Research for Dragonfly Streaming Scheduler

Research done for the design of Dragonfly Streaming Scheduler.

Here P2P Streaming means that the data downloaded by the peer which will then be passed to the target place, is not going to stored in the local file system. The data would be maintained in the memory, and then ditectly send to the target node.

This is necessary for the P2P downloading when the peer server have a cloud disk. In that case, the process of writing the data to the file system of peer server would be limited to the network I/O. But if the P2P streaming is adopted, the writing process would be eliminated.

In this post, I would do research on how to design the scheduler in the supernode/server to distribute the pieces among peer servers.

Is Random Scheduling Sufficient in P2P Video Streaming?

P2P streaming system can be broadly classified into two categories, namely tree-based and mesh-based. The tree-based systems have well-organized overlay structures and typically distribute data from a peer to its children peers. While the mesh-based P2P streaming system is not restricted to the static topology. The peering relationships are established/terminated based on the content availability and bandwidth availability on peers. Dragonfly is just based on the last type of systems.

Baseline Scheduling: Random Pull Scheduling

This scheduling algorithm is kind of like the current DF algorithm. I’ll post it below

  • Distribute the number of pieces evenly. Select the pieces with the smallest number in the entire P2P network so that the distribution of each piece in the P2P network is balanced to avoid “Nervous resources”.
  • Nearest distance priority. For a peer, the piece closest to the piece currently being downloaded is preferentially selected, so that the peer can achieve the effect of sequential read and write approximately, which will improve the I/O efficiency of file.
  • Local blacklist and global blacklist. An example is easier to understand: When peer A fails to download from peer B, B will become the local blacklist of A, and then the download tasks of A will filter B out; When the number of failed downloads from B reaches a certain threshold, B will become the global blacklist, and all the download tasks will filter B out.
  • Self-isolation. PeerA will only download files from the supernode after it fails to download multiple times from other peers, and will also be added to the global blacklist. So the peer A will no longer interact with peers other than the supernode.
  • Peer load balancing. This mechanism will control the number of upload pieces and download pieces that each peer can provide simultaneously and the priority as the target peer.

Adaptive Queue-based Chunk Scheduling

This scheduling algorithm is not that useful at the use case of DF scenario.

A pull signal are sent from peers a to the server whenever its forwarding queue becomes empty (or falls below a threshold) (step 1 in Figure 1). The server responds to a pull signal by sending three data chunks back to peer a (step 2). These chunks will be stored in the forwarding queue (step 3) and be relayed to peer b and peer c (step 4). When the server has responded to all ’pull’ signals in its ’pull’ signal queue, it uploads one data chunk to all peers (step 5). This data chunk will not be stored in peer’s forwarding queue and will not be relayed further.

Random Useful Packet Forwarding

A brand new concept is brought up: the fresh data chunks

The source server in RUPF always gives highest priority to the fresh data chunks that have not been sent out before.

And there are differences between the RUPF and DF architecture. The formal one is design for the user, which means that the peer would have to choose a subset of peers to serve. While the DF peer would serve follow the request from other peers, which is scheduled by the supernode. There doesn’t exist the choice. Thus, the concept of deprived peers could be dropped by DF.

这个全新的概念在流式传输中有其独特的意义

关于将fresh data chunks作为碎片调度第一优先级的提案

场景:如果目前存在三个流式传输的节点A, B, C,以及一个非流式传输的节点D。此时A, B, C, D本地均含有另外一节点E需要的碎片,假设A, B, C含有碎片a,D含有碎片b都是节点E感兴趣的碎片。那么按照当前的调度算法,将会首先从D下载b,之后再从A, B, C之中选择一个下载a。

按照我目前的理解,虽然目前碎片a被三个节点所持有,但其会被很快移除,可用性上可能不如碎片b。也就是说,我认为非流式传输中保存的数据相比流式传输,有更高的可用性。所以我偏向首先下载a,之后再下载b。

然后我在 Is Random Scheduling Sufficient in P2P Video Streaming? 这篇论文里遇到了一个概念是 the fresh data chunks,也就是被分享次数最少的碎片 / 没有被分享过。如果将这个作为第一优先级进行调度,在碎片b相比碎片a,被分享了更多次的情况下,就可以解决上面的问题。

0%