Archive for the ‘Network coding’ Category

On Large-Scale P2P Streaming Systems with Network Coding

Monday, June 30th, 2008

Recent years have witnessed real-world success of peer-to-peer (P2P) live streaming systems, especially in start-up companies. Given these success stories, it is not surprising that there is a large base of research aiming at designing “good” P2P streaming systems. Even with wide research attention, however, a number of fundamental performance metrics that characterize “good” P2P streaming systems are not yet well understood. Let us revisit a few performance metrics as examples.

With respect to playback quality, if media segments do not arrive in a timely fashion, they have to be skipped at playback, which degrades the playback quality. How do we consistently maintain high playback quality at all participating peers? With respect to the initial buffering delay, which is experienced by a peer when it first joins or switches to a new channel, how do we improve user experience with the shortest initial buffering delay? With respect to server bandwidth costs, how do we minimize such costs by maximizing bandwidth contribution from participating peers? Last but not the least important, how do we design a system that scales well to accommodate a large flash crowd and a high degree of peer dynamics?

We believe that these performance metrics should be given priority when evaluating a protocol that is designed specifically for peer-to-peer live streaming. The playback quality and the initial buffering delay matter most to the user experience, which determines the level of user satisfaction. The server bandwidth costs, however, matter most to the companies in operation, as they directly determine most of the ongoing operational costs.

While it may be possible to implement a protocol design in real-world systems without “paper and pencil” analytical studies, we believe in the values of such analytical studies of large-scale P2P streaming systems. Analytical studies help us to understand the fundamentals with mathematical rigor, rather than by resorting to intuition or trial-and-error. Analytical studies also help us to understand properties that are hard to evaluate with real-world studies without risks, such as the ability to scale well to a large number of peers with good performance.

At first glance, developing a theoretical framework that helps protocol designers to understand all of these performance metrics may appear too good to be true. Most existing analytical studies therefore focused on only one or two performance metrics instead. For instance, Y. Liu [ACM Multimedia 2007] provided a centralized streaming algorithm that achieves minimum delay. Massoulie et al. [INFOCOM 2007] proposed a streaming algorithm that approaches maximum streaming rate. Bonald et al. [SIGMETRICS 2008] identified several streaming algorithms that achieve near-optimal streaming rate and delay.

In contrast, in our recent research paper to appear in ACM Multimedia 2008, we seek to mathematically analyze and understand a new family of streaming protocols, simultaneously considering all these performance metrics. Our rescue comes from the use of network coding, which not only eliminates some of the mathematical difficulties associated with previous theoretical models, but also leads to simple and practical streaming protocols. This simplification allows us to take into account all of the important performance metrics, even under a more realistic system model.

We identify several design principles for streaming systems with network coding and show that any protocol following our design principles is sufficient to achieve provably good overall performance in realistic settings. In particular, we have provided sufficient conditions on the required server bandwidth costs to ensure smooth playback under peer dynamics, and with short initial buffering delays. A closer look at these conditions further suggests that our design principles are able to achieve near-optimal bandwidth utilization during flash crowds and to handle severe peer dynamics with ease.

Our design principles can be readily implemented in practice with low overhead. In fact, our previous work on R2 is a particular example of such implementations. Through such an example of a practical protocol design, combined with our theoretical framework of fundamental properties, we have provided a striking demonstration of an effective streaming protocol design with network coding. We expect to see real-world implementations of our design principles in the near future. Clearly, the design principles identified in our work might not be unique, and our analytical studies are concerned with a family of such protocols, focusing on important elements, rather than protocol details. We believe our theoretical framework will shed considerable light into the journey towards the best possible protocols in P2P streaming systems.

A slide presentation on this project is available for further reading, in both Adobe Flash and Adobe PDF formats.

R2: Practical Peer-to-Peer Live Streaming with Network Coding

Saturday, October 13th, 2007

Recent real-world success of peer-to-peer live streaming, such as UUSee and PPLive, has led to the question of what further improvements one can make to existing streaming protocol design. It is well known that Peer-to-Peer (P2P) streaming conserves server bandwidth by shifting the load to peers and their ISPs, and as such scales well to a larger population of subscribers, and offers better resilience to flash crowds, when server bandwidth supply is temporarily insufficient to meet demand. However, when pushed too far, it may be hard to monitor and control the quality of streaming playback, as routine peer departures are detrimental to streaming performance in a transient fashion. The delay in delivering a live event to all peers is much higher with relaying, and the initial buffering delay can be higher.

Previous research in the literature has focused on constructing high-quality peer-level topologies. Tree-based per-slice push, such as Chunkyspread [ICNP 2006], may shorten the delays between live events and their playback time, but they suffer overhead of maintaining trees and unavoidable implementation complexity. Mesh-based per-segment pull, such as CoolStreaming [INFOCOM 2005], may be simple to implement and offer better resilience to frequent peer departures, but the overhead of requests and buffer availability exchanges is quite significant, and the delays between live events and their playback time may be much longer.

With the recent surge of interest in practical network coding, it is interesting to see if network coding would be helpful in delivering performance gains in P2P streaming, with respect to three important performance metrics: (1) resilience to peer dynamics; (2) the utilization of scarce server bandwidth; and (3) the time required for initial buffering delays when switching to new channels.

In the R2 project, our objective is to design, analyze, and implement a new peer-to-peer live streaming protocol, that hopefully combines the benefits of tree-based per-slice push and mesh-based per-segment pull, but without much complexity in its implementation. In the first phase, we are interested in weather network coding can offer any tangible benefits in P2P streaming, without revising the traditional mesh-based per-segment pull protocol. We have performed an experimental evaluation of the benefits of network coding over a vanilla pull-based streaming protocol, with the conclusion that there exist some benefits offered by network coding, but they are nothing to write home about. Our papers summarizing results in this step have been published in IEEE Transactions on Multimedia (PDF), and IEEE INFOCOM 2007 (PDF).

In the second phase, we are naturally curious about the possibility of designing a much improved protocol that is tailored to take full advantage of network coding. Towards this objective, we have designed R2 (Random push with Random network coding), and believed that R2 enjoys the advantages of both push-based and pull-based streaming protocols. To show its practicality and effectiveness, our experimental results are based on a complete implementation of R2 with emulated peers on a server cluster, coupled with an optimized library of random network coding tuned for performance. Our paper on R2 is to appear in the December 2007 issue of the IEEE Journal on Selected Areas in Communications, Special Issue on Advances in P2P Streaming.

If you are still reading, you may also be interested in our presentation slides dedicated to the R2 project, available as an Adobe PDF document.

Tuning the implementation of network coding

Wednesday, June 20th, 2007

Since 2001, network coding has received a tremendous amount of research attention in the networking community. The essence of network coding is to allow coding at intermediate network nodes. Network coding is shown to help to resist dynamics over time in peer-to-peer (P2P) content distribution, or to improve throughput in wireless networks.

The lack of commercial use of network coding in any P2P applications may be due to the high coding complexity of random linear codes, as the number of blocks to code scales up. As random linear nodes are used, or implicitly assumed, in almost all practical proposals of using network coding in recent papers, we thought it would be a good idea to see how much coding — what may be the typical block length — that modern desktop processors are able to handle, by carefully designing a performanced-tuned implementation of random network coding. Ideally, such an implementation would use the best development tools available today, and take advantage of all available hardware support from modern processors. We thought it would be an important and challenging step towards commercial use of network coding, especially in P2P applications that use modern processors.

Our recent paper titled “Parallelized Progressive Network Coding With Hardware Acceleration,” to appear in the Proceedings of IEEE IWQoS 2007 (held in Chicago, from June 21-22, 2007), has presented results from such an implementation that we have completed. In this implementation, we have used a combination of three tricks: progressive decoding using Gauss-Jordan elimination so that blocks can be decoded on-the-fly as they are received, a loop-based implementation of GF(256) multiplication that takes advantage of SIMD vector instructions in modern processors, and a threaded implementation that aggressively utilizes multi-core CPUs and their L2 caches. The result has been a network coding implementation written in C++, with a strong focus on multi-core performance, optimized for Intel, AMD, and PowerPC processor families (AltiVec on PowerPC and SSE2 on x86). In addition, our code takes care of platform-specific details (e.g., byte-aligned memory allocation for SIMD) and readily compiles and runs on Microsoft Windows, Mac OS X, and Linux.

This graph shows the decoding performance of our implementation (where n is the number of blocks to code):

performance.gif

This table shows the performance of both encoding and decoding on various processors and operating systems:

comparison.gif

A slide presentation of this work is available as an Adobe PDF document or a Flash slide show.

This work has brought us some good news. We show that for practical block sizes and block lengths (such as 128 blocks of 4 KB each), it is practical to perform network coding on-the-fly in P2P applications, with little computational overhead (since most of the time required for coding overlaps with the time for transmitting or receiving coded data). If the network is the bottleneck, the load on processors should be light, as the coding performance we have shown is far better than typical DSL bandwidth (at most 500 KB per second).