Characterizing peer-to-peer streaming flows

Live peer-to-peer (P2P) multimedia streaming applications have been successfully and commercially deployed in the Internet with up to millions of users at any given time. Within the academic community, there has been an increasing level of attention on measurement studies with these real-world streaming applications. Prominent examples that are better known to academia include CoolStreaming (an INFOCOM 2005 paper), PPLive and TVAnts (a significant and increasing number of measurement studies treating them as a “blackbox”).

We expect many more papers in the near-term future taking the same research metholodogy: treating commercial applications as a “blackbox,” and perform extensive measurements by monitoring their traffic characteristics. In fact, Skype has already received a similar level of scrutiny with such a research approach, and if we were allowed a crystal ball (as of early July 2007), we would predict papers on Youtube, Joost, Zattoo, and perhaps even the iPhone when it uses EDGE (or, later, 3G HSDPA) — if we were allowed sufficient resources in addition to that crystal ball before it’s too late, perhaps we would launch a few projects on these ourselves!

As a commonly adopted design for most of the recent successful P2P live streaming applications, blocks of live media contents are being delivered over a mesh overlay topology, featuring a BitTorrent-like reciprocal exchanges of useful content blocks among multiple peers. It is also interesting to observe that most current-generation P2P streaming applications employ relatively simple peer selection and mesh construction protocol designs. They typically use central tracking servers to gain initial knowledge of existing peers in the channels, and periodically exchange peer lists among peers themselves. As mesh-based streaming topologies play an important role towards the commercial success of P2P streaming, it is critical to acquire a thorough and in-depth understanding of the topological characteristics of these large-scale P2P meshes. It would be an intriguing research challenge to investigate how the constructed topologies actually behave in practice, dynamically evolve over time, and react to extreme scenarios such as huge flash crowds.

There has been quite a number of existing papers on exploring P2P topologies of P2P decentralized search (such as Gnutella and its variants). The problem is that such P2P decentralized search applications are quite different from modern P2P streaming applications based on block exchanges and long-lived meshes, leading to possibly different topological properties. As a matter of fact, topological characteristics in block-exchanging P2P streaming applications may well be different from their file-sharing predecessors (such as BitTorrent), due to its biased peer selection protocol design towards the timely delivery of media content.

The bad news is, there are only so much one can do with a measurement-based study treating the application as a “blackbox.” If we wish to gain in-depth insights and a complete understanding of P2P streaming characteristics, we need to collaborate with commercial solution providers and add instrumentation facilities into the product itself! With the Magellan project, we wish to achieve exactly that, by collaborating with UUSee Inc., one of the leading P2P live streaming solution providers in mainland China, supported by reputable venture capital firms in North America.

Our ICDCS 2007 paper, that we just presented in Toronto, Canada, represents the first milestone of Magellan, with our observations from exploring graph theoretical properties in actually formed live streaming topologies. All of our observations are based on over 120 GB of traces and 10 million unique IP addresses that we have collected over a two-month period (September to October 2006) using dedicated trace servers, and instrumented client code to proactively submit reports to trace servers. With emphasis on their evolutionary nature over a long period of time, we have utilized and extended classical graph measurement metrics — such as the degree, clustering coefficient, and reciprocity — to investigate various aspects of the streaming topologies at different times of the day, in different days in a week, and in flash crowd scenarios. We also compare our discoveries with existing results related to file sharing applications, with further insights unique to P2P streaming.

We have brought several insights to the spotlight in this paper. First, we show that the current-generation P2P streaming platform scales very well, even in large flash crowd scenarios. Second, we observe that the degree distribution towards active neighbors in a peer-to-peer mesh does not follow the power-law distribution. Third, we argue that ISP-based clusters are formed from the dynamic peer selection process, carried out during live streaming sessions, even with exceedingly simple algorithms of peer selection. Finally, we believe that mesh P2P topologies do not resemble a tree in reality, and that the level of reciprocity among peers is high, which plays a key role towards the effectiveness of utilizing peer upload bandwidth.

If you are still reading, you may also be interested in our presentation slides dedicated to this work, available as either an Adobe PDF document or an Flash slide show.

Update on September 12, 2007: I just noticed three papers on YouTube trace analysis in IMC 2007 (Internet Measurement Conference). I guess my “crystal ball” works quite accurately when I wrote this short essay back in July: would Joost be the next one to analyze?