Characterizing peer-to-peer streaming flows
Dubbed the “Internet currency” by researchers at Harvard University, available bandwidth among peers is of pivotal importance to large-scale P2P streaming applications. It is therefore important for peers in a peer-to-peer streaming system to select an appropriate and small number of neighbouring peers with active connections, so that there is an abundant — or at least satisfactory — amount of available bandwidth between itself and each of its neighbours. After all, have a few hundred active neighbours with 1 KB per second each may not be so helpful in delivering multimedia content in a timely fashion.
The objective of our second milestone in the Magellan project was precisely to see if it is possible to select “good” active peers with a satisfactory level of available bandwidth. In our recent paper to appear in IEEE Journal on Selected Areas in Communications (JSAC), Special Issue on Advances in Peer-to-Peer Streaming Systems, it is our hope that some kind of imprecise and approximate knowledge of available TCP bandwidth between two peers can be derived with minimum active probing, or better yet, no probing at all.
To hopefully achieve this objective, we have taken the time to conduct an exhaustive investigation with respect to statistical properties of TCP throughput values of streaming flows among peers, using more than 230 GB of UUSee traces and 370 million live streaming flows over a four-month period of time (November 2006 to February 2007). In particular, we have investigated TCP throughput distributions in various peer ISP/area/type categories, statistically tested the correlation between TCP throughput and its application-layer factors by modeling them into regression models, and studied the evolutionary properties of TCP throughput values over the trace period.
We have made quite a number of interesting observations. First, we have discovered that the ISPs that peers belong to are highly significant in determining inter-peer bandwidth, even more important than their geographic locations. Second, we have also found excellent linear correlations between the availability of peer last-mile bandwidth and inter-peer bandwidth within the same ISP and between a subset of ISPs, with different linear regression coefficients for different pairs of ISPs. Finally, we have observed daily evolutionary patterns of inter-peer bandwidth.
Based on these insights, we have designed a “throughput expectation index” that makes it possible to select high-bandwidth peers without performing any measurements. This index is computed based on the ISPs that peers belong to, a table of linear regression coefficients for different pairs of ISPs, and the time of the day. All linear regression coefficients are derived off-line using historical measurement traces. We have cross-checked the set of peers selected using such an index against the set of real-world top-ranked peers in the traces from a different time period, and discovered a surprisingly good match between the two.
Interested readers are referred to the web site of the first author, Chuan Wu, for more details.