skip to content

Timetable (SCSW08)

Statistics of Networks

Thursday 24th June 2010 to Friday 25th June 2010

Thursday 24th June 2010
08:50 to 09:00 Opening Remarks INI 1
09:00 to 09:45 H Shen ([North Carolina])
Robust Estimation of the Self-similarity Parameter in Network Traffic
We shall discuss the problem of estimating the self-similarity parameter of network traffic traces. Robust estimators will be reviewed that are less sensitive to commonly encountered non-stationary traffic conditions, such as sudden level shifts and breaks. Numerical results from simulated experiments and real trace applications will be used for illustration.
09:45 to 10:30 M Handcock ([UCLA])
Modeling networks when data is missing or sampled
Network models are widely used to represent relational information among interacting units and the structural implications of these relations. Recently, social network studies have focused a great deal of attention on random graph models of networks whose nodes represent individual social actors and whose edges represent a specified relationship between the actors. Most inference for social network models assumes that the presence or absence of all possible links is observed, that the information is completely reliable, and that there are no measurement (e.g. recording) errors. This is clearly not true in practice, as much network data is collected though sample surveys. In addition even if a census of a population is attempted, individuals and links between individuals are missed (i.e., do not appear in the recorded data). In this paper we develop the conceptual and computational theory for inference based on sampled network information. We first review forms of network sampling designs used in practice. We consider inference from the likelihood framework, and develop a typology of network data that reflects their treatment within this frame. We then develop inference for social network models based on information from adaptive network designs. We motivate and illustrate these ideas by analyzing the effect of link-tracing sampling designs on a collaboration network. This is joint work with Krista J. Gile, Nuffield College, Oxford.
10:30 to 11:00 Morning Coffee
11:00 to 11:45 M Coates ([McGill])
The Value of Clustering for Distributing Content in Mobile Social Networks
In several situations it is desirable to distribute content to all users in a wireless network. Examples include the dissemination of software and firmware updates, push-based video-on-demand systems, and news feeds. When the content is not extremely time-sensitive, there is an opportunity to reduce the consumption of server-client bandwidth by exploiting local connections between users. Rather than transmitting the entire content to each user individually, the content is divided between multiple users. When users meet they exchange data, with the result that the content percolates through the network and eventually each user has the entire content. As a network designer, we have no control over the contacts between users, which are dictated by social interactions and user behaviour. We can model the contact events as a random process and, making certain assumptions about the behaviour of this process, design procedures to optimally distribute the content among the users so that it spreads through the network as quickly as possible. In this talk, I will describe a content distribution procedure based on clustering the users into groups using a network clustering algorithm that strives to maximize the conductance of each cluster. The procedure is suboptimal, but the performance deterioration is relatively small, and there are significant improvements in terms of estimation overhead and robustness.
11:45 to 12:30 M Crovella ([Boston])
Inferring Invisible Traffic
A traffic matrix encompassing the entire Internet would be very valuable. Unfortunately, from any given vantage point in the network, most traffic is invisible. In this talk I will describe results that hold some promise for this problem. First, I will show a new characterization result: traffic matrices (TMs) typically show very low effective rank. This result refers to TMs that are purely spatial (have no temporal component), over a wide range of spatial granularities. Next, I will define an inference problem whose solution allows one to infer invisible TM elements. This problem relies crucially on an atomicity property that I will define. Finally, I will show example solutions of this inference problem via two different methods: subset regression and matrix completion. The example consists of an AS inferring the amount of invisible traffic passing between other pairs of ASes. Using this example I will illustrate the accuracy of the methods as a function of spatial granularity.

This work is joint with Lokesh Setia, Vineet Bharti, Pankaj Setia, Gonca Gursun, and Anukool Lakhina.
12:30 to 13:30 Lunch at Wolfson Court
14:00 to 14:45 M Newman ([Michigan])
Community Structure and Link Prediction in Networks
We investigate two questions of broad interest in the study of networks: the discovery of “communities” (regions of unusually high density within networks) and link prediction (judgements about which links in a network are mostly likely to be either false negatives or false positives). We approach both problems using likelihood-based fits to appropriate random graph models, including stochastic block models, edge-placement models, and hierarchical models. The overall conclusion is that these models appear to work well in general, giving useful answers to the questions of interest, but that there are a number of substantive issues, about which our understanding is currently poor, that prevent their wider application.
14:45 to 15:30 MW Mahoney ([Stanford])
Community Structure in Large Social and Information Networks
The concept of a community is central to social network analysis, and thus a large body of work has been devoted to identifying community structure. For example, a community may be thought of as a set of web pages on related topics, a set of people who share common interests, or more generally as a set of nodes in a network more similar amongst themselves than with the remainder of the network. Motivated by difficulties we experienced at actually finding meaningful communities in very large real-world networks, we have performed a large scale analysis of a wide range of social and information networks. Our main methodology used local spectral methods and involved computing isoperimetric properties of the networks at various size scales—a novel application of ideas from statistics and scientific computation to internet data analysis, which required the development of new algorithmic tools, as well as the reinterpretation of the statistical basis underlying traditional spectral approximation algorithms.

Our empirical results suggest a significantly more refined picture of community structure than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small size scales, and at larger size scales, the best possible communities gradually “blend in” with the rest of the network and thus become less “community-like.” This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. Possible mechanisms for reproducing our empirical observations will be discussed, as will implications of these findings for clustering, classification, and more general data analysis in modern large social and information networks.
15:30 to 16:00 Afternoon Tea
16:00 to 16:45 N Duffield ([AT&T])
New Methods for Sampling and Estimation in Communications Networks
The scale of communications networks, and the immense amount of operational measurements they make available, and the large numerical ranges of data values, together present formidable challenges for data analysis, reduction, and fusion. This talk describes some new sampling based approaches to these problems.
16:45 to 18:15 Poster Session
18:45 to 19:30 Dinner at Wolfson Court
Friday 25th June 2010
09:00 to 09:45 P Bühlmann (ETH Zürich)
Sparse Graphs and Causal Inference
Understanding cause-effect relationships between variables is of interest in many fields of science. To effectively address such questions, we need to look beyond the framework of variable selection or importance from models describing associations only. We will show how graphical modeling and intervention calculus can be used for quantifying intervention and causal effects, particularly for high-dimensional, sparse settings where the number of variables can greatly exceed sample size. Besides methodology and theory we illustrate some findings on gene intervention effects (of single gene deletions) in yeast.
09:45 to 10:30 M Maggioni ([Duke])
Multiscale Methods for the Analysis of Dynamic Graphs
Dynamic graphs arise in a variety of real-world situations: from social networks, to engineered physical networks, to graphs associated with data sets (e.g. financial transactions) that vary in time. The challenges are the need to develop robust tools and metrics for comparing graphs at different times, in order to model statistical significant changes, and capture anomalies: in real-world situation a graph/network will vary stochastically in time with vertex/edge additions/deletions, and classical tools such as graph isomorphism are not robust enough to handle such changes. We use multiscale decompositions of graph and random walks at multiple scales to introduce metrics of change (in time) of a graph, that allow use to capture changes of different magnitude at different scales and “locations” on the graph. We apply these techniques to synthetic graphs as well as real world data sets, and discuss strengths and weaknesses of this approach.
10:30 to 11:00 Morning Coffee
11:00 to 11:45 W Willinger ([AT&T])
When Everything Looks Like a Nail: Graph Models of the Internet
The general appeal of abstracting real-world networks to simple graphs is understandable and has been partly responsible for fueling the new field of “network science”. However, as the Internet application has demonstrated, such abstractions that ignore much of what engineers consider as critical come at a price. For example, they can lead to the study of graph models that have little in common with the real-world networks that motivated these models in the first place. In turn, they tend to focus on cyber-threat models that are largely of academic interest because they are incapable of dealing with the most potent and potentially lethal real-world threats.

Fortunately, the Internet application also suggests an alternative and more engineering-based approach to the “art” of abstracting real-world networks. This approach emphasizes network function over network structure and requires (1) a basic understanding of the process by which network connectivity measurements are obtained, (2) a careful assessment of the kind of inferences that the available measurements can and cannot support, and (3) a break with tradition when it comes to network modeling and model validation. I will illustrate with specific Internet-related examples that ignoring any of these issues is bound to lead to specious network models or alluring results that quickly collapse when scrutinized with real data or examined by domain experts.

11:45 to 12:30 E Volz ([Michigan])
Diffusion in Networks and Infectious Disease Epidemics
I review several special cases of epidemics spreading through random networks which reduce to simple solutions based on ordinary differential equations. This reveals a link between traditional mass action models of epidemics in which contacts are instantaneous and uncorrelated, and networks which have dynamically rearranging ties. I then present several applications to HIV and Influenza. A simple modification to our equations allows us to model sero-sorting of HIV positive individuals, which is the tendency to rearrange relationships to those with matching infection status. Other extensions to the model are explored, including diffusion in networks with clustering (transitivity), and heterogeneous susceptibility and infectiousness. We find that ODE approximations using probability generating functions are more precise and computationally tractable than corresponding systems based on pair approximation.
12:30 to 13:30 Lunch at Wolfson Court
14:00 to 14:45 A Feldmann (Technische Universität Berlin)
An Opportunity for ISP and Application Collaboration
Our analysis of residential broadband Internet traffic reveals a number of surprises in terms of the mental models we developed from the measurement literature. For example, we find that HTTP— not peer-to-peer—traffic dominates by a significant margin; that more often than not the home user’s immediate ISP connectivity contributes more to the round-trip times the user experiences than the WAN portion of the path; and that the DSL lines are frequently not the bottleneck in bulk-transfer performance.

Moreover, we note that applications often put their own traffic optimization strategies in place. Among the popular examples are Content Distribution Networks, large content providers with multiple data centers around the world, as well as P2P networks. However, these applications rarely respect or even consider the underlying network. To overcome this, we suggest that applications and ISPs should collaborate. We propose and evaluate the feasibility of a solution using as ISP offered information service. 16
14:45 to 15:30 M Roughan ([Adelaide])
Statistically Accurate Network Measurements
Estimations of Internet loss rates from measurements often have large statistical errors. Any meaningful application of Internet loss measurement therefore needs to incorporate these uncertainties in the results. However, estimating measurement errors is difficult as it requires the knowledge of the loss process itself. Many experiments use a simple and unrealistic Bernoulli model for the loss process and severely underestimate these errors.

Here we develop SAIL a measurement tool that can accurately estimate not only the loss rate of an end-to-end path but also its statistical errors. The key idea in our method is to capture the correlation between packet losses using an ON/OFF renewal model and to use a Hidden Semi- Markov Model algorithm to infer the parameters of the ON and OFF periods from measurement data. Once the model parameters are known, statistical properties of the loss process such as the loss rate, its variance, and the distribution of the loss and no-loss bursts can be easily computed.

This is co-work with Hung Nguyen.
15:30 to 16:00 Afternoon Tea
16:00 to 16:45 E Kolaczyk ([Boston])
(Anti)social Behavior in Malicious Internet Source IPs: Characterisation and Detection
We consider the problem of monitoring Internet traffic at the IP address level, for the purpose of identifying malicious source IPs. This problem is highly challenging, due to such diverse factors as data volume, limited measurement vantage, sampling effects, and user privacy concerns. Moreover, efforts typically are made for traffic from the very IP addresses we seek to detect to blend in with the rest of (normal) traffic. In this talk, we present work characterising the traffic behavior of IP source addresses from a social network perspective and exploiting our characterizations to build simple but effective detection tools. Specifically, we analyze network flow data, collected on a major Internet backbone network, in combination with log records from Internet security programs, using both local and global network representations and network analysis tools. Our findings are twofold. First, we show that malicious source nodes in IP traffic are distinctive in their communication behavior, in that they “interact” with other nodes without substantively ‘participating’ in the natural communities within which the latter exist. Second, we demonstrate that, with appropriate social network analysis tools, this behavior can be exploited in developing detection algorithms. This is joint work with Qi Ding, Natallia Katenka, Paul Barford, and Mark Crovella.
16:45 to 17:30 P Thiran ([EPFL])
Locating Congested Links in the Internet with Unicast Probes
How can we locate congested IP (Internet Protocol) links from end-to-end measurements using active probing mechanisms? For practical reasons, we want to avoid using IP multicast protocols, which are not widely deployed, and to avoid relying on tight temporal synchronization between beaconing nodes, which is difficult to achieve between distant sites. Like other problems in network tomography or traffic matrix estimation, this inverse problem is ill-conditioned: the end-of-end measurement outcomes do not allow to uniquely identify the variables representing the status of the IP links. To overcome this critical problem, current methods often use the unrealistic assumption that all IP links have the same prior probability of being congested. We find that this assumption is not needed: spatial correlations are sufficient to either learn these probabilities, or to identify the variances of the link loss rates. We can then use the learned probabilities or variances as priors to find rapidly the congested links at any time, with an order of magnitude gain in accuracy over existing algorithms. These solutions scale well and are therefore applicable in today’s Internet, as shown by the results obtained both by simulation and real implementation using the PlanetLab network over the Internet.

This joint work with Hung X. Nguyen (Univ. of Adelaide), Denisa Ghita, Katerina Argyrak, and Maciej Kurant (EPFL).
18:45 to 19:30 Dinner at Wolfson Court
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons