skip to content

Timetable (SNAW01)

Graph limits and statistics

Monday 11th July 2016 to Friday 15th July 2016

Monday 11th July 2016
09:00 to 09:35 Registration
09:35 to 09:45 Welcome from John Toland (INI Director) and Organisers INI 1
09:45 to 10:30 Cristina Toninelli
Kinetically constrained spin models
We shall discuss kinetically constrained spin models (KCSM), a class ofinteracting particle systems which have been introduced in physics liter-ature to model liquid-glass and more general jamming transitions. Theevolution of KCSM follows a continuous time Markov process whoseelementary moves are birth and death of particles. The key feature isthat a move occurs only if the conguration satises a local constraint.These constraints induce the existence of clusters of blocked particles,the occurrence of several invariant measures and ergodicity breakingtransitions. We shall present some results concerning the identicationof the ergodic regime and the exponential convergence to equilibriumin this regime. After establishing a connection with monotone cellu-lar automata of bootstrap percolation type, we shall conclude our talkwith a discussion of a particular KCSM for which the correspondingcellular automaton has a peculiar discontinuous percolation transition.
10:30 to 11:15 Peter Csikvari
Statistical matching theory with a glimpse towards extremal regular graphs
In this talk we will survey some recent development on statistical properties of matchings of very large and infinite graphs. The main goal of the talk is to describe a few applications of a new concept called matching measure. These applications include new results on the number of (perfect) matchings in large girth graphs as well as simple new proofs of certain statistical physical theorems. In particular, we will sketch a new proof of Schrijver's lower bound for the number of perfect matchings of regular bipartite graphs. We will also mention some extremal graph theoretic results about other graph parameters of regular graphs.

This talk is based on joint papers with various subsets of Miklos Abert, Peter E. Frenkel, Tamas Hubai and Gabor Kun.
11:15 to 11:45 Morning Coffee
11:45 to 12:30 Oliver Riordan
The phase transition in Achlioptas processes
The classical random graph process starts with a fixed set of n vertices and no edges. Edges are then added one-by-one, uniformly at random. One of the most interesting features of this process, established by Erdős and Rényi more than 50 years ago, is the phase transition near n/2 edges, where a single `giant' component emerges from a sea of small components. This example serves as a starting point for understanding phase transitions in a wide variety of other contexts. Around 2000, Dimitris Achlioptas suggested an innocent-sounding variation of the model: at each stage two edges are selected at random, but only one is added, the choice depending on (typically) the sizes of the components it would connect. This may seem like a small change, but these processes do not have the key independence property that underlies our understanding of the classical process. One can ask many questions about Achlioptas processes; the most interesting concern the phase transition: does the critical value change from n/2? Is the nature of the transition the same or not? I will describe a number of results on these questions from joint work with Lutz Warnke.
12:30 to 13:30 Lunch @ Wolfson Court
13:45 to 14:30 Paul Smith
Towards universality in bootstrap percolation
Bootstrap percolation is a broad class of monotone cellular automata, which has links to the Glauber dynamics of the Ising model and other areas of statistical physics. Starting with random initial conditions, the question is to determine the threshold for complete occupation of the underlying graph. Until relatively recently, only nearest-neighbour models (and relatively minor variants of these models) had been studied -- and these are now very well understood. In this talk I will discuss a new `universality' theory for bootstrap percolation, which has emerged in the last few years. In particular, I will explain the classification of two-dimensional models, give more precise results for so-called `critical' models (also in two dimensions), and talk about a new classification theorem for higher dimensional models.
14:30 to 15:00 Peter Orbanz
Subsampling, symmetry and averaging in networks
Consider a very large graph---say, the link graph of a large social network. Now invent a randomized algorithm that extracts a smaller subgraph. If we use the subgraph as sample data and perform statistical analysis on this sample, what can we learn about the underlying network? Clearly, that should depend on the subsampling algorithm. I show how the choice of algorithm defines a notion of (1) distributional invariance and (2) of averaging within a single large graph. Under suitable conditions, the resulting averages satisfy a law of large numbers, such that statistical inference from a single sample graph is indeed possible. From this algorithmic point of view, graphon models arise from a specific choice of sampling algorithm, various known pathologies of these models are explained as a selection bias.

15:00 to 15:30 Afternoon Tea & Posters
15:30 to 16:00 Francois Caron
Sparse and modular networks using exchangeable random measures
Statistical network modeling has focused on representing the graph as a discrete structure, namely the adjacency matrix, and considering the exchangeability of this array. In such cases, it is well known that the graph is necessarily either dense (the number of edges scales quadratically with the number of nodes) or trivially empty.
Here, we instead consider representing the graph as a measure on the plane. For the associated definition of exchangeability, we rely on the Kallenberg representation theorem (Kallenberg, 1990). For certain choices of such exchangeable random measures underlying the graph construction, the network process is sparse with power-law degree distribution, and can accommodate an overlapping block-structure.
A Markov chain Monte Carlo algorithm is derived for efficient exploration of the posterior distribution and allows to recover the structure of a range of networks ranging from dense to sparse based on our flexible formulation.

Joint work with Emily Fox and Adrien Todeschini

16:00 to 16:30 David Blei
Exponential Family Embeddings
Word embeddings are a powerful approach for capturing semantic  similarity among terms in a vocabulary.  In this talk, I will describe exponential family embeddings, a class of methods that  extends the idea of word embeddings to other types of  high-dimensional data. As examples, we studied neural data with real-valued observations, count data from a market basket analysis, and ratings data from a movie recommendation system.  We then extended this idea to networks, giving a novel type of "latent space" model.  The main idea behind an EF-EMB is to model each observation conditioned on a set of other  observations.  This set is called the context, and the way the  context is defined is a modeling choice that depends on the problem.  In language the context is the surrounding words; in neuroscience  the context is close-by neurons; in market basket data the context is other items in the shopping cart; in networks the context is edges emanating from a node pair.  Each type of embedding model  defines the context, the exponential family of conditional  distributions, and how the latent embedding vectors are shared  across data. We infer the embeddings with a scalable algorithm based  on stochastic gradient descent.  We found exponential family embedding models to be more effective than other types of dimension reduction.  They better reconstruct held-out data and find interesting qualitative  structure.
16:30 to 17:30 Welcome Wine Reception
Tuesday 12th July 2016
09:00 to 09:30 David Aldous
A framework for imperfectly observed networks
Model a network as an edge-weighted graph, where the (unknown) weight $w_e$ of edge $e$ indicates the frequency of observed interactions, that is over time $t$ we observe a Poisson($t w_e$) number of interactions across edges $e$.  How should we estimate some given statistic of the underlying network?   This leads to wide-ranging and challenging problems.
09:30 to 10:00 Richard Kenyon
Can you see the sound of a drum?
We discuss planar networks and harmonic functions on them, and in particular two reconstruction problems for "hidden" networks:
1. To what extent can one reconstruct a network from boundary measurements?
2. Can one reconstruct conductances from a ``harmonic embedding" of the network?
10:00 to 10:30 Charles Radin
Emergent phases in constrained random networks
Structures are shown to emerge in large random networks, under variable
constraints, when examined at large scale. They are automatically created
by the coalescence of nodes into a small number of equivalence classes,
which must cooperate to create structure in order to satisfy the constraints.
The structures change gradually as constraints vary within a phase, and
change sharply at phase transitions.

This talk is based on joint work with Richard Kenyon, Kui Ren and Lorenzo Sadun.

Related Links
10:30 to 11:00 Morning Coffee
11:00 to 11:30 Nikhil Srivastava
Spectral Sparsification of Graphs
Modern graph algorithms increasingly access and manipulate graphs via their Laplacian operators and other associated “continuous” objects, rather than purely discretely.  An important primitive in this paradigm is spectral sparsification: being able to approximate the Laplacian of a given graph by that of a graph with significantly fewer edges. I will survey some of the key results in this area, drawing on tools from random matrix theory, matrix analysis, and electrical network theory.
11:30 to 12:00 He Sun
Partitioning Well-Clustered Graphs: Spectral Clustering Works!
We study variants of the widely used spectral clustering that partitions a graph into k clusters by (1) embedding the vertices of a graph into a low-dimensional space using the bottom eigenvectors of the Laplacian matrix, and (2) grouping the embedded points into k clusters via k-means algorithms. We show that, for a wide class of  graphs, spectral clustering gives a good approximation of the optimal clustering. While this approach was proposed in the early 1990s and has comprehensive applications, prior to our work  similar results were known only for graphs generated from stochastic models.

We also give a nearly-linear time algorithm for partitioning well-clustered graphs based on  computing a matrix exponential andapproximate nearest neighbor data structures.

Based on joint work with Richard Peng (Georgia Institute of Technology), and Luca Zanetti (University of Bristol).

12:00 to 12:30 Amin Saberi
Thin spanning trees and their algorithmic applications
Motivated by Jaeger's modular orientation conjecture, Goddyn asked the following question: 

A spanning tree of a graph G is called epsilon-thin if it contains at most an epsilon fraction of the edges of each cut in that graph. Is there a function f:(0,1)→
such that every f(epsilon)-edge-connected graph has an epsilon-thin spanning tree?
I will talk about our journey in search of such thin trees, their applications concerning traveling salesman problems, and unexpected connections to graph sparsification and the Kadison-Singer problem.

12:30 to 13:30 Lunch @ Wolfson Court
13:30 to 14:00 Haiyan Huang
Inferring gene-gene associations and gene networks beyond standard statistical models
With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to infer gene interactions has been a major challenge in systems biology. In this talk, I will discuss two methods of finding gene associations that differ in their considerations of how genes behave across the given samples. The first method applies to the case where the patterns of gene association may change or only exist in a subset of all the samples. The second method goes beyond pairwise gene relationships to higher level group interactions, but requiring similar gene behaviours across all the samples. We compare both methods to other popular approaches using simulated and real data, and demonstrate they lead to better general performance and capture important biological features in certain situations that are missed by the other methods.
14:00 to 14:30 Florian Markowetz
Understanding genetic interaction networks
Genes do not act in isolation, but rather in tight interaction networks. Maps of genetic interactions between pairs of genes are a powerful way to dissect these relationships. I will discuss two statistical approaches to better understand the functional content of genetic interaction networks and the mechanism underlying them.

First, I will present a method that adapt concepts of spatial statistics to assess the functional content of molecular networks. Based on the guilt-by-association principle, our approach (called SANTA) quantifies the strength of association between a gene set and a network, and functionally annotates molecular networks like other enrichment methods annotate lists of genes.

Second, I will describe a method to understand genetic interactions based on high-dimensional phenotypes. I will present methodology we developed to test the hypothesis that complex relationships between a gene pair can be explained by the action of a third gene that modulates the interaction. Our approach to test this hypothesis builds on Nested Effects Models (NEMs), a probabilistic model tailored to inferring networks from gene perturbation data. We have extended NEMs with logical functions to model gene interactions and show in simulations and case studies that our approach can successfully infer modulators of genetic interactions and thus lead to a better understanding of an important feature of cellular organisation.

Related Links
14:30 to 15:00 Afternoon Tea & Posters
15:00 to 15:30 Michael Leung
A Weak Law for Moments of Pairwise-Stable Networks
We develop asymptotic theory for strategic network-formation models under the assumption that the econometrician observes a single large pairwise-stable network. Drawing on techniques from the literature on random graphs, we derive primitive restrictions on the model that establish a weak law of large numbers for a useful class of network moments. Under these restrictions, we show that the model generates realistic networks that are sparse and may contain "giant" connected subnetworks, two well-known properties of real-world social networks. The restrictions also conveniently suggest a new method to simulate counterfactual networks that avoids a well-known curse of dimensionality. Additionally, we characterize the identified set of structural parameters based on a tractable class of dyad-level network moments and construct consistent set estimators.

Related Links
15:30 to 16:00 Oleg Sofrygin
Estimation of Causal Effects in Network-Dependent Observational Data
Co-author: Mark J. van der Laan (University of California, Berkeley, CA)

We outline the framework of targeted maximum likelihood estimation (TMLE) in observational network data. Consider a dataset in which each observational unit is causally connected to other units via a known social or geographical network. For each unit we observe their baseline covariates, their exposure and their outcome, and we are interested in estimating the effect of a single time-point stochastic intervention. We propose a semi-parametric statistical model that allows for between-unit dependencies: First, unit-level exposure can depend on the baseline covariates of other connected units. Second, the unit-level outcome can depend on the baseline covariates and exposures of other connected units. We impose some restrictions on our model, e.g., assuming that the unit's exposure and outcome depend on other units as some known (but otherwise arbitrary) summary measures of fixed dimensionality. A practical application of our approach is demonstrated in a large-scale networ k simulation study that applies two newly developed R packages: simcausal and tmlenet. We also discuss some extensions of our work towards estimation in longitudinal data.
16:00 to 16:30 Dean Eckles
Statistical and causal inference in networks
I survey the state of methods for statistical and causal inference in networks. I present Fisherian randomization inference methods for testing for network interference (i.e., spillovers, exogenous peer effects) as an example. 
Wednesday 13th July 2016
09:00 to 09:45 Balazs Szegedy
On the graph limit approach to random regular graphs
Let G=G(n,d) denote the random d-regular graph on n vertices. A celebrated result by J. Friedman solves Alon's second eigenvalue conjecture saying that if d is fixed and n is large then G is close to be Ramanujan. Despite of significant effort, much less was known about the structure of the eigenvectors of G. We use a combination of graph limit theory and information theory to prove that every eigenvector of G (when normalized to have length equal to square root of n) has an entry distribution that is close to some Gaussian distribution in the weak topology. Our results also work in the more general setting of almost-eigenvectors. We hope our methods will lead to a general graph limit approach to a large class of problems on random regular graphs. Joint work with A. Backhausz.
09:45 to 10:30 Fabio Martinelli
Bootstrap percolation and kinetically constrained spin models: critical lengths and mixing time scales
Co-authors: Cristina Toninelli (Univ Paris VII Diderot), Rob Morris (IMPA)

In recent years, a great deal of progress has been made in understanding the behaviour of a particular class of monotone cellular automata, commonly known as bootstrap percolation. In particular, if one considers only two-dimensional automata, then we now have a fairly precise understanding of the typical evolution of these processes, starting from p-random initial conditions of infected sites. Given a bootstrap model, one can consider the associated kinetically constrained spin model in which the state (infected or healthy) of a vertex is resampled (independently) at rate 1 by tossing a p-coin if it could be infected in the next step by the bootstrap process, and remains in its current state otherwise. Here p is the probability of infection. The main interest in KCM’s stems from the fact that, as p → 0, they mimic some of the most striking features of the glass transition, a major and still largely open problem in condensed matter physics. In this talk, motivated by recent universality results for bootstrap percolation, I’ll discuss some “universality conjectures” concerning the growth of the (random) infection time of the origin in a KCM as p → 0. Joint project with R. Morris (IMPA) and C. Toninelli (Paris VII),
10:30 to 11:00 Morning Coffee
11:00 to 11:45 Svante Janson
Graph limits and entropy
The entropy of a graph limit, or a graphon, was essentially defined by Aldous (although in a different, equivalent formulation), who showed that if a graph limit W has entropy H, then the entropy of the distribution of the corresponding random graph G(n,W) is asymptotically n^2 H/2.

Consider a hereditary class of graphs Q. Then the number of graphs of order n in Q is asymptotically given by exp( n^2 H/2), where H is the maximum entropy of a graph limit that is a limit of graphs in Q. Moreover, if this entropy maximizing limit is unique, then a uniformly random graph of order n in Q converges in probability, as n tends to infinity, to this maximizing graph limit.

As an example, we discuss the entropy maximising graph limits for the class of string graphs.

This is based on joint works with Hamed Hatami and Balasz Szegedy, and with Andrew Uzzel.
11:45 to 12:30 Paul Neville Balister
Barrier coverage in thin strips
Suppose sensors are deployed randomly in a long thin strip, and suppose each sensor can detect objects within a fixed distance. We say that the sensors achieve barrier coverage if there is no path across the strip that a small object can follow that avoids detection by the sensors. We give fairly precise results on the probability that barrier coverage is achieved as a function of the range of the sensors, the height and length of the strip, and the number of sensors deployed. In particular, we show that the most likely obstruction - a rectangular region crossing the strip which is devoid of sensors - does not in general dominate the probability of failure of barrier coverage.
Joint work with Bela Bollobas and Amites Sarkar.
12:30 to 13:30 Lunch @ Wolfson Court
13:30 to 14:15 Vincent Tassion
Sharpness of the phase transition for Voronoi percolation in $\mathbb R^d$
Take a Poisson point process on $\mathbb R^d$ and consider its Voronoi tessellation. Colour each cell of the tessellation black with probability $p$ and white with probability $1-p$ independently of each other. This rocess undergoes a phase transition at a critical parameter $p_c(d)$: below $p_c(d)$ all the black connected components are bounded almost surely, and above $p_c$ there is an unbounded black connected component almost surely. In any dimension $d$ larger than 2, we prove that for $p
The talk is based on a joint work with H. Duminil-Copin and A. Raoufi. 
14:15 to 14:45 Mark Newman
Contact networks and the spread of epidemic disease
The structure of the networks of contact between individuals can have a profound effect on the rate and extent of disease epidemics. This talk will outline some of the fundamental mathematics connecting network structure with disease outcomes and describe a number of case studies of applications to infectious diseases, including influenza, HIV, Mycoplasma pneumonia, and SARS.  
14:45 to 15:15 Afternoon Tea & Posters
15:15 to 15:45 Alfred Hero
Continuum limits for minimal paths
Many  applications involve computing minimal paths over the nodes of a graph relative to a measure of pairwise node dissimilarity. These include minimal spanning trees in computer vision, shortest paths in image databases, or  non-dominated anti-chains in multi-objective database search. When the  nodes are random vectors and the dissimilarity is an increasing function of Euclidean distance these minimal paths can have continuum limits as the number of nodes approaches infinity. Such continuum limits can lead to low complexity diffusion approximations to  the solution of the combinatorial minimal path problem.
Thursday 14th July 2016
09:00 to 09:30 Sharmodeep Bhattacharyya
Spectral Clustering for Dynamic Stochastic Block Model
Co-author: Shirshendu Chatterjee (CUNY)

One of the most common and crucial aspect of many network data sets is the dependence of network link structure on time or other attributes. There is a long history of researchers proposing networks for dynamic time-evolving formation of networks. Most complex networks, starting from biological networks like genetic or neurological networks to social, co-authorship and citation networks are time-varying. This has led the researchers to study dynamic, time-evolving networks. In this work, we consider the problem of finding a common clustering structure in time-varying networks. We consider three simple extension of spectral clustering methods to dynamic settings and give theoretical justification that the spectral clustering methods produce consistent community detection for such dynamic networks. We also propose an extension of the static version of nonparametric latent variable models to the dynamic setting and use a special case of the model to justify the spectral clusteri ng methods. We show the validity of the theoretical results via simulations too and apply the clustering methods to real-world dynamic biological networks.
09:30 to 10:00 Arash Amini
Matched bipartite block model with covariates
Community detection in bipartite networks is challenging. Motivated by an application from biology, we consider a model which allows for matched communities in the bipartite setting, in addition to node covariates with information about the matching. We derive a simple fast algorithm for fitting the model based on variational inference ideas and show its effectiveness in simulations.
10:00 to 10:30 Patrick Wolfe
10:30 to 11:00 Morning Coffee
11:00 to 11:30 Sofia Olhede
Nonparametrics for networks
Relational data have become an important component of modern statistics. Networks, and weighted networks, are ubiquitous in modern applications such as disease dynamics, ecology, financial contagion, and neuroscience. The inference of networks is harder, in parts because the measure placed on the observables need to satisfy sets of permutation invariances, and most networks are very sparse, with most possible relations not present.

This talk will explore how to best construct nonparametric summaries of such objects, in such as way that the underlying statistical model of the observations is well described, and any estimators computable with scalable algorithms.

This is joint work with Pierre-Andre Maugis and Patrick Wolfe (UCL).
11:30 to 12:00 Elizaveta Levina
Estimating network edge probabilities by neighborhood smoothing
Co-authors: Yuan Zhang (Ohio State University), Ji Zhu (University of Michigan)

The problem of estimating probabilities of network edges from the observed adjacency matrix has important applications to predicting missing links and network denoising. It has usually been addressed by estimating the graphon, a function that determines the matrix of edge probabilities, but is ill-defined without strong assumptions on the network structure. Here we propose a novel computationally efficient method based on neighborhood smoothing to estimate the expectation of the adjacency matrix directly, without making the strong structural assumptions graphon estimation requires. The neighborhood smoothing method requires little tuning, has a competitive mean-squared error rate, and outperforms many benchmark methods on the task of link prediction in both simulated and real networks.

Related Links
12:00 to 12:30 Peter Bickel
Fitting block models with covariates:”Likelihood” methods ,sparsity and selection of the number of blocks
Co-authors: Purna Sarkar (U. of Texas,Austin), Soumendu Mukherjee (U C, Berkeley), Sharmodeep Bhattacharyya, (Oregon State University)

We introduce block models with edge and block covariates ,along the lines of Hoff, Handcock,Raftery (2002) ,specializing to covariate forms of the types proposed by Zhao, Levina,Zhu(2014) and generalizing that of Newman ,Clauset(2015)..WE study mean field variational fitting in this situation along the lines of Celisse,Daudin,Pierre( 2011 ) and B.,Choi,Chang,Zhang (2013).Using theory,examples, and simulations we argue that these methods ,possibly regularized lead to models with a sparse number of blocks. 
12:30 to 13:30 Lunch @ Wolfson Court
13:30 to 14:00 Bin Yu
Artificial neurons meet real neurons: pattern selectivity in V4 via deep learning
Co-authors: Yuansi Chen (UCB), Reza Abbasi Asl (UCB), Adam Bloniarz (UCB), Jack Gallant (UCB)

Vision in humans and in non-human primates is mediated by a constellation of hierarchically organized visual areas. One important area is V4, a large retinotopically-organized area located intermediate between primary visual cortex and high-level areas in the inferior temporal lobe. V4 neurons have highly nonlinear response properties. Consequently, it has been difficult to construct quantitative models that accurately describe how visual information is represented in V4. To better understand the filtering properties of V4 neurons we recorded from 71 well isolated cells stimulated with natural images. We fit predictive models of neuron spike rates using transformations of natural images learned by a convolutional neural network (CNN). The CNN was trained for image classification on the ImageNet dataset. To derive a model for each neuron, we first propagate each of the stimulus images forward to an inner layer of the CNN. We use the activations of the inner layer as the featu re (predictor) vector in a high dimensional regression, where the response rate of the V4 neuron is taken as the response vector. Thus, the final model for each neuron consists of a multilayer nonlinear transformation provided by the CNN, and one final linear layer of weights provided by regression. We find that models using the first two layers of three well-known CNNs provide better predictions of responses of V4 neurons than those obtained using a conventional Gabor-like wavelet model. To characterize the spatial and pattern selectivity of each V4 neuron, we both explicitly optimize the input image to maximize the predicted spike rate, and visualize the selected filters of the CNN. We also perform dimensionality reduction by sparse PCA to visualize the population of neurons. Finally, we show the stability of our analysis across the three CNNs, and conclude that the V4 neurons are tuned to a remarkable diversity of shapes such as curves, blobs, checkerboard patterns, and V1-like gratings. 
14:00 to 14:30 Thomas Nichols
Modelling Large- and Small-Scale Brain Networks
Investigations of the human brain with neuroimaging have recently seen a dramatic shift in focus, from "brain mapping", identifying brain regions related to particular functions, to connectivity or "connectomics", identifying networks of coordinated brain regions, and how these networks behave at rest and during tasks. In this presentation I will discuss two quite different approaches to modeling brain connectivity. In the first work, we use Bayesian time series methods to allow for time-varying connectivity. Non-stationarity connectivity methods typically use a moving-window approach, while this method poses a single generative model for all nodes, all time points. Known as a "Multiregression Dynamic Model" (MDM), it comprises an extension of a traditional Bayesian Network (or Graphical Model), by posing latent time-varying coefficients that implement a regression a given node on its parent nodes. Intended for a modest number of nodes (up to about 12), a MDM allows inference of the structure of the graph using closed form Bayes factors (conditional on a single estimated "discount factor", reflecting the balance of observation and latent variance. While originally developed for directed acyclic graphs, it can also accommodate directed (possibly cyclic) graphs as well. In the second work, we use mixtures of simple binary random graph models to account for complex structure in brain networks. In this approach, the network is reduced to a binary adjacency matrix. While this is invariably represents a loss of information, it avoids a Gaussianity assumption and allows the use of much larger graphs, e.g. with 100's of nodes. Daudin et al. (2008) proposed a "Erdos-Reyni Mixture Model", which assumes that, after an unknown number of latent node classes have been estimated, that connections arise as Bernoulli counts, homogeneously for each pair of classes. We extend this work to account for multisubject data (where edge data are now Binomially distributed), allowing

Related Links
14:30 to 15:00 Mitya Chklovskii
Natural neural networks: structure, dynamics and computation
The brain is a directed weighted graph with neurons as nodes and synapses as edges. The structure of the graph determines the dynamics of neural activity. At the same time, correlations in neural activity determine synaptic weights. The grand challenge of neuroscience is to construct a theory relating neural structure and dynamics to computation. I will present our progress towards this goal using an online matrix factorization framework.
15:00 to 15:30 Afternoon Tea & Posters
15:30 to 16:00 Alexandre Arenas
The physics of spreading processes in multilayer networks
The study of networks plays a crucial role in investigating the structure, dynamics, and function of a wide variety of complex systems in myriad disciplines. Despite the success of traditional network analysis, standard networks provide a limited representation of complex systems, which often include different types of relationships (i.e., “multiplexity”) among their constituent components and/or multiple interacting subsystems. Such structural complexity has a significant effect on both dynamics and function. Throwing away or aggregating available structural information can generate misleading results and be a major obstacle towards attempts to understand complex systems. The recent “multilayer” approach for modeling networked systems explicitly allows the incorporation of multiplexity and other features of realistic systems. On one hand, it allows one to couple different structural relationships by encoding them in a convenient mathematical object. On the other hand, it also allows one to couple different dynamical processes on top of such interconnected structures. The resulting framework plays a crucial role in helping achieve a thorough, accurate understanding of complex systems. The study of multilayer networks has also revealed new physical phenomena that remain hidden when using ordinary graphs, the traditional network representation. Here we survey progress towards attaining a deeper understanding of spreading processes on multilayer networks, and we highlight some of the physical phenomena related to spreading processes that emerge from multilayer structure.
16:00 to 16:30 Ginestra Bianconi
Message passing theory for percolation models on multiplex networks
 Multiplex networks describe a large variety of complex systems including infrastructures, transportation networks and biological systems.In presence of interdepedencies between the nodes the robustness of the system to random failures is determined by the mutually connected giant component (MCGC). Much progress on the emergence of the MCGC has been achieved  so far but  the characterization of the emergence of the MCGC is multiplex networks with link overlap in an arbitrary large number of layers has remained elusive. Here I will present a  message passing algorithm for characterizing the percolation transition in multiplex networks with link overlap and an arbitrary number of layers M and I will derive the equation for the order parameter in an ensembles of random multiplex networks. Specifically I will propose and compare two message passing algorithms, that generalize the algorithm widely used to study the percolation transition in multiplex networks without link overlap. The first algorithm describes a directed percolation transition and admits an epidemic spreading interpretation. The second algorithm describes the emergence of the mutually connected giant component, that is the percolation transition, but does not preserve the epidemic spreading interpretation. The phase diagrams for the percolation and directed percolation transition is obtained in simple representative cases. For the same multiplex network structure, in which the directed percolation transition has non-trivial tricritical points, the percolation transition has a discontinuous phase transition, with the exception of the trivial case in which all the layers completely overlap.
16:30 to 17:00 Tiago Peixoto
Efficient Bayesian inference of multi-scale network structures
A principled approach to characterize the hidden structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a popular choice for this task is the stochastic block model, where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. In this talk, we will present a nonparametric Bayesian inference framework based on a microcanonical formulation of the stochastic block model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference of large networks, but also reveal structures at multiple scales; 2. A very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges, but also with an unlimited number of groups. We show also how this approach can be used to sample group hierarchies from the posterior distribution, perform model selection, and how it can be easily be generalized to networks with edge covariates and  node annotations.
17:00 to 17:30 Mason Porter
Multilayer Networks and Applications
One of the most active areas of network science, with an explosion of publications during the last few years, is the study of "multilayer networks," in which heterogeneous types of entities can be connected via multiple types of ties that change in time. Multilayer networks can include multiple subsystems and "layers" of connectivity, and it is important to take multilayer features into account to try to improve our understanding of complex systems. In this talk, I'll give an introduction to multilayer networks and will discuss applications in areas such as transportation, sociology, neuroscience, and ecology.

Related Links
19:30 to 22:00 Formal Dinner at Trinity College
Friday 15th July 2016
09:00 to 09:30 Peter Morters
Spatial preferential attachment networks
We study a family of growing networks, in which new vertices are given a spatial position on the unit circle and are connected to existing vertices with a probability favouring short spatial distances and high degrees. In this model of a scale-free network with clustering we can independently tune the power law exponent τ of the degree distribution and the exponent δ at which the connection probability decreases with the distance of two vertices. We show that the network is robust if τ 2 + 1/(δ−1). This is the first instance of a scale-free network where robustness depends not only on its degree distribution but also on its clustering features. 

Joint work with Emmanuel Jacob (ENS Lyon).
09:30 to 10:00 Nelly Litvak
Ranking algorithms on directed configuration networks
Co-authors: Ningyuan Chen (Yale School of Management), Mariana Olvera-Cravioto (Columbia University)

We study a family of rankings, which includes Google's PageRank, on a directed configuration model. We show that the the rank of a randomly chosen vertex converges in distribution to a finite random variable that can be written as a linear combination of i.i.d. copies of the attracting endogenous solution to a stochastic fixed-point equation. We provide precise asymptotics for this limiting random variable. In particular, if the in-degree distribution in the directed configuration model has a power law distribution, then the limiting distribution of the rank also follows a power law with the same exponent. Such power law behaviour of ranking is well-known from empirical studies of real-life networks. Our asymptotic result gives remarkably good approximation for the complete ranking distribution on configuration networks of moderate size and on the directed graph of English Wikipedia.
10:00 to 10:30 Mike Steele
Twitter Event Networks and Surgery Constructions
The models considered here were motivated by graphs that one constructs by building edges according to Twitter "retweets" i.e. messages that are forwarded with comments. When one considers the largest connected component in such a graph then one finds empirically that some vertex has degree that grows linearly over time, but other vertex degrees grow at a rate that is algebraic and sublinear. A construction that uses surgery on multi-type branching processes gives us a one parameter class of dynamic graph models that captures the "superstar" phenomena that one find in the retweet graphs.
10:30 to 11:00 Morning Coffee
11:00 to 11:30 Catherine Matias
Statistical issues in the stochastic block model
I will try to present a state of the art about statistical results established in the stochastic block model together with some remaining challenges in the domain.
11:30 to 12:00 Nial Friel
Properties of Latent Variable Network Models
We derive properties of Latent Variable Models for networks, a broad class of models that includes the widely-used Latent Position Models. We characterise several features of interest, with particular focus on the degree distribution, clustering coefficient, average path length and degree correlations. We introduce the Gaussian Latent Position Model, and derive analytic expressions and asymptotic approximations for its network properties. We pay particular attention to one special case, the Gaussian Latent Position Model with Random Effects, and show that it can represent heavy-tailed degree distributions, positive asymptotic clustering coefficients and small-world behaviour that often occur in observed social networks. Finally, we illustrate the ability of the models to capture important features of real networks through several well known datasets.
12:00 to 12:30 Cristopher Moore
Information-theoretic bounds and phase transitions in community detection and high-dimensional clustering
Co-authors: Jess Banks (SFI/UC Berkeley), Jiaming Xu (Simons), Joe Neeman (UT Austin), Praneeth Netrapalli (MSR Cambridge)

Over the past decade, a rich interaction has grown up between statistical physics and statistical inference. In particular, physics often predicts phase transitions where, if a data set becomes too sparse or too noisy, it suddenly becomes impossible to find its underlying structure, or even to distinguish it from a “null model” with no structure at all. For instance, in the community detection problem in networks, there is a transition below which the vertices cannot be labeled better than chance, and the network cannot be distinguished from an Erdos-Renyi random graph. Similarly, in the clustering problem in Gaussian mixture models, it becomes impossible to find the clusters, or even to tell whether clusters exist, i.e., to distinguish the data from a single Gaussian.

Many of the physics predictions have been made rigorous, but there is still a very lively interchange between the fields, both about these transitions and about asymptotically optimal algorithms. In particular, while efficient message-passing and spectral algorithms are known that succeed down to the so-called Kesten-Stigum bound, in a number of problems we believe that it is information-theoretically possible (but perhaps not computationally feasible) to go further. I will give a friendly introduction to this field, and present some new results proving this conjecture.

This is joint work with Jess Banks, Joe Neeman, Praneeth Netrapalli, and Jiaming Xu.
12:30 to 13:30 Lunch @ Wolfson Court
13:30 to 14:00 Harrison Zhou
Some optimality results in network analysis and beyond
In this talk I will review some recent optimality results in graphon estimation and community detection and their extensions to structured linear models.
14:00 to 14:30 Karl Rohe
Network driven sampling; a critical threshold for design effects
Web crawling and respondent-driven sampling (RDS) are two types of network driven sampling techniques that are popular when it is difficult to contact individuals in the population of interest. This paper studies network driven sampling as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree, instead of a chain, allows for the sampled units to refer multiple future units into the sample. In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is $D$, then constructing an estimator from the novel design makes the variance of the estimator $D$ times greater than it would be under a simple random sample. Under certain assumptions on the referral tree, the design effect of network driven sampling has a critical threshold that is a function of the referral rate $m$ and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix $\lambda_2$. If $m 1/\lambda_2^2$, then the design effect grows with $n$ (i.e. the standard estimator is no longer $\sqrt{n}$-consistent; it converges at the slower rate of $\log_m \lambda_2$).
14:30 to 15:00 Carey Priebe
Repeated Motif Hierarchical Stochastic Blockmodels
Co-authors: Vince Lyzinski (Johns Hopkins University), Minh Tang (Johns Hopkins University), Avanti Athreya (Johns Hopkins University), Youngser Park (Johns Hopkins University), Joshua Vogelstein (Johns Hopkins University), Keith Levin (Johns Hopkins University)

Based on our methodology for community detection and community comparison in graphs (Lyzinski et al., 2015,, we formulate a model selection procedure for deciding whether a hierarchical stochastic blockmodel graph supports the conjecture of repeated motifs. Such a graph inference procedure promises to address a fundamental outstanding question regarding the atoms of neural computation (Marcus, Marblestone & Dean, 2014,
Related Links
15:00 to 15:30 Afternoon Tea & Posters
15:30 to 16:00 David Choi
Co-clustering of non-smooth graphons
Theoretical results are becoming known for community detection and clustering of networks; however, these results assume an idealized generative model that is unlikely to hold in many settings. Here we consider exploratory co-clustering of a bipartite network, where the rows and columns of the adjacency matrix are assumed to be samples from an arbitrary population. This is equivalent to assuming that the data is generated from a nonparametric model known as a graphon. We show that co-clusters found by any method can be extended to the row and column populations, or equivalently that the estimated blockmodel approximates a blocked version of the generative graphon, with generalization error bounded by n^{-1/2}. Analogous results are also shown for degree-corrected co-blockmodels and random dot product bipartite graphs, with error rates depending on the dimensionality of the latent variable space.
16:00 to 16:30 Purna Sarkar
Convex Relaxation for Community Detection with Covariates
Community detection in networks is an important problem in many applied areas. We investigate this in the presence of node covariates. Recently, an emerging body of theoretical work has been focused on  leveraging information from both the edges in the network and the node covariates to infer community memberships. However, in most parameter regimes, one of the sources of information provides enough information to infer the hidden clusters, thereby making the other source redundant. We show that when the network and the covariates carry ``orthogonal'' pieces of information about the cluster memberships, one can get asymptotically consistent clustering by using them both, while each of them fails individually. 

This is joint work with Bowei Yan, University of Texas at Austin.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons