Graph limits and statistics
Monday 11th July 2016 to Friday 15th July 2016
09:00 to 09:35  Registration  
09:35 to 09:45  Welcome from John Toland (INI Director) and Organisers  INI 1  
09:45 to 10:30 
Cristina Toninelli (Université Pierre et Marie Curie Paris) Kinetically constrained spin models
We shall discuss kinetically constrained spin models (KCSM), a class ofinteracting particle systems which have been introduced in physics literature to model liquidglass and more general jamming transitions. Theevolution of KCSM follows a continuous time Markov process whoseelementary moves are birth and death of particles. The key feature isthat a move occurs only if the conguration satises a local constraint.These constraints induce the existence of clusters of blocked particles,the occurrence of several invariant measures and ergodicity breakingtransitions. We shall present some results concerning the identicationof the ergodic regime and the exponential convergence to equilibriumin this regime. After establishing a connection with monotone cellular automata of bootstrap percolation type, we shall conclude our talkwith a discussion of a particular KCSM for which the correspondingcellular automaton has a peculiar discontinuous percolation transition.

INI 1  
10:30 to 11:15 
Peter Csikvari (Massachusetts Institute of Technology); (Eötvös Loránd University) Statistical matching theory with a glimpse towards extremal regular graphs
In this talk we will survey some recent development on statistical properties
of matchings of very large and infinite graphs. The main goal of the talk is to
describe a few applications of a new concept called matching measure. These
applications include new results on the number of (perfect) matchings in large
girth graphs as well as simple new proofs of certain statistical physical
theorems. In particular, we will sketch a new proof of Schrijver's lower bound
for the number of perfect matchings of regular bipartite graphs. We will also
mention some extremal graph theoretic results about other graph parameters of
regular graphs. This talk is based on joint papers with various subsets of Miklos Abert, Peter E. Frenkel, Tamas Hubai and Gabor Kun. 
INI 1  
11:15 to 11:45  Morning Coffee  
11:45 to 12:30 
Oliver Riordan (University of Oxford) The phase transition in Achlioptas processes
The classical random graph process starts with a fixed set of n vertices and no edges. Edges are then added onebyone, uniformly at random. One of the most interesting features of this process, established by Erdős and Rényi more than 50 years ago, is the phase transition near n/2 edges, where a single `giant' component emerges from a sea of small components. This example serves as a starting point for understanding phase transitions in a wide variety of other contexts. Around 2000, Dimitris Achlioptas suggested an innocentsounding variation of the model: at each stage two edges are selected at random, but only one is added, the choice depending on (typically) the sizes of the components it would connect. This may seem like a small change, but these processes do not have the key independence property that underlies our understanding of the classical process. One can ask many questions about Achlioptas processes; the most interesting concern the phase transition: does the critical value change from n/2? Is the nature of the transition the same or not? I will describe a number of results on these questions from joint work with Lutz Warnke. 
INI 1  
12:30 to 13:30  Lunch @ Wolfson Court  
13:45 to 14:30 
Paul Smith (University of Cambridge) Towards universality in bootstrap percolation
Bootstrap percolation is a broad class of monotone cellular automata, which has links to the Glauber dynamics of the Ising model and other areas of statistical physics. Starting with random initial conditions, the question is to determine the threshold for complete occupation of the underlying graph. Until relatively recently, only nearestneighbour models (and relatively minor variants of these models) had been studied  and these are now very well understood. In this talk I will discuss a new `universality' theory for bootstrap percolation, which has emerged in the last few years. In particular, I will explain the classification of twodimensional models, give more precise results for socalled `critical' models (also in two dimensions), and talk about a new classification theorem for higher dimensional models. 
INI 1  
14:30 to 15:00 
Peter Orbanz (Columbia University) Subsampling, symmetry and averaging in networks
Consider a very large graphsay, the link graph of a large social network. Now invent a randomized algorithm that extracts a smaller subgraph. If we use the subgraph as sample data and perform statistical analysis on this sample, what can we learn about the underlying network? Clearly, that should depend on the subsampling algorithm. I show how the choice of algorithm defines a notion of (1) distributional invariance and (2) of averaging within a single large graph. Under suitable conditions, the resulting averages satisfy a law of large numbers, such that statistical inference from a single sample graph is indeed possible. From this algorithmic point of view, graphon models arise from a specific choice of sampling algorithm, various known pathologies of these models are explained as a selection bias. 
INI 1  
15:00 to 15:30  Afternoon Tea & Posters  
15:30 to 16:00 
Francois Caron (University of Oxford) Sparse and modular networks using exchangeable random measures
Statistical network modeling has focused on representing
the graph as a discrete structure, namely the adjacency
matrix, and considering the exchangeability of this array.
In such cases, it is well known that the graph is
necessarily either dense (the number of edges scales
quadratically with the number of nodes) or trivially
empty. Here, we instead consider representing the graph as a measure on the plane. For the associated definition of exchangeability, we rely on the Kallenberg representation theorem (Kallenberg, 1990). For certain choices of such exchangeable random measures underlying the graph construction, the network process is sparse with powerlaw degree distribution, and can accommodate an overlapping blockstructure. A Markov chain Monte Carlo algorithm is derived for efficient exploration of the posterior distribution and allows to recover the structure of a range of networks ranging from dense to sparse based on our flexible formulation. Joint work with Emily Fox and Adrien Todeschini http://arxiv.org/abs/1401.1137 http://arxiv.org/pdf/1602.02114 
INI 1  
16:00 to 16:30 
David Blei (Columbia University) Exponential Family Embeddings
Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this talk, I will describe exponential family embeddings, a class of methods that extends the idea of word embeddings to other types of highdimensional data. As examples, we studied neural data with realvalued observations, count data from a market basket analysis, and ratings data from a movie recommendation system. We then extended this idea to networks, giving a novel type of "latent space" model. The main idea behind an EFEMB is to model each observation conditioned on a set of other observations. This set is called the context, and the way the context is defined is a modeling choice that depends on the problem. In language the context is the surrounding words; in neuroscience the context is closeby neurons; in market basket data the context is other items in the shopping cart; in networks the context is edges emanating from a node pair. Each type of embedding model defines the context, the exponential family of conditional distributions, and how the latent embedding vectors are shared across data. We infer the embeddings with a scalable algorithm based on stochastic gradient descent. We found exponential family embedding models to be more effective than other types of dimension reduction. They better reconstruct heldout data and find interesting qualitative structure. 
INI 1  
16:30 to 17:30  Welcome Wine Reception 
09:00 to 09:30 
David Aldous (University of California) A framework for imperfectly observed networks
Model a network as an edgeweighted graph, where the (unknown) weight $w_e$ of edge $e$ indicates the frequency of observed interactions, that is over time $t$ we observe a Poisson($t w_e$) number of interactions across edges $e$. How should we estimate some given statistic of the underlying network? This leads to wideranging and challenging problems.

INI 1  
09:30 to 10:00 
Richard Kenyon (Brown University) Can you see the sound of a drum?
We discuss planar networks and harmonic functions on
them, and in particular two reconstruction problems for "hidden"
networks:
1. To what extent can one reconstruct a network from boundary measurements? 2. Can one reconstruct conductances from a ``harmonic embedding" of the network? 
INI 1  
10:00 to 10:30 
Charles Radin (University of Texas at Austin) Emergent phases in constrained random networks
Structures are shown to emerge in large random networks, under variable constraints, when examined at large scale. They are automatically created by the coalescence of nodes into a small number of equivalence classes, which must cooperate to create structure in order to satisfy the constraints. The structures change gradually as constraints vary within a phase, and change sharply at phase transitions. This talk is based on joint work with Richard Kenyon, Kui Ren and Lorenzo Sadun. Related Links

INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:30 
Nikhil Srivastava (University of California, Berkeley) Spectral Sparsification of Graphs
Modern graph algorithms increasingly access and manipulate
graphs via their Laplacian operators and other associated “continuous” objects,
rather than purely discretely. An
important primitive in this paradigm is spectral sparsification: being able to
approximate the Laplacian of a given graph by that of a graph with significantly
fewer edges. I will survey some of the key results in this area, drawing on
tools from random matrix theory, matrix analysis, and electrical network
theory.

INI 1  
11:30 to 12:00 
He Sun (University of Bristol) Partitioning WellClustered Graphs: Spectral Clustering Works!
We study variants of the widely used spectral clustering that partitions a graph into k clusters by (1) embedding the vertices of a graph into a lowdimensional space using the bottom eigenvectors of the Laplacian matrix, and (2) grouping the embedded points into k clusters via kmeans algorithms. We show that, for a wide class of graphs, spectral clustering gives a good approximation of the optimal clustering. While this approach was proposed in the early 1990s and has comprehensive applications, prior to our work similar results were known only for graphs generated from stochastic models. We also give a nearlylinear time algorithm for partitioning wellclustered graphs based on computing a matrix exponential andapproximate nearest neighbor data structures. Based on joint work with Richard Peng (Georgia Institute of Technology), and Luca Zanetti (University of Bristol). Reference: http://arxiv.org/abs/1411.2021 
INI 1  
12:00 to 12:30 
Amin Saberi (Stanford University) Thin spanning trees and their algorithmic applications
Motivated
by Jaeger's modular orientation conjecture, Goddyn asked the following
question: A spanning tree of a graph G is called epsilonthin if it contains at most an epsilon fraction of the edges of each cut in that graph. Is there a function f:(0,1)→ℤ such that every f(epsilon)edgeconnected graph has an epsilonthin spanning tree? I will talk about our journey in search of such thin trees, their applications concerning traveling salesman problems, and unexpected connections to graph sparsification and the KadisonSinger problem. Bio: http://stanford.edu/~saberi/bio.txt 
INI 1  
12:30 to 13:30  Lunch @ Wolfson Court  
13:30 to 14:00 
Haiyan Huang (University of California, Berkeley) Inferring genegene associations and gene networks beyond standard statistical models
With the advent of highthroughput technologies making largescale gene
expression data readily available, developing appropriate computational tools to
infer gene interactions has been a major challenge in systems biology. In this
talk, I will discuss two methods of finding gene associations that differ in
their considerations of how genes behave across the given samples. The first
method applies to the case where the patterns of gene association may change or
only exist in a subset of all the samples. The second method goes beyond
pairwise gene relationships to higher level group interactions, but requiring
similar gene behaviours across all the samples. We compare both methods to other
popular approaches using simulated and real data, and demonstrate they lead to
better general performance and capture important biological features in certain
situations that are missed by the other methods.

INI 1  
14:00 to 14:30 
Florian Markowetz (University of Cambridge) Understanding genetic interaction networks
Genes do not act in isolation, but rather in tight interaction networks. Maps
of genetic interactions between pairs of genes are a powerful way to dissect
these relationships. I will discuss two statistical approaches to better
understand the functional content of genetic interaction networks and the
mechanism underlying them. First, I will present a method that adapt concepts of spatial statistics to assess the functional content of molecular networks. Based on the guiltbyassociation principle, our approach (called SANTA) quantifies the strength of association between a gene set and a network, and functionally annotates molecular networks like other enrichment methods annotate lists of genes. Second, I will describe a method to understand genetic interactions based on highdimensional phenotypes. I will present methodology we developed to test the hypothesis that complex relationships between a gene pair can be explained by the action of a third gene that modulates the interaction. Our approach to test this hypothesis builds on Nested Effects Models (NEMs), a probabilistic model tailored to inferring networks from gene perturbation data. We have extended NEMs with logical functions to model gene interactions and show in simulations and case studies that our approach can successfully infer modulators of genetic interactions and thus lead to a better understanding of an important feature of cellular organisation. Related Links

INI 1  
14:30 to 15:00  Afternoon Tea & Posters  
15:00 to 15:30 
Michael Leung (Massachusetts Institute of Technology) A Weak Law for Moments of PairwiseStable Networks
We develop asymptotic theory for strategic networkformation models under the
assumption that the econometrician observes a single large pairwisestable
network. Drawing on techniques from the literature on random graphs, we derive
primitive restrictions on the model that establish a weak law of large numbers
for a useful class of network moments. Under these restrictions, we show that
the model generates realistic networks that are sparse and may contain "giant"
connected subnetworks, two wellknown properties of realworld social networks.
The restrictions also conveniently suggest a new method to simulate
counterfactual networks that avoids a wellknown curse of dimensionality.
Additionally, we characterize the identified set of structural parameters based
on a tractable class of dyadlevel network moments and construct consistent set
estimators.
Related Links

INI 1  
15:30 to 16:00 
Oleg Sofrygin (University of California, Berkeley) Estimation of Causal Effects in NetworkDependent Observational Data
Coauthor: Mark J. van der Laan (University of
California, Berkeley, CA) We outline the framework of targeted maximum likelihood estimation (TMLE) in observational network data. Consider a dataset in which each observational unit is causally connected to other units via a known social or geographical network. For each unit we observe their baseline covariates, their exposure and their outcome, and we are interested in estimating the effect of a single timepoint stochastic intervention. We propose a semiparametric statistical model that allows for betweenunit dependencies: First, unitlevel exposure can depend on the baseline covariates of other connected units. Second, the unitlevel outcome can depend on the baseline covariates and exposures of other connected units. We impose some restrictions on our model, e.g., assuming that the unit's exposure and outcome depend on other units as some known (but otherwise arbitrary) summary measures of fixed dimensionality. A practical application of our approach is demonstrated in a largescale networ k simulation study that applies two newly developed R packages: simcausal and tmlenet. We also discuss some extensions of our work towards estimation in longitudinal data. 
INI 1  
16:00 to 16:30 
Dean Eckles (Massachusetts Institute of Technology) Statistical and causal inference in networks
I survey the state of methods for statistical and causal inference in networks. I present Fisherian randomization inference methods for testing for network interference (i.e., spillovers, exogenous peer effects) as an example. See https://arxiv.org/abs/1506.02084 
INI 1 
09:00 to 09:45 
Balazs Szegedy (University of Toronto) On the graph limit approach to random regular graphs
Let G=G(n,d) denote the random
dregular graph on n vertices. A celebrated result by J. Friedman
solves Alon's second eigenvalue conjecture saying that if d is fixed and n is
large then G is close to be Ramanujan. Despite of significant effort, much less
was known about the structure of the eigenvectors of G. We use a combination
of graph limit theory and information theory to prove that every
eigenvector of G (when normalized to have length equal to square root of n) has
an entry distribution that is close to some Gaussian distribution in the weak
topology. Our results also work in the more general setting of
almosteigenvectors. We hope our methods will lead to a general graph limit
approach to a large class of problems on random regular graphs. Joint work
with A. Backhausz.

INI 1  
09:45 to 10:30 
Fabio Martinelli (Università degli Studi Roma Tre) Bootstrap percolation and kinetically constrained spin models: critical lengths and mixing time scales
Coauthors: Cristina Toninelli (Univ Paris VII
Diderot), Rob Morris (IMPA) In recent years, a great deal of progress has been made in understanding the behaviour of a particular class of monotone cellular automata, commonly known as bootstrap percolation. In particular, if one considers only twodimensional automata, then we now have a fairly precise understanding of the typical evolution of these processes, starting from prandom initial conditions of infected sites. Given a bootstrap model, one can consider the associated kinetically constrained spin model in which the state (infected or healthy) of a vertex is resampled (independently) at rate 1 by tossing a pcoin if it could be infected in the next step by the bootstrap process, and remains in its current state otherwise. Here p is the probability of infection. The main interest in KCM’s stems from the fact that, as p → 0, they mimic some of the most striking features of the glass transition, a major and still largely open problem in condensed matter physics. In this talk, motivated by recent universality results for bootstrap percolation, I’ll discuss some “universality conjectures” concerning the growth of the (random) infection time of the origin in a KCM as p → 0. Joint project with R. Morris (IMPA) and C. Toninelli (Paris VII), 
INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:45 
Svante Janson (Uppsala Universitet) Graph limits and entropy
The entropy of a graph limit, or a graphon, was essentially defined by Aldous (although in a different, equivalent formulation), who showed that if a graph limit W has entropy H, then the entropy of the distribution of the corresponding random graph G(n,W) is asymptotically n^2 H/2. Consider a hereditary class of graphs Q. Then the number of graphs of order n in Q is asymptotically given by exp( n^2 H/2), where H is the maximum entropy of a graph limit that is a limit of graphs in Q. Moreover, if this entropy maximizing limit is unique, then a uniformly random graph of order n in Q converges in probability, as n tends to infinity, to this maximizing graph limit. As an example, we discuss the entropy maximising graph limits for the class of string graphs. This is based on joint works with Hamed Hatami and Balasz Szegedy, and with Andrew Uzzel. 
INI 1  
11:45 to 12:30 
Paul Neville Balister (University of Memphis) Barrier coverage in thin strips
Suppose sensors are deployed randomly in a long thin strip, and suppose each sensor can detect objects within a fixed distance. We say that the sensors achieve barrier coverage if there is no path across the strip that a small object can follow that avoids detection by the sensors. We give fairly precise results on the probability that barrier coverage is achieved as a function of the range of the sensors, the height and length of the strip, and the number of sensors deployed. In particular, we show that the most likely obstruction  a rectangular region crossing the strip which is devoid of sensors  does not in general dominate the probability of failure of barrier coverage. Joint work with Bela Bollobas and Amites Sarkar. 
INI 1  
12:30 to 13:30  Lunch @ Wolfson Court  
13:30 to 14:15 
Vincent Tassion (Université de Genève) Sharpness of the phase transition for Voronoi percolation in $\mathbb R^d$
Take a Poisson point process
on $\mathbb R^d$ and consider its Voronoi tessellation. Colour each cell of the tessellation black with probability $p$ and white with probability $1p$ independently of each other. This rocess undergoes a phase transition at a critical parameter $p_c(d)$: below $p_c(d)$ all the black connected components are bounded almost surely, and above $p_c$ there is an unbounded black connected component almost surely. In any dimension $d$ larger than 2, we prove that for $p The talk is based on a joint work with H. DuminilCopin and A. Raoufi. 
INI 1  
14:15 to 14:45 
Mark Newman (University of Michigan) Contact networks and the spread of epidemic disease
The structure of the networks of contact between individuals can have a
profound effect on the rate and extent of disease epidemics. This talk will
outline some of the fundamental mathematics connecting network structure with
disease outcomes and describe a number of case studies of applications to
infectious diseases, including influenza, HIV, Mycoplasma pneumonia, and SARS.

INI 1  
14:45 to 15:15  Afternoon Tea & Posters  
15:15 to 15:45 
Alfred Hero (University of Michigan) Continuum limits for minimal paths
Many applications
involve computing minimal paths over the nodes of a graph relative to a measure
of pairwise node dissimilarity. These include minimal spanning trees in
computer vision, shortest paths in image databases, or nondominated antichains in multiobjective
database search. When the nodes are
random vectors and the dissimilarity is an increasing function of Euclidean
distance these minimal paths can have continuum limits as the number of nodes
approaches infinity. Such continuum limits can lead to low complexity diffusion
approximations to the solution of the
combinatorial minimal path problem.

INI 1 
09:00 to 09:30 
Sharmodeep Bhattacharyya (Oregon State University) Spectral Clustering for Dynamic Stochastic Block Model
Coauthor: Shirshendu Chatterjee (CUNY) One of the most common and crucial aspect of many network data sets is the dependence of network link structure on time or other attributes. There is a long history of researchers proposing networks for dynamic timeevolving formation of networks. Most complex networks, starting from biological networks like genetic or neurological networks to social, coauthorship and citation networks are timevarying. This has led the researchers to study dynamic, timeevolving networks. In this work, we consider the problem of finding a common clustering structure in timevarying networks. We consider three simple extension of spectral clustering methods to dynamic settings and give theoretical justification that the spectral clustering methods produce consistent community detection for such dynamic networks. We also propose an extension of the static version of nonparametric latent variable models to the dynamic setting and use a special case of the model to justify the spectral clusteri ng methods. We show the validity of the theoretical results via simulations too and apply the clustering methods to realworld dynamic biological networks. 
INI 1  
09:30 to 10:00 
Arash Amini (University of California, Los Angeles) Matched bipartite block model with covariates
Community
detection in bipartite networks is challenging. Motivated by an application
from biology, we consider a model which allows for matched communities in the
bipartite setting, in addition to node covariates with information about the
matching. We derive a simple fast algorithm for fitting the model based on
variational inference ideas and show its effectiveness in simulations. 
INI 1  
10:00 to 10:30 
Patrick Wolfe (University College London) tba 
INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:30 
Sofia Olhede (University College London) Nonparametrics for networks
Relational data have become an important component of modern statistics.
Networks, and weighted networks, are ubiquitous in modern applications such as
disease dynamics, ecology, financial contagion, and neuroscience. The inference
of networks is harder, in parts because the measure placed on the observables
need to satisfy sets of permutation invariances, and most networks are very sparse,
with most possible relations not present. This talk will explore how to best construct nonparametric summaries of such objects, in such as way that the underlying statistical model of the observations is well described, and any estimators computable with scalable algorithms. This is joint work with PierreAndre Maugis and Patrick Wolfe (UCL). 
INI 1  
11:30 to 12:00 
Elizaveta Levina (University of Michigan) Estimating network edge probabilities by neighborhood smoothing
Coauthors: Yuan Zhang (Ohio State University), Ji
Zhu (University of Michigan) The problem of estimating probabilities of network edges from the observed adjacency matrix has important applications to predicting missing links and network denoising. It has usually been addressed by estimating the graphon, a function that determines the matrix of edge probabilities, but is illdefined without strong assumptions on the network structure. Here we propose a novel computationally efficient method based on neighborhood smoothing to estimate the expectation of the adjacency matrix directly, without making the strong structural assumptions graphon estimation requires. The neighborhood smoothing method requires little tuning, has a competitive meansquared error rate, and outperforms many benchmark methods on the task of link prediction in both simulated and real networks. Related Links

INI 1  
12:00 to 12:30 
Peter Bickel (University of California, Berkeley) Fitting block models with covariates:”Likelihood” methods ,sparsity and selection of the number of blocks
Coauthors: Purna Sarkar (U. of Texas,Austin),
Soumendu Mukherjee (U C, Berkeley), Sharmodeep Bhattacharyya,
(Oregon State University) We introduce block models with edge and block covariates ,along the lines of Hoff, Handcock,Raftery (2002) ,specializing to covariate forms of the types proposed by Zhao, Levina,Zhu(2014) and generalizing that of Newman ,Clauset(2015)..WE study mean field variational fitting in this situation along the lines of Celisse,Daudin,Pierre( 2011 ) and B.,Choi,Chang,Zhang (2013).Using theory,examples, and simulations we argue that these methods ,possibly regularized lead to models with a sparse number of blocks. 
INI 1  
12:30 to 13:30  Lunch @ Wolfson Court  
13:30 to 14:00 
Bin Yu (University of California, Berkeley) Artificial neurons meet real neurons: pattern selectivity in V4 via deep learning
Coauthors: Yuansi Chen (UCB), Reza Abbasi Asl
(UCB), Adam Bloniarz (UCB), Jack Gallant (UCB) Vision in humans and in nonhuman primates is mediated by a constellation of hierarchically organized visual areas. One important area is V4, a large retinotopicallyorganized area located intermediate between primary visual cortex and highlevel areas in the inferior temporal lobe. V4 neurons have highly nonlinear response properties. Consequently, it has been difficult to construct quantitative models that accurately describe how visual information is represented in V4. To better understand the filtering properties of V4 neurons we recorded from 71 well isolated cells stimulated with natural images. We fit predictive models of neuron spike rates using transformations of natural images learned by a convolutional neural network (CNN). The CNN was trained for image classification on the ImageNet dataset. To derive a model for each neuron, we first propagate each of the stimulus images forward to an inner layer of the CNN. We use the activations of the inner layer as the featu re (predictor) vector in a high dimensional regression, where the response rate of the V4 neuron is taken as the response vector. Thus, the final model for each neuron consists of a multilayer nonlinear transformation provided by the CNN, and one final linear layer of weights provided by regression. We find that models using the first two layers of three wellknown CNNs provide better predictions of responses of V4 neurons than those obtained using a conventional Gaborlike wavelet model. To characterize the spatial and pattern selectivity of each V4 neuron, we both explicitly optimize the input image to maximize the predicted spike rate, and visualize the selected filters of the CNN. We also perform dimensionality reduction by sparse PCA to visualize the population of neurons. Finally, we show the stability of our analysis across the three CNNs, and conclude that the V4 neurons are tuned to a remarkable diversity of shapes such as curves, blobs, checkerboard patterns, and V1like gratings. 
INI 1  
14:00 to 14:30 
Thomas Nichols (University of Warwick) Modelling Large and SmallScale Brain Networks
Investigations of the human brain with neuroimaging have recently seen a
dramatic shift in focus, from "brain mapping", identifying brain regions related
to particular functions, to connectivity or "connectomics", identifying networks
of coordinated brain regions, and how these networks behave at rest and during
tasks. In this presentation I will discuss two quite different approaches to
modeling brain connectivity. In the first work, we use Bayesian time series
methods to allow for timevarying connectivity. Nonstationarity connectivity
methods typically use a movingwindow approach, while this method poses a single
generative model for all nodes, all time points. Known as a "Multiregression
Dynamic Model" (MDM), it comprises an extension of a traditional Bayesian
Network (or Graphical Model), by posing latent timevarying coefficients that
implement a regression a given node on its parent nodes. Intended for a modest
number of nodes (up to about 12), a MDM allows inference of the structure of the
graph using closed form Bayes factors (conditional on a single estimated
"discount factor", reflecting the balance of observation and latent variance.
While originally developed for directed acyclic graphs, it can also accommodate
directed (possibly cyclic) graphs as well. In the second work, we use mixtures
of simple binary random graph models to account for complex structure in brain
networks. In this approach, the network is reduced to a binary adjacency matrix.
While this is invariably represents a loss of information, it avoids a
Gaussianity assumption and allows the use of much larger graphs, e.g. with 100's
of nodes. Daudin et al. (2008) proposed a "ErdosReyni Mixture Model", which
assumes that, after an unknown number of latent node classes have been
estimated, that connections arise as Bernoulli counts, homogeneously for each
pair of classes. We extend this work to account for multisubject data (where
edge data are now Binomially distributed), allowing
Related Links

INI 1  
14:30 to 15:00 
Mitya Chklovskii (Simons Foundation); (New York University) Natural neural networks: structure, dynamics and computation
The brain is a directed weighted graph with neurons as nodes
and synapses as edges. The structure of the graph determines the dynamics of
neural activity. At the same time, correlations in neural activity determine synaptic
weights. The grand challenge of neuroscience is to construct a theory relating neural
structure and dynamics to computation. I will present our progress towards this
goal using an online matrix factorization framework.

INI 1  
15:00 to 15:30  Afternoon Tea & Posters  
15:30 to 16:00 
Alexandre Arenas (Universitat Rovira i Virgili) The physics of spreading processes in multilayer networks
The study of networks plays a crucial role in investigating the structure, dynamics, and function
of a wide variety of complex systems in myriad disciplines. Despite the success of traditional
network analysis, standard networks provide a limited representation of complex systems,
which often include different types of relationships (i.e., “multiplexity”) among their
constituent components and/or multiple interacting subsystems. Such structural complexity
has a significant effect on both dynamics and function. Throwing away or aggregating available
structural information can generate misleading results and be a major obstacle towards
attempts to understand complex systems. The recent “multilayer” approach for modeling
networked systems explicitly allows the incorporation of multiplexity and other features of
realistic systems. On one hand, it allows one to couple different structural relationships by encoding them in a convenient mathematical object. On the other hand, it also allows one to
couple different dynamical processes on top of such interconnected structures. The resulting
framework plays a crucial role in helping achieve a thorough, accurate understanding of
complex systems. The study of multilayer networks has also revealed new physical phenomena
that remain hidden when using ordinary graphs, the traditional network representation.
Here we survey progress towards attaining a deeper understanding of spreading processes on
multilayer networks, and we highlight some of the physical phenomena related to spreading
processes that emerge from multilayer structure.

INI 1  
16:00 to 16:30 
Ginestra Bianconi (Queen Mary University of London) Message passing theory for percolation models on multiplex networks
Multiplex networks describe a large variety of complex systems including
infrastructures, transportation networks and biological systems.In presence of interdepedencies between the nodes the robustness of the system to random failures is determined by the mutually connected giant component (MCGC). Much progress on the emergence of the MCGC has been achieved so far but the characterization of the emergence of the MCGC is multiplex networks with link overlap in an arbitrary large number of layers has remained elusive. Here I will present a message passing algorithm for characterizing the
percolation transition in multiplex networks with link overlap and an arbitrary
number of layers M and I will derive the equation for the order parameter in an ensembles of random multiplex networks. Specifically I will propose and compare two message passing
algorithms, that generalize the algorithm widely used to study the percolation
transition in multiplex networks without link overlap. The first algorithm
describes a directed percolation transition and admits an epidemic spreading
interpretation. The second algorithm describes the emergence of the mutually
connected giant component, that is the percolation transition, but does not
preserve the epidemic spreading interpretation. The phase diagrams
for the percolation and directed percolation transition is obtained in simple
representative cases. For the same multiplex network
structure, in which the directed percolation transition has nontrivial
tricritical points, the percolation transition has a discontinuous phase
transition, with the exception of the trivial case in which all the layers
completely overlap.

INI 1  
16:30 to 17:00 
Tiago Peixoto (ISI Foundation); (Universität Bremen) Efficient Bayesian inference of multiscale network structures
A principled approach to characterize the hidden structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a popular choice for this task is the stochastic block model, where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. In this talk, we will present a nonparametric Bayesian inference framework based on a microcanonical formulation of the stochastic block model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference of large networks, but also reveal structures at multiple scales; 2. A very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges, but also with an unlimited number of groups. We show also how this approach can be used to sample group hierarchies from the posterior distribution, perform model selection, and how it can be easily be generalized to networks with edge covariates and node annotations. 
INI 1  
17:00 to 17:30 
Mason Porter (University of Oxford) Multilayer Networks and Applications
One of the most active areas of network science, with an explosion of
publications during the last few years, is the study of "multilayer networks,"
in which heterogeneous types of entities can be connected via multiple types of
ties that change in time. Multilayer networks can include multiple subsystems
and "layers" of connectivity, and it is important to take multilayer features
into account to try to improve our understanding of complex systems. In this
talk, I'll give an introduction to multilayer networks and will discuss
applications in areas such as transportation, sociology, neuroscience, and
ecology.
Related Links

INI 1  
19:30 to 22:00  Formal Dinner at Trinity College 
09:00 to 09:30 
Peter Morters (University of Bath) Spatial preferential attachment networks
We study a family of growing networks, in which new vertices are given a spatial position on the unit circle and are connected to existing vertices with a probability favouring short spatial distances and high degrees. In this model of a scalefree network with clustering we can independently tune the power law exponent τ of the degree distribution and the exponent δ at which the connection probability decreases with the distance of two vertices. We show that the network is robust if τ 2 + 1/(δ−1). This is the first instance of a scalefree network where robustness depends not only on its degree distribution but also on its clustering features. Joint work with Emmanuel Jacob (ENS Lyon). 
INI 1  
09:30 to 10:00 
Nelly Litvak (Universiteit Twente) Ranking algorithms on directed configuration networks
Coauthors: Ningyuan Chen (Yale School of
Management), Mariana OlveraCravioto (Columbia University) We study a family of rankings, which includes Google's PageRank, on a directed configuration model. We show that the the rank of a randomly chosen vertex converges in distribution to a finite random variable that can be written as a linear combination of i.i.d. copies of the attracting endogenous solution to a stochastic fixedpoint equation. We provide precise asymptotics for this limiting random variable. In particular, if the indegree distribution in the directed configuration model has a power law distribution, then the limiting distribution of the rank also follows a power law with the same exponent. Such power law behaviour of ranking is wellknown from empirical studies of reallife networks. Our asymptotic result gives remarkably good approximation for the complete ranking distribution on configuration networks of moderate size and on the directed graph of English Wikipedia. 
INI 1  
10:00 to 10:30 
Mike Steele (University of Pennsylvania) Twitter Event Networks and Surgery Constructions
The models considered here were motivated by graphs that one constructs by building edges according to Twitter "retweets" i.e. messages that are forwarded with comments. When one considers the largest connected component in such a graph then one finds empirically that some vertex has degree that grows linearly over time, but other vertex degrees grow at a rate that is algebraic and sublinear. A construction that uses surgery on multitype branching processes gives us a one parameter class of dynamic graph models that captures the "superstar" phenomena that one find in the retweet graphs.

INI 1  
10:30 to 11:00  Morning Coffee  
11:00 to 11:30 
Catherine Matias (CNRS (Centre national de la recherche scientifique)); (Université Pierre et Marie Curie Paris) Statistical issues in the stochastic block model
I will try to present a state of the art about statistical results established in the stochastic block model together with some remaining challenges in the domain. 
INI 1  
11:30 to 12:00 
Nial Friel (University College Dublin) Properties of Latent Variable Network Models
We derive properties of Latent Variable Models for networks, a broad class of
models that includes the widelyused Latent Position Models. We characterise several
features of interest, with particular focus on the degree distribution, clustering
coefficient, average path length and degree correlations. We introduce the Gaussian
Latent Position Model, and derive analytic expressions and asymptotic approximations
for its network properties. We pay particular attention to one special case,
the Gaussian Latent Position Model with Random Effects, and show that it can
represent heavytailed degree distributions, positive asymptotic clustering coefficients
and smallworld behaviour that often occur in observed social networks.
Finally, we illustrate the ability of the models to capture important features of real
networks through several well known datasets.

INI 1  
12:00 to 12:30 
Cristopher Moore (Santa Fe Institute) Informationtheoretic bounds and phase transitions in community detection and highdimensional clustering
Coauthors: Jess Banks (SFI/UC Berkeley), Jiaming
Xu (Simons), Joe Neeman (UT Austin), Praneeth Netrapalli
(MSR Cambridge) Over the past decade, a rich interaction has grown up between statistical physics and statistical inference. In particular, physics often predicts phase transitions where, if a data set becomes too sparse or too noisy, it suddenly becomes impossible to find its underlying structure, or even to distinguish it from a “null model” with no structure at all. For instance, in the community detection problem in networks, there is a transition below which the vertices cannot be labeled better than chance, and the network cannot be distinguished from an ErdosRenyi random graph. Similarly, in the clustering problem in Gaussian mixture models, it becomes impossible to find the clusters, or even to tell whether clusters exist, i.e., to distinguish the data from a single Gaussian. Many of the physics predictions have been made rigorous, but there is still a very lively interchange between the fields, both about these transitions and about asymptotically optimal algorithms. In particular, while efficient messagepassing and spectral algorithms are known that succeed down to the socalled KestenStigum bound, in a number of problems we believe that it is informationtheoretically possible (but perhaps not computationally feasible) to go further. I will give a friendly introduction to this field, and present some new results proving this conjecture. This is joint work with Jess Banks, Joe Neeman, Praneeth Netrapalli, and Jiaming Xu. 
INI 1  
12:30 to 13:30  Lunch @ Wolfson Court  
13:30 to 14:00 
Harrison Zhou (Yale University) Some optimality results in network analysis and beyond
In this talk I will review some recent optimality results in graphon
estimation and community detection and their extensions to structured linear
models.

INI 1  
14:00 to 14:30 
Karl Rohe (University of WisconsinMadison) Network driven sampling; a critical threshold for design effects
Web crawling and respondentdriven sampling (RDS) are two types of network driven sampling techniques that are popular when it is difficult to contact individuals in the population of interest. This paper studies network driven sampling as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree, instead of a chain, allows for the sampled units to refer multiple future units into the sample. In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is $D$, then constructing an estimator from the novel design makes the variance of the estimator $D$ times greater than it would be under a simple random sample. Under certain assumptions on the referral tree, the design effect of network driven sampling has a critical threshold that is a function of the referral rate $m$ and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix $\lambda_2$. If $m 1/\lambda_2^2$, then the design effect grows with $n$ (i.e. the standard estimator is no longer $\sqrt{n}$consistent; it converges at the slower rate of $\log_m \lambda_2$).

INI 1  
14:30 to 15:00 
Carey Priebe (Johns Hopkins University) Repeated Motif Hierarchical Stochastic Blockmodels
Coauthors: Vince Lyzinski (Johns Hopkins
University), Minh Tang (Johns Hopkins University), Avanti Athreya
(Johns Hopkins University), Youngser Park (Johns Hopkins
University), Joshua Vogelstein (Johns Hopkins University), Keith
Levin (Johns Hopkins University) Based on our methodology for community detection and community comparison in graphs (Lyzinski et al., 2015, http://arxiv.org/abs/1503.02115), we formulate a model selection procedure for deciding whether a hierarchical stochastic blockmodel graph supports the conjecture of repeated motifs. Such a graph inference procedure promises to address a fundamental outstanding question regarding the atoms of neural computation (Marcus, Marblestone & Dean, 2014, http://arxiv.org/abs/1410.8826). Related Links 
INI 1  
15:00 to 15:30  Afternoon Tea & Posters  
15:30 to 16:00 
David Choi (Carnegie Mellon University) Coclustering of nonsmooth graphons
Theoretical results are becoming known for community detection and clustering
of networks; however, these results assume an idealized generative model that is
unlikely to hold in many settings. Here we consider exploratory coclustering of
a bipartite network, where the rows and columns of the adjacency matrix are
assumed to be samples from an arbitrary population. This is equivalent to
assuming that the data is generated from a nonparametric model known as a
graphon. We show that coclusters found by any method can be extended to the row
and column populations, or equivalently that the estimated blockmodel
approximates a blocked version of the generative graphon, with generalization
error bounded by n^{1/2}. Analogous results are also shown for degreecorrected
coblockmodels and random dot product bipartite graphs, with error rates
depending on the dimensionality of the latent variable space.

INI 1  
16:00 to 16:30 
Purna Sarkar (University of Texas at Austin) Convex Relaxation for Community Detection with Covariates
Community detection in networks is an important problem in many applied areas. We investigate this in the presence of node covariates. Recently, an emerging body of theoretical work has been focused on leveraging information from both the edges in the network and the node covariates to infer community memberships. However, in most parameter regimes, one of the sources of information provides enough information to infer the hidden clusters, thereby making the other source redundant. We show that when the network and the covariates carry ``orthogonal'' pieces of information about the cluster memberships, one can get asymptotically consistent clustering by using them both, while each of them fails individually. This is joint work with Bowei Yan, University of Texas at Austin. 
INI 1 