# Seminars (SNA)

Videos and presentation materials from other INI events are also available.

Search seminar archive

Event When Speaker Title Presentation Material
SNAW01 11th July 2016
09:45 to 10:30
Cristina Toninelli Kinetically constrained spin models
We shall discuss kinetically constrained spin models (KCSM), a class ofinteracting particle systems which have been introduced in physics liter-ature to model liquid-glass and more general jamming transitions. Theevolution of KCSM follows a continuous time Markov process whoseelementary moves are birth and death of particles. The key feature isthat a move occurs only if the conguration satises a local constraint.These constraints induce the existence of clusters of blocked particles,the occurrence of several invariant measures and ergodicity breakingtransitions. We shall present some results concerning the identicationof the ergodic regime and the exponential convergence to equilibriumin this regime. After establishing a connection with monotone cellu-lar automata of bootstrap percolation type, we shall conclude our talkwith a discussion of a particular KCSM for which the correspondingcellular automaton has a peculiar discontinuous percolation transition.
SNAW01 11th July 2016
10:30 to 11:15
Peter Csikvari Statistical matching theory with a glimpse towards extremal regular graphs
In this talk we will survey some recent development on statistical properties of matchings of very large and infinite graphs. The main goal of the talk is to describe a few applications of a new concept called matching measure. These applications include new results on the number of (perfect) matchings in large girth graphs as well as simple new proofs of certain statistical physical theorems. In particular, we will sketch a new proof of Schrijver's lower bound for the number of perfect matchings of regular bipartite graphs. We will also mention some extremal graph theoretic results about other graph parameters of regular graphs.

This talk is based on joint papers with various subsets of Miklos Abert, Peter E. Frenkel, Tamas Hubai and Gabor Kun.
SNAW01 11th July 2016
11:45 to 12:30
Oliver Riordan The phase transition in Achlioptas processes
The classical random graph process starts with a fixed set of n vertices and no edges. Edges are then added one-by-one, uniformly at random. One of the most interesting features of this process, established by Erdős and Rényi more than 50 years ago, is the phase transition near n/2 edges, where a single giant' component emerges from a sea of small components. This example serves as a starting point for understanding phase transitions in a wide variety of other contexts. Around 2000, Dimitris Achlioptas suggested an innocent-sounding variation of the model: at each stage two edges are selected at random, but only one is added, the choice depending on (typically) the sizes of the components it would connect. This may seem like a small change, but these processes do not have the key independence property that underlies our understanding of the classical process. One can ask many questions about Achlioptas processes; the most interesting concern the phase transition: does the critical value change from n/2? Is the nature of the transition the same or not? I will describe a number of results on these questions from joint work with Lutz Warnke.
SNAW01 11th July 2016
13:45 to 14:30
Paul Smith Towards universality in bootstrap percolation
Bootstrap percolation is a broad class of monotone cellular automata, which has links to the Glauber dynamics of the Ising model and other areas of statistical physics. Starting with random initial conditions, the question is to determine the threshold for complete occupation of the underlying graph. Until relatively recently, only nearest-neighbour models (and relatively minor variants of these models) had been studied -- and these are now very well understood. In this talk I will discuss a new universality' theory for bootstrap percolation, which has emerged in the last few years. In particular, I will explain the classification of two-dimensional models, give more precise results for so-called critical' models (also in two dimensions), and talk about a new classification theorem for higher dimensional models.
SNAW01 11th July 2016
14:30 to 15:00
Peter Orbanz Subsampling, symmetry and averaging in networks
Consider a very large graph---say, the link graph of a large social network. Now invent a randomized algorithm that extracts a smaller subgraph. If we use the subgraph as sample data and perform statistical analysis on this sample, what can we learn about the underlying network? Clearly, that should depend on the subsampling algorithm. I show how the choice of algorithm defines a notion of (1) distributional invariance and (2) of averaging within a single large graph. Under suitable conditions, the resulting averages satisfy a law of large numbers, such that statistical inference from a single sample graph is indeed possible. From this algorithmic point of view, graphon models arise from a specific choice of sampling algorithm, various known pathologies of these models are explained as a selection bias.

SNAW01 11th July 2016
15:30 to 16:00
Francois Caron Sparse and modular networks using exchangeable random measures
Statistical network modeling has focused on representing the graph as a discrete structure, namely the adjacency matrix, and considering the exchangeability of this array. In such cases, it is well known that the graph is necessarily either dense (the number of edges scales quadratically with the number of nodes) or trivially empty.
Here, we instead consider representing the graph as a measure on the plane. For the associated definition of exchangeability, we rely on the Kallenberg representation theorem (Kallenberg, 1990). For certain choices of such exchangeable random measures underlying the graph construction, the network process is sparse with power-law degree distribution, and can accommodate an overlapping block-structure.
A Markov chain Monte Carlo algorithm is derived for efficient exploration of the posterior distribution and allows to recover the structure of a range of networks ranging from dense to sparse based on our flexible formulation.

Joint work with Emily Fox and Adrien Todeschini
http://arxiv.org/abs/1401.1137
http://arxiv.org/pdf/1602.02114

SNAW01 11th July 2016
16:00 to 16:30
David Blei Exponential Family Embeddings
Word embeddings are a powerful approach for capturing semantic  similarity among terms in a vocabulary.  In this talk, I will describe exponential family embeddings, a class of methods that  extends the idea of word embeddings to other types of  high-dimensional data. As examples, we studied neural data with real-valued observations, count data from a market basket analysis, and ratings data from a movie recommendation system.  We then extended this idea to networks, giving a novel type of "latent space" model.  The main idea behind an EF-EMB is to model each observation conditioned on a set of other  observations.  This set is called the context, and the way the  context is defined is a modeling choice that depends on the problem.  In language the context is the surrounding words; in neuroscience  the context is close-by neurons; in market basket data the context is other items in the shopping cart; in networks the context is edges emanating from a node pair.  Each type of embedding model  defines the context, the exponential family of conditional  distributions, and how the latent embedding vectors are shared  across data. We infer the embeddings with a scalable algorithm based  on stochastic gradient descent.  We found exponential family embedding models to be more effective than other types of dimension reduction.  They better reconstruct held-out data and find interesting qualitative  structure.
SNAW01 12th July 2016
09:00 to 09:30
David Aldous A framework for imperfectly observed networks
Model a network as an edge-weighted graph, where the (unknown) weight $w_e$ of edge $e$ indicates the frequency of observed interactions, that is over time $t$ we observe a Poisson($t w_e$) number of interactions across edges $e$.  How should we estimate some given statistic of the underlying network?   This leads to wide-ranging and challenging problems.
SNAW01 12th July 2016
09:30 to 10:00
Richard Kenyon Can you see the sound of a drum?
We discuss planar networks and harmonic functions on them, and in particular two reconstruction problems for "hidden" networks:
1. To what extent can one reconstruct a network from boundary measurements?
2. Can one reconstruct conductances from a harmonic embedding" of the network?
SNAW01 12th July 2016
10:00 to 10:30
Charles Radin Emergent phases in constrained random networks
Structures are shown to emerge in large random networks, under variable
constraints, when examined at large scale. They are automatically created
by the coalescence of nodes into a small number of equivalence classes,
which must cooperate to create structure in order to satisfy the constraints.
The structures change gradually as constraints vary within a phase, and
change sharply at phase transitions.

This talk is based on joint work with Richard Kenyon, Kui Ren and Lorenzo Sadun.

SNAW01 12th July 2016
11:00 to 11:30
Nikhil Srivastava Spectral Sparsification of Graphs
Modern graph algorithms increasingly access and manipulate graphs via their Laplacian operators and other associated “continuous” objects, rather than purely discretely.  An important primitive in this paradigm is spectral sparsification: being able to approximate the Laplacian of a given graph by that of a graph with significantly fewer edges. I will survey some of the key results in this area, drawing on tools from random matrix theory, matrix analysis, and electrical network theory.
SNAW01 12th July 2016
11:30 to 12:00
He Sun Partitioning Well-Clustered Graphs: Spectral Clustering Works!
We study variants of the widely used spectral clustering that partitions a graph into k clusters by (1) embedding the vertices of a graph into a low-dimensional space using the bottom eigenvectors of the Laplacian matrix, and (2) grouping the embedded points into k clusters via k-means algorithms. We show that, for a wide class of  graphs, spectral clustering gives a good approximation of the optimal clustering. While this approach was proposed in the early 1990s and has comprehensive applications, prior to our work  similar results were known only for graphs generated from stochastic models.

We also give a nearly-linear time algorithm for partitioning well-clustered graphs based on  computing a matrix exponential andapproximate nearest neighbor data structures.

Based on joint work with Richard Peng (Georgia Institute of Technology), and Luca Zanetti (University of Bristol).

Reference:
http://arxiv.org/abs/1411.2021
SNAW01 12th July 2016
12:00 to 12:30
Amin Saberi Thin spanning trees and their algorithmic applications
Motivated by Jaeger's modular orientation conjecture, Goddyn asked the following question:

A spanning tree of a graph G is called epsilon-thin if it contains at most an epsilon fraction of the edges of each cut in that graph. Is there a function f:(0,1)→
such that every f(epsilon)-edge-connected graph has an epsilon-thin spanning tree?

I will talk about our journey in search of such thin trees, their applications concerning traveling salesman problems, and unexpected connections to graph sparsification and the Kadison-Singer problem.

Bio: http://stanford.edu/~saberi/bio.txt
SNAW01 12th July 2016
13:30 to 14:00
Haiyan Huang Inferring gene-gene associations and gene networks beyond standard statistical models
With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to infer gene interactions has been a major challenge in systems biology. In this talk, I will discuss two methods of finding gene associations that differ in their considerations of how genes behave across the given samples. The first method applies to the case where the patterns of gene association may change or only exist in a subset of all the samples. The second method goes beyond pairwise gene relationships to higher level group interactions, but requiring similar gene behaviours across all the samples. We compare both methods to other popular approaches using simulated and real data, and demonstrate they lead to better general performance and capture important biological features in certain situations that are missed by the other methods.
SNAW01 12th July 2016
14:00 to 14:30
Florian Markowetz Understanding genetic interaction networks
Genes do not act in isolation, but rather in tight interaction networks. Maps of genetic interactions between pairs of genes are a powerful way to dissect these relationships. I will discuss two statistical approaches to better understand the functional content of genetic interaction networks and the mechanism underlying them.

First, I will present a method that adapt concepts of spatial statistics to assess the functional content of molecular networks. Based on the guilt-by-association principle, our approach (called SANTA) quantifies the strength of association between a gene set and a network, and functionally annotates molecular networks like other enrichment methods annotate lists of genes.

Second, I will describe a method to understand genetic interactions based on high-dimensional phenotypes. I will present methodology we developed to test the hypothesis that complex relationships between a gene pair can be explained by the action of a third gene that modulates the interaction. Our approach to test this hypothesis builds on Nested Effects Models (NEMs), a probabilistic model tailored to inferring networks from gene perturbation data. We have extended NEMs with logical functions to model gene interactions and show in simulations and case studies that our approach can successfully infer modulators of genetic interactions and thus lead to a better understanding of an important feature of cellular organisation.

SNAW01 12th July 2016
15:00 to 15:30
Michael Leung A Weak Law for Moments of Pairwise-Stable Networks
We develop asymptotic theory for strategic network-formation models under the assumption that the econometrician observes a single large pairwise-stable network. Drawing on techniques from the literature on random graphs, we derive primitive restrictions on the model that establish a weak law of large numbers for a useful class of network moments. Under these restrictions, we show that the model generates realistic networks that are sparse and may contain "giant" connected subnetworks, two well-known properties of real-world social networks. The restrictions also conveniently suggest a new method to simulate counterfactual networks that avoids a well-known curse of dimensionality. Additionally, we characterize the identified set of structural parameters based on a tractable class of dyad-level network moments and construct consistent set estimators.

SNAW01 12th July 2016
15:30 to 16:00
Oleg Sofrygin Estimation of Causal Effects in Network-Dependent Observational Data
Co-author: Mark J. van der Laan (University of California, Berkeley, CA)

We outline the framework of targeted maximum likelihood estimation (TMLE) in observational network data. Consider a dataset in which each observational unit is causally connected to other units via a known social or geographical network. For each unit we observe their baseline covariates, their exposure and their outcome, and we are interested in estimating the effect of a single time-point stochastic intervention. We propose a semi-parametric statistical model that allows for between-unit dependencies: First, unit-level exposure can depend on the baseline covariates of other connected units. Second, the unit-level outcome can depend on the baseline covariates and exposures of other connected units. We impose some restrictions on our model, e.g., assuming that the unit's exposure and outcome depend on other units as some known (but otherwise arbitrary) summary measures of fixed dimensionality. A practical application of our approach is demonstrated in a large-scale networ k simulation study that applies two newly developed R packages: simcausal and tmlenet. We also discuss some extensions of our work towards estimation in longitudinal data.
SNAW01 12th July 2016
16:00 to 16:30
Dean Eckles Statistical and causal inference in networks
I survey the state of methods for statistical and causal inference in networks. I present Fisherian randomization inference methods for testing for network interference (i.e., spillovers, exogenous peer effects) as an example.
See https://arxiv.org/abs/1506.02084
SNAW01 13th July 2016
09:00 to 09:45
Balazs Szegedy On the graph limit approach to random regular graphs
Let G=G(n,d) denote the random d-regular graph on n vertices. A celebrated result by J. Friedman solves Alon's second eigenvalue conjecture saying that if d is fixed and n is large then G is close to be Ramanujan. Despite of significant effort, much less was known about the structure of the eigenvectors of G. We use a combination of graph limit theory and information theory to prove that every eigenvector of G (when normalized to have length equal to square root of n) has an entry distribution that is close to some Gaussian distribution in the weak topology. Our results also work in the more general setting of almost-eigenvectors. We hope our methods will lead to a general graph limit approach to a large class of problems on random regular graphs. Joint work with A. Backhausz.
SNAW01 13th July 2016
09:45 to 10:30
Fabio Martinelli Bootstrap percolation and kinetically constrained spin models: critical lengths and mixing time scales
Co-authors: Cristina Toninelli (Univ Paris VII Diderot), Rob Morris (IMPA)

In recent years, a great deal of progress has been made in understanding the behaviour of a particular class of monotone cellular automata, commonly known as bootstrap percolation. In particular, if one considers only two-dimensional automata, then we now have a fairly precise understanding of the typical evolution of these processes, starting from p-random initial conditions of infected sites. Given a bootstrap model, one can consider the associated kinetically constrained spin model in which the state (infected or healthy) of a vertex is resampled (independently) at rate 1 by tossing a p-coin if it could be infected in the next step by the bootstrap process, and remains in its current state otherwise. Here p is the probability of infection. The main interest in KCM’s stems from the fact that, as p → 0, they mimic some of the most striking features of the glass transition, a major and still largely open problem in condensed matter physics. In this talk, motivated by recent universality results for bootstrap percolation, I’ll discuss some “universality conjectures” concerning the growth of the (random) infection time of the origin in a KCM as p → 0. Joint project with R. Morris (IMPA) and C. Toninelli (Paris VII),
SNAW01 13th July 2016
11:00 to 11:45
Svante Janson Graph limits and entropy
The entropy of a graph limit, or a graphon, was essentially defined by Aldous (although in a different, equivalent formulation), who showed that if a graph limit W has entropy H, then the entropy of the distribution of the corresponding random graph G(n,W) is asymptotically n^2 H/2.

Consider a hereditary class of graphs Q. Then the number of graphs of order n in Q is asymptotically given by exp( n^2 H/2), where H is the maximum entropy of a graph limit that is a limit of graphs in Q. Moreover, if this entropy maximizing limit is unique, then a uniformly random graph of order n in Q converges in probability, as n tends to infinity, to this maximizing graph limit.

As an example, we discuss the entropy maximising graph limits for the class of string graphs.

This is based on joint works with Hamed Hatami and Balasz Szegedy, and with Andrew Uzzel.
SNAW01 13th July 2016
11:45 to 12:30
Paul Neville Balister Barrier coverage in thin strips
Suppose sensors are deployed randomly in a long thin strip, and suppose each sensor can detect objects within a fixed distance. We say that the sensors achieve barrier coverage if there is no path across the strip that a small object can follow that avoids detection by the sensors. We give fairly precise results on the probability that barrier coverage is achieved as a function of the range of the sensors, the height and length of the strip, and the number of sensors deployed. In particular, we show that the most likely obstruction - a rectangular region crossing the strip which is devoid of sensors - does not in general dominate the probability of failure of barrier coverage.
Joint work with Bela Bollobas and Amites Sarkar.
SNAW01 13th July 2016
13:30 to 14:15
Vincent Tassion Sharpness of the phase transition for Voronoi percolation in $\mathbb R^d$
Take a Poisson point process on $\mathbb R^d$ and consider its Voronoi tessellation. Colour each cell of the tessellation black with probability $p$ and white with probability $1-p$ independently of each other. This rocess undergoes a phase transition at a critical parameter $p_c(d)$: below $p_c(d)$ all the black connected components are bounded almost surely, and above $p_c$ there is an unbounded black connected component almost surely. In any dimension $d$ larger than 2, we prove that for $p The talk is based on a joint work with H. Duminil-Copin and A. Raoufi. SNAW01 13th July 2016 14:15 to 14:45 Mark Newman Contact networks and the spread of epidemic disease The structure of the networks of contact between individuals can have a profound effect on the rate and extent of disease epidemics. This talk will outline some of the fundamental mathematics connecting network structure with disease outcomes and describe a number of case studies of applications to infectious diseases, including influenza, HIV, Mycoplasma pneumonia, and SARS. SNAW01 13th July 2016 15:15 to 15:45 Alfred Hero Continuum limits for minimal paths Many applications involve computing minimal paths over the nodes of a graph relative to a measure of pairwise node dissimilarity. These include minimal spanning trees in computer vision, shortest paths in image databases, or non-dominated anti-chains in multi-objective database search. When the nodes are random vectors and the dissimilarity is an increasing function of Euclidean distance these minimal paths can have continuum limits as the number of nodes approaches infinity. Such continuum limits can lead to low complexity diffusion approximations to the solution of the combinatorial minimal path problem. SNAW01 14th July 2016 09:00 to 09:30 Sharmodeep Bhattacharyya Spectral Clustering for Dynamic Stochastic Block Model Co-author: Shirshendu Chatterjee (CUNY) One of the most common and crucial aspect of many network data sets is the dependence of network link structure on time or other attributes. There is a long history of researchers proposing networks for dynamic time-evolving formation of networks. Most complex networks, starting from biological networks like genetic or neurological networks to social, co-authorship and citation networks are time-varying. This has led the researchers to study dynamic, time-evolving networks. In this work, we consider the problem of finding a common clustering structure in time-varying networks. We consider three simple extension of spectral clustering methods to dynamic settings and give theoretical justification that the spectral clustering methods produce consistent community detection for such dynamic networks. We also propose an extension of the static version of nonparametric latent variable models to the dynamic setting and use a special case of the model to justify the spectral clusteri ng methods. We show the validity of the theoretical results via simulations too and apply the clustering methods to real-world dynamic biological networks. SNAW01 14th July 2016 09:30 to 10:00 Arash Amini Matched bipartite block model with covariates Community detection in bipartite networks is challenging. Motivated by an application from biology, we consider a model which allows for matched communities in the bipartite setting, in addition to node covariates with information about the matching. We derive a simple fast algorithm for fitting the model based on variational inference ideas and show its effectiveness in simulations. SNAW01 14th July 2016 10:00 to 10:30 Patrick Wolfe tba SNAW01 14th July 2016 11:00 to 11:30 Sofia Olhede Nonparametrics for networks Relational data have become an important component of modern statistics. Networks, and weighted networks, are ubiquitous in modern applications such as disease dynamics, ecology, financial contagion, and neuroscience. The inference of networks is harder, in parts because the measure placed on the observables need to satisfy sets of permutation invariances, and most networks are very sparse, with most possible relations not present. This talk will explore how to best construct nonparametric summaries of such objects, in such as way that the underlying statistical model of the observations is well described, and any estimators computable with scalable algorithms. This is joint work with Pierre-Andre Maugis and Patrick Wolfe (UCL). SNAW01 14th July 2016 11:30 to 12:00 Elizaveta Levina Estimating network edge probabilities by neighborhood smoothing Co-authors: Yuan Zhang (Ohio State University), Ji Zhu (University of Michigan) The problem of estimating probabilities of network edges from the observed adjacency matrix has important applications to predicting missing links and network denoising. It has usually been addressed by estimating the graphon, a function that determines the matrix of edge probabilities, but is ill-defined without strong assumptions on the network structure. Here we propose a novel computationally efficient method based on neighborhood smoothing to estimate the expectation of the adjacency matrix directly, without making the strong structural assumptions graphon estimation requires. The neighborhood smoothing method requires little tuning, has a competitive mean-squared error rate, and outperforms many benchmark methods on the task of link prediction in both simulated and real networks. Related Links SNAW01 14th July 2016 12:00 to 12:30 Peter Bickel Fitting block models with covariates:”Likelihood” methods ,sparsity and selection of the number of blocks Co-authors: Purna Sarkar (U. of Texas,Austin), Soumendu Mukherjee (U C, Berkeley), Sharmodeep Bhattacharyya, (Oregon State University) We introduce block models with edge and block covariates ,along the lines of Hoff, Handcock,Raftery (2002) ,specializing to covariate forms of the types proposed by Zhao, Levina,Zhu(2014) and generalizing that of Newman ,Clauset(2015)..WE study mean field variational fitting in this situation along the lines of Celisse,Daudin,Pierre( 2011 ) and B.,Choi,Chang,Zhang (2013).Using theory,examples, and simulations we argue that these methods ,possibly regularized lead to models with a sparse number of blocks. SNAW01 14th July 2016 13:30 to 14:00 Bin Yu Artificial neurons meet real neurons: pattern selectivity in V4 via deep learning Co-authors: Yuansi Chen (UCB), Reza Abbasi Asl (UCB), Adam Bloniarz (UCB), Jack Gallant (UCB) Vision in humans and in non-human primates is mediated by a constellation of hierarchically organized visual areas. One important area is V4, a large retinotopically-organized area located intermediate between primary visual cortex and high-level areas in the inferior temporal lobe. V4 neurons have highly nonlinear response properties. Consequently, it has been difficult to construct quantitative models that accurately describe how visual information is represented in V4. To better understand the filtering properties of V4 neurons we recorded from 71 well isolated cells stimulated with natural images. We fit predictive models of neuron spike rates using transformations of natural images learned by a convolutional neural network (CNN). The CNN was trained for image classification on the ImageNet dataset. To derive a model for each neuron, we first propagate each of the stimulus images forward to an inner layer of the CNN. We use the activations of the inner layer as the featu re (predictor) vector in a high dimensional regression, where the response rate of the V4 neuron is taken as the response vector. Thus, the final model for each neuron consists of a multilayer nonlinear transformation provided by the CNN, and one final linear layer of weights provided by regression. We find that models using the first two layers of three well-known CNNs provide better predictions of responses of V4 neurons than those obtained using a conventional Gabor-like wavelet model. To characterize the spatial and pattern selectivity of each V4 neuron, we both explicitly optimize the input image to maximize the predicted spike rate, and visualize the selected filters of the CNN. We also perform dimensionality reduction by sparse PCA to visualize the population of neurons. Finally, we show the stability of our analysis across the three CNNs, and conclude that the V4 neurons are tuned to a remarkable diversity of shapes such as curves, blobs, checkerboard patterns, and V1-like gratings. SNAW01 14th July 2016 14:00 to 14:30 Thomas Nichols Modelling Large- and Small-Scale Brain Networks Investigations of the human brain with neuroimaging have recently seen a dramatic shift in focus, from "brain mapping", identifying brain regions related to particular functions, to connectivity or "connectomics", identifying networks of coordinated brain regions, and how these networks behave at rest and during tasks. In this presentation I will discuss two quite different approaches to modeling brain connectivity. In the first work, we use Bayesian time series methods to allow for time-varying connectivity. Non-stationarity connectivity methods typically use a moving-window approach, while this method poses a single generative model for all nodes, all time points. Known as a "Multiregression Dynamic Model" (MDM), it comprises an extension of a traditional Bayesian Network (or Graphical Model), by posing latent time-varying coefficients that implement a regression a given node on its parent nodes. Intended for a modest number of nodes (up to about 12), a MDM allows inference of the structure of the graph using closed form Bayes factors (conditional on a single estimated "discount factor", reflecting the balance of observation and latent variance. While originally developed for directed acyclic graphs, it can also accommodate directed (possibly cyclic) graphs as well. In the second work, we use mixtures of simple binary random graph models to account for complex structure in brain networks. In this approach, the network is reduced to a binary adjacency matrix. While this is invariably represents a loss of information, it avoids a Gaussianity assumption and allows the use of much larger graphs, e.g. with 100's of nodes. Daudin et al. (2008) proposed a "Erdos-Reyni Mixture Model", which assumes that, after an unknown number of latent node classes have been estimated, that connections arise as Bernoulli counts, homogeneously for each pair of classes. We extend this work to account for multisubject data (where edge data are now Binomially distributed), allowing Related Links SNAW01 14th July 2016 14:30 to 15:00 Mitya Chklovskii Natural neural networks: structure, dynamics and computation The brain is a directed weighted graph with neurons as nodes and synapses as edges. The structure of the graph determines the dynamics of neural activity. At the same time, correlations in neural activity determine synaptic weights. The grand challenge of neuroscience is to construct a theory relating neural structure and dynamics to computation. I will present our progress towards this goal using an online matrix factorization framework. SNAW01 14th July 2016 15:30 to 16:00 Alexandre Arenas The physics of spreading processes in multilayer networks The study of networks plays a crucial role in investigating the structure, dynamics, and function of a wide variety of complex systems in myriad disciplines. Despite the success of traditional network analysis, standard networks provide a limited representation of complex systems, which often include different types of relationships (i.e., “multiplexity”) among their constituent components and/or multiple interacting subsystems. Such structural complexity has a significant effect on both dynamics and function. Throwing away or aggregating available structural information can generate misleading results and be a major obstacle towards attempts to understand complex systems. The recent “multilayer” approach for modeling networked systems explicitly allows the incorporation of multiplexity and other features of realistic systems. On one hand, it allows one to couple different structural relationships by encoding them in a convenient mathematical object. On the other hand, it also allows one to couple different dynamical processes on top of such interconnected structures. The resulting framework plays a crucial role in helping achieve a thorough, accurate understanding of complex systems. The study of multilayer networks has also revealed new physical phenomena that remain hidden when using ordinary graphs, the traditional network representation. Here we survey progress towards attaining a deeper understanding of spreading processes on multilayer networks, and we highlight some of the physical phenomena related to spreading processes that emerge from multilayer structure. SNAW01 14th July 2016 16:00 to 16:30 Ginestra Bianconi Message passing theory for percolation models on multiplex networks Multiplex networks describe a large variety of complex systems including infrastructures, transportation networks and biological systems.In presence of interdepedencies between the nodes the robustness of the system to random failures is determined by the mutually connected giant component (MCGC). Much progress on the emergence of the MCGC has been achieved so far but the characterization of the emergence of the MCGC is multiplex networks with link overlap in an arbitrary large number of layers has remained elusive. Here I will present a message passing algorithm for characterizing the percolation transition in multiplex networks with link overlap and an arbitrary number of layers M and I will derive the equation for the order parameter in an ensembles of random multiplex networks. Specifically I will propose and compare two message passing algorithms, that generalize the algorithm widely used to study the percolation transition in multiplex networks without link overlap. The first algorithm describes a directed percolation transition and admits an epidemic spreading interpretation. The second algorithm describes the emergence of the mutually connected giant component, that is the percolation transition, but does not preserve the epidemic spreading interpretation. The phase diagrams for the percolation and directed percolation transition is obtained in simple representative cases. For the same multiplex network structure, in which the directed percolation transition has non-trivial tricritical points, the percolation transition has a discontinuous phase transition, with the exception of the trivial case in which all the layers completely overlap. SNAW01 14th July 2016 16:30 to 17:00 Tiago Peixoto Efficient Bayesian inference of multi-scale network structures A principled approach to characterize the hidden structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a popular choice for this task is the stochastic block model, where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. In this talk, we will present a nonparametric Bayesian inference framework based on a microcanonical formulation of the stochastic block model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference of large networks, but also reveal structures at multiple scales; 2. A very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges, but also with an unlimited number of groups. We show also how this approach can be used to sample group hierarchies from the posterior distribution, perform model selection, and how it can be easily be generalized to networks with edge covariates and node annotations. SNAW01 14th July 2016 17:00 to 17:30 Mason Porter Multilayer Networks and Applications One of the most active areas of network science, with an explosion of publications during the last few years, is the study of "multilayer networks," in which heterogeneous types of entities can be connected via multiple types of ties that change in time. Multilayer networks can include multiple subsystems and "layers" of connectivity, and it is important to take multilayer features into account to try to improve our understanding of complex systems. In this talk, I'll give an introduction to multilayer networks and will discuss applications in areas such as transportation, sociology, neuroscience, and ecology. Related Links SNAW01 15th July 2016 09:00 to 09:30 Peter Morters Spatial preferential attachment networks We study a family of growing networks, in which new vertices are given a spatial position on the unit circle and are connected to existing vertices with a probability favouring short spatial distances and high degrees. In this model of a scale-free network with clustering we can independently tune the power law exponent τ of the degree distribution and the exponent δ at which the connection probability decreases with the distance of two vertices. We show that the network is robust if τ 2 + 1/(δ−1). This is the first instance of a scale-free network where robustness depends not only on its degree distribution but also on its clustering features. Joint work with Emmanuel Jacob (ENS Lyon). SNAW01 15th July 2016 09:30 to 10:00 Nelly Litvak Ranking algorithms on directed configuration networks Co-authors: Ningyuan Chen (Yale School of Management), Mariana Olvera-Cravioto (Columbia University) We study a family of rankings, which includes Google's PageRank, on a directed configuration model. We show that the the rank of a randomly chosen vertex converges in distribution to a finite random variable that can be written as a linear combination of i.i.d. copies of the attracting endogenous solution to a stochastic fixed-point equation. We provide precise asymptotics for this limiting random variable. In particular, if the in-degree distribution in the directed configuration model has a power law distribution, then the limiting distribution of the rank also follows a power law with the same exponent. Such power law behaviour of ranking is well-known from empirical studies of real-life networks. Our asymptotic result gives remarkably good approximation for the complete ranking distribution on configuration networks of moderate size and on the directed graph of English Wikipedia. SNAW01 15th July 2016 10:00 to 10:30 Mike Steele Twitter Event Networks and Surgery Constructions The models considered here were motivated by graphs that one constructs by building edges according to Twitter "retweets" i.e. messages that are forwarded with comments. When one considers the largest connected component in such a graph then one finds empirically that some vertex has degree that grows linearly over time, but other vertex degrees grow at a rate that is algebraic and sublinear. A construction that uses surgery on multi-type branching processes gives us a one parameter class of dynamic graph models that captures the "superstar" phenomena that one find in the retweet graphs. SNAW01 15th July 2016 11:00 to 11:30 Catherine Matias Statistical issues in the stochastic block model I will try to present a state of the art about statistical results established in the stochastic block model together with some remaining challenges in the domain. SNAW01 15th July 2016 11:30 to 12:00 Nial Friel Properties of Latent Variable Network Models We derive properties of Latent Variable Models for networks, a broad class of models that includes the widely-used Latent Position Models. We characterise several features of interest, with particular focus on the degree distribution, clustering coefficient, average path length and degree correlations. We introduce the Gaussian Latent Position Model, and derive analytic expressions and asymptotic approximations for its network properties. We pay particular attention to one special case, the Gaussian Latent Position Model with Random Effects, and show that it can represent heavy-tailed degree distributions, positive asymptotic clustering coefficients and small-world behaviour that often occur in observed social networks. Finally, we illustrate the ability of the models to capture important features of real networks through several well known datasets. SNAW01 15th July 2016 12:00 to 12:30 Cristopher Moore Information-theoretic bounds and phase transitions in community detection and high-dimensional clustering Co-authors: Jess Banks (SFI/UC Berkeley), Jiaming Xu (Simons), Joe Neeman (UT Austin), Praneeth Netrapalli (MSR Cambridge) Over the past decade, a rich interaction has grown up between statistical physics and statistical inference. In particular, physics often predicts phase transitions where, if a data set becomes too sparse or too noisy, it suddenly becomes impossible to find its underlying structure, or even to distinguish it from a “null model” with no structure at all. For instance, in the community detection problem in networks, there is a transition below which the vertices cannot be labeled better than chance, and the network cannot be distinguished from an Erdos-Renyi random graph. Similarly, in the clustering problem in Gaussian mixture models, it becomes impossible to find the clusters, or even to tell whether clusters exist, i.e., to distinguish the data from a single Gaussian. Many of the physics predictions have been made rigorous, but there is still a very lively interchange between the fields, both about these transitions and about asymptotically optimal algorithms. In particular, while efficient message-passing and spectral algorithms are known that succeed down to the so-called Kesten-Stigum bound, in a number of problems we believe that it is information-theoretically possible (but perhaps not computationally feasible) to go further. I will give a friendly introduction to this field, and present some new results proving this conjecture. This is joint work with Jess Banks, Joe Neeman, Praneeth Netrapalli, and Jiaming Xu. SNAW01 15th July 2016 13:30 to 14:00 Harrison Zhou Some optimality results in network analysis and beyond In this talk I will review some recent optimality results in graphon estimation and community detection and their extensions to structured linear models. SNAW01 15th July 2016 14:00 to 14:30 Karl Rohe Network driven sampling; a critical threshold for design effects Web crawling and respondent-driven sampling (RDS) are two types of network driven sampling techniques that are popular when it is difficult to contact individuals in the population of interest. This paper studies network driven sampling as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree, instead of a chain, allows for the sampled units to refer multiple future units into the sample. In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is$D$, then constructing an estimator from the novel design makes the variance of the estimator$D$times greater than it would be under a simple random sample. Under certain assumptions on the referral tree, the design effect of network driven sampling has a critical threshold that is a function of the referral rate$m$and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix$\lambda_2$. If$m 1/\lambda_2^2$, then the design effect grows with$n$(i.e. the standard estimator is no longer$\sqrt{n}$-consistent; it converges at the slower rate of$\log_m \lambda_2$). SNAW01 15th July 2016 14:30 to 15:00 Carey Priebe Repeated Motif Hierarchical Stochastic Blockmodels Co-authors: Vince Lyzinski (Johns Hopkins University), Minh Tang (Johns Hopkins University), Avanti Athreya (Johns Hopkins University), Youngser Park (Johns Hopkins University), Joshua Vogelstein (Johns Hopkins University), Keith Levin (Johns Hopkins University) Based on our methodology for community detection and community comparison in graphs (Lyzinski et al., 2015, http://arxiv.org/abs/1503.02115), we formulate a model selection procedure for deciding whether a hierarchical stochastic blockmodel graph supports the conjecture of repeated motifs. Such a graph inference procedure promises to address a fundamental outstanding question regarding the atoms of neural computation (Marcus, Marblestone & Dean, 2014, http://arxiv.org/abs/1410.8826) . Related Links SNAW01 15th July 2016 15:30 to 16:00 David Choi Co-clustering of non-smooth graphons Theoretical results are becoming known for community detection and clustering of networks; however, these results assume an idealized generative model that is unlikely to hold in many settings. Here we consider exploratory co-clustering of a bipartite network, where the rows and columns of the adjacency matrix are assumed to be samples from an arbitrary population. This is equivalent to assuming that the data is generated from a nonparametric model known as a graphon. We show that co-clusters found by any method can be extended to the row and column populations, or equivalently that the estimated blockmodel approximates a blocked version of the generative graphon, with generalization error bounded by n^{-1/2}. Analogous results are also shown for degree-corrected co-blockmodels and random dot product bipartite graphs, with error rates depending on the dimensionality of the latent variable space. SNAW01 15th July 2016 16:00 to 16:30 Purna Sarkar Convex Relaxation for Community Detection with Covariates Community detection in networks is an important problem in many applied areas. We investigate this in the presence of node covariates. Recently, an emerging body of theoretical work has been focused on leveraging information from both the edges in the network and the node covariates to infer community memberships. However, in most parameter regimes, one of the sources of information provides enough information to infer the hidden clusters, thereby making the other source redundant. We show that when the network and the covariates carry orthogonal'' pieces of information about the cluster memberships, one can get asymptotically consistent clustering by using them both, while each of them fails individually. This is joint work with Bowei Yan, University of Texas at Austin. SNAW05 25th July 2016 10:00 to 11:00 Peter Hoff Bayesian Methods for Networks Statistical analysis of social network data presents many challenges: Realistic models often require a large number of parameters, yet maximum likelihood estimates for even the simplest models may be unstable. Furthermore, network data often exhibit non-standard statistical dependencies, and most network datasets lack any sort of replication. Statistical methods to address these issues have included random effects and latent variable models, and penalized likelihood methods. In this talk I will discuss how these approaches fit naturally within a Bayesian framework for network modeling. Additionally, we will discuss how standard Bayesian concepts such as exchangeability play a role in the development and interpretation of probability models for networks. Finally, some thoughts on the use of Bayesian methods for large-scale dynamic networks will be presented. SNAW05 25th July 2016 11:30 to 12:00 Alisa Kirichenko Function estimation on large graphs with missing data Co-author: Harry van Zanten (University of Amsterdam) There are various problems in statistics and machine learning that involve making an inference about a function on a graph. I will present a Bayesian approach to estimating a smooth function in the context of regression and classification problems on graphs. I will discuss the mathematical framework that allows to study the performance of nonparametric function estimation methods on large graphs. I will review theoretical results that show how to achieve asymptotically optimal Bayesian regularization under geometry conditions on the families of the graphs and the smoothness assumption on the true function. Both assumptions are formulated in terms of graph Laplacian. I will also discuss the case of "uniformly distributed" missing observations and investigate the generalization performance for various missing mechanisms. SNAW05 25th July 2016 12:00 to 12:30 Sergio Bacallado Bayesian sequential design in matrix factorisation models Co-author: Annie Marsden (University of Cambridge) Many problems in high-dimensional statistics rely on low-rank decompositions of matrices. Examples include matrix completion, recommender systems or collaborative filtering, and graph clustering or community detection. Most commonly, estimates are obtained by solving an optimisation problem through SDP relaxations, expectation maximisation, or projected gradient descent algorithms. Bayesian analogs of these procedures provide estimates of uncertainty, but these are rarely exploited in practice. In this talk, we explore how the posterior distribution in matrix factorisation models can be put to use in sequential design problems. Bayesian procedures such as Thompson sampling and the Bayesian UCB have been shown to achieve optimal regret in Multi-Arm Bandit problems. We present a simulation study supporting similar strategies in recommender systems. SNAW05 25th July 2016 13:30 to 14:00 Harry Crane Edge exchangeability: a new foundation for modeling network data Exchangeable models for vertex labeled graphs cannot replicate the large sample behaviors of sparsity and power law degree distributions observed in many network datasets. To address this issue, we introduce the principle of edge exchangeability, which is more natural for most applications and admits models for networks with sparse and/or power law structure. The vertices in an edge exchangeable network arrive in size-biased order according to their degree, further explaining why vertex exchangeability is an untenable assumption for many applications. Our discussion settles a longstanding question of statistical network modeling and presents a new framework within which to develop future theory and methods. Joint work with Walter Dempsey. SNAW05 25th July 2016 14:00 to 14:30 Cameron Freer Exchangeable constructions of countable structures Co-authors: Nathanael Ackerman (Harvard University), Diana Cai (University of Chicago), Alex Kruckman (UC Berkeley), Aleksandra Kwiatkowska (University of Bonn), Jaroslav Nesetril (Charles University in Prague), Rehana Patel (Franklin W. Olin College of Engineering), Jan Reimann (Penn State University) The Aldous-Hoover-Kallenberg theorem and the theory of graph limits connect three kinds of objects: sequences of finite graphs, random countably infinite graphs, and certain continuum-sized measurable "limit" objects (graphons). Graphons induce exchangeable countably infinite graphs via sampling, and all exchangeable graphs arise from a mixture of such sampling procedures -- a two-dimensional generalization of de Finetti's theorem. This naturally leads to the question of which countably infinite graphs (or other structures) can arise via an exchangeable construction. More formally, consider a random structure with a fixed countably infinite underlying set. The random structure is exchangeable when its joint distribution is invariant under permutations of the underlying set. For example, the countably infinite Erdős–Rényi graph is exchangeable; moreover, it is almost surely isomorphic to a particular graph, known as the Rado graph, and so we say that the Rado graph admits an exchangeable construction. On the other hand, one can show that no infinite tree admits an exchangeable construction. In joint work with Ackerman and Patel, we provide a necessary and sufficient condition for a countably infinite structure to admit an exchangeable construction. We also address related questions, such as what structures admit a unique exchangeable construction, and give examples involving graphs, directed graphs, and partial orders. Joint work with Nathanael Ackerman, Diana Cai, Alex Kruckman, Aleksandra Kwiatkowska, Jaroslav Nešetřil, Rehana Patel, and Jan Reimann. SNAW05 25th July 2016 15:30 to 16:00 Lise Getoor Statistical Relational Learning: Review and Recent Advances Statistical relational learning (SRL) is a subfield of machine learning that combines relational representations (from databases and AI) with probabilistic modeling techniques (most often graphical models)for modeling network data (typically richly structured multi-relational and multi-model networks). In this talk, I will briefly review some SRL modeling techniques, and then I will introduce hinge-loss Markov random fields (HL-MRFs), a new kind of probabilistic graphical model that supports scalable collective inference from richly structured data. HL-MRFs unify three different approaches to convex inference: LP approximations for randomized algorithms, local relaxations for probabilistic graphical models, and inference in soft logic. I will show that all three lead to the same inference objective. This makes inference in HL-MRFs highly scalable. Along the way, I will describe several successful applications of HL-MRFs and I will describe probabilistic soft logic, a declarative language for defining HL-MRFS. SNAW05 25th July 2016 16:00 to 16:30 Nial Friel Optimal Bayes estimators for block models In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or nodes of a network into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior samples for the latent allocation variables can be effectively obtained in a wide range of clustering models, including finite mixtures, infinite mixtures, hidden markov models and block models for networks. However, due to the categorical nature of the clustering variables and the lack of scalable algorithms, summary tools that can interpret such samples are not available. We adopt a Bayesian decision theoretic approach to define an optimality criterion for clusterings, and propose a fast and context-independent greedy algorithm to find the best allocations. One important facet of our approach is that the optimal number of groups is automatically selected, thereby solving the clustering and the model-choice problems at the same time. We discuss the choice of loss functions to compare partitions, and show that our approach can accommodate a wide range of cases. Finally, we illustrate our approach on a variety of real-data applications for the stochastic block model and latent block model. This is joint work with Riccardo Rastelli. SNAW05 26th July 2016 09:00 to 09:30 Adrian Raftery Interlocking directorates in Irish companies using bipartite networks: a latent space approach Co-authors: Nial Friel (University College Dublin), Riccardo Rastelli (University College Dublin), Jason Wyse (Trinity College Dublin) We analyze the temporal bipartite network of the leading Irish com- panies and their directors from 2003 to 2013, encompassing the end of the Celtic Tiger boom and the ensuing nancial crisis in 2008. We focus on the evolution of company interlocks, whereby a company director simultaneously sits on two or more boards. We develop a novel statistical model for this dataset by embedding the positions of companies and directors in a latent space. A useful aspect of our modelling approach is that it accommodates the development of metrics which can be used to assess the extent of interlocking over time. SNAW05 26th July 2016 09:30 to 10:00 Pierre Latouche Goodness of fit of logistic models for random graphs We consider binary networks along with covariate information on the edges. In order to take these covariates into account, logistic-type models for random graphs are often considered. One of the main questions which arises in practice is to assess the goodness of fit of a model. To address this problem, we add a general term, related to the graphon function of W-graph models, to the logistic models. Such an extra term can be approximated from a blockwise constant function obtained using stochastic block models with increasing number of clusters. If the given network is fully explained by the covariates, then a sole block should be estimated from data. This framework allows to derive a testing procedure from a model based selection context. Bayes factors or posterior odds can then be used for decision making. Overall, the logistic model considered necessitates two types of variational approximations to derive the model selection approach. SNAW05 26th July 2016 10:00 to 10:30 Simon Lunagomez Network Models with Dynamic Vertex Sets Many models of dynamic network data assume that the set of nodes (social actors) remains constant over time. By contrast, we propose a framework that allows for models where both the set of nodes and the social ties connecting them change over time. We depart from the conventional setup where the distribution of the vertex set is a point mass, and instead model its evolution via a stochastic process. Our approach is modular, in the sense that joint distribution of the vertex sets over time can be modeled marginally and then the joint distribution of the edge sets can be specified, conditionally upon it. The conditional independence statements implied by our approach then mean that posterior sampling for the parameters corresponding to these factors (joint distribution of the vertex sets, joint distribution of the edge sets) can be performed separately. We illustrate our methodology via both simulation studies and data analysis. Joint work with Sofia Olhede and Patrick Wolfe. SNAW05 26th July 2016 11:30 to 12:00 Brendan Murphy Latent Space Stochastic Block Model for Social Networks A large number of statistical models have been proposed for social network analysis in recent years. In this paper, we propose a new model, the latent position stochastic block model, which extends and generalises both latent space model (Hoff et al., 2002) and stochastic block model (Nowicki and Snijders, 2001). The probability of an edge between two actors in a network depends on their respective class labels as well as latent positions in an unobserved latent space. The proposed model is capable of representing transitivity, clustering, as well as disassortative mixing. A Bayesian method with Markov chain Monte Carlo sampling is proposed for estimation of model parameters. Model selection is performed by directly estimating marginal likelihood for each model and models of different number of classes or dimensions of latent space can be compared. We apply the network model to one simulated network and two real social networks. SNAW05 26th July 2016 12:00 to 12:30 Konstantina Palla A Bayesian nonparametric model for sparse dynamic networks Co-authors: Francois Caron (Univ of OXford), Yee Whye Teh (Univ of Oxford) We propose a Bayesian nonparametric prior for time-varying networks. To each node of the network is associated a positive parameter, modeling the sociability of that nodes. Sociabilities are assumed to evolve over time, and are modeled via a dynamic point process model. The model is able to (a) capture smooth evolution of the interactions between nodes, allowing edges to appear/disappear over time (b) capture long term evolution of the sociabilities of the nodes (c) and yields sparse graphs, where the number of edges grows subquadratically with the number of nodes. The evolution of the sociabilities is described by a tractable time-varying gamma process. We provide some theoretical insights into the model, describe a Hamiltonian Monte Carlo algorithm for efficieent exploration of the posterior distribution and present results on synthetic and real world dataset. SNAW05 26th July 2016 13:30 to 14:00 Tyler McCormick Multiresolution network models Social networks exhibit two key topological features: global sparsity and local density. That is, overall the propensity for interaction between any two randomly selected actors is infinitesimal, but for any given individual there is massive variability in the propensity to interact with others in the network. Further, the relevant scientific questions typically differ depending on the scale of analysis. In this talk, we propose a class of multiresolution statistical models that model network structures on multiple scales to enable inference about relevant population-level parameters. We capture global graph structure using a mixture over projective models that capture local graph structures. This approach is advantageous as it allows us to differentially invest modeling effort within subgraphs of high density, while maintaining a parsimonious structure between such subgraphs. We illustrate the utility of our method using simulation and data on household relations from Karnataka, India. This is joint work with Bailey Fosdick (CSU) and Ted Westling (UW). SNAW05 26th July 2016 14:00 to 14:30 Yee Whye Teh Bayesian Hierarchical Community Discovery Co-author: Charles Blundell (Google DeepMind) We propose an efficient Bayesian nonparametric model for discovering hierarchical community structure in social networks. Our model is a tree-structured mixture of potentially exponentially many stochastic blockmodels. We describe a family of greedy agglomerative model selection algorithms whose worst case scales quadratically in the number of vertices of the network, but independent of the number of communities. Our algorithms are two orders of magnitude faster than the infinite relational model, achieving comparable or better accuracy. Related Links SNAW05 26th July 2016 14:30 to 15:00 Daniel Roy From Graphon to Graphex: Models and Estimators for Sparse Networks using Exchangeable Random Measures A major challenge for Bayesian network analysis is that we lack general nonparametric models of sparse graphs. To meet this challenge, we introduce and study the general class of random graphs defined by the exchangeability of their real-valued vertex labels, an idea inspired by a model due to Caron and Fox. A straightforward adaptation of a result by Kallenberg yields a representation theorem: every such random graph is characterized by three (potentially random) components: a nonnegative real I, an integrable function S : R+ to R+, and a symmetric measurable function W: R+^2 to [0,1] that satisfies several weak integrability conditions. We call the triple (I,S,W) a graphex, in analogy to graphons, which characterize the (dense) exchangeable graphs on the naturals. I will present some results about the structure and consistent estimation of these random graphs. This is joint work with Victor Veitch. SNAW05 27th July 2016 09:30 to 10:00 Gesine Reinert Estimating the number of communities in a network Community detection has been a main topic in the analysis of networks. While there exist a range of powerful and flexible methods for dividing a network into a specified number of communities, it is an open question how to determine exactly how many communities one should use. We answer this question based on a combination of methods from Bayesian analysis and statistical physics. We demonstrate the approach on a range of real-world examples with known community structure, finding that it is able to determine the number of communities correctly in every case. This is joint work with Mark Newman (University of Michigan) SNAW05 27th July 2016 10:00 to 10:30 Mike West Bayesian dynamic modelling of network flows I discuss Bayesian dynamic modelling for sequential analysis of network flow count data, linking two classes of models which allow fast, scalable and interpretable Bayesian inference. The first class involves sets of "decoupled" univariate state-space models for streaming count data, able to adaptively characterize and quantify network dynamics in real-time. These are then "recoupled" to define "emulation" of a second class of more structured, time-varying gravity models that allow closer and formal dissection of network dynamics and interactions among network nodes. Evolving internet flows on a defined network of web domains in e-commerce applications provide context, data and examples. Bayesian model monitoring theory defines a strategy for sequential model assessment and adaptation in cases of signaled departures of network flow data from model-based predictions. This work builds on the more general concepts and strategy of "decouple/recouple" for Bayesian model emulation. That is, we use decoupled, parallel and scalable analyses of a set of simpler and computationally efficient univariate models, then recouple- on a sound theoretical basis- to rebuild the larger multivariate process for more formal inferences. Co-authors: Xi Chen (Duke University), Kaoru Irie (University of Tokyo), David Banks (Duke University), Jewell Thomas (MaxPoint Interactive Inc.), and Rob Haslinger (The Sync Project). SNAW05 27th July 2016 11:30 to 12:00 Benjamin Bloem-Reddy Random walk models of networks: modeling and inferring complex dependence A signature of many network datasets is strong local dependence, a phenomenon that gives rise to frequently observed properties such as transitive triples, pendants, and structural heterogeneity. One difficulty in modeling such dependence is that the notion of locality may not be well-defined, and it is likely to be heterogeneous throughout the network. Furthermore, models that do not assume some form of conditional independence on the edges typically are intractable or too simplistic to serve as useful statistical models. We introduce a class of models, based on random walks, that allows the scale of dependence to vary; it is able to generate a range of network structures faithful to those observed in real data, and it admits tractable inference procedures. This is joint work with Peter Orbanz. SNAW05 27th July 2016 12:00 to 12:30 Diana Cai Edge-exchangeable graphs, sparsity, and power laws Many popular network models rely on the assumption of (vertex) exchangeability, in which the distribution of the graph is invariant to relabelings of the vertices. However, the Aldous-Hoover theorem guarantees that these graphs are dense or empty with probability one, whereas many real-world graphs are sparse. We present an alternative notion of exchangeability for random graphs, which we call edge exchangeability, in which the distribution of a graph sequence is invariant to the order of the edges. We characterize the class of edge exchangeable models with a paintbox construction, and we demonstrate that edge-exchangeable models, unlike models that are traditionally vertex exchangeable, can exhibit sparsity and power laws. To do so, we outline a general framework for graph generative models; by contrast to the pioneering work of Caron and Fox (2014), models within our framework are stationary across steps of the graph sequence. In particular, our model grows the graph by instantiating more latent atoms of a single random measure as the dataset size increases, rather than adding new atoms to the measure. Joint work with Trevor Campbell and Tamara Broderick. SNAW05 27th July 2016 14:00 to 14:30 Natalia Bochkina Selection of the Regularization Parameter in Graphical Models using Network Characteristics We study gene interaction networks using Gaussian graphical models that represent the underlying graph structure of conditional dependence between random variables determined by their partial correlation or precision matrix. In a high dimensional setting, the precision matrix is estimated using penalized likelihood by adding a penalization term which controls the amount of sparsity in the precision matrix and totally characterizes the complexity and structure of the graph. The most commonly used penalization term is the L1 norm of the precision matrix scaled by the regularization parameter which determines the tradeoff between sparsity of the graph and fit to the data. We propose several procedures to select the regularization parameter in the estimation of graphical models that focus on recovering reliably the appropriate network characteristic of the graph, and discuss their Bayesian interpretation. Performance is illustrated on simulated data as well as on colon tumor gene expression data. This work is extended to reconstructing a differential network between two paired groups of samples. This is joint work with with Adria Caballe Mestres (University of Edinburgh and Biomathematics and Statistics Scotland) and Claus Mayer (Biomathematics and Statistics Scotland). SNAW05 27th July 2016 14:30 to 15:00 Mikkel Schmidt Inference in the infinite relational model The infinite relational model and similar related non-parametric mixture models are very powerful for characterizing the structure in complex networks. But (approximate) inference can be challenging, especially for large networks, and both MCMC and variational inference is often hampered by local optima in the posterior distribution. These issues are discussed in the context of learning a high resolution parcellation of human brain structural connectivity network. SNAW05 27th July 2016 15:00 to 15:30 Sonia Petrone On asymptotic exchangeability and graph evolution Co-author: Sandra Fortini (Bocconi University, Milano) Reinforced processes are of interest in probability and in many related fields, and as preferential attachment schemes in graph modeling. In this talk, I will discuss some classes of reinforced processes with a random reinforcement. Randomly reinforced processes allow to extend the spectrum of application, to incorporate non stationary evolution, competition, selection and other forms of asymmetry. I will present some underlying theory and limit properties, including asymptotic exchangeability, and potential applications as weighted preferential attachment schemes in graph modeling. An interesting aspect is that, depending on the random weights' structure, a sub-region of the sample space may tend to dominate, possibly explaining sparsity phenomena in exchangeable graphs. SNA 28th July 2016 09:15 to 10:15 Alberto Caimo Bayesian ERGMs -- computational and modelling challenges Recent research in statistical social network analysis has demonstrated the advantages and effectiveness of Bayesian approaches to network data. In fact, Bayesian exponential random graph models (BERGMs) are becoming increasingly popular as techniques for modelling relational data in wide range of research areas. However, the applicability of these models in real-world settings is limited by computational complexity. In this seminar we review some of the most recent computational methods for estimating BERGMs as well as extended ERGM-based modelling frameworks for dynamic and heterogenous social networks. SNA 29th July 2016 11:00 to 12:00 James Marron Joint and Individual Variation Explained (JIVE) A major challenge in the age of Big Data is the integration of disparate data types into a data analysis. That is tackled here in the context of data blocks measured on a common set of experimental subjects. This data structure motivates the simultaneous exploration of the joint and individual variation within each data block. This is done here in a way that scales well to large data sets (with blocks of wildly disparate size), using principal angle analysis, careful formulation of the underlying linear algebra, and differing outputs depending on the analytical goals. Ideas are illustrated using mortality, cancer and neuroimaging data sets. This talk reveals several new challenges in network analysis, from a post-JIVE network analysis on the original data types, to an integration of network methods into the heart of the JIVE methodology. SNA 4th August 2016 14:00 to 15:00 Patrick Perry Fitting Hierarchical Models in Large-Scale Recommender Systems Early in the development of recommender systems, hierarchical models were recognized as a tool capable of combining content-based filtering (recommending based on item-specific attributes) with collaborative filtering (recommending based on preferences of similar users). However, as recently as the late 2000s, many authors deemed the computational costs required to fit hierarchical models to be prohibitively high for commercial-scale settings. This talk addresses the challenge of fitting a hierarchical model at commercial scale by proposing a moment-based procedure for estimating the parameters of a hierarchical model. This procedure has its roots in a method originally introduced by Cochran in 1937. The method trades statistical efficiency for computational efficiency. It gives consistent parameter estimates, competitive prediction error performance, and substantial computational improvements. When applied to a large-scale recommender system application and compared to a standard maximum likelihood procedure, the method delivers competitive prediction performance while reducing computation time from hours to minutes. SNA 11th August 2016 14:00 to 15:00 Richard Gibbens Capacity bounds and robustness in multipath networks The recent developments of multipath data transport protocols allow end-systems to explore and share available resources within networks. Through dynamic load balancing over subflows these protocols ensure high levels of robustness to network failures and traffic overloads. In this talk we use fluid models to study the benefits that accrue when load is shared across subflows. We combine insights gained from the fluid models with a precise description of the capacity region for the network and show that our models of multipath protocols approach the boundary of the capacity region as the intensity of the offered traffic approaches a critical value. Additionally, we quantify the extent to which multipath protocols will make a network robust to unforeseen traffic mismatches and link failures. (Joint work with A. Bejan, R. Hancock and D. Towsley) SNA 17th August 2016 16:00 to 17:00 Peter Bickel Rothschild Lecture: From Small Data to Big Data and Back: Statistics and Data Science Modern statistics began with R.A.Fisher’s seminal work in the early twentieth century, with important predecessors such as Karl Pearson, contemporaries such as J.Neyman and successors such as A. Wald. The focus was on small data sets or summaries of larger ones. Analyses were based on simple models and given in terms of estimates, testing and confidence bounds. With the advent of big data, the size of datasets, their complexity and heterogeneity, and the lack of theory to build mechanistic probability models brought new issues to the fore. I will discuss some of these issues: 1. Computation 2. Prediction 3. Sparsity/dimension reduction4. Stability/robustness 5. Reduction to small data SNA 18th August 2016 14:00 to 15:00 Frank Granville Ball Epidemics on networks In this talk we consider two extensions of the standard stochastic epidemic SIR (Susceptible-Infected-Recovered) on a configuration model network. The first extension, which is joint work with Peter Neal (Lancaster University), incorporates casual contacts, i.e. with people chosen uniformly at random from the population. The second extension, which is joint work with Tom Britton (Stockholm University) and David Sirl (University of Nottingham), involves the spread of an epidemic on a random network model which allows for tunable clustering, degree correlation and degree distribution. For each model, approximating branching processes are used to obtain a threshold parameter, which determines whether or not an epidemic with few initial infectives can become established and lead to a major outbreak, and the (approximate) probability and relative final size of a major outbreak. For the model with casual contacts, an embedding argument is used to derive a central limit theorem for the size of a major epidemic; similar methods lead to the asymptotic variance of the giant component in a configuration model random graph. The theory is illustrated by numerical studies, which explore the impact of network properties on the outcome of an epidemic. SNA 22nd August 2016 13:30 to 14:30 Pieter Trapman Emerging epidemics on networks SNA 22nd August 2016 15:00 to 16:00 Carey Priebe Semiparametric spectral modeling of the Drosophila connectome SNA 23rd August 2016 10:00 to 11:00 Elizaveta Levina Interpretable Models for Prediction on Networks with Cohesion SNA 23rd August 2016 11:30 to 12:30 Yulia Gel The Role of Modern Social Media Data in Surveillance and Prediction of Infectious Diseases: from Time Series to Networks The prompt detection and forecasting of infectious diseases with rapid transmission and high virulence are critical in the effective defense against these diseases. Despite many promising approaches in modern surveillance methodology, the lack of observations for near real-time forecasting is still the key challenge obstructing operational prediction and control of disease dynamics. For instance, even CDC data for well monitored areas in USA are two weeks behind, as it takes time to confirm influenza like illness (ILI) as flu, while two weeks is a substantial time in terms of flu transmission. These limitations have ignited the recent interest in searching for alternative near real-time data sources on the current epidemic state and, in particular, in the wealth of health-related information offered by modern social media. For example, Google Flu Trends used flu-related searches to predict a future epidemiological state at a local level, and more recently, Twitter and Wikipedia have also proven to be a very valuable resource for a wide spectrum of public health applications. In this talk we will review capabilities and limitations of such social media data as early warning indicators of influenza dynamics in conjunction with traditional time series epidemiological models and with more recent random network approaches accounting for heterogeneous social interaction patterns. SNA 24th August 2016 10:00 to 11:00 Peter Bickel “Network modeling of topological domains using Hi-C data” Genome-wide chromosome conformation capture techniques such as Hi-C enable the generation of 3D genome contact maps and offer new pathways toward understanding the spatial organization of genome. It is widely recognized that chromosomes form domains of enriched interactions playing significant roles in gene regulation and development. In particular, it is of interest to identify densely interacting, contiguous regions at the sub-megabase scale known as topologically associating domains (TADs), which are believed to be conserved across cell types and even species. Although a few algorithms have been proposed to detect TADs, developing statistical frameworks capable of incorporating the hierarchical nature of TADs and known biological covariates remains a nascent field. We develop a network model that explicitly makes use of cell-type specific CTCF binding sites to detect multiscale domains. Our model leads to a likelihood objective that can be efficiently optimized via relaxation. We demonstrate the domains identified have desirable epigenetic features and compare them across different cell types. SNA 24th August 2016 11:30 to 12:30 Ernst C. Wit Network inference in genomics The whole concept of network inference in genomics has multiple meanings and interpretations. It can refer to “causal” or “topological” considerations, i.e., learning about functional relationships in the genomic system or to considerations about the structure of the overall genomic network. Moreover, the genomic network does not really exists and can refer to gene regulatory networks, cell signalling networks, metabolic networks etc. In this talk I aim to clarify a bit of the underlying genomics in order to motivate a hierarchy of 4 network inference strategies, starting at the single cell level and finishing at global structural network inference. It will involve stochastic and ordinary differential equation models, causal inference and graphical modelling. SNAW02 25th August 2016 10:00 to 10:40 Yi Yu Estimating whole brain dynamics using spectral clustering The estimation of time-varying networks for functional Magnetic Resonance Imaging (fMRI) data sets is of increasing importance and interest. In this work, we formulate the problem in a high-dimensional time series framework and introduce a data-driven method, namely Network Change Points Detection (NCPD), which detects change points in the network structure of a multivariate time series, with each component of the time series represented by a node in the network. NCPD is applied to various simulated data and a resting-state fMRI data set. This new methodology also allows us to identify common functional states within and across subjects. Finally, NCPD promises to offer a deep insight into the large-scale characterisations and dynamics of the brain. This is joint work with Ivor Cribben (Alberta School of Business). SNAW02 25th August 2016 10:40 to 11:00 Pariya Behrouzi Detecting Epistatic Selection in the Genome of RILs via a latent Gaussian Copula Graphical Model Recombinant Inbred Lines (RILs) derived from divergent parental lines can display extensive segregation distortion and long-range linkage disequilibrium (LD) between distant loci on same or different chromosomes. These genomic signatures are consistent with epistatic selection having acted on entire networks of interacting parental alleles during inbreeding. The reconstruction of these interaction networks from observations of pair-wise marker-marker correlations or pair-wise genotype frequency distortions is challenging as multiple testing approaches are under-powered and true long-range LD is difficult to distinguish from drift, particularly in small RIL panels. Here we develop an efficient method for reconstructing an underlying network of genomic signatures of high-dimensional epistatic selection from multi-locus genotype data. The network captures the conditionally dependent short- and long-range LD structure of RIL genomes and thus reveals "aberrant" marker-marker associations that are due to epistatic selection rather than gametic linkage. The network estimation relies on penalized Gaussian copula graphical models, which accounts for large number of markers p and small number of individuals n. A multi-core implementation of our algorithm makes it feasible to estimate the graph in high-dimensions (max markers ~ 3000). We demonstrate the efficiency of the proposed method on simulated datasets as well as on genotyping data in A.thaliana and Maize. SNAW02 25th August 2016 11:30 to 12:10 Ginestra Bianconi Multiplex networks Multiplex networks describe interacting systems where the same set of nodes are linked by different type of interactions. Multiplex networks include social networks, infrastructures and biological systems. Characterizing and modelling the structure of multiplex networks is fundamental for solving network inference problems. Here we will discuss recent results showing that multiplex networks encode more information than their single layers taken in isolation. In fact they are characterized by strongly correlated structures that reveal important statistical properties of the complex system that they describe. SNAW02 25th August 2016 12:10 to 12:30 Mirko Signorelli Modelling community structure in the Italian Parliament: a penalized inference approach In many parliamentary systems, bills can be proposed by a single parliamentarian, or cosponsored by a group of parliamentarians. In the latter case, bill cosponsorship defines a symmetric relation that can be taken as a measure of ideological agreement between parliamentarians. Political scientists have often analysed bill cosponsorship networks in the US Congress, assessing its community structure and the behaviour of minorities therein. In this talk, I will consider data on bill cosponsorship in the Italian Chamber of Deputies over the last 15 years. If compared to the US Congress, a distinguishing feature of the Italian Chamber is the presence of a large number of political groups: the primary purpose of the analysis is thus to infer the pattern of collaborations between these groups. We consider a stochastic blockmodel for edge-valued graphs that views bill cosponsorship as the result of a Poisson process, which explicitly depends on membership of parliamentary groups. As the number of model parameters increases quickly with the number of groups, we pursue a penalized likelihood approach to model estimation that enables us to infer a sparse reduced graph, which summarizes relations between parliamentary groups. Besides showing the effects of gender and geographic proximity on bill cosponsorship, the analysis points out the evolution from a highly polarized political arena, in which Deputies base collaborations on their identification with left or right-wing values, towards an increasingly fragmented Parliament, where a rigid separation of political groups into coalitions does not seem to hold any more, and collaborations beyond the perimeter of coalitions become possible. Joint work with Ernst Wit. Related links: https://arxiv.org/abs/1607.08743 (arXiv preprint) SNAW02 25th August 2016 13:30 to 14:10 Matteo Barigozzi Networks, Dynamic Factors, and the Volatility Analysis of High-Dimensional Financial Series Co-author: Marc Hallin (ECARES-ULB ) We consider weighted directed networks for analysing large panels of financial volatilities.For a given horizon$h$, the weight associated with edge$(i,j)$represents the$h$-step-ahead forecast error variance of variable$i$accounted for by variable$j$innovations. To challenge the curse of dimensionality, we decompose the panel into a factor (market) driven component and an idiosyncratic one modelled by means of a sparse VAR. Inversion of the VAR together with suitable identification restrictions, produce the estimated network, bymeans of which we can assess how systemic each firm is. An analysis of the U.S. stock market demonstrates the prominent role of Financial firms as source of contagion during the 2007-2008 crisis. SNAW02 25th August 2016 14:10 to 14:50 Daniele Durante Bayesian modeling of networks in complex business intelligence problems Co-authors: Sally Paganin (University of Padova, Dept. of Statistical Sciences), Bruno Scarpa (University of Padova, Dept. of Statistical Sciences), David B. Dunson (Duke University, Dept. of Statistical Science) Complex network data problems are increasingly common in many fields of application. Our motivation is drawn from strategic marketing studies monitoring customer choices of specific products, along with co-subscription networks encoding multiple purchasing behavior. Data are available for several agencies within the same insurance company, and our goal is to efficiently exploit co-subscription networks to inform targeted advertising of cross-sell strategies to currently mono-product customers. We address this goal by developing a Bayesian hierarchical model, which clusters agencies according to common mono-product customer choices and co-subscription networks. Within each cluster, we efficiently model customer behavior via a cluster-dependent mixture of latent eigenmodels. This formulation provides key information on mono-product customer choices and multiple purchasing behavior within each cluster, informing targeted cross-sell strategies. We develop simple algorithms for tractable inference, and assess performance in simulations and an application to business intelligence Related Links SNAW02 25th August 2016 14:50 to 15:30 Luca De Benedictis Implementing Propensity Score Matching with Network Data: The effect of GATT on bilateral trade Co-authors: Bruno Arpino, Alessandra Mattei Motivated by the evaluation of the causal effect of the General Agreement on Tariffs and Trade on bilateral international trade flows, we investigate the role of network structure in propensity score matching under the assumption of strong ignorability. We study the sensitivity of causal inference with respect to the presence of characteristics of the network in the set of confounders conditional on which strong ignorability is assumed to hold. We find that estimates of the average causal effect are highly sensitive to the presence of node-level network statistics in the set of confounders. Therefore, we argue that estimates may suffer from omitted variable bias when the relational dimension of units is ignored, at least in our application. SNAW02 25th August 2016 16:00 to 16:20 Nynke Niezink Modeling the dynamics of social networks and continuous actor attributes Co-authors: Tom Snijders (University of Groningen) Social networks and the characteristics of the actors who constitute these networks are not static; they evolve interdependently over time. People may befriend others with similar political opinions or change their own opinion based on that of their friends. The stochastic actor-oriented model is used to statistically model such dynamics. We will present an extension of this model for continuous dynamic actor characteristics. The method available until now assumed actor characteristics to be measured on an ordinal categorical scale, which yielded practical limitations for applied researchers. We now model the interdependent dynamics by a stochastic differential equation for the attribute evolution and a Markov chain model for the network evolution. Although the model is too complicated to calculate likelihoods or estimators in closed form, the stochastic evolution process can be easily simulated. Therefore, we estimate model parameters using the method of moments and the Robbins-Monro algorithm for stochastic approximation. We will illustrate the proposed method by a study of the relation between friendship and obesity, analyzing body mass index as continuous dynamic actor attribute. SNAW02 25th August 2016 16:20 to 17:00 Katherine McLaughlin Analysis of Networks with Missing Data with Application to the National Longitudinal Study of Adolescent Health Co-authors: Krista J. Gile (University of Massachusetts at Amherst), Mark S. Handcock (University of California, Los Angeles) It is common in the analysis of social network data to assume that it represents a census of the networked population of interest. Often the data result from sampling of the networked population via a known mechanism. However, most social network analysis ignores the problem of missing data by including only actors with complete observations. In this talk we address the modeling of networks with missing data, developing previous ideas in missing data, network modeling, and network sampling. We show the value of the mean value parametrization to study differences between modeling approaches. We also develop goodness-of-fit techniques to better understand model fit. The ideas are motivated by an analysis of a friendship network from the National Longitudinal Study of Adolescent Health. The work presented is by Krista J. Gile and Mark S. Handcock. SNAW02 26th August 2016 09:00 to 09:40 Alberto Roverato The Networked Partial Correlation and its Application to the Analysis of Genetic Interactions Genetic interactions confer robustness on cells in response to genetic perturbations. This often occurs through molecular buffering mechanisms that can be predicted using, among other features, the degree of coexpression between genes, commonly estimated through marginal measures of association such as Pearson or Spearman correlation coefficients. However, marginal correlations are sensitive to indirect effects and often partial correlations are used instead. Yet, partial correlations convey no information about the (linear) influence of the coexpressed genes on the entire multivariate system, which may be crucial to discriminate functional associations from genetic interactions. To address these two shortcomings, here we propose to use the edge weight derived from the covariance decomposition over the paths of the associated gene network. We call this new quantity the networked partial correlation and use it to analyze genetic interactions in yeast. More concretely, in its well-characterized leucine biosynthesis pathway and on a previously published data set of genome-wide quantitative genetic interaction profiles. In both cases, networked partial correlations substantially improve the identification of genetic interactions over classical coexpression measures. SNAW02 26th August 2016 09:40 to 10:20 Reza Mohammadi Bayesian modelling of Dupuytren disease using Gaussian copula graphical models Co-authors: Fentaw Abegaz (University of Liege, Belgium), Edwin van den Heuvel (Eindhoven University of Technology, The Netherlands), Ernst Wit (University of Groningen, The Netherlands) Dupuytren disease is a fibroproliferative disorder with unknown etiology that often progresses and eventually can cause permanent contractures of the affected fingers. In this talk, we provide a computationally efficient Bayesian framework to discover potential risk factors and investigate which fingers are jointly affected. Our Bayesian approach is based on Gaussian copula graphical models, which provide a way to discover the underlying conditional independence structure of variables in multivariate mixed data. In particular, we combine the semiparametric Gaussian copula with extended rank likelihood to analyse multivariate mixed data with arbitrary marginal distributions. For the structural learning, we construct a computationally efficient search algorithm using a trans-dimensional MCMC algorithm based on a birth-death process. In addition, to make our statistical method easily accessible to other researchers, we have implemented our method in C++ and provide an interface with R software as an R package BDgraph, which is freely available online. SNAW02 26th August 2016 10:20 to 10:40 Silvia Ioana Fierascu Applying network science to political problems. A conceptual and analytical framework for understanding and predicting corruption risks in business-political networks This short talk is a summary of my in-progress PhD dissertation, “The Network Phenomenon of State Capture. Network Dynamics, Unintended Consequences, and Political-Business Elite Relations in Hungary.” The thesis comes as a critique to the typical conceptual and methodological approaches in political science to studying institutionalized grand corruption. It proposes a novel conceptual and analytical framework, rooted in network theory and using network scientific research designs and methods, to better understand the complex case of a successful post-communist democracy turning hybrid regime - Hungary. To this end, I analyze the formation, evolution, and development of different types of business-political elite and organizational networks, and their effects on the quality of the state and the market, from the beginning of the democratic regime until today’s illiberal democracy. Using two large empirical datasets - interlocking directorates (business-political elite and organizational networks, 1990-2010) and corruption risks in issuer-winner public procurement networks (2009-2012), I model these multi-mode, dynamic, and projected networks using statistical methods for network data (e.g., comparisons to random models, motif detection, ergms) and machine learning algorithms (e.g., regression trees, random forests). In this presentation, I will showcase some of the main findings and future research plans. The study is part of a broader research agenda - building a robust conceptual and (network) analytical framework for a large cross-country analysis of corruption risks in public procurement, with data-driven and evidence-based policy recommendations. I will end the talk with highlighting some of the successes, challenges, and questions I have encountered in applying network science to better understand political problems. SNAW02 26th August 2016 10:40 to 11:00 Gwenael Leday Incorporating biological information into network inference using structured shrinkage High-throughput biotechnologies such as microarrays provide the opportunity to study theinterplay between molecular entities, which is central to the understanding of disease biology.The statistical description and analysis of this interplay is naturally carried out with Gaussiangraphical models in which nodes represent molecular variables and edges between them representinteractions. Inferring the edge set is, however, a challenging task as the number of parametersto estimate easily is much larger than the sample size. A conventional remedy is to regularize orpenalize the model likelihood. In network models, this is often done locally in the neighbourhoodof each node. However, estimation of the many regularization parameters is often dicult andcan result in large statistical uncertainties. We show how to combine local regularization withglobal shrinkage of the regularization parameters, via empirical Bayes (EB), to borrow strengthbetween nodes and improve inference. Furthermore, we show how one can use EB so the level ofregularization may dier across an arbitrary number of predened groups of interactions. Suchauxiliary information is often available in Biology. It is shown that accurate prior information cangreatly improve the reconstruction of the network, but need not harm the reconstruction if wrong. SNAW02 26th August 2016 11:30 to 12:10 Juliane Manitz Source Estimation for Propagation Processes on Complex Networks with an Application to Delays in Public Transportation Systems Co-authors: Jonas Harbering (University of Göttingen), Marie Schmidt (Erasmus University Rotterdam), Thomas Kneib (University of Göttingen), Anita Schöbel (University of Göttingen) The correct identification of the source of a propagation process is crucial in many research fields. As a specific application, we onsider source estimation of delays in public transportation networks. We propose two approaches: a effective distance median and a backtracking method. The former is based on the structurally generic effective distance-based approach for the identification of infectious disease origins, and the latter is specifically designed for delay propagation. We examine the performance of both methods in simulation studies and in an application to the German railway system, and compare the results to a centrality-based approach for source detection. SNAW02 26th August 2016 12:10 to 12:30 Carsten Chong Contagion in Financial Systems: A Bayesian Network Approach We conduct a probabilistic analysis for a structural default model of interconnected financial institutions. For all possible network structures we characterize the joint default distribution of the system using Bayesian network methodologies. Particular emphasis is given to the treatment and consequences of cyclic financial linkages. We further demonstrate how Bayesian network theory can be applied to detect contagion channels within the financial network, to measure the systemic importance of selected entities on others, and to compute conditional or unconditional probabilities of default for single or multiple institutions. SNAW02 26th August 2016 13:30 to 14:10 Ben Parker Optimal Design of Experiments on Connected Units with Application to Social Networks Co-authors: Steven G. Gilmour and John Schormans When experiments are performed on social networks, it is difficult to justify the usual assumption of treatment-unit additivity, due to the connections between actors in the network. We investigate how connections between experimental units affect the design of experiments on those experimental units. Specifically, where we have unstructured treatments, whose effects propagate according to a linear network effects model which we introduce, we show that optimal designs are no longer necessarily balanced; we further demonstrate how experiments which do not take a network effect into account can lead to much higher variance than necessary and/or a large bias. We show the use of this methodology in a very wide range of experiments in agricultural trials, and crossover trials, as well as experiments on connected individuals in a social network. SNAW02 26th August 2016 14:10 to 14:50 Eric Laber On-line estimation of an optimal treatment allocation strategy for the control of white-nose syndrome in ba Co-authors: Nick J. Meyer (North Carolina State University), Brian J. Reich (North Carolina State University), Krishna Pacifici (North Carolina State University), John Drake (University of Georgia), Jaime Collazo (North Carolina State University) Emerging infectious diseases are responsible for high morbidity and mortality, economic damages to affected countries, and are a major vulnerability for global stability. Technological advances have made it possible to collect, curate, and access large amounts of data on the progression of an infectious disease. We derive a framework for using this data in real-time to inform disease management. We formalize a treatment allocation strategy as a sequence of functions, one per treatment period, that map up-to-date information on the spread of an infectious disease to a subset of locations for treatment. An optimal allocation strategy optimizes some cumulative outcome, e.g., the number of uninfected locations, the geographic footprint of the disease, or the cost of the epidemic. Estimation of an optimal allocation strategy for an emerging infectious disease is challenging because spatial proximity induces interference among locations, the number of possible allocations is exponential in the number of locations, and because disease dynamics and intervention effectiveness are unknown at outbreak. We derive a Bayesian online estimator of the optimal allocation strategy that combines simulation-optimization with Thompson sampling. The proposed estimator performs favorably in simulation experiments. This work is motivated by and illustrated using data on the spread of white-nose syndrome a highly fatal infectious disease devastating bat populations in North America. Related Links SNAW02 26th August 2016 14:50 to 15:10 Steffen Lauritzen What COSTNET can do for and with you! Statistical Network Science is one of the hot topics of this moment, as evidenced by this Isaac Newton Programme. Network phenomena are widespread and a multitude of network data formats appear in a variety of applications. Often the applied scientist has only a limited awareness of the available modelling and inference techniques available. We will describe infectious disease phenomena, social network models and genetic regulatory networks COSTNET is a European Cooperation in Science and Technology funded initiative to bring together quantitative network modellers in Europe and beyond to collaborate between 2016 - 2020 on joint, capacity building projects to push the boundary of statistical network science. As part of the MC committee of this COST Action, I would like to invite interested parties to join and be part of this initiative. SNA 1st September 2016 14:00 to 15:00 Susan Holmes Microbial Community Networks in the Human Microbiome The study of the human microbiome involves the integration and estimation of trees and networks from 16S rRNA read counts. I will show several examples of double embedding where we use trees and networks and covariates of interest to establish associations between microbial communities and preterm birth, resilience to antibiotic treatment and batch effects. We enrich the standard ecological toolbox of ordination methods using generalized singular value decomposition with weights built from graphs and test the effect of covariates using generalizations of the Friedman-Rafsky tests. This talk contains joint work with Elizabeth Purdom, Joey McMurdie, Julia Fukuyama, Kris Sankaran and Lan Nguyen. SNA 8th September 2016 14:00 to 15:00 Malwina Luczak Extinction time for the weaker of two competing SIS epidemics We consider a simple stochastic model for the spread of a disease caused by two virus strains in a closed homogeneously mixing population of size N. In our model, the spread of each strain is described by the stochastic logistic SIS epidemic process in the absence of the other strain, and we assume that there is perfect cross-immunity between the two virus strains, that is, individuals infected by one strain are temporarily immune to re-infections and infections by the other strain. For the case where one strain has a strictly larger basic reproductive ratio than the other, and the stronger strain on its own is supercritical (that is, its basic reproductive ratio is larger than 1), we derive precise asymptotic results for the distribution of the time when the weaker strain disappears from the population, that is, its extinction time. We further consider what happens when the difference between the two reproductive ratios may tend to 0. In our proof, we set out an approach for establishing a long-term fluid limit approximation for a sequence of Markov chains in the vicinity of a stable fixed point of the limit drift equations. This is joint work with Fabio Lopes. SNA 15th September 2016 14:00 to 15:00 Mark Newman Inference and large-scale structure in networks Characterization of network structure as focused on two different scales: small-scale structure, represented by properties such as degrees, correlations, and clustering, and large-scale structure, which is most commonly presented in terms of modules and community detection. This talk will focus on large-scale structure, but with the aim of getting away from community structure, which is well-trodden ground, and looking at other forms. Working with generative models and a range of model-based inference techniques, I'll talk about overlapping communities, hierarchical structure, latent-space structure, ranking, and core-periphery structure, among others. SNA 29th September 2016 11:00 to 12:00 Lutz Warnke The phase transition in the random d-process One of the most interesting features of Erdös-Rényi random graphs is the percolation phase transition', where the global structure intuitively changes from only small components to a single giant component plus small ones. In this talk we discuss the percolation phase transition in the random d-process, which corresponds to a natural algorithmic model for generating random regular graphs that differs from the usual configuration model (starting with an empty graph on n vertices, the random d-process evolves by sequentially adding new random edges so that the maximum degree remains at most d). Our results on the phase transition solve a problem of Wormald from 1997, and verify a conjecture of Balinska and Quintas from 1990. Based on joint work with Nick Wormald. SNA 6th October 2016 14:00 to 15:00 Carey Priebe Limit theorems for eigenvectors of the normalized Laplacian for random graphs We prove a central limit theorem for the components of the eigenvectors corresponding to the$d\$ largest eigenvalues of the normalized Laplacian matrix of a finite dimensional random dot product graph. As a corollary, we show that for stochastic blockmodel graphs, the rows of the spectral embedding of the normalized Laplacia converge to multivariate normals and furthermore the mean and the covariance matrix of each row are functions of the associated vertex's block membership. Together with prior results for the eigenvectors of the adjacency matrix, we then compare, via the Chernoff information between multivariate normal distributions, how the choice of embedding method impacts subsequent inference. We demonstrate that neither embedding method dominates with respect to the inference task of recovering the latent block assignments.
SNA 13th October 2016
11:30 to 12:30
Svante Janson Component sizes in random graphs with given vertex degrees (the configuration model)
We consider uniform random graphs with given vertex degrees; these are conveniently generated by the configuration model. We discuss some results on the size of the components, in particular the largest one, from the fundamental results by Molloy and Reed (1995, 1998) on the phase transition to recent work on the barely supercritical case (joint work with Malwina Luczak and Remco van der Hofstad; still in progress).

SNA 20th October 2016
14:00 to 15:00
Peter Orbanz The role of invariance in learning from random graphs and structured data
Graphon models can be derived from the concept of exchangeability, which has long played an important role in (Bayesian) statistics. Exchangeability is, in turn, a special case of probabilistic invariance, or symmetry. This talk will be an attempt to explain, in as non-technical a manner as possible, why and how invariance is useful in statistics. I will cover some general results, discuss how different notions of exchangeability fit into the picture, and how invariance can be regarded as a consequence of assumptions on the process by which the data was sampled. All of this ultimately concerns the problem: What can we learn about an infinite random structure if only a finite sample from a single realization is observed?

SNA 27th October 2016
14:00 to 15:00
H. Vincent Poor Some Network Analysis Problems Motivated by the Smart Grid
Smart grid involves the imposition of an advanced cyber layer atop the physical layer of the electricity grid, in order to improve the efficiency, security and cost of electricity use and distribution, and to allow for greater decentralization of power generation and management.  This cyber-physical setting motivates a number of problems in network analysis, and this talk will briefly describe several of these problems together with approaches to solving them. These include competitive privacy in which multiple grid entities seek an optimal trade-off between privacy lost and utility gained from information sharing; distributed inference in which both the cyber and physical network topologies have roles to play in achieving consensus; real-time topology identification which helps in the mitigation of cascading failures; and attack construction which seeks an understanding of optimal strategies for attacking the grid in support of the design of effective countermeasures.

SNA 3rd November 2016
14:00 to 15:00
Ayalvadi Ganesh Steiner trees in the stochastic mean-field model of distance
Consider the complete graph on n nodes with iid exponential weights of unit mean on the edges. A number of properties of this model have been investigated including first passage percolation, diameter, minimum spanning tree, etc. In particular, Janson showed that the typical distance between two nodes scales as (log n)/n, whereas the diameter (maximum distance between any two nodes) scales as 3(log n)/n. Bollobas et al. showed that, for any fixed k, the weight of the Steiner tree connecting k typical nodes scales as (k-1)log n/n, which recovers Janson's result for k=2. We extend this result to show that the worst case k-Steiner tree, over all choices of k nodes, has weight scaling as (2k-1)log n/n.   Further, Janson derived the limiting distribution of the typical distance between two nodes. We refine the result of Bollobas et al. and present a perhaps surprising result in this direction for the typical Steiner tree which has implications for the limiting shape of the 3-Steiner tree.   This is joint work with Angus Davidson and Balint Toth.
SNA 9th November 2016
14:00 to 15:00
Vince Lyzinski Information Recovery in Shuffled Graphs via Graph Matching
In a number of methodologies for joint inference across graphs, it is assumed that an explicit vertex correspondence is a priori known across the vertex sets of the graphs. While this assumption is often reasonable, in practice these correspondences may be unobserved and/or errorfully observed, and graph matching—aligning a pair of graphs to minimize their edge disagreements—is used to align the graphs before performing subsequent inference. Herein, we explore the duality between the loss of mutual information due to an errorfully observed vertex correspondence and the ability of graph matching algorithms to recover the true correspondence across graphs. We then demonstrate the practical effect that graph shuffling—and matching—can have on subsequent inference, with examples from two sample graph hypothesis testing and joint graph clustering.
SNA 10th November 2016
10:00 to 11:00
Marta Zlatic Circuits principles of memory-based behavioral choice
Choosing which behavior to generate based on sensory inputs and previous experience is crucial for survival. To understand the circuit principles by which experience-driven behavioral choices are made it is essential to determine the architecture of networks that mediate these functions, and determine the causal relationships between the structural motifs and function. We combine three levels of analysis: i) circuit mapping with synaptic resolution; ii) physiological measurements of neural activity and iii) neural manipulation in freely behaving animals to dissect the logic of memory-based behavioral choice in Drosophila larva. In an EM volume that spans the entire nervous system we reconstructed a complete wiring diagram of the higher order parallel fiber system for associative learning, the Mushroom Body (MB), including the pathways from the conditioned (CS) and unconditioned sensory (US) neurons to the MB, and the patterns of interactions of MB output neurons with circuits that mediate innate responses to CS and US. Using calcium imaging and optogenetic manipulation of individual MB input and output neurons we elucidated the logic of punishment and reward encoding by the ensemble of dopaminergic MB input neurons and the logic by which the MB interacts with pathways for innate responses to olfactory stimuli in the larva brain, the Lateral Horn (LH). I will discuss key findings from these studies.

SNA 10th November 2016
11:00 to 12:00
Albert Cardona The synapse-level wiring diagram of a center for learning and memory, the insect mushroom body
Associating stimuli with positive or negative outcomes is essential for survival, but a complete wiring diagram of a higher-order circuit supporting associative memory has not been previously available. In this talk, I will discuss our findings--in collaboration with Marta Zlatic, Andreas Thum and Larry F. Abbott--on circuits for associative memory based on the full reconstruction at synaptic resolution of one such circuit, the insect mushroom body (MB), as mapped in the larva of Drosophila. Our results provide single-animal cross-hemispheric evidence that Kenyon cells (KCs)--the parallel fibers--integrate random combinations of stimuli, but we also found a set of KCs that are wired in an ordered way to relay signals from single projection neurons (PNs).   We show that combining these two kinds of KCs maximizes the dimension of the odor representation. Memories are formed when co-activation of KCs and modulatory neurons (mostly dopaminergic, DAN) alters the strength of KC synapses onto MB output neurons (MBONs) within discrete MB regions known as compartments.  We found a novel canonical circuit in these compartments with previously unpredicted connections: a reciprocal KC-to-DAN connection and a DAN-to-MBON connection, suggesting multiple potential sites of synaptic plasticity.  Counting of individual synaptic connections allowed us to assess the relative contributions of olfactory, non-olfactory, modulatory and extra-MB signals onto each MBON, providing evidence that MBONs contextualize associative memories with a variety of other information. We also found a stereotyped MBON-MBON circuit with reciprocal inhibition among MB compartments of opposing valences suggestive of competitive interactions that could enhance the selection of learned response. Finally, feedback connections from MBONs to neuromodulatory neurons provide a possible mechanism for existing memories to affect the formation of new ones. The complete circuit map of the MB should guide future functional studies of this center of learning and memory.

SNA 10th November 2016
14:00 to 15:00
Charlotte Deane Evaluating modules in molecular networks in light of annotation bias
Many complex systems can be represented as networks and the detection of novel functional modules in such networks has become an important step in many research areas.  In this talk I will describe a method for evaluating potential modules that overcomes annotation biases that often occur in networks. I will demonstrate its utility in the area of biological networks. In biological networks, in the absence of gold standard functional modules, functional annotations are often used to verify whether detected modules/communities have biological meaning. However, as I will show, the uneven distribution of functional annotations means that such evaluation methods favour communities of well-studied proteins. We propose a novel framework for the evaluation of communities as functional modules.  Our proposed framework, CommWalker, takes communities as inputs and evaluates them  in  their  local  network  environment  by  performing  short  random  walks.   We test CommWalker's ability to overcome annotation bias using input communities from four community detection methods on two protein interaction networks.  We find that modules accepted by CommWalker are similarly co-expressed as those accepted by current methods.  Crucially, CommWalker performs well not only in well-annotated regions, but also in regions otherwise obscured by poor annotation.   CommWalker community prioritization both faithfully captures well-validated communities, and identifies functional modules that may correspond to more novel biology. If there is time at the end of the seminar I will also discuss a method for network comparison that we have recently developed that is aimed at identifying common organizational principles in networks. The methodology is simple, intuitive and is applicable in a wide variety of settings ranging from the functional classiﬁcation of proteins to tracking the evolution of the world trade network.

SNA 17th November 2016
14:00 to 15:00
Patrick Flandrin Wavelets in the making — A publication network perspective
Publication networks are useful scientometric indicators for « drawing » a scientific landscape by unveiling individual and/or thematic interactions. From a historical as well as sociological perspective, an important issue is to capture the dynamics of the corresponding graphs, this being especially true when we face the emergence of a new paradigm that aggregates, around new concepts, activities that were previously scattered in different fields. In exact sciences, a typical example of such an emergence is provided by « wavelets », a mathematical object that generalizes Fourier analysis and has led to numerous scientific and technological applications since their introduction in the mid-80’s. Basing the analysis on ideas of heterogeneous networks, communities detection and flow diagrams, we will present some results in this direction, with the two-fold objective of enlightening the origins of what has become a standard tool and a well-recognized keyword, and of evidencing the new organizations of the scientific landscape that have followed.
Joint work with Pablo Jensen and Matteo Morini.

SNA 24th November 2016
14:00 to 15:00
Mohamed-Ali Belabbas The geometry of optimal experiment design for vector-valued Ornstein-Uhlenbeck processes.
Ornstein-Uhlenbeck processes are commonly used as models in engineering, biology and finance. Consider the estimation of the state of such processes from linear, noisy measurements; the Kalman filter is known to be the minimum mean square error estimator when the measurement noise is Gaussian. We address here how to design the measurements that minimize the error afforded by the Kalman filter. This problem of optimal experiment design, which is almost as old as the Kalman filter itself,  is however not convex. As a consequence, many ad hoc methods have been used over the years to solve it. We show in this talk how a geometric approach allows us to characterize and obtain the optimal designs exactly. This optimal design yields the lowest possible estimation error from linear measurements with a fixed signal to noise ratio.

SNA 1st December 2016
14:00 to 15:00
Simon Lunagomez Valid inference from non-ignorable network sampling mechanisms
Consider a population of individuals and a  network that encodes social connections among them.   We are interested in making inference on super-population estimands that are a function of both individuals' responses and of the network, from a sample. Neither the sampling frame nor the network are available. However, the sampling mechanism implicitly leverages the network to recruit individuals, thus partially revealing social interactions among the individuals in the sample, as well as their responses.   This is a common setting that arises, for instance, in epidemiology and healthcare, where samples from hard-to-reach populations are collected using link-tracing mechanisms, including respondent-driven sampling. Contrary to random sampling, the probability models of these network sampling mechanisms carry information about the estimands of interest, such as the incidence of certain diseases in the target population.   In this work, we study statistical properties of popular network sampling mechanisms. We formulate the estimation problem in terms of Rubin's inferential framework to explicitly   account for social network structure. We then identify key modeling elements that lead to inferences with good frequentist properties when dealing with data collected through non-ignorable network sampling mechanisms.   We demonstrate these methods on a study of the  incidence of HIV in Brazil.   Joint work with Edoardo Airoldi

SNAW04 12th December 2016
09:30 to 10:30
Tom Snijders Continuous-time statistical models for network panel data
For the statistical analysis of network panel data even with as little as 2 waves, it is very fruitful to use models that assume a continuous-time Markov network process, observed only at the moments of observation for the panel. This is analogous to the use of continuous-time models for classical (non-network) panel data proposed by Bergstrom, Singer, and others. For network data such an approach was proposed already by Coleman in 1964. The advantage of this approach is that it provides a simple way to represent the feedback that is inherent in network dynamics, and the model can be defined by just specifying the conditional probability of a tie change, given the current state of the network.

This approach is used in the Stochastic Actor-Oriented Model of Snijders (2001) and in the Longitudinal Exponential Random Graph Model of Snijders & Koskinen (2013). The first of these is actor-oriented, i.e., tie changes are modelled as choices by actors, which among their outgoing tie variables to toggle; the second is tie-oriented, i.e., tie changes are modelled as toggles of single tie variables. Both are generalized linear models for the (unobserved) continuous-time process, with all the practical modelling flexibility of such models. Estimation for panel data is more involved, requiring a simulation approach. Estimators have been developed along several lines, including Method of Moments, Generalized Method of Moments, Maximum Likelihood, and Bayesian, and are available in the R package RSiena. This package is widely applied in empirical social network studies in the social sciences.

This presentation treats the basic definition of the model and some of its extensions, e.g., co-evolution of multivariate networks. Some open problems, from a mathematical and from an applied perspective, will be mentioned.

References
• Ruth M. Ripley, Tom A.B. Snijders, Zsófia Boda, András Vörös, and Paulina Preciado, 2016. Manual for SIENA version 4.0. Oxford: University of Oxford, Department of Statistics; Nuffield College. http://www.stats.ox.ac.uk/siena/
• Tom A.B. Snijders, 2001. The statistical evaluation of social network dynamics. Sociological Methodology, 31, 361-395.
• Tom A.B. Snijders and Johan Koskinen, 2013. "Longitudinal Models". Chapter 11 (pp. 130-140) in D. Lusher, J. Koskinen, and G. Robins, Exponential Random Graph Models for Social Networks, Cambridge: Cambridge University Press.
• Tom A.B. Snijders, Gerhard G. van de Bunt, G. G., and Christian E.G. Steglich, 2010. Introduction to actor-based models for network dynamics. Social Networks, 32, 44–60.

SNAW04 12th December 2016
11:15 to 12:00
Eric Kolaczyk Dynamic causal networks with multi-scale temporal structure
Co-authors: Xinyu Kang (Boston University), Apratim Ganguly (Boston University)

I will discuss a novel method to model multivariate time series using dynamic causal networks. This method combines traditional multi-scale modeling and network based neighborhood selection, aiming at capturing the temporally local structure of the data while maintaining the sparsity of the potential interactions. Our multi-scale framework is based on recursive dyadic partitioning, which recursively partitions the temporal axis into finer intervals and allows us to detect local network structural changes at varying temporal resolutions. The dynamic neighborhood selection is achieved through penalized likelihood estimation, where the penalty seeks to limit the number of neighbors used to model the data. Theoretical and numerical results describing the performance of our method will be presented, as well as an application in computational neuroscience.
SNAW04 12th December 2016
13:30 to 14:30
Jennifer Chayes Graphons and Machine Learning: Modeling and Estimation of Sparse Massive Networks - Part I
There are numerous examples of sparse massive networks, in particular the Internet, WWW and online social networks.  How do we model and learn these networks?  In contrast to conventional learning problems, where we have many independent samples, it is often the case for these networks that we can get only one independent sample.  How do we use a single snapshot today to learn a model for the network, and therefore be able to predict a similar, but larger network in the future?  In the case of relatively small or moderately sized networks, it’s appropriate to model the network parametrically, and attempt to learn these parameters.  For massive networks, a non-parametric representation is more appropriate.  In this talk, we first review the theory of graphons, developed over the last decade to describe limits of dense graphs, and the more the recent theory describing sparse graphs of unbounded average degree, including power-law graphs.  We then show how to use these graphons as nonparametric models for sparse networks.  Finally, we show how to get consistent estimators of these non-parametric models, and moreover how to do this in a way that protects the privacy of individuals on the network.

Part I of this talk reviews the theory of graph convergence for dense and sparse graphs.  Part II uses the results of Part I to model and estimate sparse massive networks.
SNAW04 12th December 2016
14:30 to 15:30
Christian Borgs Graphons and Machine Learning: Modeling and Estimation of Sparse Massive Networks - Part II
There are numerous examples of sparse massive networks, in particular the Internet, WWW and online social networks.  How do we model and learn these networks?  In contrast to conventional learning problems, where we have many independent samples, it is often the case for these networks that we can get only one independent sample.  How do we use a single snapshot today to learn a model for the network, and therefore be able to predict a similar, but larger network in the future?  In the case of relatively small or moderately sized networks, it’s appropriate to model the network parametrically, and attempt to learn these parameters.  For massive networks, a non-parametric representation is more appropriate.  In this talk, we first review the theory of graphons, developed over the last decade to describe limits of dense graphs, and the more the recent theory describing sparse graphs of unbounded average degree, including power-law graphs.  We then show how to use these graphons as nonparametric models for sparse networks.  Finally, we show how to get consistent estimators of these non-parametric models, and moreover how to do this in a way that protects the privacy of individuals on the network.

Part I of this talk reviews the theory of graph convergence for dense and sparse graphs.  Part II uses the results of Part I to model and estimate sparse massive networks.
SNAW04 12th December 2016
16:00 to 16:45
Jeanette Janssen Recognizing graphs formed by spatial random processes
In many real life applications, network formation can be modelled using a spatial random graph model: vertices are embedded in a metric space S, and pairs of vertices are more likely to be connected if they are closer together in the space. A general geometric graph model that captures this concept is G(n,w), where w is a  symmetric "link probability" function from SxS to [0,1]. To guarantee the spatial nature of the random graph, we requite that this function has the property that, for fixed x in S, w(x,y) decreases as y is moved further away from x. The function w can be seen as the graph limit of the sequence G(n,w) as n goes to infinity.
We consider the question: given a large graph or sequence of graphs, how can we determine if they are likely the results of such a general geometric random graph process? Focusing on the one-dimensional (linear) case where S=[0,1], we define a graph parameter \Gamma and use the theory of graph limits to show that this parameter indeed measures the compatibility of the graph with a linear model.
This is joint work with Huda Chuangpishit, Mahya Ghandehari, Nauzer Kalyaniwalla, and Israel Rocha

SNAW04 13th December 2016
09:30 to 10:30
Tom Britton A network epidemic model with preventive rewiring: comparative analysis of the initial phase
Co-authors: Joan Saldana (Universitat de Girona), David Juher (Universitat de Girona)

This talk is concerned with stochastic SIR and SEIR epidemic models on random networks in which individuals may rewire away from infected neighbors at some rate ω (and reconnect to non-infectious individuals with probability α or else simply drop the edge if α=0), so-called preventive rewiring. The models are denoted SIR-ω and SEIR-ω, and we focus attention on the early stages of an outbreak, where we derive expression for the basic reproduction number R0 and the expected degree of the infectious nodes E(DI) using two different approximation approaches. The first approach approximates the early spread of an epidemic by a branching process, whereas the second one uses pair approximation. The expressions are compared with the corresponding empirical means obtained from stochastic simulations of SIR-ω and SEIR-ω epidemics on Poisson and scale-free networks. To appear in Bull Math Biol.
SNAW04 13th December 2016
11:15 to 12:00
Maria Deijfen Birds of a feather or opposites attract - effects in network modelling
We study properties of some standard network models when the population is split into two types and the connection pattern between the types is varied. The studied models are generalizations of the Erdös-Renyi graph, the configuration model and a preferential attachment graph. For the Erdös-Renyi graph and the configuration model, the focus is on the component structure. We derive expressions for the critical parameter, indicating when there is a giant component in the graph, and study the size of the largest component by aid of simulations. For the preferential attachment model, we analyze the degree distributions of the two types and derive explicit expressions for the degree exponents.

Joint work with Robert Fitzner (Eindhoven University of Technology).

SNAW04 13th December 2016
13:45 to 14:45
Susan Holmes Study of the dynamics of bacterial communities in the Human Microbiome
Co-authors: Kris Sankaran (Stanford), Julia Fukuyama (Stanford), Lan Nguyen (Stanford), Diana Proctor (Stanford), David Relman (Stanford), Sergio Bacallado (Cambridge), Boyu Ren (Stanford), Pratheepa Jeganathan (Stanford)

The human microbiome is a complex assembly of bacteria that are sensitive to many perturbations. In several longitudinal analyses we study perturbations of bacterial community networks over time. For this, we have developed specific tools for modeling the vaginal, intestinal and oral microbiomes under these different perturbations (pregnancy, hypo-salivation inducing medications and antibiotics are some examples).

A suite of statistical tools written in R based on a Bioconductor package (phyloseq) allows for easy normalization, visualization and statistical testing of the longitudinal multi-table data composed of 16sRNA reads combined with clinical data, transcriptomic and metabolomic profiles. Challenges we have had to address include information leaks, the heterogeneity of the data, multiplicity of choices during the analyses and validation of results.

Each different body site requires a different modeling strategy as some sites form tight communities easily modeled with Stochastic Block Models whereas others show more diverse assemblies that require complex latent variable models.

SNAW04 13th December 2016
14:45 to 15:30
Danielle Bassett Dynamic networks in the human brain
Each area of the human brain plays a unique role in processing information gleaned from the external world and in driving our responses to that external world via behavior. However, the brain is far from a set of disconnected building blocks. Instead, parts of the brain communicate with one another in complex spatiotemporal patterns that enable human behavior. Understanding this spatio-temporal complexity requires a paradigmatic shift in our conceptual approaches, empirical thrusts, and quantitative methods. In this talk, I will describe the recent use of tools from network science to understand the structure and function of the human brain. With these novel approaches, we can begin to characterize the connectome’’, a model representation of neurobiological data that encapsulates both constituent elements of the brain (network nodes) and their interactions with one another (network edges). In a critical innovation, we imbue network edges with temporal dependence to capture the dynamics of the ever-reconfiguring brain communication patterns that support cognition. I will recount the utility of dynamic network approaches in not only understanding, but also predicting individual differences in adaptive functions such as learning, and in delineating healthy versus diseased brain communication dynamics. An emerging frontier, dynamic network neuroscience provides a powerful new conceptual and mathematical framework with which to understand adaptive human capabilities because it embraces the inherently evolving, interconnected nature of neurophysiological phenomena underlying human thought.
SNAW04 13th December 2016
16:00 to 16:45
Stéphane Robin Detecting change-points in the structure of a network: Exact Bayesian inference
Joint work with Loïc Schwaller

We consider the problem of change-point detection in multivariate time-series, typically the expression of a set of genes, or the activity of a set of brain regions over time. We adopt the framework of graphical models to described the dependency between the series. We are interested in the situation where the graphical model is affected by abrupt changes throughout time. In the above examples, such changes correspond to gene or brain region rewiring.

We demonstrate that it is possible to perform exact Bayesian inference whenever one considers a simple class of undirected graphs called spanning trees as possible structures. We are then able to integrate on both the graph and segmentation spaces at the same time by combining classical dynamic programming with algebraic results pertaining to spanning trees. In particular, we show that quantities such as posterior distributions for change-points or posterior edge probabilities over time can efficiently be obtained.

We illustrate our results on both synthetic and experimental data arising from molecular biology and neuroscience.
SNAW04 13th December 2016
16:45 to 17:30
Harry Crane Markov process models for time-varying networks
Many models for dynamic networks, such as the preferential attachment model, describe evolution by sequential addition of vertices and/or edges. Such models are not suited to networks whose connectivity varies over time, as in social relationships and other kinds of temporally varying interactions. For modeling in this latter setting, I develop the general theory of exchangeable Markov processes for time-varying networks and discuss relevant consequences.
SNAW04 14th December 2016
09:30 to 10:30
Frank Granville Ball Epidemics on networks
In this talk we consider two extensions of the standard stochastic epidemic SIR (Susceptible-Infected-Recovered) on a configuration model network.  The first extension, which is joint work with Peter Neal (Lancaster University), incorporates casual contacts, i.e. with people chosen uniformly at random from the population.  The second extension, which is joint work with Tom Britton (Stockholm University) and David Sirl (University of Nottingham), involves the spread of an epidemic on a random network model which allows for tunable clustering,  degree correlation and degree distribution.  For each model, approximating branching processes are used to obtain a threshold parameter, which determines whether or not an epidemic with few initial infectives can become established and lead to a major outbreak, and the (approximate) probability and relative final size of a major outbreak.  For the model with casual contacts, an embedding argument is used to derive a central limit theorem for the size of a major epidemic; similar methods lead to the asymptotic variance of the giant component in a configuration model random graph.  The theory is illustrated by numerical studies, which explore the impact of network properties on the outcome of an epidemic.
SNAW04 14th December 2016
11:15 to 12:00
Veronica Vinciotti Sparse Gaussian graphical models for dynamic gene regulatory networks
Co-authors: Luigi Augugliaro (University of Palermo), Antonino Abbruzzo (University of Palermo), Ernst Wit (University of Groningen)

In this talk, I will present a factorial Gaussian graphical model for inferring dynamic gene regulatory networks from genomic high-throughput data. The model allows including dynamic-related equality constraints on the precision matrix as well as imposing sparsity constraints in the estimation procedure. I will discuss model selection and present an application on a high-resolution time-course microarray data from the Neisseria meningitidis bacterium, a causative agent of life-threatening infections such as meningitis. The methodology described in this paper is implemented in the R package sglasso, freely available from CRAN.
SNAW04 14th December 2016
14:00 to 14:45
Pierre Borgnat Networks as signals: Extraction of dynamical network structures
Joint work with Ronan Hamon (LIF, Marseille, France), P. Flandrin (CNRS, LP, ENS de Lyon, France) and C. Robardet (LIRIS, INSA de Lyon, France)
We have proposed  a new framework to track the structure of temporal networks, using a signal processing approach: the method is based on the duality between static networks and signals using a multidimensional scaling technique. For temporal networks, it enables a tracking of the network structure over time. To extract the most significant patterns of the networks and their activation over time, we use nonnegative matrix factorization of the temporal spectra. This framework, inspired by audio decomposition, allows transforming back these frequency patterns into networks, so as to highlight the evolution of the underlying structure of the network over time. The effectiveness of the method is first evidenced on a toy example, prior being used to study a temporal network of face-to-face contacts. The extraction of sub-networks highlights significant structures decomposed on time intervals.
SNAW04 14th December 2016
14:45 to 15:30
Marianna Pensky Non-parametric methods for the dynamic stochastic block model and the time-dependent graphon
The Dynamic Stochastic Block Model (DSBM) and the dynamic graphon are natural extensions of the, respectively, Stochastic Block Model and the graphon, from the time-independent to the time-dependent setting. The objective of the present talk is estimation of the tensor of the connection probabilities  when it is generated by the  DSBM  and the dynamic graphon. In particular, in the context of the DSBM, under very few simple non-parametric assumptions,  we derive a penalized least squares  estimator  and show that it satisfies an oracle inequality and also attains the minimax lower bounds for the risk.  We extend  those results to estimation in the context of the dynamic graphon. The estimators   are adaptive to the unknown number of blocks in the context of DSBM or of the smoothness of the graphon function.  The technique relies on the vectorization of the model and leads to to much simpler mathematical arguments  than the ones used previously in the stationary set up. In addition, all our results are non-asymptotic and allow a variety of  extensions.

SNAW04 14th December 2016
16:00 to 16:45
Sidney Resnick Multivariate power laws in a preferential attachment network model; model calibration
We begin with a review of the multivariate regular variation of in- and out-degree in a preferential attachment model. The problem can be approached in a variety of ways: (i) Multivariate Tauberian theory; (ii) Direct approach via asymptotics to get a limit measure; (iii) proving multivariate regular variation of the limiting mass function of normalized in- and out-degree. We then turn to model calibration comparing various information sources and methods. If a full history of network growth is available, full MLE implementation is possible and performs well on simulated data. If a single snapshot in time is all that is available, then approximate MLE can be used. Comparison with MLE and use of asymptotic methods relying on heavy tail estimators is also made and predictably there is a trade-off between robustness and accuracy. Methods generally perform well on simulated data but real data creates problems with model error and we can illustrate this with wikipedia data obtained from Konect.
SNAW04 14th December 2016
16:45 to 17:30
Peter Orbanz Random walk models of network formation
SNAW04 15th December 2016
09:30 to 10:30
Andrew Nobel Mining differential correlation
Given data obtained under two sampling conditions, it is often of interest to identify variables that behave differently in one condition than in the other. The talk will describe a method for differential analysis of second-order behavior called Differential Correlation Mining (DCM). DCM is a special case of differential analysis for weighted networks, and is distinct from standard analyses of first order differential behavior, for example studies of differential expression.

The DCM method identifies differentially correlated sets of variables, with the property that the average pairwise correlation between variables in a set is higher under one sample condition than the other. DCM is based on an iterative testing procedure that adaptively updates the size and elements of a candidate variable set. Updates are performed via hypothesis testing of individual variables, based on the asymptotic distribution of their average differential correlation. The method does not assume that the sample or population correlation matrices are sparse, or have any particular structure.

I will present both simulation results and applications of DCM to genomics and brain imaging.  As time permits, I will also present a brief overview of some additional network related work being done with collaborators at UNC.

SNAW04 15th December 2016
11:15 to 12:00
Hanna Wallach Bayesian Poisson Tensor Decomposition for International Relations
Like their inhabitants, countries interact with one another: theyconsult, negotiate, trade, threaten, and fight. These interactions areseldom uncoordinated. Rather, they are connected by a fabric ofoverlapping communities, such as security coalitions, treaties, tradecartels, and military alliances. A single country can belong tomultiple communities, reflecting its many overlapping identities, andcan engage in both within- and between-community interactions,depending on the capacity in which it is acting. In this talk, I willintroduce two tensor decomposition models for modeling interactionevents of the form "country i took action a toward country j at timet." The first model (Bayesian Poisson CP decomposition) discoverscoherent threads of events, characterized by sender countries,receiver countries, action types, and time steps; the second model(Bayesian Poisson Tucker decomposition) discovers latentcountry--community memberships, including the number of latentcommunities, as well as directed community--community interactionnetworks that are specific to "topics" of similar action types. I willdemonstrate that these models infer interpretable latent structuresthat conform to and inform our knowledge of international relations.Many existing models for discrete data (such as networks and text) arespecial cases of these models, including infinite relational models,stochastic block models, and latent Dirichlet allocation. As a result,Bayesian Poisson tensor decomposition is a general framework foranalyzing and understanding discrete data sets in the social sciences.
SNAW04 15th December 2016
14:00 to 14:45
Sharad Goel The Effect of Recommendations on Network Structure
Co-authors: Jessica Su (Stanford University), Aneesh Sharma (Twitter)

Online social networks regularly offer users personalized, algorithmic suggestions of whom to connect to. Here we examine the aggregate effects of such recommendations on network structure, focusing on whether these recommendations increase the popularity of niche users or, conversely, those who are already popular. We investigate this issue by empirically and theoretically analyzing abrupt changes in Twitter's network structure around the mid-2010 introduction of its "Who to Follow" feature. We find that users across the popularity spectrum benefitted from the recommendations; however, the most popular users profited substantially more than average. We trace this "rich get richer" phenomenon to three intertwined factors. First, as is typical of network recommenders, the system relies on a "friend-of-friend"-style algorithm, which we show generally results in users being recommended proportional to their degree. Second, we find that the baseline growth rate of users is sublinear in degree. This mismatch between the recommender and the natural network dynamics thus alters the structural evolution of the network. Finally, we find that people are much more likely to respond positively to recommendations for popular users -- perhaps because of their greater name recognition -- further amplifying the cumulative advantage of well-known individuals.

SNAW04 15th December 2016
14:45 to 15:30
Peter Mörters The spread of infections on evolving scale-free networks
We study the contact process on a class of evolving scale-free networks, where each node updates its connections at independent random times. We give a rigorous mathematical proof that there is a transition between a phase where for all infection rates the infection survives for a long time, at least exponential in the network size, and a phase where for sufficiently small infection rates extinction occurs quickly, at most polynomially in the network size. The phase transition occurs when the power-law exponent crosses the value four. This behaviour is in contrast to that of the contact process on the corresponding static model, where there is no phase transition, as well as that of a classical mean-field approximation, which has a phase transition at power-law exponent three. The new observation behind our result is that temporal variability of networks can simultaneously increase the rate at which the infection spreads in the network, and decrease the time which the infection spends in metastable states.

This is joint work with Emmanuel Jacob (ENS Lyon).
SNAW04 15th December 2016
16:00 to 16:45
Catherine Matias Statistical clustering of temporal networks through a dynamic stochastic block model
Co-author: Vincent MIELE (CNRS / LBBE / Univ. Lyon 1)

Statistical node clustering in discrete time dynamic networks is an emerging field that raises many challenges. Here, we explore statistical properties and frequentist inference in a model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time. We model binary data as well as weighted dynamic random graphs (with discrete or continuous edges values). Our approach,motivated by the importance of controlling for label switching issues across the different time steps, focuses on detecting groups characterized by a stable within group connectivity behavior. We study identifiability of themodel parameters, propose an inference procedure based on a variational expectation maximization algorithm as well as a model selection criterion to select for the number of groups. We carefully discuss our initialization strategy which plays an important role in the method and compare our procedure with exi sting ones on synthetic datasets. We also illustrate our approach on dynamic contact networks, one of encounters among high school students and two others on animal interactions. An implementation of the method is available as a R package called dynsbm.

SNAW04 16th December 2016
09:30 to 10:30
George Michailidis Vector Autoregressive based Network Models
Vector autoregressions represent a popular class of time series models that aim to capture temporal interconnections between temporally evolving
entities.  They have been widely used in macroeconomic and financial modeling and more recently they have found novel applications in functional genomics and neuroscience. In this presentation, we discuss modeling and estimation issues in the high dimensional setting under different constrains
on the transition matrices - sparsity, low rankness. We also provide extensions to multi-layer networks and illustrate the results with application​s
to financial stability monitoring and biological regulation.
SNAW04 16th December 2016
11:00 to 11:45
Miklos Racz Finding and hiding the seed
I will present an overview of recent developments in source detection and estimation in randomly growing graphs and diffusions on graphs. Can one detect the influence of the initial seed graph? How good are root-finding algorithms? Can one engineer messaging protocols that hide the source of a rumor? I will explore such questions in the talk. This is based on joint works with Sebastien Bubeck, Ronen Eldan, Elchanan Mossel, Jacob Richey, and Tselil Schramm.
SNAW04 16th December 2016
11:45 to 12:30
Codina Cotar Edge- and vertex-reinforced random walks with super-linear reinforcement on infinite graphs
Co-author: Debleena Thacker (Lund University)

In this talk we introduce a new simple but powerful general technique for the study of edge- and vertex-reinforced processes with super-linear reinforcement, based on the use of order statistics for the number of edge, respectively of vertex, traversals. The technique relies on upper bound estimates for the number of edge traversals, proved in a different context by Cotar and Limic [Ann. Appl. Probab. (2009)] for finite graphs with edge reinforcement. We apply our new method both to edge- and to vertex-reinforced random walks with super-linear reinforcement on arbitrary infinite connected graphs of bounded degree. We stress that, unlike all previous results for processes with super-linear reinforcement, we make no other assumption on the graphs.

For edge-reinforced random walks, we complete the results of Limic and Tarres [Ann. Probab. (2007)] and we settle a conjecture of Sellke [Technical Report 94-26, Purdue University (1994)] by showing that for any reciprocally summable reinforcement weight function w, the walk traverses a random attracting edge at all large times.

For vertex-reinforced random walks, we extend results previously obtained on Z by Volkov [Ann. Probab. (2001)] and by Basdevant, Schapira and Singh [Ann. Probab. (2014)], and on complete graphs by Benaim, Raimond and Schapira [ALEA (2013)]. We show that on any infinite connected graph of bounded degree, with reinforcement weight function w taken from a general class of reciprocally summable reinforcement weight functions, the walk traverses two random neighbouring attracting vertices at all large times.

SNAW04 16th December 2016
13:30 to 14:15
Fengnan Gao On the Statistical Estimation of the Preferential Attachment Network Model
The preferential attachment (PA) network is a popular way of modeling the social networks, the collaboration networks and etc. The PA network model is an evolving network model where new nodes keep coming in. When a new node comes in, it establishes only one connection with an existing node. The random choice on the existing node is via a multinomial distribution with probability weights based on a preferential function f on the degrees. f maps the natural numbers to the positive real line and is assumed apriori non-decreasing, which means the nodes with high degrees are more likely to get new connections, i.e. "the rich get richer". We proposed an estimator on f. We show, with techniques from branching process, our estimator is consistent. If f is affine, meaning f(k) = k + delta, it is well known that such a model leads to a power-law degree distribution. We proposed a maximum likelihood estimator for delta and establish a central limit result on the MLE of delta.  If f belongs to a parametric family no faster than linear, we show the MLE will also yield optimal performance with the asymptotic normality results.  We will also talk about the potential extensions of the model (with borrowed strength from nonparametric Bayesian statistics) and interesting applications.
This is joint work with Aad van der Vaart.
SNAW04 16th December 2016
14:15 to 15:00
Pavel Krivitsky Modeling and Simulation of Dynamic Networks using Egocentrically-Sampled Data
In spread of infections over a sexual or other contact networks, the timing of the contacts can be as important as their cross-sectional structure. However, their modeling and simulation is complicated by the difficulty of collecting data about these networks. Rather than the more traditional panel (repeated observations) or event data, in which all individuals are identified, these networks are often observed only in the form of an egocentric survey: a sample of individuals in the network reporting non-identifying demographic information (e.g., age, sex, race/ethnicity) about their contacts, as well as and contact history (e.g., start and end of past contacts).

This work develops a generalized method of moments approach to simulation and inference for dynamic networks models from such data by using the models' long-run properties, and proposes a network-size invariant parametrization to facilitate using these models to simulate populations with changing sizes and compositions.

These techniques are applied to egocentric data from the 1992 US National Health and Social Life Survey, and other applications are demonstrated as well, produced in collaboration Martina Morris and Steven Goodreau and others.
SNAW04 16th December 2016
15:30 to 16:15
Patrick Wolfe tba