The power of traditional genetics studies to identify the genetic determinants of diseases is limited by the fact that complex disease traits depend on small incremental contributions from many loci. Integrative functional genomics represents a relatively novel approach to the problem. The idea is to use hypotheses on the patho-physiological mechanisms underlying the studied disease to focus attention on a restricted collection of molecular pathways and corresponding inheritable molecular phenotypes. On each sampled individual, data information is collected both at a DNA sequence level, within or around candidate genes, as well as at a clinical phenotype and at a molecular phenotype level.
We propose a general statistical framework for the design and analysis of functional genomics studies of the above kind. Our approach uses a directed graph representation of a probability model of the problem, incorporating "intervention nodes" for formal reasoning about causes and effects of causes, as proposed by Dawid. In fact, meaningful biological questions can often be formulated in terms of effects of specific interventions, for example, the effect of blocking a certain receptor by a drug. Our approach involves mapping available biological knowledge onto the graph, using graph semantics and conditional independence reasoning to formulate meaningful scientific questions, and identifying appropriate experimental designs for answering those questions. Finally, the graph can be used as a computational framework for estimating the quantities of interest.
The method will be illustrated with the aid of our study on the effect of platelet sensitivity to thrombotic occlusive events.