An Isaac Newton Institute Workshop

Logic and Databases

Probabilities in Databases and in Logics

Author: Dan Suciu (University of Washington)

Abstract

This talk discusses two applications of probabilities in databases, and their corresponding problems in probabilistic logic. The first application is managing imprecise data: here each tuple in the database becomes a probabilistic event, and query evaluation reduces to the problem of computing the probability that a sentence is true. This is the model that has been used in "probabilistic databases" over the last twenty years. I will discuss several aspects of conjunctive query evaluation, including the data complexity (#P-complete) and a provably efficient approximation algorithms for top-k queries. The second application is to manage incompleteness: here the database is not known, but instead is learned through a set of observations, such as views or statistics. Examples include query answering using views, measuring information leakage, and cardinality estimation from data statistics. Under the Baysian approach to incompleteness a prior probability distribution is assumed on all possible databases and the set of observations are used to condition the prior. Query evaluation reduces to the problem of computing a conditional probability: in fact, for practical applications we need to compute the limit of this conditional probability when the domain size growth to infinity. I will describe an explicit method for computing this limit conditional probability for conjunctive queries, then present the complexity of various decision problems associated to it. This is joined work with Nilesh Dalvi.