skip to content

Timetable (LAAW02)

Logic and Databases

Monday 27th February 2006 to Friday 3rd March 2006

Monday 27th February 2006
08:30 to 09:30 Registration
09:30 to 11:00 The automaton approach to XML schema languages: from practice to theory INI 1
11:00 to 11:30 Coffee
11:30 to 12:30 Probabilities in databases and in logics I

This talk discusses two applications of probabilities in databases, and their corresponding problems in probabilistic logic. The first application is managing imprecise data: here each tuple in the database becomes a probabilistic event, and query evaluation reduces to the problem of computing the probability that a sentence is true. This is the model that has been used in "probabilistic databases" over the last twenty years. I will discuss several aspects of conjunctive query evaluation, including the data complexity (#P-complete) and a provably efficient approximation algorithms for top-k queries. The second application is to manage incompleteness: here the database is not known, but instead is learned through a set of observations, such as views or statistics. Examples include query answering using views, measuring information leakage, and cardinality estimation from data statistics. Under the Baysian approach to incompleteness a prior probability distribution is assumed on all possible databases and the set of observations are used to condition the prior. Query evaluation reduces to the problem of computing a conditional probability: in fact, for practical applications we need to compute the limit of this conditional probability when the domain size growth to infinity. I will describe an explicit method for computing this limit conditional probability for conjunctive queries, then present the complexity of various decision problems associated to it. This is joined work with Nilesh Dalvi.

12:30 to 13:30 Lunch at Churchill College
14:00 to 15:00 Foundations of Schema mappings I INI 1
15:00 to 15:30 Tea
15:30 to 16:15 V Sazonov ([Liverpool])
Querying hyperset/web-like databases

A hyperset approach to Web-like (semistructured) databases and query language Delta are presented in a simple, intuitive and quite precise way, hopefully for a wider audience, with demonstrating how this abstract view can be represented in concrete terms of set equations allowing cycles. This approach answers the question on how expressive power of querying to semistructured databases can be characterised both in terms of descriptive complexity and by some demonstrating examples such as defining in terms of Delta the formal semantics of path expressions, as well as of other known languages UnQL and UnCAL.

16:15 to 17:00 Concentration bounds for Markov processes of metafinite models

A formal model of discrete dynamical systems with probabilistic updates is presented. States of a system are metafinite models, and update rules are essentially Abstract State Machine rules augmented with a random function. Given an initial state, a sequence of updates generates a trajectory in the phase space of the system. It is shown that if the states in the trajectory satisfy certain locality conditions, then logically definable random variables of the states are concentrated around their expectation. That is, with high probability, their values are close to their average value.

17:00 to 17:45 XPath with transitive closure

We determine the expressivity of the expansion of Core XPath with a reflexive transitive closure operator. This expansion is known under the name "Regular XPath". It allows us to write expressions as (child::*/child::*)+ meaning "give me all nodes which are an even number of steps below me", and filter expressions like "I have an even number of descendants". The transitive closure operator is NOT part of the XPath 1.0 or 2.0 standards, but is available in the XSLT processors SAXON and XALAN.

Regular XPath can be seen as a variant of Propositional Dynamic logic, with four programs (child, parent, left, right) and propositional variables corresponding to XML tag names. Core XPath corresponds to a fragment of PDL in which the Kleene star may only be applied to atomic programs.

Interestingly, in order to obtain a nice characterization of Regular XPath in terms of first order logic expanded with transitive closure, we need to add something more to Core XPath. This extra addition can be phrased in a number of different ways. One is the following: if P and Q are path expressions, then P=Q is a filter expression. P=Q is true at a node n iff there exists a node m which is reachable from n by both P and Q. At present we do not know if this extra addition is really needed.

17:45 to 18:45 Beer and Wine Reception
Tuesday 28th February 2006
09:30 to 10:15 Cardinality-based semantics for consistent query answering: incremental and parameterized complexity

Consistent Query Answering (CQA) is the problem of computing from a database the answers to a query that are consistent with respect to certain integrity constraints, that the database, as a whole, may fail to satisfy. Consistent answers have been characterized as those that are invariant under certain minimal forms of restoration of the consistency of the database.

In this paper we investigate algorithmic and complexity theoretic issues of CQA under database repairs that minimally depart -wrt the cardinality of the symmetric difference- from the original database. Research on this kind of repairs had been suggested in the literature, but no systematic study had been done. Here we obtain first tight complexity bounds.

We also address, considering for the first time a dynamic scenario for CQA, the problem of incremental complexity of CQA, that naturally occurs when an originally consistent database becomes inconsistent after the execution of a sequence of update operations. Tight bounds on incremental complexity are provided for various semantics under denial constraints, e.g. (a) minimum tuple-based repairs wrt cardinality, (b) minimal tuple-based repairs wrt set inclusion, and (c) minimum numerical aggregation of attribute-based repairs. Fixed parameter tractability is also investigated in this dynamic context, where the size of the update sequence becomes the relevant parameter.

Related Links

10:15 to 11:00 Computational problems of data exchange

Data Exchange is the problem of inserting data structured under a source schema into a target schema of different structure (possibly with integrity constraints), while reflecting the source data as accurately as possible. We study computational issues related to data exchange in the setting of Fagin, Kolaitis, and Popa (PODS´03). In particular, we are interested in computing a particular universal solution to a data exchange problem, called "the core". We present polynomial algorithms for computing the core of large classes of data exchange problems, solving relevant open problems by Fagin et al. Our results show that data exchange with cores is feasible in a very general framework. Furthermore, we use the technique of hypertree decompositions to derive improved algorithms for computing the core of a relational instance with labeled nulls, a problem we show to be fixed-parameter intractable with respect to the block size of the input instances. Finally, we show that computing cores is NP-hard in presence of a system-predicate NULL(x), which is true if x is a null value.

Part of this work is joint work with Alan Nash, UCSD.

11:00 to 11:30 Coffee
11:30 to 12:30 Probabilities in databases and in logics II INI 1
12:30 to 13:30 Lunch at Churchill College
14:00 to 15:00 Foundations of Schema mappings II INI 1
15:00 to 15:30 Tea
15:30 to 16:15 Context logic and tree update

Spatial logics have been used to describe properties of tree-like structures (Ambient Logic) and in a Hoare style to reason about dynamic updates of heap-like structures (Separation Logic). A natural question is whether the Ambient Logic can be used to reason about tree update (XML update). Calcagno, Zarfaty and I have shown that this is not possible. Instead, we had to fundamentally change the way we reason about structured data by introducing Context Logic. Local data update typically identifies the portion of data to be replaced, removes it, and inserts the new data in the same place. With Context Logic, we can reason about both data and this place of insertion (contexts). We have shown that Context Logic can be used to develop local Hoare reasoning about tree update, heap update (analogous to Separation Logic reasoning), and term rewriting (which escaped reasoning using Separation Logic).

In this talk, I will motivate Context Logic using our local Hoare reasoning about tree update (XML update). However, my main goal is to explain some expressivity results. With our style of Hoare reasoning, certain so-called adjunct connectives play an essential role in describing the weakest preconditions (important for both the theory and tool development). These connectives can be eliminated in the tree case (without quantifiers). This result follows previous results on Separation Logic and Ambient Logic, by Lozes first and then Dawar, Gardner and Ghelli. Such results are interesting, since there is typically no simple translation from a formula to its adjunct-free equivalent and, in the case of the Ambient Logic with hiding, the task of finding the adjunct-free formulae is not computable.

Despite these elimination results, our intuition has always been that the adjunct connectives are essential for Hoare reasoning. The point is that the weakest preconditions are parametric with respect to a formula variable. We have recently developed inexpressibility results, which characterise what it means for a logic to be unable to express such parametric properties. We have not seen this style of parametric inexpressibility result before, and believe it is an important general concept that will be applicable to other styles of logical reasoning.

16:15 to 17:00 M Katz ([Haifa])
Approximation logic and databases

The Logic of Approximation is an infinite-valued first order logic, designed to handle inexactness in scientific research. Truth-values of this logic, ranging in the closed interval [0,1], are 'degrees of error'. Thus 0 (no error) stands for absolute truth and 1 (maximal error) for absolute falsehood. The semantic rules for connectives and quantifiers are reversals of familiar rules of fuzzy logics. Unique to this logic is a notion of deduction based on the idea that errors in the conclusion can be minimized (to within arbitrary epsilon) if errors in the premiss are controlled (to within appropriate delta). In the past this logic was developed and applied by the author to geometry, quantum theory, measurement theory and utility theory. In the present paper it is to be applied to the study of relational databases. In particular we shall try to show how to handle, within the framework of the logic of approximation, various properties of specific databases derived from batteries of intelligence tests and from surveys of social attitudes.

17:00 to 17:45 A normal form for singulary logic over physically realizable data models

We present a new normal form for first-order logic over bounded-degree structures which relies on the symmetry of labeled connections between elements of the structure. The key idea is to use vocabularies with only monadic predicate and monadic function symbols -- so called singulary logic [Church]. Our proof is effective and syntactic, not relying on games or other semantic characterizations. Linear-time computability of first-order queries becomes a direct corollary of this result. Included will be a discussion of why these data structures are the appropriate models for physically realizable databases.

[Church, A.] Introduction to Mathematical Logic, Princeton University Press, 1956.

Related Links

Wednesday 1st March 2006
09:15 to 10:15 Queries on tree-structured data: Logical languages and complexity

In this talk, I will survey recent results on the theory -- and in particular complexity -- of a number of popular logical query languages for tree-structured data and XML, including XPath, conjunctive and first-order queries, and XQuery.

10:15 to 11:00 BPQL - A query language for business processes

This talk presents BPQL, a novel graphical query language for querying Business Processes, implemented as a set of cooperating web services. BPQL is based on an intuitive model of business processes, an abstraction of the emerging BPEL (Business Process Execution Language) standard. It allows users to query business processes visually, in a manner analogous to how such processes are typically specified, and can be employed in a distributed setting, where process components may be provided by distinct providers(peers).

The talk describes the query language as well as its underlying formal model. Special emphasis is given to the following subjects: (a) The analogy between the specification and querying of business processes, and (b) the use of graph grammars to represent potentially infinite query results by a finite and concise representation. The current implementation, using Active XML and Web services is briefly described.

11:00 to 11:30 Coffee
11:30 to 12:30 Pebble tree walking automata and TC-logics on trees INI 1
12:30 to 13:30 Lunch at Churchill College
19:30 to 18:00 Conference Dinner in the New Senior Combination Room at Selwyn College
Thursday 2nd March 2006
09:15 to 10:15 The complexity of processing data streams and external memory data I INI 1
10:15 to 11:00 Ontology mediated data management

Ontologies provide a conceptualization of a domain of interest. Nowadays, they are typically represented in terms of Description Logics, and are seen as the key technology used to describe the semantics of information at various sites. The idea of using ontologies as a conceptual view over data repositories is becoming more and more popular. In order for this idea to become widespread in standard applications, it is fundamental that the conceptual layer through which the underlying data layer is accessed does not introduce a significant overhead in dealing with the data. Based on these observations, in recent years techniques to access data sources through a conceptual layer built on top of them have been developed, and the computational complexity of answering queries over ontologies, measured in the size of the underlying data, has been studied.

The talk is divided in two parts. The first part will present the general ideas underlying ontology mediated data management, and will discuss the tradeoff between expressive power of the ontology language, and efficiency in query processing. We will specifically address the possibility of delegating query processing over ontologies to database engines of increasing power (from relational, to linear recursion, to arbitrary recursion, to disjunctive Datalog). In the second part, we will provide a concrete example of a system for ontology mediated data management on top of a relational engine. The system is implemented in the prototype tool QuOnto.

The talk will be jointly given by Diego Calvanese and Giuseppe De Giacomo.

11:00 to 11:30 Coffee
11:30 to 12:30 Automatic verification of communicating data-aware web services

This talk presents recent work at UC San Diego on the verification of compositions of web service peers which interact asynchronously by exchanging messages. Each peer can be viewed as a data-aware reactive system. It has access to a local database and reacts to user input and incoming messages by performing various actions and sending messages. The reaction is described by queries over the database, internal state, user input and received messages. We consider two formalisms for specification of correctness properties of compositions, namely Linear Temporal First-Order Logic and Conversation Protocols. For both formalisms, we map the boundaries of verification decidability, showing that they include expressive classes of compositions and properties. We also address modular verification, in which the correctness of a composition is predicated on the properties of its environment. Finally, we present encouraging experimental results on verification of single peers (WAVE project). These suggest that automatic verification may be practically feasible for a large class of data-aware web services.

Related Links

12:30 to 13:30 Lunch at Churchill College
14:00 to 15:00 Database normalization revisited: an information-theoretic approach INI 1
15:00 to 15:30 Tea
15:30 to 16:15 S Szeider ([Durham])
On the clique-width of graphs

Clique-width is a graph parameter that measures in a certain sense the complexity of a graph. It is known that MSO_1 queries can be evaluated in polynomial time on classes of graphs with effectively bounded clique-width. Here MSO_1 denotes the fragment of Monadic Second Order Logic with second-order quantification on vertex sets. NP-hard problems like 3-colorability can be expressed as MSO_1 queries.

This talk will survey the concept of clique-width, its advantages and its limits. In particular, new NP-hardness results for the recognition of graphs with bounded clique-width will be presented; these results were recently obtained in joint work with M.R. Fellows, F. Rosamond, and U. Rotics.

    INI 1
    16:15 to 17:00 Complete axiomatisation for PDLpath

    PDLpath is a logic introduced in [1] to express path constraints on graphs of semistructured data. Its language contains one nominal r, which is true at the root node, standard PDL modalities, converse, and a new modality #, which stands for "some edge label". We assume that the language contains countably infinitely many atomic modalities or labels. We provide a complete axiomatisation of PDLpath and a new decidability proof for it (another proof is in [1]).

    [1] N. Alechina, S. Demri and M. de Rijke. A modal perspective on path constraints. Journal of Logic and Computation, 6(3), 2003, pages 939-956.

    INI 1
    17:00 to 17:45 M Arenas ([PUC Chile])
    Locally consistent transformations and query answering in data exchange

    Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema. Given a source instance, there may be many solutions - target instances that satisfy the constraints of the data exchange problem. Previous work has identified two classes of desirable solutions: canonical universal solutions, and their cores. Query answering in data exchange amounts to rewriting a query over the target schema to another query that, over a materialized target instance, gives the result that is semantically consistent with the source. A basic question is then whether there exists a transformation sending a source instance into a solution over which target queries can be answered.

    In this talk, we show that the answer is negative for many data exchange transformations that have structural properties similar to the canonical universal solution and the core. Namely, we prove that many such transformations preserve the "local" structure of the data. Using this notion, we further show that every target query rewritable over such a transformation cannot distinguish tuples whose neighborhoods in the source are similar. This gives us a first tool that helps check whether a query is rewritable.

    This is joint work with Pablo Barcelo, Ronald Fagin and Leonid Libkin.

    INI 1
    Friday 3rd March 2006
    09:15 to 10:15 The complexity of processing data streams and external memory data INI 1
    10:15 to 11:00 Temporal logics for n-ary queries over XML documents INI 1
    11:00 to 11:30 Coffee
    11:30 to 12:30 Law enforcement on hypergraphs INI 1
    University of Cambridge Research Councils UK
        Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons