Random Horn Formulas and Propagation Connectivity for Directed Hypergraphs †

We consider the property that in a random definite Horn formula of size-3 clauses over n variables, where every such clause is included with probability p, there is a pair of variables for which forward chaining produces all other variables. We show that with high probability the property does not hold for p = (5 1n ln n)/(n ln n).


Introduction
Horn formulas are a subclass of CNF expressions, where every clause contains at most one unnegated variable.This class is tractable in the sense that many problems that are hard for CNF expressions in general are polynomially solvable for Horn formulas (such as satisfiability and equivalence).It is partly for this reason that Horn formulas are of basic importance in artificial intelligence and other areas.Random Horn formulas have been studied in [DBC01,DV06,Ist02,LMST09,MIDV07].
A Horn formula is definite if it consists of clauses containing exactly one unnegated variable.We consider definite Horn formulas with clauses of size 3, i.e., with clauses of the form (ā ∨ b ∨ c), which can also be written as a, b → c.Here a and b form the body of the clause and c is the head of the clause.Implication between a definite Horn formula ϕ and a definite Horn clause C can be decided by forward chaining: mark variables in the body of C and, while there is a clause in ϕ with all its body variables marked, mark its head variable as well.Then C is implied by ϕ iff its head gets marked.
We consider random definite Horn formulas with clauses of size 3 over n variables, where every clause is included with probability p.It follows directly from the results of [LMST09] that p = (2 ln n)/n is a threshold probability for the following property: every pair of variables implies every other variable (see also [DBC01] for a related result).
In this paper we consider the property that some pair of variables implies every other variable.This property is closely related to the property of propagation connectivity for 3-uniform undirected hypergraphs, introduced recently by Berke and Onsjö [BO09a].They consider a marking process like forward chaining, except that now a vertex can be marked if it is contained in an edge whose other two vertices are already marked.A 3-uniform undirected hypergraph is propagation connected if there is a pair of vertices such that the marking process, starting from that pair, marks every vertex.Berke and Onsjö showed that for p < 1/(n(log n) 2 ) random hypergraphs are a.a.s.not propagation connected [BO09a] and for p > 1/(n(log n) 0.4 ) random hypergraphs are a.a.s.propagation connected [BO09b].The first result proves a lower bound for the transition from random hypergraphs being a.a.s.not propagation connected to random hypergraphs being a.a.s.propagation connected, and the second result proves an upper bound for the transition.We use the terms lower and upper bound in a similar sense throughout the paper.
The Horn formula property mentioned above is equivalent to propagation connectivity for directed 3uniform hypergraphs, where by a directed hypergraph we mean a hypergraph with each edge having a distinguished vertex called its head, and the other vertices called its body.(This is one of the possible definitions of a directed hypergraph.There are several other variants.)In the rest of the paper we use the terminology of propagation connectivity for directed hypergraphs instead of Horn formulas.We show that random directed 3-uniform hypergraphs for p ≤ 1/(11n ln n) are a.a.s.not propagation connected and for p ≥ (5 ln ln n)/(n ln n) are a.a.s.propagation connected.The proofs are based on two versions of the "fanning-out process" (see, e.g., [Kar90,JŁR00]).For the upper bound we start the process by exploring a subset of the vertices and finding a maximal degree pair within that subset.
For the undirected hypergraph version of the problem, Coja-Oghlan, Onsjö and Watanabe concurrently and independently proved lower and upper bounds, both of the order 1/(n ln n) [COOW10].It appears that their argument can be adapted to yield order 1/(n ln n) lower and upper bounds in the directed case as well.The proofs we present here are simpler.
The lower and upper bounds are presented in Section 3 and 4. In the closing section we mention a few open problems.

Preliminaries
We consider 3-uniform directed hypergraphs H with directed edges of the form u, v → w.The pair (u, v) is the body of the edge and w is the head of the edge.Note that the body is an unordered pair.The degree of a pair (u, v) is the number of vertices w that form an edge u, v → w with the pair.We refer to vertex w as a successor of (u, v).The (u, v)-propagation connected component (or simply (u, v)-component) of H is the set of vertices marked by the marking process starting with (u, v).
The probability model where a random directed hypergraph is formed over the vertex set [n] = {1, . . ., n} by including each edge u, v → w independently with probability p is denoted by DH(n, p).For any monotone increasing property of directed hypergraphs the probability that the property holds for a random directed hypergraph drawn from DH(n, p) is a monotone non-decreasing function of p (see [Bol01, Th.2.1]).
We use the following versions of the Chernoff bounds [JŁR00].

A lower bound
We first give a lower bound for probabilities p such that a random directed hypergraph from DH(n, p) is a.a.s.propagation connected.
Theorem 3.1 Let p ≤ 1/(11n ln n).In a random directed hypergraph from DH(n, p) a.a.s.every propagation connected component has size at most 11 ln n.
Proof: By monotonicity we may assume p = 1/(11n ln n).The following process is used to explore H ∈ DH(n, p).Start with two sets A 0 = {u, v} and B 0 = ∅.The sets A i and B i represent the sets of discovered vertices and saturated pairs at iteration i respectively, and put Find every edge u i , v i → w where w ∈ A i−1 .Construct the set A i so that it contains all vertices in set A i−1 plus all vertices w, where w is the head of an edge that was found in step i.Construct the set B i by When every pair in A i is saturated, we have discovered all the vertices in the component, and from then on we put A j = A i , B j = B i for every j > i.
We need to show that this process stabilizes after a small number of steps with high probability.Define X i to be the number of successors, in V A i−1 , of the pair (u i , v i ) to be saturated.Each edge with body (u i , v i ) and head in V A i−1 is in the hypergraph with probability p, independently of the presence or absence of any other edge.Furthermore each such edge is considered at most once in the process.Thus Let k = 11 ln n .If the process generates at least k vertices then this must happen in the first k−1 2 iterations.Thus the probability of generating at least k vertices is at most Let X + i ∈ BIN(n, p) and replace the upper limit in the summation (1) by k 2 for convenience.Then, noting that ( k 2 ) i=1 X + i ∈ BIN k 2 n, p and as such has mean k 2 np, the probability (1) can be upper bounded by Using the values of p and k we note that np , which implies the theorem by the union bound. 2

An upper bound
In this section we give a sufficient condition for probabilities p such that a random directed hypergraph from DH(n, p) is a.a.s.propagation connected.
Theorem 4.1 For p ≥ (5 ln ln n)/(n ln n) a random directed hypergraph from DH(n, p) is a.a.s.propagation connected.
Proof: By monotonicity we may assume p = (5 ln ln n)/(n ln n).We use a modification of the process described above.First we consider all edges over the first n/4 vertices and find a highest-degree pair (u, v) in that subset.Starting from the successors of that pair we find a sufficiently large part of the rest of the component using a variant of the original process organized into phases as follows.
Let m = (ln n)/(ln ln n) and assume that we found a pair (u, v) with m successors w 1 , . . ., w m among the first n/4 vertices.Let A 0 = {w 1 , . . ., w m } be the initial set of discovered vertices and let C 0 be the (3/4)n vertices not considered so far, forming the initial set of available vertices.In iteration i of the new process we pick an arbitrary set D i−1 ⊆ C i−1 of n/2 available vertices, and we find all edges u, v → w, where u, v ∈ A i−1 and w ∈ D i−1 .If there are at least m distinct successors in D i−1 then let A i be any m of these and put C i = C i−1 \ A i .Otherwise let A j = A i−1 for every j ≥ i.We run this process for ln n ln ln n iterations.
The following lemma, analogous to bounds for graphs (see [Bol01, Ch.3]), gives a bound for the maximal degree of a pair in H ∈ DH(n, p).This lemma is stated for the smaller and simpler probability 1/(n ln n), but applies also to larger p by monotonicity.Lemma 4.2 If p = 1/(n ln n), then the maximum degree of H ∈ DH(n, p) is a.a.s. at least (ln 4n)/(ln ln 4n).
Proof: Let d = (ln 4n)/(ln ln 4n) and let the random variable Y ij be the number of successors of pair (i, j) in H. Then Y ij ∈ BIN (n − 2, p) and since we are dealing with directed edges, the variables Y ij are independent.Thus the probability that every degree is smaller than d is if n is sufficiently large.For the last inequality we used the fact that The expression on the right tends to infinity, since the first term tends to infinity, the second term is positive and the third term has a constant limit. 2 We also use a version of a lemma of [BO09b] showing that a.a.s.every component is either small or contains every vertex.Such a statement holds for several probabilities p, but we state it here for p = (5 ln ln n)/(n ln n), as this property is not monotone.This lemma is similar to the gap theorem in [Kar90] and its proof is included for completeness.Proof: If a set of vertices is a propagation connected component then there can be no edges with body in the component and head outside.Thus the probability that there is a component of size k is at most We show that for As p k 2 = Ω (n ln ln n/ ln n), the probability is upper bounded by exp{−Ω (n ln ln n/ ln n)}.Else (ln n) 2 ≤ k < n/2 and the analogous calculation gives the upper bound Here n − k can be replaced by n/2 and then substituting the values of p and k we can lower bound ln k + p(k − 1)(n − k)/2 − (ln n + 1) by Ω (ln n).Since k ≥ (ln n) 2 we get an upper bound of the form exp{−Ω (ln n) 3 }. 2 Returning to the proof of Theorem 4.1, let us say that we are successful if we find a pair of degree m among the first n/4 vertices, we can run the iterative process for ln n ln ln n iterations, always finding m new vertices, and the event described in Lemma 4.3 occurs.In this case, after the last iteration we found a component of size (ln n) 2 , and by Lemma 4.3 the hypergraph is propagation connected.
The number Z i of edges added in the ith iteration has distribution BIN m Since we saturate more than one pair in an iteration it is possible that the same vertex is discovered by more than one edge.The probability of such a conflict is at most (3) Lemma 4.3 ([BO09b]) If p = (5 ln ln n)/(n ln n) then a.a.s.every propagation connected component has either size n or size less than (ln n) 2 .