On Sampling Colorings of Bipartite Graphs †

We study the problem of efficiently sampling k-colorings of bipartite graphs. We show that a class of markov chains cannot be used as efficient samplers. Precisely, we show that, for any k, 6 ≤ k ≤ n^\1/3-e \, e > 0 fixed, \emphalmost every bipartite graph on n+n vertices is such that the mixing time of any markov chain asymptotically uniform on its k-colorings is exponential in n/k^2 (if it is allowed to only change the colors of O(n/k) vertices in a single transition step). This kind of exponential time mixing is called \emphtorpid mixing. As a corollary, we show that there are (for every n) bipartite graphs on 2n vertices with Δ (G) = Ω (\ln n) such that for every k, 6 ≤ k ≤ Δ /(6 \ln Δ ), each member of a large class of chains mixes torpidly. While, for fixed k, such negative results are implied by the work of CDF, our results are more general in that they allow k to grow with n. We also show that these negative results hold true for H-colorings of bipartite graphs provided H contains a spanning complete bipartite subgraph. We also present explicit examples of colorings (k-colorings or H-colorings) which admit 1-cautious chains that are ergodic and are shown to have exponential mixing time. While, for fixed k or fixed H, such negative results are implied by the work of CDF, our results are more general in that they allow k or H to vary with n.


Introduction
We consider the problem of efficiently sampling uniformly at random the k-colorings of a graph.This problem is related to the corresponding problem of counting the number of such structures which arise in statistical physics (see the survey of [9]).The Monte Carlo Markov chain (MCMC) approach has been very successful in obtaining efficient samplers for this problem under various assumptions about the input graph.This naturally leads to the question of how far can the MCMC approach take us.In this paper, we present some negative results on obtaining efficient samplers for colorings based on markov chains.
Several heuristics have been proposed and analyzed for efficiently sampling k-colorings [8,13,1,5,16].The currently best known positive result is due to Vigoda [16] who showed that the chains associated with the so-called (WSK) heuristic and the Glauber dynamics both mix in polynomial time for arbitrary G provided k ≥ (11/6)∆(G) where ∆(G) denotes the maximum degree of G. See [3,11,6,7] for further improvements on special classes of graphs.
For arbitrary G, the focus has been on k ≥ ∆ because, for k < ∆, it is not even guaranteed that G has a k-coloring and moreover determining if it has such a coloring is N P -complete.But, a bipartite G always has a k-coloring for k ≥ 2 and a natural question is whether MCMC approach succeeds for k < ∆.In view of the previous successes, it appears to be likely.However, no such positive result has appeared so far and only some negative results have been obtained.
For example, recently, Luczak and Vigoda [10] showed that WSK heuristic has exponential mixing time for a family of ∆-regular (constant ∆) bipartite graphs G(k, n) when 3 ≤ k ≤ ∆/20 log ∆.Such chains with exponential mixing times are said to be torpidly mixing.
Similarly, the negative results of Cooper, Dyer and Frieze [2] on sampling H-colorings imply that for every fixed k ≥ 3, there exist constants ∆ = ∆(k), δ = δ(k) > 0, τ = τ (k) > 1 and ∆-regular bipartite graphs G (on 2n vertices) such that for every δn-cautious ergodic chain M which is uniform on k-colorings of G, its mixing time is at least τ n .Here, by a δn-cautious chain, we mean a chain which does not change the colors of more than δn vertices in a single transition.But these results are true only for fixed values of k and the arguments do not seem to carry over when k is allowed to grow with n.
In this paper, we further strengthen these results by establishing that, for any s = Ω(ln n) and any k, 6 ≤ k ≤ s/(6 ln s), for almost every bipartite graph on 2n vertices with ∆(G) ≈ s, the mixing time of every member of a large class of chains for sampling k-colorings is exponentially large in n.This result is derived as a corollary of a more general statement (given below) we prove.We use C k (G) to denote the set of all (labelled) k-colorings of G.
The following is true for every sufficiently large n and for every k, Then, with probability at least 1 − e −n/(k 2 ln n) , G satisfies the following : For any dn-cautious ergodic markov chain M(G) which is asymptotically uniform on C k (G), its mixing time is Ω(e γn ).
In this, by the phrase "G ∈ G(n, n, p)", we mean a random bipartite graph formed between two sets A and B of equal size n.Each edge joining A and B is chosen independently with probability p.Note that np is essentially the average degree of a vertex in G.
Note that Theorem 1.1 allows k to grow with n and does not require it to be fixed.In fact, we can even suitably choose the random model so that G is a random element drawn from specific classes (see Section 5 for more details).Thus, for example, we can assume that G is a random graph of maximum degree ≈ s where s is any suitably large value.Similarly, we can assume that G = K n,n .In particular, any (n/8k)-cautious chain (this includes the well-known Glauber dynamics) mixes torpidly on C k (G) if G = K n,n and k lies in the stated range. (i)  The overall approach of our proof is similar to that employed by Dyer, Frieze and Jerrum [4] and is based on showing that almost surely, C k (G) is such that one can obtain an exponentially small upper bound on the conductance of any dn-cautious chain.Sections 3 and 4 contain the proof of this theorem.
It is easy to check that Theorem 1.1 holds even if we redefine C k (G) to the set of those k-colorings of G in which some color is not used on any vertex of some partite set (A or B).For such colorings, even such simple chains as Glauber dynamics is ergodic on any bipartite graph.See Section 6 for further discussion.
We also present some explicit examples of colorings for which the 1-cautious chain Glauber dynamics can be shown both to be ergodic and to require exponential mixing time.These explicit examples are (i) Actually, for Kn,n, one can show, by slightly modifying an argument in [10], that Glauber dynamics is torpidly mixing for any illustrations of the applicability of Theorem 1.1.These examples are presented in Section 7.

Generalizations to H-colorings
A careful study of the proof of Theorem 1.1 shows that the proof can be modified to prove a similar theorem for a more general scenario of sampling H-colorings of bipartite graphs.The details are given below.
Given graphs G = (V, E) and H = (U, F ), a H-coloring of G is a mapping c : V → U such that (c(x), c(y)) is an edge of H whenever (x, y) is an edge of G.The standard proper k-coloring is a K k -coloring where K k is a simple complete graph on k vertices.We use C H (G) to denote the set of all Hcolorings of G.Note that if H has at least one edge, then any bipartite G admits a H-coloring and if H has no edges, then a bipartite G admits a H-coloring if and only if G also has no edges.Hence a H-coloring of a bipartite G (for arbitrary H) (or equivalently a starting point in C H (G)) can be found in polynomial time.Using the arguments of the proof of Theorem 1.1, we can prove the following generalization : The following is true for every sufficiently large n and for every k, 6 ≤ k ≤ n 1/3−ǫ , ǫ > 0 fixed but arbitrary : Let G ∈ G(n, n, p) with np ≥ 6k(ln k) and let H be any simple (without self-loops) graph on k vertices such that H has a K ⌈k/2⌉,⌊k/2⌋ as a subgraph.Then, with probability at least 1 − e −n/(k 2 ln n) , G satisfies the following : For any dn-cautious ergodic markov chain M(G) which is asymptotically uniform on C H (G), its mixing time is Ω(e γn ).
A sketch of the proof of this theorem is given in Subsection 3.1.Note that H is not required to be fixed and H and its size can vary with n.The results of [2] require H to be fixed.

Preliminaries
A discrete-time Markov chain M over a finite space Ω and with transition matrix P is said to be ergodic if there is a limiting distribution π : Ω → [0, 1] such that for all x, y ∈ Ω, Lim t→∞ P t (x, y) = π(y).Two conditions which together are sufficient to ensure ergodicity are (1) M is aperiodic, i.e., gcd{t : P t (x, y) > 0} = 1 for all x, y ∈ Ω, and (2) M is irreducible, i.e., for each x, y ∈ Ω, there is a finite t such that P t (x, y) > 0. But if M is ergodic and is asymptotically uniform over Ω, then M is also irreducible and aperiodic.The mixing time τ (ǫ), ǫ > 0, of M is the least integer t > 0 such that the L 1 distance between the t-th step distribution and π is at most 2ǫ, irrespective of the starting point.For our negative results, it is enough to prove a lower bound on the value of τ (1/e).
We use the following notion introduced in [4].For a graph G, let M(G) be any such chain (on C k (G)) with transition matrix P G .For any b ≤ n, we say that Here, h(σ 1 , σ 2 ) refers to their hamming distance (number of vertices in which σ 1 and σ 2 differ).

Proof of Theorem 1.1
The proof is based on establishing an upper bound on the conductance of the chains for sampling kcolorings.Its major outline is similar to that employed in [4] and is given below.

Broad outline of the proof
We prove by showing that, almost surely, G satisfies : For every dn-cautious ergodic chain M(G), shortly M, with asymptotic uniform distribution π on Ω =C k (G) and whose transition matrix is P : there exists S ⊆ C k (G) with 0 < |S| ≤ 0.5|Ω| and ψ S ≤ e −γn where ψ S = i∈S,j ∈S π(i)P (i, j) π(S) .
The quantity ψ S is called the conductance of the set S. It then follows from a well-known fact [14,15] that τ (1/e) = Ω(e γn ).
Before we proceed with the proof, we introduce some definitions and assumptions which will be used later We introduce two specific k-tuples γ and δ of reals which will be used in the definition of the set S. For even k, γ is defined as ) and δ i = 0 for i ≤ (k + 1)/2 and and γ i = 0 and We provide the proof for even k.For odd k, the arguments are almost identical.Define S 1 to be the set of all k-colorings with part sizes corresponding to some (α, β) with ||(γ, δ) − (α, β)|| 1 ≤ c 2 where (γ, δ) and c 2 are defined before.Here, ||(γ, δ) − (α, β)|| 1 denotes the L1-norm defined as Note that this claim holds true with probability 1, i.e., the claim is true for every possible value of G.
Without loss of generality, assume that S = S 1 .For any chain M with transition matrix P , the boundary ∂S of S is defined as ∂S = {i ∈ S : P (i, j) > 0 for some j ∈ S} It then follows (as noticed in [10]) that Hence it suffices to show that, for almost every G, for every dn-cautious M, we have |∂S|/|S| = O(e −γn ).
For an arbitrary coloring C, let (α C , β C ) denote its corresponding vector.Consider any two k-colorings C, D of G.The proof of the following claim is given later.
We claim that for every ergodic dn-cautious chain M(G), To see this, consider any arbitrary C ∈ ∂S.Then, there exists a D ∈ S such that h(C, D) ≤ dn and hence As a result, it follows that ||(γ, δ) For otherwise, we will have which violates our assumption that D ∈ S. Hence our claim is true.
Let n max (G) denote the number of k-colorings with part sizes corresponding to (γ, δ).Let n bd (G) denote the number of k-colorings with part sizes corresponding to some (α, It is shown later (Theorem 3.1) that, with probability at least

✷
It only remains to prove Theorem3.1 stated formally below.Before we proceed with the proof of this theorem, we introduce and study some notions which are required in its derivation.
For each pair (α, β) of two k-tuples of non-negative reals each summing to 1, let C G (α, β) denote the set of all k-colorings of G with part sizes |V i ∩ A| = α i n and |V i ∩ B| = β i n for each i ≤ k.Let N G (α, β) denote the expected number of such k-colorings.First, we estimate the expectation N G (α, β) for each (α, β) and bound these expections.We then apply Markov's inequality to get an upper bound on n bd (G).For a given (α, β), note that where a is defined to be a = a(n, p) = n ln(1 − p). (ii) Note that a ≤ 0 and |a| ≥ np always.Using Stirling's approximation of factorials, it is easy to see that : For some large constant D > 0, where 1) and ( 2), we get (ii) (γ, δ) is not a local maximum of φ. (iv)   (ii) ln α denotes the logarithm of α to the natural base e. (iii) Note that φ is a continuous function defined over the interior of the set (α k = 1 − P i<k α i and and is symmetrical between the values of α i , β i , i.e., φ(α, β) = φ(β, α).φ can be extended to the boundary by taking the limits which are obtained by defining 0 ln 0 = Lt x→0 x ln x = 0.For our purposes, each α i (β i ) is a non-negative integral multiple of 1/n.This is because the sizes of the color classes are integer valued.(iv) Statement (ii) is only for the sake of information, it is not required in the analysis of mixing time.
Proof: See Section 4. ✷ For even k, φ(γ, δ) = 2 ln(k/2) and for odd k, φ(γ, δ) = ln(k + 1) + ln(k − 1) − 2 ln 2. Using Theorem 3.2, we prove Proof: (of Theorem 3.1) By definition, each k-coloring counted for n max (G) does not depend on any potential edge joining A and B and hence n max (G) is the same whatever the value of G is. Hence Thus, by (3), n max (G) ≥ e φ(γ,δ)n /n k .Also, D k e φ(α,β)n by (3) Hence, by Markov's inequality, with probability at least 1 − e −n/(k 2 ln n) , The proof of this generalization is essentially the same as the proof of Theorem 1.1.As before, we show that the conductance of the set S (where S is essentially the same as before) is exponentially small.We need to change the definition of φ(α, β) as follows.
The definition of ψ(α, β) remains the same.Here, we assume that H is a labeled graph on k vertices {1, . . ., k}.Because of the assumption that H contains a K k/2,k/2 subgraph, n max (G) remains the same.Also, the expectation of n bd (G) decreases because of the extra additive term which takes negative values.

Analysis of φ at (γ, δ)
Proof of Theorem 3.2i Recall that |a| ≥ 6k ln k.We prove the statement for even k.For odd k, the arguments are similar.Define c Let T 1 , T 2 , T 3 denote (respectively in that order) the three terms in the above expression.By our notation, i κ i ≤ 0. Also, since k ≥ 6 ≥ 2e, 1 + ln(2/k) ≤ 0. Hence T 1 is non-negative.
In the second term, by our choice of (α, β), Divide the sum i τ i ln τ i into two sums S 1 (summing over τ i ≤ 1/k 9 ) and S 2 (summing over the remaining).
Proof of Theorem 3.2 ii We only consider the case of even k.To prove (ii) Consider the point (γ ǫ , δ ǫ ) defined for every small ǫ > 0. It is defined as follows.For even k, > φ(γ, δ) for all sufficiently small ǫ > 0.
Hence (γ, δ) is not a local maximum of φ.

Consequences
The following corollary shows that Theorem 1.1 holds even if our random G is conditioned to satisfy some property which holds with "not a too small" probability.We also illustrate the immediate consequences of this corollary with some specific examples.Throughout, k is a value such that 6 ≤ k ≤ n 1/3−ǫ .Also, d, γ have the same meaning given in Theorem 1.1.Definition : Let P denote an arbitrary property of bipartite graphs which is satisfied with positive probability by a random G ∈ G(n, n, p).Given p = p(n), we define s p (P, n) as the probability that G satisfies P. The random model G(n, n, p) | P is defined to the model obtained by conditioning G(n, n, p) to the event "G ∈ G(n, n, p) satisfies P".
Corollary 5.1 Let P be any property such that there is a p = p(n) with np ≥ 6k(ln k) and e −n/(k 2 ln n) = o(s p (P, n)).Then, almost surely, (v) G 0 ∈ G(n, n, p) | P satisfies the claim of Theorem 1.1.
Corollary 5.2 For any k, 6 ≤ k = O(n 1/3−ǫ ), for any dn-cautious ergodic markov chain M(K n,n ) which is asymptotically uniform on C k (K n,n ), its mixing time is Ω(e γ(k)n ).
Proof: Let P denote the property that G = K n,n and choose p = 1 leading to np = n.Also, s p (P, n) = 1 satisfying the requirements of Corollary 5.1.✷ Corollary 5.3 Let G ′ denote any one of the random bipartite graphs G 1 and G 2 on n + n vertices defined below.Then, almost surely, G ′ satisfies : For any dn-cautious ergodic markov chain (ii) G 2 is random with 0.9s ≤ ∆(G 2 ) ≤ 1.1s, for any s ≥ 850(ln n) and any k where 6 ≤ k ≤ min{s/(6 ln s), n 1/3−ǫ }.
Proof: Let G ∈ G(n, n, p) be a random bipartite graph on n + n vertices.(ii) Choose p = s/n.Then, np ≥ 6k(ln k).Let P denote the property that 0.9s ≤ ∆(G) ≤ 1.1s and let G 2 ∈ G(n, n, p) | P. For an arbitrary vertex u, by Chernoff-Hoeffding bounds (see [12]), satisfying the requirements of Corollary 5.1. (vi)   ✷ 6 Ergodicity and cautiousness Theorem 1.1 assumes that there are some dn-cautious ergodic chains on C k (G), otherwise there is no need to worry about mixing time.So it becomes important to study the conditions under which such ergodic chains exist.We answer this question partially by providing some positive answers and also some negative answers.There are O(1)-cautious ergodic chains which are uniform on C k (K n,n ) for k ≥ 3. Glauber dynamics (GD) which tries to recolor a uniformly chosen vertex with a uniformly chosen new color, is one such example.It is easy to verify that this chain is ergodic on the s-colorings of any complete l-partite graph G l , for any s > l.In particular, it is ergodic on k-colorings (for k ≥ 3) of any complete bipartite graph.The same is true of some variants of Glauber dynamics also (see [5]).
But, there exist some bipartite graphs for which GD is not ergodic (see the Fact 6.1 for a more general statement).As noted in [10], the chain associated with WSK algorithm is ergodic for any bipartite graph even though it is not (n/k)-cautious.However, if we restrict our sample space to those k-colorings of G in which some color is not used on any vertex of some partite set (A or B), then Glauber dynamics is ergodic for each k ≥ 3 and for every bipartite graph.Also, Theorem 1.1 holds true for such colorings.
Also, a large value of b alone cannot guarantee that some ergodic b-cautious chain can be designed for sampling uniformly from C k (G) for each bipartite G and k.This is shown in the following proposition.(vi) The lower bound on s in Corollary 5.3 is close to optimal within a multiplicative factor of O(ln k).This is because if s ≤ (6/11)k (or equivalently, k ≥ (11/6)s), then for every G with ∆(G) ≤ s, the 1-cautious chain Glauber Dynamics is ergodic and mixes rapidly as shown by [16].
Proof: Let b 1 = ⌈b/2⌉ and b 2 = ⌊b/2⌋.H has two partite sets L and R defined by There is an edge between u i,j and v i ′ ,j ′ if and only if i = i ′ .H is bipartite and has n vertices.Consider a k-coloring C of H defined as C(u i,j ) = C(v i,j ′ ) = i for any i, j, j ′ .For any u i,j , if we try to change its color to some i ′ = i, then it forces each v i ′ ,j ′ to take a color i ′′ = i ′ .This further changes the colors of at least ⌈b/2⌉ vertices in L. Hence, at least b vertices are forced to change their colors.Hence h(C, C ′ ) ≥ b for every C ′ = C. Thus, there is no b ′ -cautious (b ′ < b) chain irreducible on C k (H).✷ Fact 6.1 does not weaken the statement of Theorem 1.1.It only shows that there are bipartite graphs for which less powerful chains cannot be ergodic.Theorem 3.1, on the other hand, looks at some graphs for which such less powerful but ergodic chains exist and show that they do not mix rapidly.However, it is very likely that there are O(n/k)-cautious chains (on k-colorings) which are ergodic on all or almost all bipartite graphs.

Examples of colorings admitting cautious ergodic chains
In this section, we provide some explicit examples of colorings which admit ergodic chains that are 1cautious and which can be shown (as in Theorem 1.1) to have exponential mixing time.

Example
Let C 1,k+1 (G) be the set of all (k + 1)-colorings of G = (A ∪ B, E) such that (i) color 1 is used only for vertices of A, (ii) color k + 1 is used only for vertices of B and (iii) color 2 through k are used for vertices of both A and B.
Let H be the complete bipartite graph on X = {u 1 , . . ., u k }∪Y = {v 1 , . . ., v k } minus the near perfect matching {(u 2 , v 2 ), . . ., (u k , v k )}.It can be seen that sampling uniformly from C 1,k+1 (G) is equivalent to sampling uniformly from C H (G, X, Y ) where This is seen from the following bijection which maps a c ∈ C H (G, X, Y ) to the coloring One can derive negative results (similar to Theorem 1.1) on sampling from C H (G, X, Y ).The proof is essentially the same as the proof of Theorem 1.1.For i = 2, . . ., k, define α i and and α k+1 = 0. Note that β 1 and α k+1 are forced to be zero always.Now, as in Theorem 1.1, we can define γ and δ suitably and show that the conductance of the set S is exponentially small.We leave the details to the reader.This shows that Theorem 1.1 applies to the ergodic cautious chains for sampling from C 1,k+1 (G).
The result of applying the procedure CLOSURE(H) (described in Section 3 of [2]) is a complete unlooped bipartite graph.Hence it follows from [2] (Theorem 3) that the 1-cautious chain Glauber dynamics is ergodic for all bipartite G and is uniform on the space C 1,k+1 (G).The Glauber dynamics for a H-coloring is a 1-cautious chain which tries at each step to re-color at random a single randomly chosen vertex.By the afore-said arguments, GD has exponential mixing time for G ∈ G(n, n, p).

Example
In general, for 0 ≤ a ≤ k, let C a,k (G) denote the set of all k-colorings of G = (A ∪ B, E) such that (i) colors 1 through a are used only for vertices of A, (ii) colors k − a + 1 through k are used only for vertices of B and (iii) colors a + 1 through k − a are used for vertices of both A and B.
Just as in Example 7.1, one can derive negative results on sampling using cautious chains.The proof is again essentially the same as the proof of Theorem 1.1.Also, Glauber dynamics is ergodic for such H-colorings also.

Example
Suppose H is the complete graph plus just one loop.The preceding arguments also show that our torpid mixing results can be applied to H-colorings for this specific H.This is because if H is the complete graph on V = {1, 2 . . ., k} with a self-loop on vertex 1, then any H-coloring X 1 , . . .X k of G = (A ∪ B, E) can be thought of as a standard (k + 1)-coloring Y 1 , . . .Y k+1 of G where This mapping establishes a bijection between C H (G) and C 1,k+1 (G) defined before.Hence, the negative results of Theorem 1.2 apply to this specific H.Also, the Glauber dynamics chain is ergodic in this case (follows from Theorem 2, [2]).

Conclusions
An interesting open problem is to design efficient samplers for k-colorings (k ≥ 3) of bipartite graphs?An affirmative answer would lead to efficient randomized approximate counters for this case.The results of this paper indicate that such samplers may have to use tools other than Markov chains.
It seems very likely that the negative results obtained can be extended to k-colorings of l-colorable (l ≥ 3) graphs where l < k = O(n δ ) for some positive δ which depends only on l.Finding a k-coloring is N P -hard for any fixed k ≥ 3 except for some special classes like bipartite graphs, chordal graphs, etc.Even if we ignore the complexity of finding a starting state, the previous remark shows that markov chains are not likely to be useful as efficient samplers.
It is known that approximate counting and almost uniform sampling problems are polynomial time reducible to each other provided k > ∆(G) (see [8]).However, it would be interesting to see if this can be extended to k ≤ ∆(G) for some special classes of graphs.For example, this reduction works also for bipartite G and for any k ≥ 3. We can show this by a simple modification of the arguments of [8].But, we skip the proof of this observation since we present only negative results on efficiently sampling colorings of bipartite graphs.A related question is : Are there other classes of graphs where the equivalence is true for k ≤ ∆?
This completes the proof of Theorem 1.1.✷ Proof: (of Claim 3.2 ) Let C i (A) denote the set of vertices in A colored (by C) with i. C i (B) is similarly defined.We can write h(C, D) = h A (C, D) + h B (C, D) where h A (C, D) and h B (C, D) denote respectively the contributions of A and B to h(C, D).We use ⊕ for the symmetric difference operation on sets.The claim follows from the following lower bound on h A (C, D) and a similar one for h B (C, D).

Fact 6 . 1
For every two positive integers k, b ≥ 2, there exists a bipartite graph H on n = kb vertices such that no b ′ -cautious (b ′ < b) chain is irreducible on C k (H).Hence no such chain is asymptotically uniform on C k (H).