Convergence of some leader election algorithms

We start with a set of n players. With some probability P(n,k), we kill n-k players; the other ones stay alive, and we repeat with them. What is the distribution of the number X_n of phases (or rounds) before getting only one player? We present a probabilistic analysis of this algorithm under some conditions on the probability distributions P(n,k), including stochastic monotonicity and the assumption that roughly a fixed proportion alpha of the players survive in each round. We prove a kind of convergence in distribution for X_n-log_a n, where the basis a=1/alpha; as in many other similar problems there are oscillations and no true limit distribution, but suitable subsequences converge, and there is an absolutely continuous random variable Z such that the distribution of X_n can be approximated by Z+log_a n rounded to the nearest larger integer. Applications of the general result include the leader election algorithm where players are eliminated by independent coin tosses and a variation of the leader election algorithm proposed by W.R. Franklin. We study the latter algorithm further, including numerical results.


A general convergence theorem
We consider a general leader election algorithm of the following type: We are given some random procedure that, given any set of n ≥ 2 individuals, eliminates some (but not all) individuals. If there is more that one survivor, we repeat the procedure with the set of survivors until only one (the winner) remains. We are interested in the (random) number X n of rounds required if we start with n individuals. (We set X 1 = 0, and have X n ≥ 1 for n ≥ 2.) We let N k be the number of individuals remaining after round k; thus X n := min{k : N k = 1}, where we start with N 0 = n. For convenience we may suppose that we continue with infinitely many rounds where nothing happens; thus N k is defined for all k ≥ 0 and N k = 1 for all k ≥ X n .
We assume that the number Y n of survivors of a set of n individuals has a distribution depending only on n. We have 1 ≤ Y n ≤ n; we allow the possibility that Y n = n, but we assume P(Y n = n) < 1 for every n ≥ 2, so that we will not get stuck before selecting a winner. We further assume that, given the number of remaining individuals at the start of a new round, the number of survivors is independent of the previous history. In other words, the sequence (N k ) ∞ 0 is a Markov chain on {1, 2, . . .}, and X n is the number of steps to absorption in 1. The transition probabilities of this Markov chain are, with Y 1 = 1, P (i, j) := P(Y i = j) = P(j survives of a set of i). (1.1) Note that P (i, j) = 0 if j > i and P (i, i) < 1, for i > 1. Conversely, any Markov chain on {1, 2, . . .} with such P (i, j) can be regarded as a leader election algorithm in the generality just described. We will in this paper treat leader election algorithms where, asymptotically, a fixed proportion is eliminated in each round. (Thus, we expect X n to be of the order log n.) More precisely, we assume the following for Y n , where we also repeat the key assumptions above. (Here and below, log n should be interpreted as some fixed positive number when n = 1.) Condition 1.1. For every n ≥ 1, Y n is a random variable such that 1 ≤ Y n ≤ n, and P(Y n = n) < 1 for n ≥ 2. Further: (i) Y n is stochastically increasing in n, i.e., P(Y n ≤ k) ≥ P(Y n+1 ≤ k) for all n ≥ 1 and k ≥ 1. Equivalently, we may couple Y n and Y n+1 such that Y n ≤ Y n+1 .
Remark 1.2. If (1.2) or (1.3) holds for some sequence (δ n ), it holds for every larger sequence (δ n ) too; similarly, if δ n = O (log n) −1−ε or (1.3) holds for some ε, it holds for every smaller ε too. Hence we may assume that (ii) and (iii) hold with the same ε > 0 and the same δ n , and we may assume δ n ≥ (log n) −1−ε . In particular, this implies that δ k = O(δ n ) when C −1 n ≤ k ≤ Cn, for each constant C.
The behaviour of the election algorithm is given by the recursion X 1 = 0 and X n d = X Yn + 1, n ≥ 2, (1.5) where we assume that (X i ) n i=1 and Y n are independent. We state a general convergence theorem for leader election algorithms of this type.
We recall the definitions of the total variation distance d TV and the Wasserstein distance d W (also known as the Dudley, Fortet-Mourier or Kantorovich distance, or minimal L 1 distance); these are both metrics on spaces of probability distributions, but it is convenient to write also d TV (X, Y ) := d TV (µ, ν) and d W (X, Y ) := d W (µ, ν) for random variables X, Y with X ∼ µ and Y ∼ ν.
The total variation distance d TV between (the distributions of) two arbitrary random variables X and Y is defined by (1.6) For integer-valued random variables, as is the case in our theorem, this is easily seen to be equivalent to Further, for any distributions µ and ν, the infimum is taken over all random vectors (X, Y ) on a joint probability space with the given marginal distributions µ and ν. (In other words, over all couplings (X, Y ) of µ and ν.) For integer-valued random variables, convergence in d TV is equivalent to convergence in distribution, or equivalently, weak convergence of the corresponding distributions. The Wasserstein distance d W is defined only for probability distributions with finite expectation, and can be defined by, in analogy with (1.8), (1.9) There are several equivalent formulas. For example, for integer-valued random variables, It is immediate from (1.8) and (1.9) that for integer-valued random variables X and Y (but not in general), It is well-known that d W is a complete metric on the space of probability measures on R with finite expectation, and that convergence in d W is equivalent to weak convergence plus convergence of the first absolute moment. All unspecified limits in this paper are as n → ∞.
We thus do not have convergence in distribution as n → ∞, but the usual type of oscillations with an asymptotic periodicity in log 1/α n and convergence in distribution along subsequences such that the fractional part {log 1/α n} converges. (This phenomenon is well-known for many other problems with integer-valued random variables, see for example [14,10]; it happens frequently when the variance stays bounded.) This is illustrated in Figure 1.  Proof. We assume that δ n are as in Remark 1.2.
Let q := sup n≥2 P(Y n = n). Since each P(Y n = n) < 1, and P(Y n = n) → 0 by (iii), q < 1. Hence X n is stochastically dominated by a sum of n − 1 geometric Ge(1 − q) random variables, and thus EX n = O(n). In particular, EX n < ∞ for every n.
Since the sequence (Y n ) is stochastically increasing, we may couple all Y n such that Y 1 ≤ Y 2 ≤ . . . . If we consider starting our algorithm with different initial values, and use this coupling of (Y n ) in each round, we obtain a coupling of all X n , n ≥ 1, such that X n+1 ≥ X n a.s. for every n ≥ 1. We use these couplings of (Y n ) and (X n ) throughout the proof. Let We extend b n to real arguments by the same formula; thus, b t = b ⌊t⌋ for real t ≥ 1.

It follows by induction over
and thus b n = O(1). In other words, we have shown We now use the Wasserstein distance d W . Since X n+1 ≥ X n a.s., it is easily seen by (1.9) and thus, for all n and m, (1.20) Note also that (iii) implies Then, for t ≥ 2/α, using (1.5), (1.20), (1.21), (1.3), and 1 ≤ Y ⌊t⌋ ≤ t, Hence, for any t ≥ 2/α, Since d W is a complete metric, thus there exists for every t > 0 a limiting distribution µ(t), such that if Z(t) ∼ µ(t), then In particular, (1.25) (We find it more convenient to use the random variable Z(t) than its distribution µ(t).) Clearly, Z(αt) d = Z(t), so the distribution µ(t) is a periodic function of log 1/α t. Hence, (1.24) can also be written, adding the explicit estimate obtained from (1.23), (1.26) Note further that, for γ ≥ 1, by (1.22) and (1.20), Replacing t by α −j t and letting j → ∞, it follows from (1.24) that, for all t > 0 and γ ≥ 1, Consequently, t → µ(t) = L(Z(t)) is continuous and Lipschitz in the Wasserstein metric. Define, for every real x, for any t > 0 such that x + log 1/α t is an integer; since Z(t) is periodic in log 1/α t, this does not depend on the choice of t. Since X α −j t + log 1/α t = X α −j t − j ∈ Z, the random variable Z(t) + log 1/α t is integervalued for every t. It is easily seen that for integer-valued random variables Z 1 and Z 2 , the total variation distance d TV (Z 1 , Z 2 ) ≤ d W (Z 1 , Z 2 ). Hence, for any x ∈ R and u ≥ 0, choosing t such that x + log 1/α t is an integer and letting γ = α −u , which implies that x − u + log 1/α (γt) = x + log 1/α (t) ∈ Z, we obtain from the definition (1.28) and (1.27), (1.29) Hence, F (x) is a continuous function of x.
We have shown that t → L(Z(t)) is continuous in the Wasserstein metric, and thus in the usual topology of weak convergence in the space P(R) of probability measures on R.
Furthermore, (1.26) and (1.28) show that, for any sequence k n of integers, as n → ∞, Since further the sequence X n is increasing, it now follows from Janson [10, Lemma 4.6] that F is monotone, and thus a distribution function. By (1.29), the distribution is absolutely continuous and has a bounded density function F ′ (x). It is easy to see that (1.30), (1.12) and (1.13) are equivalent, see [10,Lemma 4.1]. The corresponding result in the Wasserstein distance follows from (1.26) because d W (X n , ⌈Z + log 1/α n⌉) = d W ( X n , Z(n)), e.g. by Remark 1.4 below; (1.14) then follows by (1.10). Finally, (1.26) implies that (1.27) implies that φ is continuous, and Lipschitz on compact intervals. Remark 1.4. As remarked above, Z(t) + log 1/α t is integer-valued. Moreover, for every integer k, Hence, for every t > 0, Z(t) d = ⌈Z + log 1/α t⌉ − log 1/α t. General families of random variables of this type are studied in [10]. In particular, [10, Theorem 2.3] shows how φ(t) := EZ(t) in Theorem 1.3 can be obtained from the characteristic function of the distribution F of Z.
Remark 1.5. The very slow convergence rate O log −ε t in (1.26) is because we allow δ n to tend to 0 slowly. In typical applications, δ n = n −a for some a > 0, and then better convergence rates can be obtained. We have, however, not pursued this. Remark 1.6. Note that F and φ are influenced by the distribution of Y n for small n > 2, for example Y 3 and Y 4 ; hence there is no hope for a nice explicit formula for F or φ depending only on asymptotic properties of Y n .
Remark 1.7. The general problem of studying the number of steps until absorption at 1 of a decreasing Markov chain on {1, 2, . . . } appears in many other situations too, usually with quite different behaviour of Y n and X n . As examples we mention the recent papers studying random trees and coalescents by Drmota, Iksanov, Moehle and Roesler [3], Iksanov and Möhle [9] and Gnedin and Yakubovich [8]; in these papers the number killed in each round is much smaller than here and thus X n is larger, of the order n or n/ log n; moreover, after normalization X n has a stable limit law.

Extensions
We have assumed that we repeat the elimination step until only one player remains. As a generalization we may suppose that we stop when there are at most a players left, for some given number a.
Theorem 2.1. Consider the leader election algorithm described in Section 1, but stopping as soon as the number of remaining players is at most a, for some fixed a ≥ 1. Suppose that Condition 1.1 is satisfied. Then, the conclusions of Theorem 1.3 hold, for some F and φ that depend on the threshold a.
Proof. This generalization can be obtained from the version in Section 1 by replacing Y n by Suppose that Condition 1.1 holds for (Y n ). It is easily seen that then Condition 1.1 holds for (Y ′ n ) too, with the same α; for (ii), note that Condition 1.1(iii) implies that . Consequently, Theorem 1.3 applies to (Y ′ n ), and the result follows.
In this situation, it is also interesting to study the probability π i (n) that the procedure ends with exactly i players, starting with n players; here i = 1, . . . , a and a i=1 π i (n) = 1. We have a corresponding limit theorem for π i (n).
Proof. A modification of the proof of Theorem 1.3, now taking x n := π i (n) and d n := |x n+1 − x n | and replacing the random X t by , which easily is seen to satisfy the stated conditions. We omit the details.
More generally, there is a similar result on the probability that the process passes through a certain state; this is interesting also for the process in Section 1 with a = 1. Theorem 2.3. Suppose that Condition 1.1 holds and that a ≥ 1 is given as above. Let π i (n), i ≥ 1, be the probability that, starting with n players, there exists some round with exactly i survivors. Then
Proof. For i ≤ a, this π i (n) is the same as in Theorem 2.2, and for each i > a, this π i (n) is the same as in Theorem 2.2 if we replace a by i.
Remark 2.4. Another variation, which is natural in some problems, is to study a nonincreasing Markov chain on {0, 1, . . . } and ask for the number of steps to reach 0; in other words, the time until all players are killed. In this case, we thus assume that 0 ≤ Y n ≤ n. This can obviously be transformed to our set-up on {1, 2, . . . } by by increasing each integer by 1; in other words, we replace Y n by Y ′ n := Y n−1 + 1, n ≥ 2; we can interpret this as adding a dummy player that never is eliminated, and continuing until only the dummy remains. If Condition 1.1 holds for Y n , except that Y n = 0 is allowed and P(Y 1 = 0) > 0, then Condition 1.1 holds for Y ′ n too, and thus our results hold also in this case, with X n now defined as the number of steps until absorption in 0. (To be precise, X n = X ′ n+1 , with (X ′ n ) corresponding to (Y n ), since we add a dummy, but there is no difference between the asymptotics of X ′ n+1 and X ′ n .)
Example 3.2 (a counter example). The procedure in Example 3.1 is almost deterministic. In contrast, the very similar but completely deterministic Y n = ⌊n/2⌋, n ≥ 2, does not satisfy Condition 1.1(ii). In this case, X n = ⌊log 2 n⌋ and P(X n ≤ k) = F (k − log 2 n), for all k ∈ Z and n ≥ 1, where Except for the special rule when all throw tails, which guarantees that at least one player survives each round, the number Y n of survivors in a round thus has a binomial distribution Bi(n, 1 2 ). More precisely, if W n is the number of heads thrown, Thus, conditions (ii) and (iii) in Theorem 1.3 are satisfied. Also the monotonicity condition (i) is satisfied, because if 1 ≤ k ≤ n − 1 (other cases are trivial), then In this case, Prodinger [18], see also Fill, Mahmoud and Szpankowski [5], found an exact formula for the expectation EX n and asymptotics of the form (1.16) with the explicit function Fill, Mahmoud and Szpankowski [5] further found asymptotics of the distribution that can be written as (1.12) with which thus is the distribution function of Z in this case. Second and higher moments are considered by Louchard and Prodinger [16]. Prodinger [18] considered also the possibility of stopping at a = 2 players, and showed (1.16) above in this case too, with an explicit formula for the function φ(t) (of the same type as (3.5) for a = 1).
In fact, the results of [11] show that the conclusions (1.12) and (1.16) hold for all p ∈ (0, 1), for some functions F and φ explicitly given in [11]. (See also [16], where further higher moments are treated.) This suggests that Theorem 1.3 should hold more generally. Note that although Condition 1.1(i) does not hold for p < 1/2, the difference P(Y n+1 ≤ k) − P(Y n ≤ k) is at most P(Y n = 0) = (1 − p) n , and thus negative or exponentially small for large n. It seems likely that Theorem 1.3 can be extended to such cases, by allowing a small error in Condition 1.1(i); this then would include this leader election algorithm with a bisaed coin for any p ∈ (0, 1). However, we have not pursued this.
It was left as an open question in [11] whether for each p ∈ (0, 1) the limit function F is monotone, and thus a distribution function, which means that there exists a random variable Z such that (1.13) holds. By the discussion above, Theorem 1.3 shows that this holds for p ≥ 1/2, but the case p < 1/2 is as far as we know still open. Cf. Remark 4.2 and Figure  6 below, which show that monotonicity fails in a related situation. Numerical experiments, based on [16, Prop. 3.1], indicate that F is monotone, at least for some choices of p < 1/2.
A further variation of Example 3.3 is to let the probability p depend on n. The case p = 1/n is studied by Lavault and Louchard [13]; in this case EY n is bounded and Condition 1.1 does not hold, so Theorem 1.3 does not apply.
Example 3.5. The special rule in Examples 3.3 and 3.4 for the exceptional case when all throw tails is of course necessary to prevent us from killing all players, but as we have seen, it complicates the analysis, especially for p < 1/2 when it destroys stochastic monotonicity of Y n . Note that this rule typically is invoked only towards the end of the algorithm, when only a few players are left. We regard the rule as an emergency exit, and it could be replaced by other special rules for this case. For example, an alternative would be to switch to some other algorithm that is fail-safe although in principle (for large n) slower; for our purpose this means that the present algorithm terminates, so we may describe this by letting Y n = 1 in this case, i.e., (3.3) is replaced by Y n = W n + 1[W n = 0] = max(W n , 1). Note that for this version, Condition 1.1 holds for every p ∈ (0, 1), with α = p, so our theorems apply.
An equivalent way to treat this version is to add (as in Remark 2.4) a dummy, which is exempt from elimination, and to eliminate everyone else that throws a tail; we then stop when there are at most 2 players left (the dummy and, possibly, one real player). This is thus the version in Section 2, with a = 2 and Y n = 1 + W n−1 , W m ∼ Bi(m, p) (and starting with n + 1 players). Again, Condition 1.1 holds, with α = p, and the results in Section 2 apply. In particular, since invoking the special rule corresponds to eliminating everyone except the dummy, the probability that we have to invoke the special rule is the same as the probability that the dummy version ends with only the dummy, i.e. π 1 (n + 1) in the notation of Theorem 2.2; the asymptotics of this probability is thus given by (2.1).
Example 3.6. Prodinger [17] (for p = 1/2) and Louchard and Prodinger [15] studied a version of Examples 3.3 and 3.4 where, as in Remark 2.4, we allow all players to be killed and let X n be the time until that happens. Thus, in each round, each player tosses a coin and is killed with probability 1 − p (and we do not have any special rule). Additionally, there is a demon, who in each round kills one of the survivors (if any) with probability ν ∈ [0, 1]. We thus have the modification discussed in Remark 2.4, with where W n and I ν are independent; thus also Y ′ n = Y n−1 + 1 = max(W n−1 + 1 − I ν , 1). Condition 1.1 holds, in the modification for absorption in 0, and thus our results apply. In fact, Louchard and Prodinger [15] show (1.12) and (1.16) for this problem with explicitly given F and φ; they further give an extension of (1.16) to higher moments.
As remarked in [17,15], the special case ν = 1 is equivalent to approximate counting and ν = 0 is equivalent to the cost of an unsuccessful search in a trie; in the latter case, X n is simply the maximum of n i.i.d geometric random variables which can be treated by elementary methods.
Example 3.7. W.R. Franklin [7] proposed a leader election algorithm where the n players are arranged in a ring. Each player gets a random number; these are i.i.d. and, say, uniform on [0, 1]. (Since only the order of these numbers will matter, any continuous distribution will do; moreover, it is equivalent to let ξ 1 , . . . , ξ n be a random permutation of 1, . . . , n.) A player survives the first round if her random number is a peak; in other words, if ξ 1 , . . . , ξ n are i.i.d. random numbers, then player i survives if ξ i ≥ ξ i−1 and ξ ≥ ξ i+1 (with indices taken modulo n). We may ignore the possibility that two numbers ξ i and ξ j are equal; hence we may as well require ξ i > ξ i−1 and ξ > ξ i+1 .
In Franklin's algorithm, the survivors continue by comparing their original numbers in the same way with the nearest surviving players; this is repeated until a single winner remains. We have so far not been able to analyse this algorithm. It is easy to verify that even if we condition on the number m of survivors after the first round, the m! possible different orderings of the survivors do not appear with equal probabilities, which means that the algorithm is not of the recursive type studied in this paper. (For example, starting with a ring of 8 players and conditioning on having 4 survivors (peaks) in the first round; the probability of getting 2 survivors in the second round is 10/34, and not 1/3 as in the uniform case.) However, we can study a variation of Franklin's algorithm, where the survivors draw new random numbers in each round. This is an algorithm of the type studied in Section 1, with Y n given by the number of peaks in a random permutation, regarded as a circular list. Note that there is always at least one peak (the maximum will always do), so Y n ≥ 1 as required. It is easily seen that inserting a new player will never decrease the number of peaks; hence Y n is stochastically increasing in n. Further, we have Y n = n 1 I i , where I i := 1[ξ i > max(ξ i−1 , ξ i+1 )] is the indicator that player i survives (again, indices are taken modulo n). If n ≥ 3, then EI i = 1/3 by symmetry, and thus EY n = n/3. In particular, (ii) holds with α = 1/3. Furthermore, I i and I j are independent unless |i − j| ≤ 2 (mod n), and similarly for sets of the indicators I i , and it follows easily that E(Y n − EY n ) 6  (Details might appear elsewhere.) In comparison, for the variation with new random numbers each round, it is easily seen that the expected number after k rounds is (1/3) k n + o(1), for any fixed k; in particular, after two rounds it is n/9 + o(n). Note that c 2 in (3.7) is slightly smaller than 1/9, and thus better in terms of performance. It might have been hoped that the true Franklin algorithm is asymptotically equivalent to the variation studied here, but the fact that c 2 = 1/9 suggests that this is not the case. Nevertheless, we conjecture that Theorem 1.3 remains true for the Franklin algorithm, for some unknown α < 1/3. Example 3.9. In both versions in Example 3.7, the players are arranged in a circle. Alternatively, the players may be arranged in a line. We use the same rules as above, but we have to specify when a player at the end (with only one neighbour) is a peak. There are two obvious possibilities: (i) Never regard the first and last players as peaks. (Define ξ 0 = ξ n+1 = +∞.) (ii) Regard them as peaks if ξ 1 > ξ 2 and ξ n > ξ n−1 , respectively. (Define ξ 0 = ξ n+1 = −∞.) In the first case, it is possible that there are no peaks, and thus we have to add an emergency exit as in Example 3.5. As in Example 3.7, there are two versions (for each of (i) and (ii)): we may use the same random numbers in all rounds, or we may draw new ones each round.
In the latter case, we are again in the situation of Section 1. In both cases (i) and (ii), the distribution of Y n is related to the distribution in the circular case in Example 3.7. Indeed, if we start with a circular list of n + 1 numbers and eliminate the player with the largest number, then the remaining n numbers form a linear list, and the peaks in this list using version (i) equal the peaks except the maximum one in the original circular list. Similarly, if we instead eliminate the player with the smallest number, then the peaks in the remaining list using version (ii) equal the peaks in the original list. Hence, if Z n is the number of peaks in a random circular list of length n, then (i) yields Y n In both cases, this implies that Condition 1.1 holds (provided we add a suitable emergency exit in case (i)) because it holds for Z n . Consequently, Theorem 1.3 applies to both these linear versions of (the variation of) Franklin's algorithm, again with 1/α = 3.

First variation of the Franklin leader election algorithm. The linear case
We assume that the survivors draw new random numbers in each round and that they are arranged in line. We use possibility (i) of Example 3.9. We start with a set of n players. We assign a classical permutation of {1, . . . , n} to the set, all players corresponding to a peak stay alive, the other ones are killed. If there are no peaks, we choose the following emergency exit: a player is chosen at random (this is assumed to have 0 cost), indeed in the original game, one deals with circular permutations, so there always exists at least one peak, here we approach the problem with a classical inline permutation.
What is the distribution of the number X n of phases (or rounds) before getting only one player? We will sometimes use the subscript l to distinguish these from the circular case discussed in Section 5.

The analysis
First of all, we know (Carlitz [2]; see also [6,Chapter 3]), that the pentavariate generating function (GF) of valleys (u 0 ), double rise (u 1 ), double fall (u ′ 1 ) and peaks (u 2 ) is given by This gives the GF of the number of peaks: hence the mean M and variance V of the number of peaks, for n ≥ 2 and n ≥ 4, respectively: This GF is also given in Carlitz [2]. Moreover, from [6, Chapter 9], we know that the distribution P is asymptotically Gaussian. This is also proved in Esseen [4] by probabilistic methods. Let x(n) be the mean number of phases, E(X n ), starting with n players. As we shall see, the initial values are Since (4.2) yields M (n) + 1 = (n + 1)/3 for n ≥ 2, we have (approximating by using this for n ≤ 1 too) that the mean number of players c(j) still alive after j phases is c(j) ≈ 3 −j (n + 1) − 1.  (An induction easily yields the exact formula |c(j) − 3 −j n| < 1 for all j.) If we want c(j) = 1, this leads to the approximation We see from Theorem 1.3 (which applies by Example 3.9) that this is roughly correct, but the constant − log 3 2 has to be replaced by a periodic function φ(n).
Some values of P and Π is given in Tables 1 and 2. (We use in the tables and figures the subscript l to emphasise that we deal with the linear case.) Denoting the jth column of Π by π (j) , we have For j ≥ 2, it suffices to consider k ≥ 2 in (4.3), so we need only the matrix (P (n, k)) n,k≥2 . Since P (n, k) = 0 if k > (n − 1)/2, this matrix is triangular, and so is P j−1 . But π (1) (n) < 10 −7 , n > 20, so numerically, the significant columns of P j−1 are the first 20 columns. Also, we see the importance of the initial first column of Π. Moreover, for n > 75, P (n, k) is indistinguishable from the Gaussian limit. So we have used the expansion of the GF (4.1) for n ≤ 75 and the Gaussian limit afterwards in our numerical calculations. Of course we have    A plot of x(n) − log 3 n versus log 3 n is given in Figure 2 for n = 50, . . . , 500. The oscillations expected from (1.16) are clear.
Recall that according to Theorem 1.3, there exists a limiting distribution function F (x) = F l (x) (in a certain sense) for X n . In Figure 3, we approximate this distribution function F (x) by plotting Λ(n, j) = P(X n ≤ j) against j − log 3 n for n = 20, . . . , 500, cf. (1.12). We have also plotted a scaled Gumbel distribution; the fit is bad.
Similarly, in Figure 4 we show the probability Π(n, j), n = 150, . . . , 500, plotted against j − log 3 n. The fit with a Gaussian distribution is equally bad.
The few scattered points of both figures are actually due to small n and the propagation of the more erratic behaviour for n = 1, . . . , 40 shown in Figure 5.
So we observe the following facts: (i) A first regime (n = 1, . . . , 40) create some scattered points which almost look like two distributions. (v) P is triangular, and so is P j . Also, P j (n, k) = Θ(1), with k = O(1), only if j = log 3 n + O(1).
(vi) As F (x) is absolutely continuous, we can derive, as in [14], modulo some uniform integrability conditions, all (periodic) moments of X n , in particular x(n).
(vii) The effect of initial values is now clear. To illustrate this we have changed to Π(0, 1) = Π(1, 0) = 1, which means that we add a cost 1 for the extra selection required when the algorithm terminates with no element left. This leads to Table 3. The equivalent of Figures 3, 4 and 5 is given in Figures 6, 7 and 8. Note that X n no longer is stochastically monotone in n; we have by definition X 0 = 1 > 0 = X 1 , and Table 3 shows other examples of non-monotonicity for small n. Moreover, Figure 6 shows that the nonmonotonicity persists for large n; we clearly have convergence to a limit function, G(x) say, but the limit is not monotone and thus not a distribution function as in Figure 4 and, more generally, in Theorem 1.3.
Remark 4.2. Note that the example in (vii) and Figure 6 does not contradict Theorem 1.3 because X n now is defined with other initial values than in Theorem 1.3. Nevertheless, it is a warning that monotonicity of the limit should not be taken for granted in cases such as Example 3.4 with p < 1/2 where the monotonicity assumption of Theorem 1.3 is not satisfied.
With the usual machinery (see [14] or [   Using this numerically computed ψ in (4.4)-(4.6), we computem 1 + w 1 (x), which fits quite well with the observed periodicities of x(n) − log 3 n in Figure 2; the comparison is given in Figure 9. If we denote by P c (n, k) the distribution of the number of peaks in the circular case and by P l (n, k) the distribution in the linear case, we know, by Example 3.9 that P c (n, k) = P l (n − 1, k − 1). It is easy to check that this leads to, for n ≥ 3 and n ≥ 5, respectively, M(n) = n/3, V(n) = 2n/45, which also is easy to see probabilistically, by writing Y n as the sum of the n indicators [[player i is a peak]], and noting that indicators with distance at least 3 are independent.) The initial values are now given by P (1, 1) = 1, P (2, 1) = 1, P (3, 1) = 1, Π(0, 0) = 1, Π(1, 0) = 1, Π(2, 1) = 1, Π(3, 1) = 1.
The corresponding pictures are given in Tables 4 and 5. (We use a subscript c for the circular case.) A plot of Λ c (n, j) versus j − log 3 n for n = 20, . . . , 500 is given in Figure 10. We see that there are fewer scattered points than in the linear case. Let us mention that the fits with Gumbel or Gaussian are equally bad. A comparison of Λ c (n, j) with Λ l (n, j) is given in Figure 11. No numerical relation exists between the two distributions.
Π c (n, j) versus j − log 3 n, n = 1, . . . , 40, is plotted in Figure 12. These initial points are now scattered less than in the linear case.
The observed versus computed periodicities are given in Figure 13.
In conclusion, apart from numerical differences, the behaviour of our two variations are quite similar.
Note that the mean number of needed messages is asymptotically 2n log 3 (n), as we use 2n messages per round. Franklin [7] gives an upper bound 2n log 2 (n).