Stochastic Analysis of the k-Server Problem on the Circle

We consider a stochastic version of the $k$-server problem in which $k$ servers move on a circle to satisfy stochastically generated requests. The requests are independent and identically distributed according to an arbitrary distribution on a circle, which is either discrete or continuous. The cost of serving a request is the distance that a server needs to move to reach the request. The goal is to minimize the steady-state expected cost induced by the requests. We study the performance of a greedy strategy, focusing, in particular, on its convergence properties and the interplay between the discrete and continuous versions of the process.


Introduction
The k-server problem first posed by Manasse, McGeoch, and Sleator [6,7] is central to the study of online algorithms. Let k ≥ 2 be an integer and consider a metric space M = (M, d) where |M | > k is a set of points and d is the metric over M . A k-server problem is defined by a sequence of requests, r 1 , r 2 , . . . , where each r i ∈ M . A k-server algorithm controls k servers that reside in, and move between, points of M . A request r i is satisfied when the algorithm moves a server to that point. The algorithm must satisfy request i before it sees any future request. The cost of serving a sequence of requests is the sum of the distances that servers have to traverse in order to satisfy these requests. The k-server problem models a number of important on-line computation problems such as paging and disk-scheduling [2]. So far, most studies of the k-server problem have focused on the competitive analysis of the problem, culminating in Koutsoupias and Papadimitriou's [5] seminal results, establishing a worst-case competitive ratio of Θ(k) for the problem. † Supported by CNRS grant DREI 21514. An alternative approach to the study of on-line algorithms is probabilistic analysis. Instead of taking a worst-case approach, we assume that the input (in this case, the sequence of requests) follows some distribution, and we analyze the expected cost of a given algorithm over this input distribution. As in the worst-case scenario, the algorithm satisfies each request without information about the sequence of future requests, and we are interested in algorithms that are not tuned to a particular input distribution.

The Stochastic Model
In this work we study a stochastic version of the k-server problem in which the k mobile servers move on the circle of circumference k, which we denote by T and identify with the interval [0, k). Throughout, we focus on the long-term behavior of the process induced by the following greedy algorithm: "at each step i, send the server that is closest to the request r i ." We analyze and compare two different models; in the continuous model, requests arrive anywhere on the circle T = [0, k) according to some distribution with a continuous density; in the discrete case, requests may arrive only at ℓ discrete stations, where k < ℓ < ∞, and the positions of the stations are in and T ℓ with the induced distance. We assume that the requests are independent and identically distributed (i.i.d.) according to a distribution supported either on T (in the continuous case) or on T ℓ (in the discrete case); their common distribution is denoted either by µ or µ ℓ , respectively. For the sake of simplicity, we will always assume that µ has a strictly positive density with respect to Lebesgue measure on T and that µ ℓ has positive mass on all points x ∈ T ℓ .
In the continuous setting, we can model the process as a Markov chain (S n ) n≥0 evolving on the state space, Here p i corresponds to the position of the ith server. In the discrete setting, the servers move only within the ℓ stations, and accordingly the state space is Σ ℓ = Σ ∩ T k ℓ . To avoid ambiguity, in the case when the two closest servers to a given request r i are equidistant from r i , we specify that the greedy algorithm moves the server which is to the left of the request (where the circle is oriented in the usual way). In addition, if two or more servers are on the same location (i.e., we have p i = p i+1 = · · · = p j = x for i < j, or p i = p i+1 = · · · = p k−1 = p 0 = · · · = p j = x for i > j), and the new request is closer to x than to any other server, we choose server j if the request is on the left of x and server i if it is on the right of x. Note that, under these assumptions, the relative order of the servers on the circle does not change, and that, if at some point all the servers are on different locations, then they will never be on the same location in the future of the process.
The performance of the greedy algorithm is measured in terms of the total distance travelled by the servers to fulfill the requests. Let d be the distance on T k defined by Recalling that S n denotes the position of the servers after satisfying the nth request, the cost of the nth request is given by C n = d(S n−1 , S n ). We are interested in the asymptotic behavior of the average cost over a run of N requests, namely,

Convergence to a Stationary Distribution
In the discrete setting, since requests may arrive at all stations with positive probability, the finite state space Markov chain (S n ) n≥1 has the following structure; let, andΣ ℓ =Σ ∩ T k ℓ ; then every state in Σ ℓ \Σ ℓ is transient andΣ ℓ is the unique closed communication class and it is aperiodic. As a consequence (see, for example, [9]), the Markov chain (S n ) has a unique stationary distribution, say π ℓ , supported onΣ ℓ , the distribution of the chain (S n ) converges to π ℓ exponentially fast, and the law of large numbers holds. In particular, the average cost over N steps converges almost surely to the mean cost under π ℓ , as N → ∞.
The main contribution of this work is to obtain similar results in the continuous case. Under the assumption that µ has a positive density, we show in Theorem 1 that the process has a unique stationary distribution π, to which it converges exponentially fast.

Theorem 1
Assume that µ has a density with respect to the Lebesgue measure on T = [0, k) that is uniformly bounded away from 0. Then the Markov chain (S n ) n≥0 on Σ induced by the greedy algorithm has a unique stationary distribution π, and: • for all n ≥ 1, sup A,s0 |P (S n ∈ A|S 0 = s 0 ) − π(A)| ≤ c 1 e −c2n , the supremum being taken over all measurable sets A of Σ and all s 0 ∈ Σ; where the finite constants c 1 , c 2 , c 3 , c 4 (ǫ) may depend on k but not on n; c 2 > 0 and c 4 = c 4 (ǫ) is strictly positive for all ǫ > 0; C i = d(S i−1 , S i ) is the cost of the ith request; and E π (C) is the expected value of the one-step cost for a stationary version of the chain.
In view of the well-known convergence results in [8] and the more recent large deviations estimates in [4], in order to prove the theorem it suffices to verify that the Markov chain fulfills Doeblin's condition; see [8, page 391]. The proof is given in Appendix A.
In addition to the results of the theorem, since the cost function is bounded by definition, the Doeblin (or uniform ergodicity) property implies finer asymptotic results for the convergence of the averaged cost 1 n N n=1 C i . In particular, it is asymptotically normal, and the convergence to normality can be further refined via an Edgeworth expansion. Moreover, the exponential bound on the excess-cost probability can be sharpened to a precise asymptotic formula along the lines of the Bahadur-Rao refinement to the classical large deviations principle [3]. See [4] for details.

Convergence of the Discrete Model to the Continuous One
When the number of stations ℓ in the discrete model is large, we naturally expect that it should be possible to approximate the continuous model by an appropriately defined version of the discrete one. A precise asymptotic version of the above statement is established in the following proposition.
weakly, as ℓ goes to infinity, for some continuous request distribution µ on T. Then, the corresponding sequence {π ℓ } of stationary distributions for the discrete chains induced by each of the µ ℓ also satisfies π ℓ → π weakly, as ℓ → ∞, where π is the stationary distribution of the continuous chain induced by µ.
Proof: If the servers are in position p ∈ Σ and a request arrives at site x ∈ T, then the new position of the servers after service is where p i is the nearest point of x (and the left such point in case of equality). The continuity set of the map Q is exactly the set of points (p, x) such that there is a unique point p i nearest to x. Now suppose the sequences {µ ℓ } and {π ℓ } are as in the statement of the proposition. According to Theorem 2 below (proof omitted from this version), the stationary measures π ℓ converge weakly to π, and so do the corresponding expected costs. Note that the assumptions of the theorem are always satisfied when µ is diffuse and, in particular, when µ has a density. 2 Theorem 2 Let C Q ⊂ Σ×T denote the continuity set of the map Q defined in equation (2.1). Assume that the Markov chain (S n ) n≥0 associated with the request distribution µ has a unique invariant probability measure π, and that, for every probability measure π ′ on T k , Then π ℓ → π weakly and E π ℓ (C) → E π (C), as ℓ → ∞, where E π ℓ (C) and E π (C), denote the expected costs under the stationary versions of the corresponding chains, respectively.
We provide further insight into the correspondence between the discrete and the continuous models by constructing a coupling between the corresponding chains. Consider the continuous model with a sequence of requests (r n ) n≥1 that are i.i.d. with continuous distribution µ on T. Let (S n ) n≥0 denote the associated Markov chain on Σ and let Θ ℓ : T → T ℓ be the discretization map defined by, Similarly, let (r ℓ n ) n≥1 = (Θ ℓ (r n )) n≥1 denote the corresponding discretized requests, so that the sequence (r ℓ n ) is also i.i.d., with discrete distribution µ ℓ given by µ ℓ (x) = µ([x, x + k/ℓ)) for x ∈ T ℓ . We consider the Markov chain (S ℓ n ) defined by the initial state S ℓ 0 = Θ ℓ (S 0 ) and the sequence of requests (r ℓ n ) n≥1 . Note that S ℓ n is in general not the same as Θ ℓ (S n ). We assume that ℓ is large enough so the image under Θ ℓ of the initial positions of the k servers is a set of k distinct points.
As the number of stations ℓ goes to infinity, Θ ℓ converges to the identity map and µ ℓ converges weakly to µ, so we naturally expect that the Markov chains (S ℓ n ) and (Θ ℓ (S n )) will also be "close" for large ℓ. In order to quantify this closeness, we examine the (de-)coupling time, Our next result, whose proof will appear in the full version of this work, is the following: x ∈ T} for ǫ > 0, and assume that δ(k/ℓ) is strictly positive. Then the distribution of T ℓ stochastically dominates a geometric distribution with parameter kδ(k/ℓ), i.e.,

The Case of Two Servers and Uniform Distribution
In the case of uniformly distributed requests for k = 2 servers, we show that the greedy policy is optimal and that the steady-state (expected) cost is 5/18 = 0.2777 . . .. Consider the greedy algorithm G and an arbitrary other algorithm A. Let C G n (C A n ) be the cost incurred by algorithm G (resp., algorithm A) when serving the nth request, and let Z G n (resp., Z A n ) be the minimum distance between the servers immediately after serving the nth request (with Z 0 = Z A 0 = Z G 0 = d(S 0 (1), S 0 (2))).

Theorem 3
If the requests are independent and uniformly distributed on T, then the greedy algorithm is optimal in that, for any algorithm A, regardless of the initial position Z 0 of the two servers.
The optimality of the greedy policy for k = 2 follows from the fact that serving each new request with the closest server minimizes the cost to serve the request, while at the same time maximizing the distance between the servers, resulting in a better covering of the space. The proof will appear in the full version of this work.
Finally, we examine the cost distribution induced by the greedy algorithm. Let (Y n ) n≥0 denote the Markov chain defined by the clockwise distance between the two servers. Due to the rotation-invariance of the uniform distribution, the following simple recurrence formula holds for the process (Y n ) n≥0 , where the (ε n ) n≥1 are i.i.d. Uniform [0,1]. This is exactly the statistical setting of an auto-regressive AR(1) process; see, e.g., [8]. In particular, under appropriate assumptions on the innovation sequence (ε n ) n≥1 , the Markov chain (Y n ) n≥0 has a unique invariant distribution and it can be shown to satisfy an array of classical limit theorems, including the strong law of large numbers, the central limit theorem, the law of iterated logarithm, and so on. Here we establish some simple properties of the limiting distance lim n Y n , and we compute the steady-state expected cost. By induction, for every n ≥ 1, and, therefore, Hence, the characteristic function of Z satisfies the functional equation, is the characteristic function of the uniform distribution in [0, 1]. The expected cost in this case can be computed explicitly as, and substituting the values of the first and second moments of Z gives the expected cost as, From the characteristic function of Z we can also obtain additional information. Noting that, for all k ≥ 1, φ(θ) can be rewritten as, it is easily verified that |θ| k φ(θ) is integrable and that then Z admits a density f which is infinitely differentiable and satisfies, Moreover, the derivative of f equals, From these expressions we deduce that, for every x ∈ [0, 2], f (x) = f (2 − x) and that the function f ′ vanishes at x = 1. A closed formula for the density distribution appears difficult to obtain.

Open Problems
We are interested in the mean cost to serve a request when it arrives with distribution µ. When the k servers are in position p = (p 1 , . . . , p k ), the mean cost to serve a new request is given by, and the asymptotic cost is, • Scaling the problem with respect to k, so that k servers move on a circle of circumference k, how does C(µ, k) behave as a function on k? Figure 1 gives simulation results for the uniform distribution, showing that C(uniform, k) is monotonically increasing with k.
• When k = 2, is it possible to characterize the steady-state distribution π when µ is not the uniform distribution?
• Which parts of the above analysis extend to the case of servers moving on the surface of a ball?

A Proof of Theorem 1
Recall that a Markov chain with state space Σ and transition kernel P(·, ·) satisfies Doeblin's condition [8, page 391] if there exists a probability measure λ on Σ with the property that for some m ≥ 1 and ε ∈ (0, 1), P m (p, S) ≥ ελ(S), for every p ∈ Σ and every (measurable) S ⊂ Σ.
We define now the probability measure λ that we will use. For δ < 1/8 arbitrary but fixed, define the intervals Then λ is the probability measure on Σ induced by the joint distribution of Y 0 , Y 1 , Y 2 , . . . , Y k−1 , so that, in particular, for any set S of the form S = S 0 × S 1 × · · · × S k−1 ⊂ Σ, where each S i ⊂ T: Notice that for the verification of the Doeblin condition it suffices to consider sets of the form S = To that end we we need to show that, for any initial position p of the servers, there is a sequence of requests that has probability at least ελ(S), such that the greedy algorithm sends every server to a final position p ′ ∈ S in exactly m moves. We will establish this by describing a procedure (defined by a sequence of requests) that leads the greedy algorithm to move the servers to S.
The basic steps of the procedure are the following:

1.
Send each server i to the corresponding interval I i ; 2. Perform some additional moves until m − k moves have been made; 3. Send each server to its final position in k moves.
Step 2 is necessary, since Doeblin's condition requires that the greedy algorithm reaches the final position in exactly m moves. Below, each step is described and analyzed in detail; first recall that p = (p 0 , p 1 , . . . , p k−1 ), where each p i is the position of the ith server, and define the clockwise "distance" function,d We also define: •d i =d(p i , p i+1 ), the distance between servers i and i + 1.
• A function A that maps two points of the circle to the interval between them, Finally, let c > 0 be a constant such that the density f of µ with respect to the Lebesgue measure on T satisfies, inf x∈T f (x) ≥ c.

Lemma 1
For any initial position of the servers, the greedy algorithm will send each server i to I i in at most 12k 2 − 8k moves with probability at least c 4k 12k 2 −9k 2δc k k .
Proof: We describe a procedure defined by a sequence of requests for which the greedy algorithm sends each server i to I i . The probability that this procedure will be followed is at least c 4k 12k 2 −9k 2δc k k .
Procedure P: 1.1. Send server 0 to I 0 1.2. Send each server i to I i We describe and analyze each step separately.

Proof:
We define the following potential function: with 1 A being the indicator function of event A. We will show the following facts: Fact 3. At each move, φ(p) either becomes 0 with probability at least (2δc)/k and step 1.1 ends, or decreases by at least 1/4 with probability at least c/4k.
(a) Move of type 1. The interval I 0 is entirely contained in the section of the circle served by server 0 according to the greedy algorithm, so server 0 will move inside I 0 with probability at least (2δc)/k.
Move of type 2. The gapd ℓ is the maximum gap between two consecutive servers, sod ℓ ≥ 1. Thus, there is probability at least (d ℓ c)/4k ≥ c/4k that a request will fall on the highlighted area A(p ℓ +d ℓ /4, p ℓ +d ℓ /2), decreasing the potential φ by at leastd ℓ /4 ≥ 1/4. In order to show fact 3, we consider the two types of moves of the procedure described earlier. In the case of a type-1 move, the probability to be executed is (2δc)/k and step 1.1 terminates. For a type-2 move we consider the following two cases: • ℓ = 0: In this case, notice that the total length of the circle is k and the total number of gaps is k. Therefore the length of the largest gapd ℓ is at least 1. Hence we have that the probability of the move is at least c/4k, and the distanced(p ℓ , p 0 ) decreases by at least 1/4 (with the rest of the distances remaining the same), resulting to a decrease of φ(p) by at least 1/4.
• ℓ = 0: Again using the previous reasoning, the move is performed with probability at least c/4k. If server 0 moves into I 0 , φ(p) becomes zero (and notice that the probability c/4k ≥ (2δc)/k for δ ≤ 1/8) and step 1.1 ends. Otherwise, assume that server 0 moves by t ≥ 1/4. Thend(p 0 , 0) decreases by t and every other distanced(p i , p 0 ) increases by t. So the value of φ(p) decreases by: From the above facts we deduce that the total number of moves that will be performed cannot be more than and they all have probability at least c/4k except possibly for the last one, which has probability at least (2δc)/k. Hence the probability of the first server moving inside I 0 is at least: Step 1.2.
To send each server i to I i , we define the following procedure: Procedure P 2 (p s , p s+1 , . . . , p t ): there exists a j, s < j < t such that p j ∈ I j then P 2 (p s , p s+1 , . . . , p j ) P 2 (p j , p j+1 , . . . , p t ) end else find a good server j {to be defined next} send server j to I j {using the procedure we describe next} P 2 (p s , p s+1 , . . . , p j ) P 2 (p j , p j+1 , . . . , p t ) end end if We can now define what we mean by a "good" server; we say server j is good if there are no other servers between server j and the corresponding interval I j , and the rest of the servers are all sufficiently far from I j . An example is shown in Figure 3. Formally, we define: A server j is good if, for every i = j, either p i ∈ A(p j , j) andd(I j , p i ) > 1 − 2δ (left type), or p i ∈ A(j, p j ) andd(p i , I j ) > 1 − 2δ (right type).

Proof:
The proof is by induction on the distance t − s. The base cases t − s = 2 and 3 are easy to verify by enumerating all possible position situations of the server(s) between s and t.
Assume now that the lemma holds for all r < t − s. We will show that it also holds for t − s. We consider the following three cases: Fig. 3: An example of a good server of left type. There are no servers between server j (at position pj) and interval Ij, and the distance (to the right) of all the other servers from Ij is higher than 1 − 2δ.
Similarly to the first case, we have that server t − 1 is a good server (of left type) and we select j = t − 1.

2
Having proven Lemma 3, we can now describe the procedure that sends the good server j to I j : if j is a left-type good server then loop if p j ∈ I j then end else if I j ⊂ A(p j , p j +d j /2) then send server j to I j {probability ≥ (2δc)/k} else send server j to A(p j +d j /4, p j +d j /2) {probability ≥ c/4k} end if end loop else {j is a right-type good server} loop if p j ∈ I j then end else if I j ⊂ A(p j −d j−1 /2, p j ) then send server j to I j {probability (2δc)/k} else send server j to Notice that, at each move, either j enters I j (with probability at least (2δc)/k) or moves by a distance of at leastd j /4 ≥ 1/4 (ord j−1 /4 ≥ 1/4) with probability at least c/4k and remains a good server. Since the distance between j and I j cannot be more than k, the total number of moves required to move j into I j is bounded by 4k and their probability is at least: Also note that we can execute the procedure P 2 for s = 0, t = k (≡ 0 (mod k)) in order to send all the servers to their corresponding intervals.
Proof: The procedure sending the good server j to I j will be executed at most k − 1 times. Each time requires at most 4k moves and takes place with probability at most c 4k 4k−1 · 2δc k . 2 Combining Lemmas 2 and 4 completes the proof of Lemma 1. 2 Step 2.
For any possible initial configuration of the servers and any sequence of requests complying with step 1, the k moves of step 2 will be performed. However, as mentioned above, in order to satisfy Doeblin's condition we must show that we can reach the final configuration in exactly m moves, but for step 1 we only gave and upper bound of 12k 2 − 8k for the total number of moves. Hence, additional moves may be required in order to reach exactly that bound. The additional moves that we allow, are induced by requests that are close to the intervals I i . In particular, they may fall in It is easy to see that for any sequence of requests that falls in the above set, none of the servers moves far away from its interval, which will allow them at the end to move close to their final positions. Each request takes place with probability c/2, which is higher than any single request of step 1, and consequently the lower bound on the probability that we gave at step 1, holds even when fewer than 12k 2 − 8k moves are performed during step 1 and the rest are performed during the current step.
The last k requests will send the servers to a final configuration in S = S 0 × S 1 × · · · × S k−1 . Specifically, every server i will enter S i ⊂ I i , as we argued in step 2.
Consider now the probability that the last k requests send each server i to S i . The total number of permutations for the order to send the servers is k! and for each of them the probability that it happens is, k−1 i=0 µ(S i ), so the total probability is

using (A.1).
Taking into account all the moves of the three steps and the corresponding probabilities, we have that Doeblin's condition holds for m = 12k 2 − 7k and ε = k! c 4k .