K(µ) D = ∑

In the running time analysis of the algorithm Find and versions of it appear as limiting distributions solutions of stochastic fixed points equation of the form X D = Sigma(i) AiXi o Bi + C on the space D of cadlag functions. The distribution of the D-valued process X is invariant by some random linear affine transformation of space and random time change. We show the existence of solutions in some generality via the Weighted Branching Process. Finite exponential moments are connected to stochastic fixed point of supremum type X D = sup(i) (A(i)X(i) + C-i) on the positive reals. Specifically we present a running time analysis of m-median and adapted versions of Find. The finite dimensional distributions converge in L-1 and are continuous in the cylinder coordinates. We present the optimal adapted version in the sense of low asymptotic average number of comparisons. The limit distribution of the optimal adapted version of Find is a point measure on the function [0, 1] there exists t -> 1 + mint, 1 - t.


Introduction
We consider fixed points of the operator on the set D of cadlag functions on the unit interval I = [0, 1].The rvs ((A i , B i ) i , C), X j , j ∈ N are independent and all X j rvs have the distribution µ.The rvs A i , C, X i take values in the set D and the rvs B i take values in D ↑ , mapping the unit interval to itself and piecewise increasing.They serve as a random time change, the A's as a space transformation and the C's as a displacement.
The very first example of such fixed points appeared in the analysis of Find as the limit distribution of the Find algorithm [10].Grübel and Rösler used a specific representation of the iterates K n by some rvs with nice properties and showed the convergence of theses rvs in Skorodhod metric on D to a limit, the fixed point.This approach works for the 2-version of Find, splitting into 2 sets, but not for the 3-version, splitting into 3.
The intuition suggests some convergence result also for the 3-version.In his thesis, D. Knof [13][24] used rigorously the contraction method.The contraction method [22], [17] suggests to find a nice complete metric space such that the operator K is a (strict) contraction on it.(The space D with the Skorodhod metric is not complete.)Knof [24] introduced a suitable nice complete metric space.He studied the finite dimensional distributions of the D-valued process and showed convergence of all finite dimensional distributions.Since the family of finite dimensional distributions is consistent, there exists a projective limit, i.e. a probability measure µ on the product space [0, ∞] I with the given finite dimensional distributions.Another fixed point argument, this time for a fixed point equation of max-type, showed µ is a probability measure on the product space [0, ∞) I .Next he used the contractive branching structure of the underlying Weighted Branching Process (WBP) to verify a condition given by Dubins and Hahn [6] showing D has mass 1 for the outer measure of µ.Therefore the outer measure restricted to D is the limit distribution of Find.
Studying the details and the conditions used there, gave us reasons to prove fixed point results of K by using WBP [23] and to return somehow to rvs.We consider a larger class of operators, obtain stronger results than Knof and have less technically difficulties with right-continuous versions.With the forward view [25] we obtain point wise convergence of appropriate rvs to the fixed point, with the backward view we obtain via the stochastic order convergence of the iterates of K.This interplay of the two points of views simplifies and extends the results of Knof.Using the monotonicity respectively the absolute convergence in our setting of the rvs R n we obtain under weak conditions the convergence to a limiting rv R and also continuity in mean.Another fixed point argument shows sup t |R(t) − R n (t)| → n 0 and so we were able to circumvent Knof's difficulty of showing µ is a probability measure on D.
Interesting is the appearance of a fixed point equation of max-type in all the approaches, Grübel-Rösler, Knof and this paper in order to analyze sup t R(t).A fixed point equation of max-type is of the form where ((A i , C i ) i ), X j , j ∈ N are positive real valued and independent rvs and all X j have the same distribution as X.The first appearance of a stochastic fixed point equation of max-type is in the analysis of Find [10], The positive rvs U, X 1 , X 2 are independent, U is uniformly distributed on the unit interval, X ∼ X 1 ∼ X 2 .The argument used was via the contraction method, similar to fixed point equations of sum type, using the contraction condition E i A 2 i < 1.A more general and systematic approach was carried out by Neininger and Rüschendorf [18].(See also [1] for some related work.)The basic condition is a contraction constant like E |A i | p < 1 for some p > 0. There are more fixed points, even if the contraction condition fails [12].(They show a nice connection between fixed points of max-type and sum-type.)If the weights A i , C i are all deterministic, we know explicitly the set of all solutions of fixed point of equation ( 2) [2].In this paper we use essentially a result by Rüschendorf and Schopp [26] on finite exponential moments.Their approach relies on asymptotic estimates of higher moments and the Taylor expansion of the exponential function, Theorem 8.
We come now to the applications of K-fixed points, the optimal adapted version of Find.The algorithm Find or Quickselect [11] [14] finds the l-th smallest element out of n distinct reals.The procedure is: • Choose a pivot element by some random procedure; • Find the list of strictly smaller numbers and the list of strictly larger numbers; • Recall the appropriate lists till termination.This version is called the 3-version.The 2-version of Find forms the list of strictly smaller elements than the pivot and the list of larger elements including the pivot.
The divide-and-conquer algorithm Find has a complete running time analysis for the 2-version [10] and uniform distribution of the pivot.Grübel and Rösler took appropriate versions of the running time rvs, which converged in Skorodhod topology on the space D. However this does not work with the 3-version.Moreover it is even shown there, that the standard 3-version with uniform distribution does not converge in Skorodhod topology.
But may be asking for a convergence result of random variables is not the right question.For practical purposes one is mainly interested in the distribution of the running time, for given l and n.As it turns out, we obtain a limiting distribution if the quotient l n converges to some t.It seems natural to expect some continuous behavior in t of the distribution.As explained below, only the recursive structure (4) (5) is available for the one-dimensional distribution and it determines the one-dimensional distribution.(The one-dimensional distributions are continuous in time with respect to weak convergence.)But that is not enough to construct a nice process Y = (Y t ) t∈I with the right one-dimensional distributions.
We require the stronger recursive equation (6) for the distribution of a whole process (compare [10] [7]).That recursion provides a limiting distribution of a process Y with right continuous paths.Their one-dimensional distributions do the job for the running time including the expected continuity of onedimensional distributions t → L(Y t ).
Our approach given here does exactly this.The Weighted Branching Process (WBP) is the suitable framework to construct (in a mechanic way) nice versions of the rvs in a general setting, such that the processes converge weakly as processes on D. The main workload in this approach is showing the right continuity of the limiting process Y .(Similar to showing continuous paths for the Brownian motion.)But this property is essential for the continuity of certain functionals and to work with the limiting distribution.It turns out that the one-dimensional distributions of Y are continuous in time.
For the approximation of the Find versions X n to the limit Y we use the distance for probability measures on D. The infimum is taken over all processes (X, Z) with one dimensional distributions µ, ν on some probability space.In this distance we show all finite dimensional distributions of X n converge to those of Y and all finite dimensional distributions are continuous in the cylinder coordinates.(This does not imply weak convergence on D, since we do not prove tightness or equivalently the convergence of T -distributions with T ⊂ I dense and countable.)Eickmeier and Rüschendorf [7] showed L p -convergence of the processes, averaging the time with the Lebesgue measure, to a limiting process in L p .This approach allows no conclusion on one-dimensional distributions at some fixed time t ∈ I or the convergence of finite dimensional distributions.We could put the process to 0 on all rational time points and still have the same process in L p sense.Even if we choose appropriate weaker formulations, the continuity of the one-dimensional distributions remains questionable.Intuitively, if l/n converges to some t then the asymptotic value we obtain should be independent of how l/n converges to t locally.Therefore we need smoothness of the path, e.g. the path is in D.
Our approach is also superior to the previous one by Grübel and Roesler [10], since we can deal with 3-version and give a very general scheme how to choose the right versions for more general processes.
In detail, let X n (l) be the number of comparisons in order to find the lth smallest out of n, which is proportional to the running time of the algorithm.The correct normalization turns out to be If l = l n is a sequence such that ln n converges to some t ∈ [0, 1] we want Y n ( l n ) to converge in distribution.For the 2-version the distribution of Y n ( tn n ) will converge to the distribution of the Find process Y at t in the unit interval [10] and as we will show, so does the 3-version.We intend to show convergence, point wise and as a process, for many more versions of Find.(More precisely for all reasonable versions known to us.)We include median versions [3], [9], [16], [20], pick the pivot as the median out of m random chosen elements, or adapted versions [19], choosing the pivot element as the kth smallest in the set of m, where k depends on n and l.Partial Quicksort [15] sorts the lth smallest ones and provides a nice example with non positive costs rv C.
For all these versions, the running time X n (l) in the 3-version of Find (the 2-version is analogous) satisfies the recursive equation Here U n (l) is the rank of the pivot element and U n (l) − 1 is the cardinality of the set of strictly smaller numbers than the pivot element.The rv C n (l) gives the number of comparisons to find the pivot element and then to form the two lists.A lower bound of C n (l) is n − 1 and a reasonable upper bound n − 1 + m 2 .
The terms X • 1 , X • 2 correspond to the running time for the two lists with an index for the size.The rvs 1 , X j 2 , i, j < n are assumed to be independent.The distribution of the X j 1 and X j 2 is given recursively by the distribution of X j .
With the correct normalization (3) we obtain Extend Y n appropriate to a right continuous step function.We shall show under appropriate conditions Y n converges as a process weakly to a process Y with cadlag values.Further all one-dimensional distributions converge.We obtain a fixed point equation for the distribution, since U n n converges weakly to some U .Here U, Y 1 , Y 2 are independent and Y has the same distribution as Y 1 , Y 2 with values in D.
What about solutions of (6)?The distribution of the original Find process [10] in the 2-version is the smallest positive fixed point of this distributional fixed point equation with U having a uniform distribution.This characterizes the distribution.An existence proof for the three version is given in Knof's PhD-thesis [13] in a more general setting and also in this paper.In the last section we include the convergence of the discrete 3-Find algorithm as an example.
The median m-version was treated by Grübel [9].He provided pictures of the limiting distribution function and gave more details for the average running time, which becomes better compared to original Find.It seems obvious, that in average the algorithm will perform better, if we can choose a pivot element, which leaves for the next round a smaller list (at least with very high probability).This leads to adapted versions, choosing the kth out of m random ones, such that for l < n/2 the rank of the pivot is a little bit larger than l and for l > n/2 a little bit smaller with high probability.Martinez, Panario and Viola [19] take that view for various m.They provide also asymptotically best adapted versions for the average.
We come now to asymptotically best adapted Find versions.Choose an m = m n of the order ln n and the k = k n (l)th smallest within the m picked ones, where k m is close to l n .We shall show in Theorem 4, that (Y n (t)) t converges to the deterministic process (1 + t ∧ (1 − t)) t .We can, for example, find the median out of n with asymptotically 3n 2 comparisons.This is the very best one can do within versions of Find.We need roughly n steps comparing every element to the first pivot element and the remaining appropriate list is of cardinality roughly n 2 .For that list we need at least n 2 comparisons.This sums up to at least 3n 2 comparisons.And asymptotically that is enough.The point is, that from the remaining list we have to find an element with a given very small rank or a very large rank.A recall of the adapted algorithms leaves us with a very small remaining list, which is asymptotically negligible.
The given argument for the median holds for every l out of n.First we need n comparisons for the reduction and then in the best case remains a list of cardinality l ∧ (n − l) to be searched for an element of very small rank or very large rank.We need at least as many comparisons as the cardinality of the set.This sums up to asymptotically n + l ∧ (n − l) comparisons plus terms of smaller order than n.
Mathematically we show weak convergence of Y n to the deterministic function t → 1 + min{t, 1 − t}.From this and the included result EY n (t) → n EY (t) we obtain well known results for the average number of comparisons of the order n + l ∧ (1 − l) + o(n) ( [19] and literature cited there).With arbitrary high probability is the distribution of Y n (t) arbitrary close to that of Y (t) for fixed t and sufficiently large n.Notice, that this is still an average argument and a weaker statement then the worst case behavior of Y n of the algorithm.For worst case considerations, the algorithm PICK, developed by Blum, Floyd, Pratt, Rivest and Tarjan [21][8], shows an upper bound of 5.4305...n comparisons.

Weighted Branching Process
We describe first a weighted branching tree.The underlying graph is the tree V = ∪ ∞ n=0 N n of all finite sequences of natural numbers (including the empty sequence) with the directed edges (v, vi), v ∈ V, i ∈ N. Let (G, * ) be a measurable semi group with a grave ( * g = = g * for all g ∈ G,) and a neutral element g 0 (g 0 * g = g = g * g 0 for all g ∈ G). (To every semi group we can add in a measurable way a grave and a neutral element.)Let (H, +) be another measurable semi group with grave and neutral element and let G operate left, transitive and measurable on H ( * H : G × H → H is associative and measurable).Every edge (v, vi) carries a random length L v,vi with values in the semi group G and every vertex v a random toll (cost) C v with value in H.We call (V, L, C, (G, * , H, +)) (respectively (V, L, (G, * ))) a Weighted Branching Process (WBP) [23] (with or without cost function), if the For convenience and simplicity * H = * , * (f, g) = f * g = f g and we suppress the empty set ∅ and ω ∈ Ω, if possible.Notice, the semi group action * needs not to be commutative and the order of the terms may be important.
Define the path weight L v = L ∅,v of the path from ∅ to v (in graph theoretical sense) recursively by L ∅ the neutral element g 0 and For the length L v,vw of the (unique) path from v to vw we use the corresponding definition on the tree vV.More formally, (This definition is compatible with the previous one.If convenient we use L w v = L w,wv ).In our specific setting described in a moment we will consider the sequence The length |v| of a vertex v is the number of components.The root ∅ has length |∅| = 0.If everything is positive then R n increases point wise in n to a limit called R. This process is our main object of interest.
The distribution of the limit will be a fixed point (the smallest positive) of K (8) under suitable conditions.
Let D = D(I, R) be the space of right continuous functions on the unit interval I = [0, 1] with existing left limits.We shall also use D = D(I, R), where R denotes the reals extended by −∞, +∞ with the Alexandrov one-point compactification.We will use D + (D + ) for the positive functions in D (D) and D ↑ for the subset of all functions f : I → I in D such that for every s ∈ [0, 1) the function f is increasing on [s, t) for some t > s.
The σ-field σ(D) where λ is a bijective and (strictly) increasing (and continuous) functions from the unit interval to itself.We consider (D, σ(D)) as a subset of (R T , B T ), T ⊂ I dense in I and containing 1. Let M (E) be the set of all probability measures on E. Define the map K from the set M (D) (or (We use the convention 0 The rvs (A, B, C), X i , i ∈ N are independent and all X i rvs have the distribution µ.In more elaborate form, The symbol L denotes the distribution of a rv.The map K is assumed to be well defined.
We speak of the positive case, if all rvs (length, cost and starting distribution) are positive (in D + ), allowing the value ∞.In the positive case the operator K (8) is always well defined, since the sum of positive measurable terms is point wise well defined and measurable in [0, ∞].
We are interested in fixed points of K obtained by an iteration of K starting with the point measure δ 0 on the function 0 ∈ D identical 0 ∈ R. With an ad hoc notation the iterates of K applied to µ are Let us introduce a suitable notation for the iterates.Actually the iteration of the operator K corresponds to the backward view [25] of a WBP branching process with costs, which we will exploit now.
The semi group structure (G, * ) consists of the semi group Here • denotes the convolution and • is the point wise multiplication in D (respectively D + with the convention 0 • ∞ = 0 = ∞ • 0).The neutral element is (1, id) with id the identity.The grave corresponds to (0, 0).(More precisely (0, 0) is the right sided grave and will serve as left sided grave only after a suitable definition.) The semi group G operates transitive on the value set Notice the operation * is not commutative and the order of the terms is important.
In our special situation the commutative operation + on R (R + ) extends to R I (R + I ) point wise, the operation * is bilinear in the first coordinate on G, and the operation * is bilinear on H in the sense for all a, a 1 , a 2 , b as above and c, c and so on.
The connection to the operator K is given in general by the following lemma [25].
Lemma 1 If the terms are well defined then the distribution of R n is the n-iterate K n (δ 0 ).The sequence (R n ) n satisfies the backward recursion [25] Proof: The sum R n+1 is well defined in the positive case and satisfies We prove the distributional result on R n by an induction on n.The induction start n = 0 is true since R 0 ≡ 0 has the distribution δ 0 .For the induction step n to n + 1 argue by the backward recursion, notice The distribution of the R i n is K n (δ 0 ) by the induction hypothesis. 2 3 The limit and continuity mean.
In the positive case the sequence K n (δ 0 ) increases in stochastic order.(The stochastic order on probability measures R (D, D) are defined by µ s ν ⇔ f dµ ≤ f dν for all positive (coordinate wise) increasing functions f .)Then by general representation theory, there should be an increasing sequence of rvs to the distributions K n (δ 0 ).The clue is, our sequence (R n ) n does the job.The limit of the R n , which exists, should be a fixed point of K by the monotonicity of the operator.
We denote be f the supremum norm of a function on [0, 1] and A := i |A i |.We use R v n for the R n rv on the tree vV.
Lemma 2 In the positive setting the rvs where R i is the point wise limit of the R i n .The distribution of R is the smallest fixed point of K in stochastic order, provided R has paths in D .
In the general setting assume EA < 1, EC < ∞.Then sup n∈N E|R n | < ∞ and E|R − R n | converges to 0.
Proof: By definition R n (ω)(t) is increasing in n for every realization ω ∈ Ω and every t in the unit interval I. Therefore the limit R(ω)(t) exists, but might be infinite somewhere.We obtain the equation (10) by the backward view equation (9) going to the limit.Therefore R satisfies the fixed point equation (10) and, provided is D + -valued, is a fixed point of K.
Let ν be another fixed point of K. Then δ 0 s ν, K n (δ 0 ) s K n (ν) = ν for all n ∈ N and in the limit Now assume the additional assumptions but for the positive case.The R n : Ω×I → R + are measurable maps with respect to the product σ field.(Notice the rvs A i , B i , C are all measurable with respect to the product σ-field.)Therefore E(R n ) : Ω × I → R + is measurable and also ER n .We will show by induction for n ∈ N.This implies the recursion, use conditional independence, This ends the induction.
An easy consequence is sup n∈N ER n+1 is finite by iteration.This implies R n (t), n ∈ N 0 , t ∈ I is a uniformly integrable family.Therefore the limits R(t) can be added to the family and sup t ER(t) is finite.
Next argue Now to the general case.Consider Ãi : All rvs are positive now and the Lemma provides finiteness of R and for the first moments of it.The argument is easy observing |R n+1 − R n | ≤ Rn+1 − Rn point wise for all ω ∈ Ω.For example, the sum ER = n (ER n+1 − ER n ) converges absolute.We skip the details. 2 We turn to show continuity in mean of the limit R.
A process X = (X t ) t∈I is called continuous in mean, if the map Then the rv R is continuous in mean.
Proof: Under the above assumptions R n , n ∈ N is well defined and all sums converge absolutely a.e..For s, t ∈ I define a(s, t) We obtain by the backward view (9) for R e(s, t) for all y.The function x → ẽ(x) is increasing.Let ẽ(0+) be the right limit of ẽ at 0. Using the assumptions we obtain for fixed y and then let x tend to 0 and afterwards y → 0 This proves ẽ(0+) = 0 = ẽ(0) and consequently continuity of e and mean continuity of R. 2 As a consequence, under the above assumptions the maps t → ER(t) and t → E|R(t)| are continuous.Further the map from the unit interval to the distributions t → L(R(t)) is continuous in the Wasserstein metric l 1 and in weak topology.The Wasserstein (Mallows) l p -metric p ≥ 1 [4] on distributions on R is given by l p (µ, ν) = inf E X − Y p using the L p -norm.The infimum is taken over all rvs X, Y with distribution µ respectively ν on some probability space.The infimum is attained for the rvs F −1 (U ), G −1 (U ) with U uniformly distributed on I, F and G the distribution functions to µ and ν. (Especially for l 1 , l 1 (µ, ν) = |F (x) − G(x)|dx.)l p -convergence is equivalent to convergence in distribution and convergence of the pth moment.

Values in D
Our next aim and the main task is to show R is a D-valued (D + ) rv.We show the result under the additional Condition F: A i (t)A j (t) = 0 for all t ∈ I and i = j almost everywhere.

Theorem 4 Assume condition F and E
Then R is uniformly p-integrable and R takes values in D. R is a fixed point of K. R n converges a.e.exponentially fast to R in supremum norm and In the positive case the distribution of R is the smallest fixed point of K restricted to M (D + ) in stochastic order.
Proof: Let us start with the positive case.The R n : Ω × I → R + are measurable functions with respect to the product σ-field.(Notice the rvs A i , B i , C are all measurable with respect to the product σ-field on Let the map L : M (R + ) → M (R + ) be defined by where ((A i ) i , C), X j , j ∈ N are independent and all X j have the distribution µ.L is well defined, since everything is positive.With Theorem 4.1 [17] we obtain with the contraction method a fixed point µ ∈ M (R + ) of L. This fixed point has a finite pth moment.The solution is obtained by iteration ν n := L n (δ 0 ) → n µ and it is therefore the smallest fixed point in stochastic order.
The sequence ν n is increasing in stochastic order.Since equation ( 13) implies the distribution of R n is dominated by L(R n−1 ) in stochastic order, we obtain by induction R n is bounded in stochastic order by ν n for all n.This implies R n is bounded and the limit R is bounded in stochastic order by µ and is square integrable.In turn, the rvs R n (t) and the limit R(t) are finite and p-integrable uniformly in t.
Now consider a n = R − R n p and obtain, using conditional independence Since R − R n decreases and converges to 0 in L p we conclude it decreases point wise to 0 a.e..By the Markov inequality we obtain Using Borel-Cantelli we obtain exponential fast convergence of R n → R in supremum norm for ln c < λ.
Since R n has values in D + and converges uniformly to R we obtain the limit R has values in D + a.e.. Consequently its distribution is a fixed point of K.It is the smallest one by the monotonicity of the operator K, K n (δ 0 ) increases in stochastic order in n.Notice, under condition F the statement Corollary 5 Under the assumptions of the Theorem 3 and Theorem 4 holds P (R(t−) = R(t)) = 0 for all t ∈ I.

Proof:
The Markov inequality provides The process (R(t−)) t with left continuous paths and existing right limits has the same finite dimensional distributions as (R(t)) t .
5 Exponential moments of the fixed point.
We are interested in exponential moments of R n (t) and R(t).Lemma 6 Assume A ≤ 1 a.e. and for some λ > 0 exists a constant K < ∞ such that Ee a|C(t)|+aK(A(t)−1) ≤ 1 for all 0 ≤ a ≤ λ and t ∈ I. Then Ee a|Rn(t)| ≤ e aK for all 0 ≤ a ≤ λ, t ∈ I and n ∈ N.
Proof: The proof runs by induction.The case n = 1 is easy since R 1 ≤ e aK Theorem 7 Assume A < 1 a.e. and sup t Ee λ|C(t)| < ∞ for some λ > 0 and assume R(t) is finite a.e. for all t ∈ I. Then e aRn(t) converges to e aR(t) ∈ L 1 in L 1 -norm for all 0 ≤ a < λ and t ∈ I.
The uniform integrability as stated above implies Ee aRn(t) → n→∞ Ee aR(t) < ∞ for all 0 ≤ a ≤ a 1 and t ∈ I.The convergence of the moments implies L 1 -convergence.Since a 1 can be arbitrary but strictly smaller than λ we are done.2

Exponential moments of the supremum
Now we consider R and explain first the structure of the proof.If R satisfies the stochastic fixed point equation Our following results will rely on this estimate and a theorem [26] concerning stochastic fixed points of max-type.Define the operator L : M (R + ) → M (R + ) of max-type by where (Y i , Z i ) i , X j , j ∈ N are independent and all X j have distribution µ.The rvs Y i , Z i take values in R + .The fixed point equation is called stochastic fixed point equation of max-type.
For q > 1 define For a, p > 0 let b = b(a, p) = (pea) −1/p .Let M q (R + ) be the set of probability measures on R + with finite q moment.
Theorem 8 (Rueschendorf-Schopp) Assume E i Y q i < 1 for all q ≥ q 0 and for some a, p > 0 lim sup Then L has a unique fixed point µ = L(X) in the set M q0 (R + ), X q ≤ f (q) and lim sup

Remark
The condition E i Y q i < 1 for all q ≥ q 0 implies i Y i ≤ 1 a.e.. Theorem 9 Assume condition F and the conditions of Theorem 8 using Y i = A i and Z i = C. Then R is dominated in stochastic order by the smallest fixed point of L and  Proof: The operator L is well defined.L is a monotone operator and therefore L n (δ 0 ) increases to the smallest fixed point ν of L. That is the unique fixed point in M q0 (R + ). Then By induction we conclude the distribution of R n is dominated in stochastic order by L n (δ 0 ) which in turn is dominated by ν = L(ν).In the positive case R n is increasing to R and R n is increasing to some rv, which has to be R.This provides the existence and finiteness of R and R. (The argument in the general case follows the same way, since the telescope sum R = (R n − R n−1 ) converges absolutely and uniformly point wise.) The limit R of the R n is dominated in stochastic order by ν.We obtain the statements by Theorem 8 and the remark after. 2 Example: Find The maximum of Find satisfies the fixed point equation where U is uniform on I. Then U q = (q + 1) 1/q and using some asymptotic for We can choose p = 1 and any a > 0. It follows for x sufficiently large.All exponential moments of R are finite.

Discrete approximations
We consider recursive equations in distribution of the form This gives raise to operators K n on distributions on D or D + defined by for n ∈ N.Here ( Consider the sequence µ n , n ∈ N recursively given by (We do not need (without loss of generality) starting distributions up to n 0 and start the induction afterwards, since we put the noisy part into C.) Our aim is to show under suitable conditions µ n converges to L(R).As the example of 3-Find indicates, we can not do this on the level of rvs.So on the level of distributions we shall show convergence of all finite dimensional distributions, using the Mallows l 1 −distance.This is however done by choosing appropriate rvs which will converge in L 1 -sense and therefore also in probability.A general way to pick appropriate rvs is via WBP.Let (( The set H is N × D (respectively N × D + ) with the transitive operation Define the edge length L v,vi of the edge (v, vi) by (I v , A v i , B v i ) and the vertex weight C v of the vertex v by C v .Let L w v be the recursively defined path length, where we use L v ∅ = (id, 1, id) with id the corresponding identity as the neutral element.Define as before Then, supposing µ n is well defined, R m (n, •) converges for fixed n to some limiting rv R(n, •) with distribution µ n .
Notice, that we use partly the same symbols for the new weighted branching process.The old one of the previous section is contained as ) with an additional index ∞.We shall also use R(∞, •) for the old R(•).If there is no confusion, we will drop the ∞, like in A i (t), B i (t), C(t) and so on.Now to the basic conditions on the rvs.
Assumption Approx: By our assumptions the first term and the third converge to 0 as n → ∞.The fourth, using the e-notation of Theorem 3, We obtain together • The sequence (d n ) n is bounded.
Here we take m = 0.The above inequality has the form From this it is easy by some epsilontic to show (d n ) n is bounded.
Since d ∞ is a positive real and EA < 1 we conclude d ∞ = 0.
As a consequence, the family E|R(n, t)|, n ∈ N, t ∈ I is uniformly integrable.
We come now to finite dimensional distributions.Since converges to 0 for all finite subsets S of I, we obtain convergence of the measure µ n to µ in terms of finite dimensional distributions.Since both measure µ n and µ are on D (D + ) the sequence µ n converges weakly to µ. 2 For an application of Theorem 10 we have to find good version ((A i (n, •), B i (n, •), As explained in the introduction we will now consider processes which arise in the study of versions of the Find algorithm.Since we have a discrete recursive structure, in the mathematical sense (3), (5), we consider sequences Y n ( l n ), l = 0, . . ., n of rvs satisfying Y 0 ≡ 0 and ∀n ∈ N where N is a natural number or infinity.
(This procedure is compatible, i.e. does not change Y n ( l n ).)A more intuitive and explicit approach would be, to require the equation ( 16) as given only for l = 0, 1, . . ., n − 1, change the values of B n i ( l n ) to have values in 1 We shall also consider the process equation The process equation (19) gives a recursion involving also the marginal distributions.The distributional equations (18) (and also the Find algorithm) makes no statement on the joint distributions.Notice that therefore the process equation ( 19) is stronger than (18).It is an extension (or specific version) of the Find recursion.
For exponential moments we follow the same way as for R n and R using condition F. The task broils down to recursive equations of max-type.All necessary tools and results for those are in the paper by Rüschendorf and Schopp [26].

Standard Find
We will now consider versions of Find with one pivot element.(It would suffice to consider a binary tree.We take L i as the grave for i ≥ 3 and stay on V.) For simplicity we concentrate on one version of Find, the 3-Find.The analysis of the 2-Find is similar.(Appropriately chosen realizations for the 2-Find process will converge in the Skorodhod topology [10].This is not true for 3-Find.)We are in the positive case.
As described in the introduction, the Find algorithm consists of two main steps: -The choice of the pivot element; -The splitting into two subsets.
Let C n,1 (l) be the number of comparisons needed for choosing the pivot element and let C n,2 (l) be the number of comparisons needed for the splitting.Let I n (l) be the rank of the pivot element in the considered list.
Then the recursive formula for the number X n (l) of comparisons in order to find the lth out of n is where C n (l) = C n,1 (l) + C n,2 (l).(In a rigorous proof, one has to show first by induction, that finding the lth smallest out of a list is a rv with a distribution depending only on l and on the size of the list, but not on the actually given input (=list).We skip this easy task of induction.)As normalization we choose l) would do the job, see the discussion at equation (18).)We obtain .

Corollary 11
The above version of the Find process Y n converges weakly to a limit Y.All finite-dimensional distributions converge and are uniformly integrable.The distribution of the limit Y is the fixed point of the operator K as given above.
Proof: We have to show the assumptions of Theorem 10.The costs C are easily controlled, We give now only the estimates involving the A n 1 part.The argument for the A 2 part follows the same line.(Notice the symmetry of the uniform distribution around 1/2 and for the algorithm, the lth smallest is the (n − l + 1)th largest.) We use the notation as above.Define l n (t) = tn + 1. Estimate for some > 0 For the third condition argue for > 0 and sufficiently large n The last condition lim x→∞ lim sup n sup t E i A n i (t)1 I n i (t)<x = 0 of Theorem 10 is obviously satisfied and Theorem 10 provides the statement. 2 The last corollary covers also the standard 3-version.Notice, the choice of I n depends not on l.

Optimal adapted Find
Adapted versions of the Find algorithm take a smarter approach for selecting the pivot element.This time the choice of I n (l) will depend on l.Choose a sequence m n → ∞ of natural numbers and choose the pivot element as the k n (l) smallest element of m n random elements chosen with uniform distribution.For simplicity only we shall take m n is odd and draw the m n elements without replacement.(Asymptotically other settings are equivalent and can be treated following our setting.) We give now one adapted, asymptotically optimal version in the sense of average comparisons.Choose m n odd of the order ln n by |m n −ln n| ≤ 1.With any reasonable algorithms we can obtain the bounds n− 1 ≤ C n ≤ n + (m n ) 2 .Let U i , i ∈ N be independent rvs with a uniform distribution on the unit interval.Choose n (l), n ∈ N, 1 ≤ l ≤ n, satisfying lim n→∞ sup 1≤l≤n n (l) = 0 and lim n→∞ sup l 1 mn( n (l)) 2 = 0 such that k n (l) defined by 0 < k n (l) mn+1 − l n =: n (l) for l ≤ n 2 and 0 ≤ l n − k n (l) mn+1 =: n (l) for l > n 2 , are natural numbers.Such values exists.
Let I n (l) be the rank of U = U k n (l):mn in the sequence U 1 , U 2 , . . ., U n .(This is the rank of the pivot element in the considered list.)The coefficients are as in (8) and the limiting fixed point equation ( 8) will have the following parameters: We give now only the estimates involving the A n 1 part and l ≤ n 2 .The argument for the A n 2 part follows the same line.(Notice again the symmetry of the distribution of the uniform distribution around 1/2 and also the symmetry of the algorithm around n/2.) We will use EI n (l) = mn+1 Define l n (t) = tn + 1 and notice l n (t) ≤ n 2 when t < (Analogous D.) The σ-field B T is the product σ-field of the Borel σ-fields B on the reals.The induced σ-field on D generated by B T is σ(D), (Billingsley [5], Theorem 14.5).The σ-fields on D + and D ↑ are the induced ones.Let A = (A i ) i∈N , B = (B i ) i∈N , C be random variables.The A i , C take values in the set D (respectively D + ), the rvs B i values in D ↑ .The distribution of (A, B, C) is fixed throughout the paper.
(a, b) * c = ac • b with common multiplication on D. The grave acts as * c = 0.
and the vertex weight C v = C v .Notice an element (a, b) of G corresponds to a map and acts like a change of time (b) and a transformation of space (a).Define the rvs R n := |v|<n L v * C v n ∈ N 0 as path weighted cost up to n − 1th generation.The first values of (R n ) n are, suppressing the ∅, continuous at s for all s ∈ I. Since R is right continuous the Monotone Convergence Theorem (E|R|) implies right continuity of the function (s, t) → E|R(s) − R(t)|.Theorem 3 Assume EA < 1, EC < ∞ and the maps (s, t) → E i |A i − A i (t)| and (s, t) → E|C(s) − C(t)| are continuous.Further assume lim y→0 lim x→0 E sup |s−t|>x i e(s, t) := E|R(s) − R(t)|.Define ã(x) := sup |s−t|≤x a(s, t) and analogous c(x), ẽ(x).The symmetric functions a and c are continuous by assumption.They are uniformly continuous on the compact set I 2 and are 0 on the diagonal.This implies the continuity of the functions ã and c.
Now to the remaining case, G = D × D ↑ .Consider Ãi = |A i |, C = |C| instead of A i , C. The above theorem provides Rn → n R a.e.uniformly in sup norm.The sum R = n (R n − R n−1 ) is absolute convergent point wise since |R n − R n−1 | ≤ Rn − Rn−1 point wise (for all ω ∈ Ω) and the •-rvs are finite.Consequently R has values in D and is a square integrable fixed point of K. 2

−ax p ≤ 1 .
If p > 1 then for all t >

.
C n similar as step functions with values in D. Then equation (18) remains true point wise in t ∈ [0, 1]
is arbitrary small for y sufficiently small and then n sufficiently large.It remains the term II.Define d m,n = sup m<i<n d i and r m,n := d 0,m sup t E( i |A i (n, t)|1 Ii(n,t)≤m ).
1 are independent.We shall extend Y n suitable to a map with values in D. Extend first the A n i , B n i , C n appropriate (as step functions or by linear extension) to values D. Then define Y n recursively point wise in t ∈ [0, 1] by D t< 1 2 t A 2 (t) ≡ 1 t≥ 1 2 (1 − t) B 1 (t) ≡ 1 B 2 (t) ≡ 0Corollary 12 Corollary 11 is true for this adapted Find version.The limit Y is a degenerate process and takes the deterministic function t → 1 + min{t, 1 − t} as value.Proof: Apply Theorem 10.We verify the assumptions of Theorem 10.