Statistical properties of General Markov dynamical sources : applications to information theory

In (V), statistical properties of words generated by dynamical source s ar studied. This is done using generalized Ruelle operators. The aim of this article is to generalize sources for which th e results hold. First, we avoid the use of Grotendieck theory and Fredholm determinants, this allows dynamical sou rces that cannot be extended to a complex disk or that are not analytic. Second, we consider general Markov so urces: the language generated by the source over an alphabet M is not necessarilyM ∗.


Introduction
Statistical properties of words describe the asymptotic behavior (or laws) of parameters such as "most probable prefixes," "coincidence probability" etc.These analyses have many applications in analysis of algorithms, pattern matching, study of tries, optimization of algorithms... Of course, statistical properties of words heavily depend on the way the words are produced.
In information theory contexts, a source is a mechanism which emits symbols from an alphabet M (finite or infinite countable) to produce (infinite) words.The two "classical" simpler models are memoryless sources where each symbol is emitted independently of the previous ones and Markov chains where the probability for a symbol to be emitted depends on a bounded part of the past.Sources encountered in practical situations are usually complex mechanisms, and one needs general models to study the statistical properties of emitted words (e.g. the distribution of the prefixes of the same fixed length) and the parameters of the sources (e.g.entropy).In (V1), B. Vallée introduces a model of probabilistic dynamical source which is based upon dynamical systems theory.It covers classical sources models (that is memoryless, some Markov chains) and some other processes with unbounded dependency on past history.A probabilistic dynamical source consists of two parts: a dynamical system on the unit interval [0, 1] representing the mechanism which produces words and a probability measure.More precisely, a dynamical source is defined by: Let f be a probability density on I. Words on the alphabet M are produced in the following way: first, x ∈ I is chosen at random with respect to the probability of density f , second, the infinite word M(x) = (σ(x), σ(T x), The main tool in the analysis of such sources is a "generating operator," the generalized Ruelle operator depending on a complex parameter s and acting on a suitable Banach space.To derive results about the source, this operator must have a simple dominant eigenvalue λ(s) defined for s in a neighborhood of the real axis.Thus some additional hypotheses on the mapping T are needed.For example, in the context of (V1), branches T |I m need to be real analytic with a holomorphic extension to a complex neighborhood of [0, 1], complete (i.e.T (I m ) = I) and they need to satisfy a bounded distortion property (see (C,M,V)).
Such sources produce the set M * of all the words on the alphabet M .The analyticity of T allows to use the powerful Grothendieck theory and Fredholm theory on operators on spaces of holomorphic functions.The aim of this work is to prove that the hypothesis of analyticity and completeness may be relaxed.We extend the results of (V1) to a larger class of P-Markov sources (see Definition 1).Our class contain various classes of examples of interest such as Markov sources on a finite alphabet, Markov sources with finitely many images or Markov sources with large images (see Section 2.1 and Figure 1).
The dominant eigenvalue function s → λ(s) is involved in all the results of the paper.First of all, Secondly, statistical properties of word emitted by the source depend on λ(s): • the number B(x) of finite words whose probability is at least x, satisfies B(x) ≃ 1 λ ′ (1)x if the source is not conjugate to some source with affine branches.
• let ℓ k (x) be the probability of words having the same prefix of length k as x.This random variable follows asymptotically a log-normal law provided that the function s → log λ(s) is not affine.
• the random variable C(x, y) which is the length of the longest common prefix of the two words associated to x, y ∈ [0, 1] follow asymptotically a geometric law with ratio equal to λ(2) if the x and y are drawn independently.
These results, proven by B. Vallée for holomorphic dynamical sources, remain valid in our setting and are explicitly stated in the following main theorem.Before stating the main theorem, let us recall that two dynamical systems T , T : I → I are conjugate if there exists an homeomorphism g of I such that Roughly speaking, from a measurable dynamical point of view, if g is piecewise C 1 the systems are the same.
Theorem Consider a P-Markov source and f a density of probability, which is bounded, Lipschitz on each I m with uniformly bounded Lipschitz constant.Then there exists an analytic function s → λ(s) on a complex neighborhood of R (s) ≥ 1 such that: • Either there exist α > 1 and a sequence of integers (k m ) m∈M such that the map T is conjugated to a piecewise affine map with slopes α k m on I m , with the conjugacy C 1+Lip on each I m .In that case, there exists A, B such that x .
• The variable C follows asymptotically a geometric law with ratio equal to λ(2) if the x and y are drawn independently.
As an immediate corollary we can give an answer to Conjecture 2 of (V1).
Corollary Exceptional sources are those for which there exist α > 1 and a sequence of integers (k m ) m∈M such that the map T is conjugated to a piecewise affine map (not necessarily complete) with slopes α k m on I m , with the conjugacy C 1+Lip on each I m .
As a consequence of the proof of main theorem, we solve Conjecture 1 of (V1) (see Remark 4).
Let us quickly present the strategy underlying the proof of the previous theorem.Important objects involved in the analysis of the sources are fundamental intervals: given a prefix h of length k ∈ N, the set of words starting with this prefix is an interval in [0, 1], the fundamental interval associated to h.Its measure (with respect to the probability density f ) is denoted by u h .It is not difficult to prove that all the studied quantities can be expressed in terms of the Dirichlet series of the fundamental measures: where L k is the set of prefixes of length k (lemma 2.1).For P-Markov sources, these series define holomorphic functions of the variable s which admit a meromorphic extension to a half plane.Next we prove that these series can be expressed in terms of the generalized Ruelle operator.A careful study of spectral properties of Ruelle operators is then used to describe the singularities of Dirichlet series.Finally, parameters of the source are derived by mean of "classical" techniques: Tauberian theorem and Mellin transforms.This last part being exactly the same as in (V1  analysis of trie (or digital tree) structure.Tries are tree data structures widely used in order to implement a search in a dictionary.They are constructed from a finite set X = {w 1 , • • • , w n } of words independently generated by a source.The nodes of the trie are used to manage the search in the dictionary, and each leaf contains a single word of the dictionary.Formally, given a finite alphabet M = {a 1 , . . ., a r }, the trie associated to X is defined recursively by trie(X) = trie(X \ a 1 ), . . ., trie(X \ a r ) , where X \ a i is the subset of X consisting of words which begin with a i with their first symbol a i removed.The recursion is halted as soon as X contains less than 2 elements (see figure 2).We are concerned with the standard parameters of trees: for example, size, path length, height.The structure of tries have been intensively studied in the setting of independent sources (see (Sz) for example).The analysis of trie structures has been done recently in the setting of complete holomorphic sources by J. Clément, P. Flajolet and B. Vallée (C,F,V), (C) : roughly speaking, the expected values of size, path-length and height of tries can be expressed in terms of fundamental measures of the source and of Dirichlet series of fundamental measures.Thus the asymptotic behavior of these parameters is deduced from the spectral properties of some generalized Ruelle operators related to the source: some of these operators are defined over Banach spaces of functions of 4 variables.The definitions and spectral properties of these operators immediately extend to our setting.
Theorem Let S be a P-Markov source.Denote by S [n] , P [n] , H [n] the size, the path-length and the height of a trie constructed over n independently drawn words of S. The asymptotic expected value (when n → ∞) of these parameters is given by E[S [n] ] ≃ n h(S ) E[P [n] ] ≃ n log n h(S ) E[H [n] ] ≃ log n 2| log c(S )| where h(S ) is the entropy of the source and c(S ) is the coincidence probability of S.
The paper is organized as follows.In section 2, we give precise definitions and statement of results.
In section 3, we analyze the parameters of the source assuming some spectral properties of generalized Ruelle operators associated to our sources.In section 2.1 we consider some general classes of systems that satisfy our hypothesis and give some specific examples (in particular we exhibit a source that satisfy our hypothesis but that does not admit a complex extension).Finally, section 4 contains the proof of the spectral properties.Acknowledgments: We are grateful to B. Vallée, P. Flajolet and J. Clément for interesting us in the theory of dynamical sources and for fruitful discussions.Many of these discussions were made possible thanks to a partial financial support of ALEA project.

Dynamical sources, intrinsic parameters and transfer operators
The following definition of dynamical sources extend B. Vallée's one.We try to give the minimal conditions ensuring that the generalized Ruelle operator associated to such a source is quasi compact on a "natural" Banach space.We call these sources P-Markov dynamical sources (for positive Markov dynamical sources).
Definition 1 A dynamical P-Markov source is defined by the four following elements : Remark 1 (see the definition of operators G s in section 2.2) The first part of condition ( d2) is sufficient to have that the sum defining G s Id converges uniformly.Because the source is not necessarily complete, it does not imply the second part of condition (d2).
Condition (d5) is a bit stronger than (d2), it implies that for all m ∈ M , there exists N ∈ N such that: Remark 2 If the alphabet M is infinite then Condition (d2) is equivalent to: If the alphabet is finite then Condition (d2) is always satisfied.
A Markov source A non Markov source Such a source produces words on the alphabet M , to each x ∈ I we associate the infinite word We denote by L k the subset of M k of prefixes of length k that may be produced by the dynamical source.Remark that in our setting, L k may be a strict subset of M k .For example in Figure 3, the word bc does not belong to L 2 .In the following, each element of L k will be identified with an inverse branch of Remark that P-Markov sources are a generalization of memoryless and classical Markov sources.Indeed, if the inverse branches h m are affine (or equivalently if h ′ m is constant) and complete (i.e.J m = I) then the symbols emitted σ(x), σ(T x) ... are independent (i.e. the source is memoryless).If the inverse branches are affine but not complete then the symbols emitted σ(x), σ(T x) ... form a Markov chain (see Figure 4).We are now in position to express the positivity condition (d5).
Condition 1 For all m ∈ M , for all s > γ, there exists N ∈ N such that This condition is related to the aperiodicity condition of classical Markov chains.Indeed in the context of Markov (infinite) chains on an alphabet M , let P be the (infinite) transition matrix.Then for s = 1, Condition (2.2) is equivalent to the following: For all m ∈ M , there exists N ∈ N such that the infimum of the coefficients of the mth column of the matrix P N is strictly positive.If the alphabet M is finite this is equivalent to: there exists N ∈ N such that all the coefficients of the matrix P N are strictly positive (i.e. the Markov chain is aperiodic).This point of view is developed in section 2.1.1 below.
Let us give some examples of sources satisfying our hypothesis.

Examples of P-Markov sources.
It is straightforward that complete holomorphic sources with bounded distortion ((V1), (C,M,V)) are P-Markov dynamical sources.

Some examples.
Let us give some large classes of sources satisfying our hypothesis.The simplest class is given by finite aperiodic Markov maps.Let us recall that a Markov map (i.e. a dynamical system satisfying (d4) is strongly aperiodic if there exists M ∈ N such that for any i, j ∈ M , for any n ≥ M, The strong aperiodicity condition is natural in the context of Markov maps (in some sense it means that the systems is not decomposable).It may be rewritten in terms of inverse branches as: there exists M ∈ N such that for n ≥ M, for any i, j ∈ M , there exists h ∈ L n with I i ⊂ J h and h(J h ) ⊂ I j .Let us show that it suffices to ensure (d5) if the alphabet is finite, if the number of images is finite or if the system has large branches.
Example 1 If M is finite and the system is strongly aperiodic then it defines a P-Markov.
Indeed, the only point to verify is (d5).The aperiodicity condition implies that for all n ≥ M, all x ∈ I and m ∈ M , there exists h ∈ L n with x ∈ J h and I h ⊂ I m .Thus we have: Remark that Markov chains on a finite alphabet may always be obtained from an affine dynamical source.Thus, aperiodic Markov chains are P-Markov sources.
Example 2 If the set {J m / m ∈ M } is finite and the system is strongly aperiodic then it defines a P-Markov source provided ( d2) and ( d3) are satisfied.Indeed, let J i 1 , . .., J i k be the images of the system.The strong aperiodicity condition implies that for all n ≥ M, all m ∈ M and all j = 1, . . ., k, there exits h i j ∈ L n such that h i j (J i j ) ⊂ I m .Now, .
We would say that a source has large images if Example 3 If the source has large images and is strongly aperiodic then it defines a P-Markov source provided ( d2) and ( d3) are satisfied.
It suffices to remark that if the source has large branches and is strongly aperiodic then there exists finitely many J m whose union is I. Then the same argument has above shows that (d5) is satisfied.

A P-Markov source with small branches.
For This source is represented in Figure 5. From now on, we have emphasized that our hypothesis allow various geometric behavior of the branches, let us now give an example showing that relaxing the holomorphic extension hypothesis of (V1) is a substantial gain.

A P-Markov source with no extension on a complex neighborhood.
Consider the source whose alphabet is N * and inverse branches are given by where and C n is a constant defined by For all n ∈ N, the branch Hence this source does not satisfy condition (d2) in (V1): there is no complex neighborhood of [0, 1] on which all the h ′ n extend to a non vanishing function.Note that for any n, n and for n sufficiently large, It follows that there exists γ < 1 such that the series From previous inequalities, it results that for any x, y ∈ [0, 1], so that the source is a P-Markov dynamical source.

Intrinsic parameters and transfer operators.
Recall that a function f on a metric space X is Lipschitz if there exists L ≥ 0 such that for all x, y ∈ X, The smallest constant L satisfying this property is called the Lipschitz constant of f .The following definition introduces the notion of fundamental measures and the main parameters of the source ((V1)).

Definition 2 Fundamental measures and parameters of the source
Let f > 0 be a bounded, Lipschitz on each I m with bounded Lipschitz constants, probability density on I and F its associated distribution function.The fundamental measures are: Let B(x) be the number of fundamental intervals whose measure is at least equal to x.
For k ∈ N * , ℓ k is the random variable defined by Finally, C is the random variable on I × I, defined by The Dirichlet series of fundamental measures are: Lemma 2.1 (V1) The parameters of the source may be expressed in terms of Dirichlet series of fundamental measures: In (V1), the asymptotic behavior of Dirichlet series is obtained from spectral properties of generalized Ruelle operators associated to some analytic sources satisfying L k = M k for all k.In this paper, we prove that generalized Ruelle operators associated to P-Markov sources have the same dominant spectral properties.We relate Dirichlet series to these operators in our setting.So the analysis on the parameters of the source remain valid.Generalized Ruelle operators G s involve secants of inverse branches and are defined by We are going to prove that these operators are quasi compact with unique and simple dominant eigenvalue λ(s) that coincide with the dominant eigenvalue of the "classical" Ruelle operator: Recall that the spectrum Sp(P) of a linear operator P acting on a Banach space B is the set of complex numbers λ such that Id − λP in not invertible.Such a spectral value λ may be either an eigenvalue (i.e.Id − λP is not injective) or Id − λP is not surjective.The spectral radius R(P) is the largest modulus of an element of Sp(P).An operator P is compact if the elements of Sp(P) \ {0} are eigenvalues of finite multiplicity.An operator P is quasi-compact if there exists 0 < ε < R(P) such that the elements of Sp(P) \ B(0, ε) are eigenvalues of finite multiplicity.The smallest such ε is called essential spectral radius  Remark that condition (d2) ensures that the operator G s is well defined for R (s) > γ on bounded functions.Condition (d2) together with Taylor formula ensure that operators G s are well defined for R (s) > γ on bounded functions.Also, it is easy to see that: where H h is the secant function associated to h.In our setting, the relation between Dirichlet series and Ruelle operators is given by the following proposition.
Proof.− For any m ∈ M , we have: . Now, any h ∈ L k+1 may be uniquely written as h = h • h m for some h ∈ L k and m ∈ M .
Our main theorem extends B. Vallée results to P-Markov dynamical sources.
Theorem 2.3 Consider a dynamical P-Markov source.There exist λ(s) > 0, Φ(s) > 0 and 0 ≤ ρ(s) < 1 three analytic functions on a complex neighborhood of the half line {s ∈ R / s > γ} such that for any k ≥ 1, is the dominant eigenvalue of G s on a suitable functional space.
Λ(F, s) is analytic on R (s) > 1 and has a simple pole at s = 1.
The variable C follows asymptotically a geometric law.If λ ′′ (1) − λ ′ (1) 2 = 0 then the variable log ℓ k follows asymptotically a normal law.Moreover, λ ′′ (1) − λ ′ (1) 2 = 0 if and only if the map T is conjugated to a piecewise affine map with equal slopes, the conjugacy is C 1+Lip on each I m .
Either 1 is the only pole of Λ(F, s) on R (s) = 1, in that case or the map T is conjugated to a piecewise affine map with slopes of the form α k , α > 1, k ∈ Z, with conjugacy C 1+Lip on each I m .In that case, there exist A, B, Theorem 2.3 is derived from dominant spectral properties of generalized real Ruelle operators.We will prove that these operators admit a unique maximal eigenvalue.To this aim, we use Birkhoff cones and projective metrics (( Bi1), ( Bi2)).These techniques have been introduced in dynamical systems by P. Ferrero and B. Schmitt ((F,S)) and have been widely used by dynamicians to study Ruelle operators in many different situations.Here, we will use these techniques to prove that both operators G s and G s are quasi-compact and have a unique and simple dominant eigenvalue, for real s > γ.We will give the proofs for G s , the proofs for G s may be obtained in the same way.Even for the operators G s , our setting is not covered by previous works (see for example (Bre), (M), (Sa)).
Of course the spectral properties of the operators G s and G s depend on the space on which they act.
Because the system is not assumed to be complete (i.e.we do not assume J m = I for all m ∈ M ), the operators G s and G s do not act on continuous functions.A function f is Lipschitz continuous on I m if there exists a constant K m > 0 such that for all x, y ∈ I m , The smallest number K m such that the above is satisfied is called the Lipchitz constant of f on I m .Let It is easy to see (and will in fact follow from Lemma 3.2) that G s (resp.G s ) acts on L pw (I) (resp.L pw (J )).
Theorem 2.4 For real s > γ, the operators G s (resp.G s ) act on L pw (J ) (resp.L pw (I)), they are quasi compact and have a simple dominant eigenvalue.This dominant eigenvalue λ(s) is the same for G s and G s .The corresponding eigenvectors are strictly positive and belong to L pw (J ) (resp.L pw (I)).
Remark 3 If the source were complete (i.e.J m = I for all m) and the density function f is C 1 on I, then we could work with spaces of C 1 functions.In that case, G s acts on the space C 1 (I × I) of functions that are C 1 on I × I and G s acts on the space C 1 (I) of functions that are C 1 on I, they are quasi compact and have a simple dominant eigenvalue.This dominant eigenvalue λ(s) is the same for G s and G s .The corresponding eigenvectors are strictly positive and belong to C 1 (I × I) (resp.C 1 (I)).The only change in our proof would be in the definition of the cone in section 2.4 (see Remark 8).
We postpone the proof of Theorem 2.4 to the end of the paper (see section 4).Let us show how to use it to get Theorem 2.3.
3 Analysis of the parameters of the source

Preliminary results
The following lemma is an easy application of the derivation chain rule, (d3) and the fact that all h m , Applying the integral Taylor formula at order 1 to h, the Taylor formula at order 1 to h ′ and Lemma 3.1 gives: for all k ∈ N * , for all h The following lemma proves that the operators G s , R (s) > γ satisfy a "Doblin-Fortet" or "Lasota-Yorke" inequality.We are going to use a result by H. Hénnion ((H)) to conclude that they are quasi-compact for some complex s, R (s) > γ.We could also use it to conclude that G s are quasi-compact for real s > γ then it would remain to prove that the dominant eigenvalue is unique and simple.This can be done "by hand" but we have preferred to give a self contained argument proving in the same time the quasi compactness and the dominant spectral property (see section 4).
Lemma 3.2 For all s, R (s) = σ > γ, there exists K > 0 such that for all f ∈ L pw (J ), for all n ∈ N, Proof.− Let X = (x, x ′ ), Y = (y, y ′ ) belong to the same I m × I m .In that case, the sets {h / |h| = n and X ∈ J h × J h } and {h / |h| = n and Y ∈ J h × J h } are the same.We compute: (we have used (3.1)).This gives the result with K = σB(1 + Be σ ).
Let us state Hénnion's theorem and show that we can apply it.

Theorem 3.3 ((H)) Let (B,
• ) be a Banach space, let | • | be another norm on B and Q be an operator on (B, • ), with spectral radius R(Q).If Q satisfies: 2. for all n ∈ N, there exist positive numbers R n and r n such that r = lim inf(r n ) then Q is quasi-compact and the essential spectral radius is less than r.
We will use this theorem with B = L pw (J ) and |•| the sup norm.According to Lemma 3.2, in order to apply Theorem 3.3, we have to prove that the operators G s are compact from (L pw (J ), • ) into (L pw (J ), • ∞ ).
In other words, consider a sequence ( f n ) n∈N , f n ∈ L pw (J ) with f n ≤ 1, we have to prove that there exists a subsequence n k such that the sequence (G s f n k ) converges for the sup norm • ∞ .This will follow from remark 2.
Lemma 3.4 For all s such that R with f n ≤ 1, restricted to each I m × I m the functions f n are uniformly equicontinuous.We may apply Ascoli's theorem on each I m × I m and use a diagonal principle to find a subsequence n k such that the sequence f n k converges to some function f .Let us prove that G s f n k converges uniformly to G s f .Denote (this can be done because the convergence is uniform on each I m × I m and Q is finite).We have: In other words, G s f n k goes to G s f uniformly.Now the following result is a simple consequence of Theorem 3.3.For any s, R(s) denotes the spectral radius of G s .
The result follows.
To conclude the proof of Theorem 2.4, it remains to prove that for real s > γ, G s admits a unique simple dominant eigenvalue.We postpone this proof to section 4. Let us use Theorem 2.4 and Proposition 3.5 to obtain spectral properties of G s for complex parameters s.

Spectral properties for complex parameters s and properties of Dirichlet series
For real s > γ, by Theorem 2.4, we have that for any k ∈ N, f ∈ L pw (J ), where Π s is the spectral projection on the maximal eigenvalue and S s is an operator on L pw (J ) whose spectral radius strictly less than λ(s) and such that S s • Π s = Π s • S s = 0. Now Proposition 2.2 gives: ) and ρ(s) the spectral radius of S s over λ(s).Remark that we have used that converges which follows from (d2).Thus we have proved (2.3) of Theorem 2.3 for real s.The fact that it holds on a complex neighborhood of s > γ follows from perturbation theory (see for example Kato (K)).
We now prove Proposition 8, Proposition 9 and Proposition 10 of (V1) in our context.Remark that her proofs are based upon Fredholm determinant theory thus we have to use others arguments.Also, some changes are due to the fact that we work with functions f that are continuous on each I m but not on I.In particular, in general there does not exist x ∈ I such that f (x) = sup I f .Proposition 3.6 1.The function s → λ(s) is strictly decreasing along the real axis s > γ.
3. If R(s) = λ(σ) for s = σ + it then G s has an eigenvalue λ = e ia λ(σ), a ∈ R that belongs to the spectrum of G s .
To prove item 2, it suffices to remark that for Finally, if R(s) = λ(σ) then by Proposition 3.5, the operator G s is quasi compact and thus admits a eigenvalue λ = e ia λ(σ) of modulus λ(σ).Let Ψ s be such that G s Ψ s = λΨ s and ψ s (x) = Ψ s (x, x).Then G s ψ s = λψ s .
Let us study the spectral properties of G s for R (s) = 1.Let us remark that for any distribution F, we have Thus λ(1) = 1.For further let us denote ϕ 1 ∈ L pw (I) the eigenfunction of G 1 corresponding to the maximal eigenvalue λ(1) = 1 and satisfying m(φ 1 ) = 1.Then the measure ν = φ 1 m is T invariant.
Proposition 3.7 Let R (s) = 1, the operator may behave in two different ways.
1. Either for all s = 1, R (s) = 1, R(s) < 1 (the aperiodic case), 2. or the set of t ∈ R such that 1 belongs to the spectrum of G 1+it is of the form t 0 Z for some t 0 (the periodic case).In that case, the map T is conjugated to a piecewise affine map with slopes of the form α k , α > 1, the conjugacy is C 1+Lip on each I m .Moreover, there exists σ 0 < 1 such that on the Proof.− Let s = 1 + it and assume that 1 belongs to the spectrum of G 1+it .Then using Proposition 3.6 we have that there exists Recall that the Lebesgue measure is invariant by G 1 so that As a consequence, inequality (3.4) must be an equality.Now, because of Theorem 2.4, 1 is simple as an eigenvalue of G 1 .Thus, let x) , multiplying if necessary f 1 by some constant, we may assume that |µ| ≡ 1.Following B. Vallée's proof of Proposition 9, we obtain that for all m ∈ M , x ∈ J m , (3.5) Reciprocally, let t be such that there exists a function µ satisfying (3.5) for all m In other words, we have proved that 1 belongs to the spectrum of G 1+it if and only if there exists a function µ satisfying (3.5) for all m ∈ M .This implies that the set of real t such that 1 belongs to the spectrum of It cannot accumulate 0 because of the analyticity of s → λ(s) near s = 1.Thus it is of the form t 0 Z.There exists a real function θ ∈ L pw (I) such that µ = e iθ (recall that µ = f f 1 ∈ L pw (I)), take φ = exp( θ t ) and α = exp( 2π t ).Equation (3.5) becomes: where k(x) ∈ Z and is constant on each I m , and finally, equation (3.5) may be rewritten as: Now, we may find constants c m and d m , m ∈ M such that the function Let us prove the existence of a strip free of poles.There exists γ < σ 1 < 1 such that for any σ ∈]σ 1 , 1[, the operator G σ has no eigenvalue of modulus 1.Let σ 1 < σ 0 < 1 being such that δλ(σ) < 1 for all σ > σ 0 .Let σ ∈]σ 0 , 1[ and s = σ + iτ.Proposition 3.5 implies that either G s is quasi-compact or R(s) < 1 (in this last case 1 does not belong to the spectrum of G s ).So assume that G s is quasi-compact.If 1 is in the spectrum of G s , then it is an eigenvalue of G s (Theorem 3.3) and of G s .There exists f ∈ L pw (I) such that G s ( f ) = f .Using that α ikt 0 = 1 for any integer k, one deduces that 1 is an eigenvalue of the operators G σ+i(τ+kt 0 ) for any k ∈ Z.It follows that if there is no strip free of poles, then some of the points of the line R (s) = 1 are accumulated by a sequence of poles of Λ(F, s).This is a contradiction since Λ(F, s) is a meromorphic function in a neighborhood of R (s) = 1.
We now prove the log-convexity of s → λ(s).Such a property is necessary to study the random variable log ℓ k .
Proposition 3.8 The function s → log λ(s) is convex.Either it is strictly convex or it is affine.In this last case, the map T is conjugated to a piecewise affine map with slopes all equal.The conjugacy is C 1+Lip on each I m .
Proof.− We have to prove that for t ∈ [0, 1] and s > γ, where f σ denote a dominant eigenfunction of G σ .We may normalize ψ to have sup I ψ = 1.Consider a sequence x n ∈ I such that ψ(x n ) → 1.
(3.8) follows from Hölder inequality.Taking the limit when n → ∞ gives (3.6).λ being analytic, if equality holds in (3.6) for some s, s ′ , t then log λ is affine.In this last case, it remains to prove that the map T is conjugated to a piecewise affine map with slopes all equal.
Assume that log λ is affine then there exists a < 1 such that λ(s) = a s−1 .Choose s, s ′ , t such that ts is a dominant eigenfunction of G 1 .Hölder inequality implies that As in the proof of Proposition 3.7, we use that G 1 leaves Lebesgue measure invariant to conclude that As a consequence, ψ ≡ 1 and equality holds in (3.7) for all x ∈ I.This implies that there exists a function k : Summing over h ∈ M and noting φ(x) = and then T satisfy a cocycle relation: (3.9) Following the end of the proof of Proposition 3.7, we conclude that T is conjugated to a piecewise affine map with slopes all equal to 1 a .
Remark 4 By the way, the cocycle argument used in the proofs of Proposition 3.7 and 3.  Figure 7 shows relations between sources conjugated to piecewise affine sources.Let U(s) = log λ(s).With propositions 3.6, 3.7, 3.8 the analysis of parameters of the source done in sections 7, 8, 9 of (V1) apply to our setting without any change.To conclude the proof Theorem 2.3, it remains to verify that if the source is not log-affine then U ′′ (1) = 0.This is necessary to apply Hwang's quasi powers theorem and obtain the central limit theorem.
4 Spectral properties of real generalized Ruelle operators The aim of this section is to prove Theorem 2.4.Let us recall definitions and properties of cones and projective metrics (see (L) or (L,S,V) for a complete presentation).

Cones and projective metrics
The theory of cones and projective metrics of G. Birkhoff ( Bi1) is a powerful tool to study linear operators.P. Ferrero and B. Schmitt (F,S) applied it to estimate the correlation decay for random compositions of dynamical systems.
Definition 3 Let V be a vector space.A subset C ⊂ V which enjoys the following four properties is called a convex cone.
We now define the Hilbert metric on C : Definition 4 The distance d C ( f , g) between two points f , g in C is given by where α and β are defined as where we take α = 0 or β = ∞ when the corresponding sets are empty.
Remark 5 In the sequel we will use that β( f , g) = α(g, f ).
The distance d C is a pseudo-metric, because two elements can be at an infinite distance from each other, and it is a projective metric because any two proportional elements have a null distance.
Given two elements linearly independent f and g ∈ C , consider the intersection of C with the two dimensional vector space spanned by f and g.Its boundary is the union of two half lines ℓ 1 , ℓ 2 .The distance d C ( f , g) is the log of the cross-ratio of the four half lines ℓ 1 , ℓ 2 , f , g (see figure 8).
Remark 6 For example, if V is a space formed with real valued functions and C + the cone of positive functions then an easy computation gives: Fig. 8: Projective metric Definition 5 Let V be a vector space, C ⊂ V a convex cone, a linear operator L : The next theorem, due to G. Birkhoff (Bi2), shows that every positive linear operator is a contraction, provided that the diameter of the image is finite.
Theorem 4.1 Let V be a vector space, C ⊂ V a convex cone (see definition above) and L : V → V a positive linear operator.Let d C be the Hilbert metric associated to the cone C .If we denote (apply Theorem 4.1 with L = Id).In particular, if Theorem 4.1 alone is not completely satisfactory: given a cone C and its metric d C , we need to relate the distance d C with a suitable norm on V .The following lemma provides such a relation.
and let ℓ : C → R + be a homogeneous and order preserving function, i.e.

Proof of Theorem 2.4
We We deduce (using Lemma 4.2 with ∞ as homogenous form) that the sequence of lines generated by (G n s f ) n∈N is a Cauchy sequence and converges to a line generated by an eigenvector Ψ s .This eigenvector corresponds to an eigenvalue λ(s).On another hand, we construct an eigenvector ν s for the dual operator.Then Lemma 4.2 (applied with ν s as homogenous form) and equation (4.1) give that goes to zero exponentially fast for any f ∈ C .Then we have to extend this result from the cone to the Banach space of piecewise Lipschitz functions (this is done using Lemma 4.9 below).The fact that (4.2) goes to zero exponentially fast implies that λ(s) is the unique dominant eigenvalue of G s .
The following lemma proves the existence of a real positive eigenvalue for the dual operator of G s .
The corresponding eigenvector is indeed a measure.Recall that is V is a topological Banach space, it topological dual V ′ is endowed with the weak topology that is: a sequence (ν n ) n∈N of elements of V ′ converges to ν ∈ V ′ if and only if for any f ∈ V , the sequence (ν n ( f )) n∈N converges to ν( f ).Also, if L is a continuous linear operator on V then it defines a continuous linear operator L ′ on V ′ by: for Lemma 4.3 There exists a measure ν s on J and a positive number λ(s) such that for f ∈ L pw (J ), condition 4 below which controls the functions on the complementary of a well chosen finite part Q of M .Let Q be a finite subset of M such that: The existence of such a subset Q follows from (2.1).For a > 0, b > 0, let C a,b (s) be the set of functions f on J such that: Remark 8 As mentioned in Remark 3, if the source is complete, G s acts on the space C 1 (I ×I) of functions that are C 1 on I × I. To get the dominant eigenvalue result on this space, it suffices to replace item 1 in the definition of the cone above by " f ∈ C 1 (I × I)".  1. β > g(x, x ′ ) f (x, x ′ ) for all (x, x ′ ) ∈ J .
2. β > e ad(X,Y ) g(y, y ′ ) − g(x, x ′ ) e ad(X,Y ) f (y, y ′ ) − f (x, x ′ ) := u(X,Y ) for all (x, x ′ ), (y, y ′ ) ∈ J .2) is satisfied for m ∈ Q and N then (2.2) is also satisfied for m and kN for all k ∈ N * .Since Q is finite, we may take M a common multiple.
Sublemma 4.8 Let M be given by (4.5).There exist constants K 1 , K 2 such that for any k ≥ M, for any f ∈ C a,b (s), for all X ∈ J , The following lemma shows that any function in L pw (J ) may be pushed into the cone C a,b (s).
Lemma 4.9 There exists K 3 > 0 satisfying: for any function f ∈ L pw (J ), there exists R( f ) > 0 such that R( f ) + f ∈ C a,b (s) and R( f (a) A finite or infinite countable alphabet M .(b) A topological partition of I := [0, 1] into disjoint open intervals I m , m ∈ M , i.e.I = S m∈M I m .(c) A mapping σ which is constant and equal to m on each I m .1365-8050 c 2004 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France (d) A mapping T whose restriction to each I m is a C 2 bijection from I m to T (I m ) = J m .

Fig. 2 :
Fig. 2: An example of trie (a) An alphabet M , finite or infinite countable.(b) A topological partition of I := [0, 1] with disjoint open intervals I m , m ∈ M , i.e.I = S m∈M I m , I m =]a m , b m [.(c) A mapping σ which is constant and equal to m on each I m .(d) A mapping T whose restriction to each I m is a C 2 bijection from I m to T (I m ) = J m .Let h m : J m → I m be the local inverse of T restricted to I m .The mappings h m satisfy the following conditions: (d1) Contracting.There exist 0 < η m ≤ δ m < 1 for which η m ≤ |h ′ m (x)| ≤ δ m for x ∈ J m .(d2) There exists γ < 1 such that for R (s) > γ, the series ∑ m∈M 1 J m (x)δ s m converge uniformly for x ∈ I and ∑ m∈M |I m | s converges.(d3) Bounded distortion.There exists a constant A < +∞ such that for all m ∈ M and all x, y ∈ J m , Markov property.Each interval J m is union of some of the I k 's.(d5) Positivity.See Condition 1 below.

Fig. 6 :
Fig. 6: Spectrum of a quasi compact operator L pw (I) be the space of functions that are bounded and Lipschitz continuous on each I m , with the supremum of the Lipschitz constants on the I m 's finite.Denote by J ⊂ I × I the union of all sets I m × I m and let L pw (J ) the space of functions on J , that are bounded and Lipschitz continuous on each I m × I m , with bounded Lipschitz constant.In both cases, Lip( f ) will denote the sup of the Lipschitz constants on the I m 's or on the I m × I m 's.These spaces are endowed with the norm:

Fig. 7 :
Fig. 7: Exceptional sources all m ∈ M , ν s (I m × I m ) > 0.2.For f∈ C a,b (s), m ∈ M , X ∈ I m × I m , f (X) ≤ e 2a ν s ( f ) ν s (I m × I m ).Proof.− To prove Item 1, remark that (2.2) and Taylor equality imply that for all m ∈ M , there existsN ∈ N such that inf X∈J G N s 1 I m ×I m (X) > 0. Now, ν s (1 I m ×I m ) = λ(s) −N ν s (G N s 1 I m ×I m ) > 0. Item 2.follows from the definition of the cone (condition 3.) by integrating with respect to ν s on I m × I m .Lemma 4.5 For any s > γ, for any δ < ξ < 1, there exists a 0 > 0, b 0 > 0 such that for all a ≥ a 0 , b ≥ b 0 and for any k∈ N * , G k s maps C a,b (s) into C ξa,ξb (s).Proof.− Let f ∈ C a,b (s).Because C ξa,ξb (s) ⊂ C a,b (s), it suffices to proof the lemma for k = 1.Let f ∈ C a,b (s), for anym ∈ M , X = (x, x ′ ), Y = (y, y ′ ) ∈ I m × I m , we have to compare: f (h m (x), h m (x ′ ))H s m (x, x ′ ) with f (h m (y), h m (y ′ ))H s m (y, y ′ ).Because f belongs toC a,b (s), and for each m ∈ M , h m is a δ-contraction, we have: f (h m (x), h m (x ′ )) ≤ e aδd(X,Y ) f (h m (y), h m (y ′ )).
x ′ ) ≤ e sBd(X,Y ) H s m (y, y ′ ).So, G s f (X) ≤ e ξad(X,Y ) G s f (Y ) provided a ≥ sB ξ − δ .Now, let X ∈ I m × I m with m ∈ Q .Let c := inf m∈Q ν s (I m × I m ), c > 0 because of Lemma 4.4 and the fact that Q is finite.We have: f (h m (x), h m (x ′ )) ≤ e 2a c ν s ( f ) G s 1 ∞ + bν s ( f ) sup x∈I ∑ m ∈Q x∈Jm δ s m .Now, we use that ν s ( f ) = ν s (G s f ) λ(s) and since sup x∈I ∑ m ∈Q x∈Jm δ s m < λ(s)δ, we get: G s f (X) ≤ ν s (G s f ) ( e 2a G s 1 cλ(s) + bδ) ≤ bξν s (G s f ) provided b ≥ e 2a G s 1 λ(s)c(ξ−δ) .Lemma 4.6 Let a ≥ a 0 , b ≥ b 0 , there exists M such that for k ≥ M, the projective diameter ∆ of G k s C a,b (s) into C a,b (s) is finite: ∆ = sup f ,g ∈ C a,b (s) d C a,b (s) (G k s f , G k s g) < ∞.Proof.− Let f , g ∈ C ξa,ξb (s), let β > 0, we have that β f − g ∈ C a,b (s) if and only if:

Proof. −
From Lemma 4.5, we have that if f ∈ C a,b (s) then G p s f ∈ C a,b (s) for all p ∈ N. So, it suffices to prove the inequality for k= M. Since G M s f ∈ C a,b (s), we have for all X ∈ J : G M s f (X) ≤ ν s (G M s f ) max[b, e 2a c ].Now, using Sublemma 4.7, we findm 0 ∈ Q such that for X ∈ I m 0 × I m 0 , f (X) ≥ εν s ( f ).Now, for all X ∈ J , G M s f (X) ≥ εν s ( f )G M s (1 I m 0 ×I m 0 )(X) ≥ ν s ( f )εD.So the sublemma is proved with K 1 = max[b,e 2a c ] and K 2 = εD λ(s) M .Let us conclude the proof of Lemma 4.6.Equations (4.3) and (4.4) together with Sublemma 4.8 gives that for all f , g ∈ C a,b (s) and k ≥ M, d C a,b (s) we conclude that the projective diameter ∆ of G k s C a,b (s) into C a,b (s) is finite : ), is not done in this paper.The reader is referred to B. Vallée's paper.
Let us mention that previous strategy initially developed by B. Vallée also has various important applications in the area of analysis of algorithms (especially for arithmetic algorithms), see (V2), (V3), (V4) for example.At last, an important application of the asymptotic behavior of the parameters of P-Markov sources is the are now going to use Theorem 4.1 and Lemma 4.2 to prove Theorem 2.4.Recall that we already know from Section 3 that the operators G s are quasi-compact for real s > γ.It remains to prove that they have a unique dominant eigenvalue ; we prove it for G s and leave to the reader the proof for G s .Let us sketch how to use cones to obtain the dominant spectral properties.To obtain a unique dominant eigenvalue, it is sufficient to find a cone C and an integer k such that G k s maps C into itself and the diameter ∆