Analysis of algorithms is concerned with accurate estimates of complexity parameters of algorithms and aims at predicting the behaviour of a given algorithm run in a given environment. It develops general methods for obtaining closed-form formulae, asymptotic estimates, and probability distributions for combinatorial or probabilistic quantities, that are of interest in the optimization of algorithms. Interest is also placed on the methods themselves, whether combinatorial, probabilistic, or analytic. Combinatorial and statistical properties of discrete structures (strings, trees, tries, dags, graphs, and so on) as well as mathematical objects (e.g., continued fractions, polynomials, operators) that are relevant to the design of efficient algorithms are investigated.

Editors: Luc Devroye ; Michael Drmota ; Philippe Jacquet ; Wojciech Szpankowski

A word of length $n$ is rich if it contains $n$ nonempty palindromic factors. An infinite word is rich if all of its finite factors are rich. Baranwal and Shallit produced an infinite binary rich word with critical exponent $2+\sqrt{2}/2$ ($\approx 2.707$) and conjectured that this was the least possible critical exponent for infinite binary rich words (i.e., that the repetition threshold for binary rich words is $2+\sqrt{2}/2$). In this article, we give a structure theorem for infinite binary rich words that avoid $14/5$-powers (i.e., repetitions with exponent at least 2.8). As a consequence, we deduce that the repetition threshold for binary rich words is $2+\sqrt{2}/2$, as conjectured by Baranwal and Shallit. This resolves an open problem of Vesti for the binary alphabet; the problem remains open for larger alphabets.

The problem of estimating the number n of distinct keys of a large collection of N data is well known in computer science. A classical algorithm is the adaptive sampling (AS). n can be estimated by R2 J , where R is the final bucket size and J is the final depth at the end of the process. Several new interesting questions can be asked about AS (some of them were suggested by P.Flajolet and popularized by J.Lumbroso). The distribution of W = log(R2 J /n) is known, we rederive this distribution in a simpler way. We provide new results on the moments of J and W. We also analyze the final cache size R distribution. We consider colored keys: assume also that among the n distinct keys, m do have color K We show how to estimate p = m n. We study keys with some multiplicity : we provide a way to estimate the total number M of some color K keys among the total number N of keys. We consider the case where we know a priori the multiplicities but not the colors. There we want to estimate the total number of keys N. An appendix is devoted to the case where the hashing function provides bits with probability different from 1/2.

In a barter exchange market, agents bring items and seek to exchange their items with one another. Agents may agree to a k-way exchange involving a cycle of k agents. A barter exchange market can be represented by a digraph where the vertices represent items and the edges out of a vertex indicate the items that an agent is willing to accept in exchange for that item. It is known that the problem of finding a set of vertex-disjoint cycles with the maximum total number of vertices (MAX-SIZE-EXCHANGE) can be solved in polynomial time. We consider a barter exchange where each agent may bring multiple items, and items of the same agent are represented by vertices with the same color. A set of cycles is said to be tropical if for every color there is a cycle that contains a vertex of that color. We show that the problem of determining whether there exists a tropical set of vertex-disjoint cycles in a digraph (TROPICAL-EXCHANGE) is NP-complete and APX-hard. This is equivalent to determining whether it is possible to arrange an exchange of items among agents such that every agent trades away at least one item. TROPICAL-MAX-SIZE-EXCHANGE is a similar problem, where the goal is to find a set of vertex-disjoint cycles that contains the maximum number of vertices and also contains all of the colors in the graph. We show that this problem is likewise NP-complete and APX-hard. For the restricted case where there are at most two vertices of each color (corresponding to a restriction that […]

In a rooted tree, protected nodes are neither leaves nor parents of any leaves. They have some practical motivations, e.g., in organizational schemes, security models and social-network models. Protected node profile measures the number of protected nodes with the same distance from the root in rooted trees. For no rooted tree, protected node profile has been investigated so far. Here, we present the asymptotic expectations, variances, covariance and limiting bivariate distribution of protected node profile and non-protected internal node profile in random tries, an important data structure on words in computer science. Also we investigate the fraction of these expectations asymptotically. These results are derived by the methods of analytic combinatorics such as generating functions, Mellin transform, Poissonization and depoissonization, saddle point method and singularity analysis.

Every rational number p/q defines a rational base numeration system in which every integer has a unique finite representation, up to leading zeroes. This work is a contribution to the study of the set of the representations of integers. This prefix-closed subset of the free monoid is naturally represented as a highly non-regular tree. Its nodes are the integers, its edges bear labels taken in {0,1,...,p-1}, and its subtrees are all distinct. We associate with each subtree (or with its root n) three infinite words. The bottom word of n is the lexicographically smallest word that is the label of a branch of the subtree. The top word of n is defined similarly. The span-word of n is the digitwise difference between the latter and the former. First, we show that the set of all the span-words is accepted by an infinite automaton whose underlying graph is essentially the same as the tree itself. Second, we study the function that computes for all n the bottom word associated with n+1 from the one associated with n, and show that it is realised by an infinite sequential transducer whose underlying graph is once again essentially the same as the tree itself. An infinite word may be interpreted as an expansion in base p/q after the radix point, hence evaluated to a real number. If T is a subtree whose root is n, then the evaluations of the labels of the branches of T form an interval of $\mathbb{R}$. The length of this interval is called the span of n and is equal to the […]

Stanley lists the class of Dyck paths where all returns to the axis are of odd length as one of the many objects enumerated by (shifted) Catalan numbers. By the standard bijection in this context, these special Dyck paths correspond to a class of rooted plane trees, so-called Catalan-Stanley trees. This paper investigates a deterministic growth procedure for these trees by which any Catalan-Stanley tree can be grown from the tree of size one after some number of rounds; a parameter that will be referred to as the age of the tree. Asymptotic analyses are carried out for the age of a random Catalan-Stanley tree of given size as well as for the "speed" of the growth process by comparing the size of a given tree to the size of its ancestors.

In queuing theory, it is usual to have some models with a "reset" of the queue. In terms of lattice paths, it is like having the possibility of jumping from any altitude to zero. These objects have the interesting feature that they do not have the same intuitive probabilistic behaviour as classical Dyck paths (the typical properties of which are strongly related to Brownian motion theory), and this article quantifies some relations between these two types of paths. We give a bijection with some other lattice paths and a link with a continued fraction expansion. Furthermore, we prove several formulae for related combinatorial structures conjectured in the On-Line Encyclopedia of Integer Sequences. Thanks to the kernel method and via analytic combinatorics, we provide the enumeration and limit laws of these "lattice paths with catastrophes" for any finite set of jumps. We end with an algorithm to generate such lattice paths uniformly at random.

By a tight tour in a $k$-uniform hypergraph $H$ we mean any sequence of its vertices $(w_0,w_1,\ldots,w_{s-1})$ such that for all $i=0,\ldots,s-1$ the set $e_i=\{w_i,w_{i+1}\ldots,w_{i+k-1}\}$ is an edge of $H$ (where operations on indices are computed modulo $s$) and the sets $e_i$ for $i=0,\ldots,s-1$ are pairwise different. A tight tour in $H$ is a tight Euler tour if it contains all edges of $H$. We prove that the problem of deciding if a given $3$-uniform hypergraph has a tight Euler tour is NP-complete, and that it cannot be solved in time $2^{o(m)}$ (where $m$ is the number of edges in the input hypergraph), unless the ETH fails. We also present an exact exponential algorithm for the problem, whose time complexity matches this lower bound, and the space complexity is polynomial. In fact, this algorithm solves a more general problem of computing the number of tight Euler tours in a given uniform hypergraph.

Consider $ m $ copies of an irreducible, aperiodic Markov chain $ Y $ taking values in a finite state space. The asymptotics as $ m $ tends to infinity, of the first time from which on the trajectories of the $ m $ copies differ, have been studied by Szpankowski (1991) in the setting of tries. We use a different approach and model the $ m $ trajectories by a variant of the occupancy scheme, where we consider a nested sequence of boxes. This approach will enable us to extend the result to the case when the transition probabilities are random. We moreover use the same techniques to study the asymptotics as $ m $ tends to infinity of the time up to which we have observed all the possible trajectories of $ Y $ in random and nonrandom scenery.

The objective of this paper is to find in a setting of n sequential observations of objects a good online policy to select the k bestof these n uniquely rankable objects. This focus is motivated by the fact that it is hard to find closed form solutions of optimalstrategies for general k and n. Selection is without recall, and the idea is to investigate threshold functions which maintain allpresent information, that is thresholds which are functions of all selections made so far. Our main interest lies in the asymptoticbehaviour of these thresholds as n -> infinity and in the corresponding asymptotic performance of the threshold algorithm.

The partial sum of the states of a Markov chain or more generally a Markov source is asymptotically normally distributed under suitable conditions. One of these conditions is that the variance is unbounded. A simple combinatorial characterization of Markov sources which satisfy this condition is given in terms of cycles of the underlying graph of the Markov chain. Also Markov sources with higher dimensional alphabets are considered. Furthermore, the case of an unbounded covariance between two coordinates of the Markov source is combinatorically characterized. If the covariance is bounded, then the two coordinates are asymptotically independent. The results are illustrated by several examples, like the number of specific blocks in $0$-$1$-sequences and the Hamming weight of the width-$w$ non-adjacent form.

The new finite state machine package in the mathematics software system SageMath is presented and illustrated by many examples. Several combinatorial problems, in particular digit problems, are introduced, modeled by automata and transducers and solved using SageMath. In particular, we compute the asymptotic Hamming weight of a non-adjacent-form-like digit expansion, which was not known before.

In this paper we present an analysis of some generalization of the classic urn and balls model. In our model each urn has a fixed capacity and initially is filled with white balls. Black balls are added to the system of connected urns and gradually displace white balls. We show a general form of formulas for the expected numbers of black balls in a given urn and we analyze some special cases (parallel and serial configurations). We are mainly interested in a counterpart of the Coupon Collector Problem for the model considered. The primary motivation for our research is the formal analysis of the mix networks (introduced by D. Chaum) and its immunity to so-called flooding (blending) attacks.

The successive discrete structures generated by a sequential algorithm from random input constitute a Markov chain that may exhibit long term dependence on its first few input values. Using examples from random graph theory and search algorithms we show how such persistence of randomness can be detected and quantified with techniques from discrete potential theory. We also show that this approach can be used to obtain strong limit theorems in cases where previously only distributional convergence was known.

We introduce and study a combinatorial optimization problem motivated by the question in the title. In the simple case where you use all objects in your room equally often, we investigate asymptotics of the optimal time to clean up in terms of the number of objects in your room. In particular, we prove a logarithmic upper bound, solve an approximate version of this problem, and conjecture a precise logarithmic asymptotic.

Flood-it is a combinatorial game played on a colored graph G whose aim is to make the graph monochromatic using the minimum number of flooding moves, relatively to a fixed pivot. Free-Flood-it is a variant where the pivot can be freely chosen for each move of the game. The standard versions of Flood-it and Free-Flood-it are played on m ×n grids. In this paper we analyze the behavior of these games when played on other classes of graphs, such as d-boards, powers of cycles and circular grids. We describe polynomial time algorithms to play Flood-it on C2n (the second power of a cycle on n vertices), 2 ×n circular grids, and some types of d-boards (grids with a monochromatic column). We also show that Free-Flood-it is NP-hard on C2n and 2 ×n circular grids.

Let fm,n,h be the number of spanning forests with h edges in the complete bipartite graph Km,n. Kirchhoff\textquoterights Matrix Tree Theorem implies fm,n,m+n-1=mn-1 nm-1 when m ≥1 and n ≥1, since fm,n,m+n-1 is the number of spanning trees in Km,n. In this paper, we give an algorithm for computing fm,n,h for general m,n,h. We implement this algorithm and use it to compute all non-zero fm,n,h when m ≤50 and n ≤50 in under 2 days.

The classic Coupon-Collector Problem (CCP) is generalized. Only basic probability theory is used. Centerpiece rather is an algorithm that efficiently counts all k-element transversals of a set system.

Let T be a d-dimensional toroidal grid of n^d points. For a given range parameter ω, and a positive integer k ≤q d, we say that two points in T are mutually visible if they differ in at most k coordinates and are a distance at most ω apart, where distance is measured using the \ellₚ norm. We obtain a random d-dimensional line-of-sight graph G by placing a node at each point in T independently with some fixed probability p^* and connecting all pairs of mutually visible nodes. We prove an asymptotically tight connectivity result for this random graph.

In the running time analysis of the algorithm Find and versions of it appear as limiting distributions solutions of stochastic fixed points equation of the form X D = Sigma(i) AiXi o Bi + C on the space D of cadlag functions. The distribution of the D-valued process X is invariant by some random linear affine transformation of space and random time change. We show the existence of solutions in some generality via the Weighted Branching Process. Finite exponential moments are connected to stochastic fixed point of supremum type X D = sup(i) (A(i)X(i) + C-i) on the positive reals. Specifically we present a running time analysis of m-median and adapted versions of Find. The finite dimensional distributions converge in L-1 and are continuous in the cylinder coordinates. We present the optimal adapted version in the sense of low asymptotic average number of comparisons. The limit distribution of the optimal adapted version of Find is a point measure on the function [0, 1] there exists t -> 1 + mint, 1 - t.

We study a leader election protocol that we call the Swedish leader election protocol. This name comes from a protocol presented by L. Bondesson, T. Nilsson, and G. Wikstrand (2007). The goal is to select one among n > 0 players, by proceeding through a number of rounds. If there is only one player remaining, the protocol stops and the player is declared the leader. Otherwise, all remaining players flip a biased coin; with probability q the player survives to the next round, with probability p = 1 - q the player loses (is killed) and plays no further ... unless all players lose in a given round (null round), so all of them play again. In the classical leader election protocol, any number of null rounds may take place, and with probability 1 some player will ultimately be elected. In the Swedish leader election protocol there is a maximum number tau of consecutive null rounds, and if the threshold is attained the protocol fails without declaring a leader. In this paper, several parameters are asymptotically analyzed, among them: Success Probability, Number of rounds R-n, Number of null rounds T-n. This paper is a companion paper to Louchard, Martinez and Prodinger (2011) where De-Poissonization was used, together with the Mellin transform. While this works fine as far as it goes, there are limitations, in particular of a computational nature. The approach chosen here is similar to earlier efforts of the same authors - Louchard and Prodinger (2004,2005,2009). Identifying some […]

What today we call digital search tree (DST) is Coffman and Eve's sequence tree introduced in 1970. A digital search tree is a binary tree whose ordering of nodes is based on the values of bits in the binary representation of a node's key. In fact, a digital search tree is a digital tree in which strings (keys, words) are stored directly in internal nodes. The profile of a digital search tree is a parameter that counts the number of nodes at the same distance from the root. In this paper we concentrate on external profile, i.e., the number of external nodes at level k when n strings are sorted. By assuming that the n input strings are independent and follow a (binary) memoryless source the asymptotic behaviour of the average profile was determined by Drmota and Szpankowski (2011). The purpose of the present paper is to extend their analysis and to provide a precise analysis of variance of the profile. The main (technical) difference is that we have to deal with an inhomogeneous part in a proper functional-differential equations satisfied by the second moment and Poisson variance. However, we show that the variance is asymptotically of the same order as the expected value which implies concentration. These results are derived by methods of analytic combinatorics such as generating functions, Mellin transform, Poissonization, the saddle point method and singularity analysis.

We adapt a novel idea of Cichon's related to Approximate Counting to the present instance of Digital Search Trees, by using m (instead of one) such trees. We investigate the level polynomials, which have as coefficients the expected numbers of data on a given level, and the insertion costs. The level polynomials can be precisely described, thanks to formulae from q-analysis. The asymptotics of expectation and variance of the insertion cost are fairly standard these days and done with Rice's method.

Cuckoo hashing is a hash table data structure offering constant access time, even in the worst case. As a drawback, the construction fails with small, but practically significant probability. However, Kirsch et al. (2008) showed that a constant-sized additional memory, the so called stash, is sufficient to reduce the failure rate drastically. But so far, using a modified insertion procedure that demands additional running time to look for an admissible key is required. As a major contribution of this paper, we show that the same bounds on the failure probability hold even without this search process and thus, the performance increases. Second, we extend the analysis to simplified cuckoo hashing, a variant of the original algorithm offering increased performance. Further, we derive some explicit asymptotic approximations concerning the number of usual resp. bipartite graphs related to the data structures. Using these results, we obtain much more precise asymptotic expansions of the success rate. These calculations are based on a generating function approach and applying the saddle point method. Finally, we provide numerical results to support the theoretical analysis.

A partition of $x > 0$ of the form $x = \sum_i 2^{a_i}3^{b_i}$ with distinct parts is called a double-base expansion of $x$. Such a representation can be obtained using a greedy approach, assuming one can efficiently compute the largest \mbox{$\{2,3\}$-integer}, i.e., a number of the form $2^a3^b$, less than or equal to $x$. In order to solve this problem, we propose an algorithm based on continued fractions in the vein of the Ostrowski number system, we prove its correctness and we analyse its complexity. In a second part, we present some experimental results on the length of double-base expansions when only a few iterations of our algorithm are performed.

In the extensible bin packing problem we are asked to pack a set of items into a given number of bins, each with an original size. However, the original bin sizes can be extended if necessary. The goal is to minimize the total size of the bins. We consider the problem with unequal (original) bin sizes and give the complete analysis on a list scheduling algorithm (LS). Namely we present tight bounds of LS for every collection of original bin sizes and every number of bins. We further show better on-line algorithms for the two-bin case and the three-bin case. Interestingly, it is proved that the on-line algorithms have better competitive ratios for unequal bins than for equal bins. Some variants of the problem are also discussed.

For Dyck paths (nonnegative symmetric) random walks, the location of the first maximum within the first sojourn is studied. Generating functions and explicit resp. asymptotic expressions for the average are derived. Related parameters are also discussed.

We start with a set of $n$ players. With some probability $P(n,k)$, we kill $n-k$ players; the other ones stay alive, and we repeat with them. What is the distribution of the number $X_n$ of \emph{phases} (or rounds) before getting only one player? We present a probabilistic analysis of this algorithm under some conditions on the probability distributions $P(n,k)$, including stochastic monotonicity and the assumption that roughly a fixed proportion $\al$ of the players survive in each round. We prove a kind of convergence in distribution for $X_n - \log_{1/\!\alpha}(n)$; as in many other similar problems there are oscillations and no true limit distribution, but suitable subsequences converge, and there is an absolutely continuous random variable $Z$ such that $d\l(X_n, \lceil Z + \log_{1/\!\alpha} (n)\rceil\r) \to 0$, where $d$ is either the total variation distance or the Wasserstein distance. Applications of the general result include the leader election algorithm where players are eliminated by independent coin tosses and a variation of the leader election algorithm proposed by W.R. Franklin. We study the latter algorithm further, including numerical results.

Priority trees are a certain data structure used for priority queue administration. Under the model that all permutations of the numbers 1, . . . , n are equally likely to construct a priority tree of size n we study the following parameters in size-n trees: depth of a random node, number of right edges to a random node, and number of descendants of a random node. For all parameters studied we give limiting distribution results.

In this paper we consider the distribution of a pattern of interest in a binary random (d; k)-sequence generated by a Markov source. Such constrained sequences are frequently encountered in communication systems. Unlike the previous approach based on generating function we have chosen here to use Markov chain embedding techniques. By doing so, we get both previous results (sequence constrained up to the rth occurrence), and new ones (sequence constrained up to its end). We also provide in both cases efficient algorithms using basic linear algebra only. We then compare our numerical results to previous ones and finally propose several interesting extensions of our method which further illustrate the usefulness of our approach. That is to say higher order Markov chains, renewal occurrences rather than overlapping ones, heterogeneous models, more complex patterns of interest, and multistate trial sequences.

A binary sequence of zeros and ones is called a (d; k)-sequence if it does not contain runs of zeros of length either lessthan d or greater than k, where d and k are arbitrary, but fixed, non-negative integers and d < k. Such sequences find requires that (d; k)-sequences do not contain a specific pattern w. Therefore, distribution results concerning pattern occurrence in (d; k)-sequences are of interest. In this paper we study the distribution of the waiting time until the r-th occurrence of a pattern w in a random (d; k)-sequence generated by a Markov source. Numerical examples are also provided.

In words, generated by independent geometrically distributed random variables, we study the lth descent, which is, roughly speaking, the lth occurrence of a neighbouring pair ab with a>b. The value a is called the initial height, and b the end height. We study these two random variables (and some similar ones) by combinatorial and probabilistic tools. We find in all instances a generating function Ψ(v,u), where the coefficient of vjui refers to the jth descent (ascent), and i to the initial (end) height. From this, various conclusions can be drawn, in particular expected values. In the probabilistic part, a Markov chain model is used, which allows to get explicit expressions for the heights of the second descent. In principle, one could go further, but the complexity of the results forbids it. This is extended to permutations of a large number of elements. Methods from q-analysis are used to simplify the expressions. This is the reason that we confine ourselves to the geometric distribution only. For general discrete distributions, no such tools are available.

We consider questions concerning the tileability of orthogonal polygons with colored dominoes. A colored domino is a rotatable 2 × 1 rectangle that is partitioned into two unit squares, which are called faces, each of which is assigned a color. In a colored domino tiling of an orthogonal polygon P, a set of dominoes completely covers P such that no dominoes overlap and so that adjacent faces have the same color. We demonstrated that for simple layout polygons that can be tiled with colored dominoes, two colors are always sufficient. We also show that for tileable non-simple layout polygons, four colors are always sufficient and sometimes necessary. We describe an O(n) time algorithm for computing a colored domino tiling of a simple orthogonal polygon, if such a tiling exists, where n is the number of dominoes used in the tiling. We also show that deciding whether or not a non-simple orthogonal polygon can be tiled with colored dominoes is NP-complete.

The β-numeration, born with the works of Rényi and Parry, provides a generalization of the notions of integers, decimal numbers and rational numbers by expanding real numbers in base β, where β>1 is not an integer. One of the main differences with the case of numeration in integral base is that the sets which play the role of integers, decimal numbers and rational numbers in base β are not stable under addition or multiplication. In particular, a fractional part may appear when one adds or multiplies two integers in base β. When β is a Pisot number, which corresponds to the most studied case, the lengths of the finite fractional parts that may appear when one adds or multiplies two integers in base β are bounded by constants which only depend on β. We prove that, for any Perron number β, the set of finite or ultimately periodic fractional parts of the sum, or the product, of two integers in base β is finite. Additionally, we prove that it is possible to compute this set for the case of addition when β is a Parry number. As a consequence, we deduce that, when β is a Perron number, there exist bounds, which only depend on β, for the lengths of the finite fractional parts that may appear when one adds or multiplies two integers in base β. Moreover, when β is a Parry number, the bound associated with the case of addition can be explicitly computed.

We analyse the asymptotic behaviour in the mean of a non-commutative rational series, which originates from differential cryptanalysis, using tools from probability theory, and from analytic number theory. We derive a Fourier representation of a first-order summation function obtained by interpreting this rational series as a non-classical rational sequence via the octal numeration system. The method is applicable to a wide class of sequences rational with respect to a numeration system essentially under the condition that they admit a linear representation with nonnegative coefficients.

In a recent paper Broutin and Devroye (2005) have studied the height of a class of edge-weighted random trees.This is a class of trees growing in continuous time which includes many wellknown trees as examples. In this paper we derive a limit theorem for the internal path length for this class of trees.For the proof we extend a limit theorem in Neininger and Rüschendorf (2004) to recursive sequences of random variables with continuous time parameter.

Exponential bounds and tail estimates are derived for additive random recursive sequences, which typically arise as functionals of recursive structures, of random trees or in recursive algorithms. In particular they arise as parameters of divide and conquer type algorithms. We derive tail bounds from estimates of the Laplace transforms and of the moment sequences. For the proof we use some classical exponential bounds and some variants of the induction method. The paper generalizes results of Rösler (% \citeyearNPRoesler:91, % \citeyearNPRoesler:92) and % \citeNNeininger:05 on subgaussian tails to more general classes of additive random recursive sequences. It also gives sufficient conditions for tail bounds of the form \exp(-a t^p) which are based on a characterization of \citeNKasahara:78.