Analysis of Algorithms

Analysis of algorithms is concerned with accurate estimates of complexity parameters of algorithms and aims at predicting the behaviour of a given algorithm run in a given environment. It develops general methods for obtaining closed-form formulae, asymptotic estimates, and probability distributions for combinatorial or probabilistic quantities, that are of interest in the optimization of algorithms. Interest is also placed on the methods themselves, whether combinatorial, probabilistic, or analytic. Combinatorial and statistical properties of discrete structures (strings, trees, tries, dags, graphs, and so on) as well as mathematical objects (e.g., continued fractions, polynomials, operators) that are relevant to the design of efficient algorithms are investigated.


A note on limits of sequences of binary trees

Rudolf Grübel.
We discuss a notion of convergence for binary trees that is based on subtree sizes. In analogy to recent developments in the theory of graphs, posets and permutations we investigate some general aspects of the topology, such as a characterization of the set of possible limits and its structure as a metric space. For random trees the subtree size topology arises in the context of algorithms for searching and sorting when applied to random input, resulting in a sequence of nested trees. For these we obtain a structural result based on a local version of exchangeability. This in turn leads to a central limit theorem, with possibly mixed asymptotic normality.

Series acceleration formulas obtained from experimentally discovered hypergeometric recursions

Paul Levrie ; John Campbell.
In 2010, Kh. Hessami Pilehrood and T. Hessami Pilehrood introduced generating function identities used to obtain series accelerations for values of Dirichlet's $\beta$ function, via the Markov--Wilf--Zeilberger method. Inspired by these past results, together with related results introduced by Chu et al., we introduce a variety of hypergeometric recurrences. We prove these recurrences using the WZ method, and we apply these recurrences to obtain series acceleration identities. We introduce a family of summations generalizing a Ramanujan-type series for $\frac{1}{\pi^2}$ due to Guillera, and a family of summations generalizing an accelerated series for Catalan's constant due to Lupa\c{s}, and many related results.

A heuristic technique for decomposing multisets of non-negative integers according to the Minkowski sum

Luciano Margara.
We study the following problem. Given a multiset $M$ of non-negative integers, decide whether there exist and, in the positive case, compute two non-trivial multisets whose Minkowski sum is equal to $M$. The Minkowski sum of two multisets A and B is a multiset containing all possible sums of any element of A and any element of B. This problem was proved to be NP-complete when multisets are replaced by sets. This version of the problem is strictly related to the factorization of boolean polynomials that turns out to be NP-complete as well. When multisets are considered, the problem is equivalent to the factorization of polynomials with non-negative integer coefficients. The computational complexity of both these problems is still unknown. The main contribution of this paper is a heuristic technique for decomposing multisets of non-negative integers. Experimental results show that our heuristic decomposes multisets of hundreds of elements within seconds independently of the magnitude of numbers belonging to the multisets. Our heuristic can be used also for factoring polynomials in N[x]. We show that, when the degree of the polynomials gets larger, our technique is much faster than the state-of-the-art algorithms implemented in commercial software like Mathematica and MatLab.

Leaf multiplicity in a Bienaymé-Galton-Watson tree

Anna M. Brandenberger ; Luc Devroye ; Marcel K. Goh ; Rosie Y. Zhao.
This note defines a notion of multiplicity for nodes in a rooted tree and presents an asymptotic calculation of the maximum multiplicity over all leaves in a Bienaymé-Galton-Watson tree with critical offspring distribution $\xi$, conditioned on the tree being of size $n$. In particular, we show that if $S_n$ is the maximum multiplicity in a conditional Bienaymé-Galton-Watson tree, then $S_n = \Omega(\log n)$ asymptotically in probability and under the further assumption that ${\bf E}\{2^\xi\} < \infty$, we have $S_n = O(\log n)$ asymptotically in probability as well. Explicit formulas are given for the constants in both bounds. We conclude by discussing links with an alternate definition of multiplicity that arises in the root-estimation problem.

On non-adaptive majority problems of large query size

Dániel Gerbner ; Máté Vizer.
We are given $n$ balls and an unknown coloring of them with two colors. Our goal is to find a ball that belongs to the larger color class, or show that the color classes have the same size. We can ask sets of $k$ balls as queries, and the problem has different variants, according to what the answers to the queries can be. These questions has attracted several researchers, but the focus of most research was the adaptive version, where queries are decided sequentially, after learning the answer to the previous query. Here we study the non-adaptive version, where all the queries have to be asked at the same time.

Anti-power $j$-fixes of the Thue-Morse word

Marisa Gaetz.
Recently, Fici, Restivo, Silva, and Zamboni introduced the notion of a $k$-anti-power, which is defined as a word of the form $w^{(1)} w^{(2)} \cdots w^{(k)}$, where $w^{(1)}, w^{(2)}, \ldots, w^{(k)}$ are distinct words of the same length. For an infinite word $w$ and a positive integer $k$, define $AP_j(w,k)$ to be the set of all integers $m$ such that $w_{j+1} w_{j+2} \cdots w_{j+km}$ is a $k$-anti-power, where $w_i$ denotes the $i$-th letter of $w$. Define also $\mathcal{F}_j(k) = (2 \mathbb{Z}^+ - 1) \cap AP_j(\mathbf{t},k)$, where $\mathbf{t}$ denotes the Thue-Morse word. For all $k \in \mathbb{Z}^+$, $\gamma_j(k) = \min (AP_j(\mathbf{t},k))$ is a well-defined positive integer, and for $k \in \mathbb{Z}^+$ sufficiently large, $\Gamma_j(k) = \sup ((2 \mathbb{Z}^+ -1) \setminus \mathcal{F}_j(k))$ is a well-defined odd positive integer. In his 2018 paper, Defant shows that $\gamma_0(k)$ and $\Gamma_0(k)$ grow linearly in $k$. We generalize Defant's methods to prove that $\gamma_j(k)$ and $\Gamma_j(k)$ grow linearly in $k$ for any nonnegative integer $j$. In particular, we show that $\displaystyle 1/10 \leq \liminf_{k \rightarrow \infty} (\gamma_j(k)/k) \leq 9/10$ and $\displaystyle 1/5 \leq \limsup_{k \rightarrow \infty} (\gamma_j(k)/k) \leq 3/2$. Additionally, we show that $\displaystyle \liminf_{k \rightarrow \infty} (\Gamma_j(k)/k) = 3/2$ and $\displaystyle \limsup_{k \rightarrow \infty} (\Gamma_j(k)/k) = 3$.

The number of distinct adjacent pairs in geometrically distributed words

Margaret Archibald ; Aubrey Blecher ; Charlotte Brennan ; Arnold Knopfmacher ; Stephan Wagner ; Mark Ward.
A sequence of geometric random variables of length $n$ is a sequence of $n$ independent and identically distributed geometric random variables ($\Gamma_1, \Gamma_2, \dots, \Gamma_n$) where $\mathbb{P}(\Gamma_j=i)=pq^{i-1}$ for $1~\leq~j~\leq~n$ with $p+q=1.$ We study the number of distinct adjacent two letter patterns in such sequences. Initially we directly count the number of distinct pairs in words of short length. Because of the rapid growth of the number of word patterns we change our approach to this problem by obtaining an expression for the expected number of distinct pairs in words of length $n$. We also obtain the asymptotics for the expected number as $n \to \infty$.

The repetition threshold for binary rich words

James D. Currie ; Lucas Mol ; Narad Rampersad.
A word of length $n$ is rich if it contains $n$ nonempty palindromic factors. An infinite word is rich if all of its finite factors are rich. Baranwal and Shallit produced an infinite binary rich word with critical exponent $2+\sqrt{2}/2$ ($\approx 2.707$) and conjectured that this was the least possible critical exponent for infinite binary rich words (i.e., that the repetition threshold for binary rich words is $2+\sqrt{2}/2$). In this article, we give a structure theorem for infinite binary rich words that avoid $14/5$-powers (i.e., repetitions with exponent at least 2.8). As a consequence, we deduce that the repetition threshold for binary rich words is $2+\sqrt{2}/2$, as conjectured by Baranwal and Shallit. This resolves an open problem of Vesti for the binary alphabet; the problem remains open for larger alphabets.

The Adaptive sampling revisited

Matthew Drescher ; Guy Louchard ; Yvik Swan.
The problem of estimating the number n of distinct keys of a large collection of N data is well known in computer science. A classical algorithm is the adaptive sampling (AS). n can be estimated by R2 J , where R is the final bucket size and J is the final depth at the end of the process. Several new interesting questions can be asked about AS (some of them were suggested by P.Flajolet and popularized by J.Lumbroso). The distribution of W = log(R2 J /n) is known, we rederive this distribution in a simpler way. We provide new results on the moments of J and W. We also analyze the final cache size R distribution. We consider colored keys: assume also that among the n distinct keys, m do have color K We show how to estimate p = m n. We study keys with some multiplicity : we provide a way to estimate the total number M of some color K keys among the total number N of keys. We consider the case where we know a priori the multiplicities but not the colors. There we want to estimate the total number of keys N. An appendix is devoted to the case where the hashing function provides bits with probability different from 1/2.

Tropical Vertex-Disjoint Cycles of a Vertex-Colored Digraph: Barter Exchange with Multiple Items Per Agent

Timothy Highley ; Hoang Le.
In a barter exchange market, agents bring items and seek to exchange their items with one another. Agents may agree to a k-way exchange involving a cycle of k agents. A barter exchange market can be represented by a digraph where the vertices represent items and the edges out of a vertex indicate the items that an agent is willing to accept in exchange for that item. It is known that the problem of finding a set of vertex-disjoint cycles with the maximum total number of vertices (MAX-SIZE-EXCHANGE) can be solved in polynomial time. We consider a barter exchange where each agent may bring multiple items, and items of the same agent are represented by vertices with the same color. A set of cycles is said to be tropical if for every color there is a cycle that contains a vertex of that color. We show that the problem of determining whether there exists a tropical set of vertex-disjoint cycles in a digraph (TROPICAL-EXCHANGE) is NP-complete and APX-hard. This is equivalent to determining whether it is possible to arrange an exchange of items among agents such that every agent trades away at least one item. TROPICAL-MAX-SIZE-EXCHANGE is a similar problem, where the goal is to find a set of vertex-disjoint cycles that contains the maximum number of vertices and also contains all of the colors in the graph. We show that this problem is likewise NP-complete and APX-hard. For the restricted case where there are at most two vertices of each color (corresponding to a restriction that […]

Protected node profile of Tries

Mehri Javanian.
In a rooted tree, protected nodes are neither leaves nor parents of any leaves. They have some practical motivations, e.g., in organizational schemes, security models and social-network models. Protected node profile measures the number of protected nodes with the same distance from the root in rooted trees. For no rooted tree, protected node profile has been investigated so far. Here, we present the asymptotic expectations, variances, covariance and limiting bivariate distribution of protected node profile and non-protected internal node profile in random tries, an important data structure on words in computer science. Also we investigate the fraction of these expectations asymptotically. These results are derived by the methods of analytic combinatorics such as generating functions, Mellin transform, Poissonization and depoissonization, saddle point method and singularity analysis.

On subtrees of the representation tree in rational base numeration systems

Shigeki Akiyama ; Victor Marsault ; Jacques Sakarovitch.
Every rational number p/q defines a rational base numeration system in which every integer has a unique finite representation, up to leading zeroes. This work is a contribution to the study of the set of the representations of integers. This prefix-closed subset of the free monoid is naturally represented as a highly non-regular tree. Its nodes are the integers, its edges bear labels taken in {0,1,...,p-1}, and its subtrees are all distinct. We associate with each subtree (or with its root n) three infinite words. The bottom word of n is the lexicographically smallest word that is the label of a branch of the subtree. The top word of n is defined similarly. The span-word of n is the digitwise difference between the latter and the former. First, we show that the set of all the span-words is accepted by an infinite automaton whose underlying graph is essentially the same as the tree itself. Second, we study the function that computes for all n the bottom word associated with n+1 from the one associated with n, and show that it is realised by an infinite sequential transducer whose underlying graph is once again essentially the same as the tree itself. An infinite word may be interpreted as an expansion in base p/q after the radix point, hence evaluated to a real number. If T is a subtree whose root is n, then the evaluations of the labels of the branches of T form an interval of $\mathbb{R}$. The length of this interval is called the span of n and is equal to the […]

Growing and Destroying Catalan-Stanley Trees

Benjamin Hackl ; Helmut Prodinger.
Stanley lists the class of Dyck paths where all returns to the axis are of odd length as one of the many objects enumerated by (shifted) Catalan numbers. By the standard bijection in this context, these special Dyck paths correspond to a class of rooted plane trees, so-called Catalan-Stanley trees. This paper investigates a deterministic growth procedure for these trees by which any Catalan-Stanley tree can be grown from the tree of size one after some number of rounds; a parameter that will be referred to as the age of the tree. Asymptotic analyses are carried out for the age of a random Catalan-Stanley tree of given size as well as for the "speed" of the growth process by comparing the size of a given tree to the size of its ancestors.

Lattice paths with catastrophes

Cyril Banderier ; Michael Wallner.
In queuing theory, it is usual to have some models with a "reset" of the queue. In terms of lattice paths, it is like having the possibility of jumping from any altitude to zero. These objects have the interesting feature that they do not have the same intuitive probabilistic behaviour as classical Dyck paths (the typical properties of which are strongly related to Brownian motion theory), and this article quantifies some relations between these two types of paths. We give a bijection with some other lattice paths and a link with a continued fraction expansion. Furthermore, we prove several formulae for related combinatorial structures conjectured in the On-Line Encyclopedia of Integer Sequences. Thanks to the kernel method and via analytic combinatorics, we provide the enumeration and limit laws of these "lattice paths with catastrophes" for any finite set of jumps. We end with an algorithm to generate such lattice paths uniformly at random.

Tight Euler tours in uniform hypergraphs - computational aspects

Zbigniew Lonc ; Paweł Naroski ; Paweł Rzążewski.
By a tight tour in a $k$-uniform hypergraph $H$ we mean any sequence of its vertices $(w_0,w_1,\ldots,w_{s-1})$ such that for all $i=0,\ldots,s-1$ the set $e_i=\{w_i,w_{i+1}\ldots,w_{i+k-1}\}$ is an edge of $H$ (where operations on indices are computed modulo $s$) and the sets $e_i$ for $i=0,\ldots,s-1$ are pairwise different. A tight tour in $H$ is a tight Euler tour if it contains all edges of $H$. We prove that the problem of deciding if a given $3$-uniform hypergraph has a tight Euler tour is NP-complete, and that it cannot be solved in time $2^{o(m)}$ (where $m$ is the number of edges in the input hypergraph), unless the ETH fails. We also present an exact exponential algorithm for the problem, whose time complexity matches this lower bound, and the space complexity is polynomial. In fact, this algorithm solves a more general problem of computing the number of tight Euler tours in a given uniform hypergraph.

Asymptotics of the occupancy scheme in a random environment and its applications to tries

Silvia Businger.
Consider $ m $ copies of an irreducible, aperiodic Markov chain $ Y $ taking values in a finite state space. The asymptotics as $ m $ tends to infinity, of the first time from which on the trajectories of the $ m $ copies differ, have been studied by Szpankowski (1991) in the setting of tries. We use a different approach and model the $ m $ trajectories by a variant of the occupancy scheme, where we consider a nested sequence of boxes. This approach will enable us to extend the result to the case when the transition probabilities are random. We moreover use the same techniques to study the asymptotics as $ m $ tends to infinity of the time up to which we have observed all the possible trajectories of $ Y $ in random and nonrandom scenery.

Sequential selection of the k best out of nrankable objects

F. Thomas Bruss ; Guy Louchard.
The objective of this paper is to find in a setting of n sequential observations of objects a good online policy to select the k bestof these n uniquely rankable objects. This focus is motivated by the fact that it is hard to find closed form solutions of optimalstrategies for general k and n. Selection is without recall, and the idea is to investigate threshold functions which maintain allpresent information, that is thresholds which are functions of all selections made so far. Our main interest lies in the asymptoticbehaviour of these thresholds as n -> infinity and in the corresponding asymptotic performance of the threshold algorithm.

Variance and Covariance of Several Simultaneous Outputs of a Markov Chain

Sara Kropf.
The partial sum of the states of a Markov chain or more generally a Markov source is asymptotically normally distributed under suitable conditions. One of these conditions is that the variance is unbounded. A simple combinatorial characterization of Markov sources which satisfy this condition is given in terms of cycles of the underlying graph of the Markov chain. Also Markov sources with higher dimensional alphabets are considered. Furthermore, the case of an unbounded covariance between two coordinates of the Markov source is combinatorically characterized. If the covariance is bounded, then the two coordinates are asymptotically independent. The results are illustrated by several examples, like the number of specific blocks in $0$-$1$-sequences and the Hamming weight of the width-$w$ non-adjacent form.

Automata in SageMath---Combinatorics meet Theoretical Computer Science

Clemens Heuberger ; Daniel Krenn ; Sara Kropf.
The new finite state machine package in the mathematics software system SageMath is presented and illustrated by many examples. Several combinatorial problems, in particular digit problems, are introduced, modeled by automata and transducers and solved using SageMath. In particular, we compute the asymptotic Hamming weight of a non-adjacent-form-like digit expansion, which was not known before.

On the Dynamics of Systems of Urns

Marek Klonowski ; Jacek Cichoń ; Rafał Kapelko.
In this paper we present an analysis of some generalization of the classic urn and balls model. In our model each urn has a fixed capacity and initially is filled with white balls. Black balls are added to the system of connected urns and gradually displace white balls. We show a general form of formulas for the expected numbers of black balls in a given urn and we analyze some special cases (parallel and serial configurations). We are mainly interested in a counterpart of the Coupon Collector Problem for the model considered. The primary motivation for our research is the formal analysis of the mix networks (introduced by D. Chaum) and its immunity to so-called flooding (blending) attacks.

Persisting randomness in randomly growing discrete structures: graphs and search trees

Rudolf Grübel.
The successive discrete structures generated by a sequential algorithm from random input constitute a Markov chain that may exhibit long term dependence on its first few input values. Using examples from random graph theory and search algorithms we show how such persistence of randomness can be detected and quantified with techniques from discrete potential theory. We also show that this approach can be used to obtain strong limit theorems in cases where previously only distributional convergence was known.

How often should you clean your room?

Kimball Martin ; Krishnan Shankar.
We introduce and study a combinatorial optimization problem motivated by the question in the title. In the simple case where you use all objects in your room equally often, we investigate asymptotics of the optimal time to clean up in terms of the number of objects in your room. In particular, we prove a logarithmic upper bound, solve an approximate version of this problem, and conjecture a precise logarithmic asymptotic.

An algorithmic analysis of Flood-It and Free-Flood-It on graph powers

Uéverton dos Santos Souza ; Fábio Protti ; Maise Silva.
Flood-it is a combinatorial game played on a colored graph G whose aim is to make the graph monochromatic using the minimum number of flooding moves, relatively to a fixed pivot. Free-Flood-it is a variant where the pivot can be freely chosen for each move of the game. The standard versions of Flood-it and Free-Flood-it are played on m ×n grids. In this paper we analyze the behavior of these games when played on other classes of graphs, such as d-boards, powers of cycles and circular grids. We describe polynomial time algorithms to play Flood-it on C2n (the second power of a cycle on n vertices), 2 ×n circular grids, and some types of d-boards (grids with a monochromatic column). We also show that Free-Flood-it is NP-hard on C2n and 2 ×n circular grids.

Computing the number of h-edge spanning forests in complete bipartite graphs

Rebecca Stones.
Let fm,n,h be the number of spanning forests with h edges in the complete bipartite graph Km,n. Kirchhoff\textquoterights Matrix Tree Theorem implies fm,n,m+n-1=mn-1 nm-1 when m ≥1 and n ≥1, since fm,n,m+n-1 is the number of spanning trees in Km,n. In this paper, we give an algorithm for computing fm,n,h for general m,n,h. We implement this algorithm and use it to compute all non-zero fm,n,h when m ≤50 and n ≤50 in under 2 days.

Coupon collecting and transversals of hypergraphs

Marcel Wild ; Svante Janson ; Stephan Wagner ; Dirk Laurie.
The classic Coupon-Collector Problem (CCP) is generalized. Only basic probability theory is used. Centerpiece rather is an algorithm that efficiently counts all k-element transversals of a set system.

Connectivity for line-of-sight networks in higher dimensions

Luc Devroye ; Linda Farczadi.
Let T be a d-dimensional toroidal grid of n^d points. For a given range parameter ω, and a positive integer k ≤q d, we say that two points in T are mutually visible if they differ in at most k coordinates and are a distance at most ω apart, where distance is measured using the \ellₚ norm. We obtain a random d-dimensional line-of-sight graph G by placing a node at each point in T independently with some fixed probability p^* and connecting all pairs of mutually visible nodes. We prove an asymptotically tight connectivity result for this random graph.

The analysis of find and versions of it

Diether Knof ; Uwe Roesler.
In the running time analysis of the algorithm Find and versions of it appear as limiting distributions solutions of stochastic fixed points equation of the form X D = Sigma(i) AiXi o Bi + C on the space D of cadlag functions. The distribution of the D-valued process X is invariant by some random linear affine transformation of space and random time change. We show the existence of solutions in some generality via the Weighted Branching Process. Finite exponential moments are connected to stochastic fixed point of supremum type X D = sup(i) (A(i)X(i) + C-i) on the positive reals. Specifically we present a running time analysis of m-median and adapted versions of Find. The finite dimensional distributions converge in L-1 and are continuous in the cylinder coordinates. We present the optimal adapted version in the sense of low asymptotic average number of comparisons. The limit distribution of the optimal adapted version of Find is a point measure on the function [0, 1] there exists t -> 1 + mint, 1 - t.

The asymmetric leader election algorithm with Swedish stopping: a probabilistic analysis

Guy Louchard ; Helmut Prodinger.
We study a leader election protocol that we call the Swedish leader election protocol. This name comes from a protocol presented by L. Bondesson, T. Nilsson, and G. Wikstrand (2007). The goal is to select one among n > 0 players, by proceeding through a number of rounds. If there is only one player remaining, the protocol stops and the player is declared the leader. Otherwise, all remaining players flip a biased coin; with probability q the player survives to the next round, with probability p = 1 - q the player loses (is killed) and plays no further ... unless all players lose in a given round (null round), so all of them play again. In the classical leader election protocol, any number of null rounds may take place, and with probability 1 some player will ultimately be elected. In the Swedish leader election protocol there is a maximum number tau of consecutive null rounds, and if the threshold is attained the protocol fails without declaring a leader. In this paper, several parameters are asymptotically analyzed, among them: Success Probability, Number of rounds R-n, Number of null rounds T-n. This paper is a companion paper to Louchard, Martinez and Prodinger (2011) where De-Poissonization was used, together with the Mellin transform. While this works fine as far as it goes, there are limitations, in particular of a computational nature. The approach chosen here is similar to earlier efforts of the same authors - Louchard and Prodinger (2004,2005,2009). Identifying some […]

The Variance of the Profile in Digital Search Trees

Ramin Kazemi ; Mohammad Q. Vahidi-Asl.
What today we call digital search tree (DST) is Coffman and Eve's sequence tree introduced in 1970. A digital search tree is a binary tree whose ordering of nodes is based on the values of bits in the binary representation of a node's key. In fact, a digital search tree is a digital tree in which strings (keys, words) are stored directly in internal nodes. The profile of a digital search tree is a parameter that counts the number of nodes at the same distance from the root. In this paper we concentrate on external profile, i.e., the number of external nodes at level k when n strings are sorted. By assuming that the n input strings are independent and follow a (binary) memoryless source the asymptotic behaviour of the average profile was determined by Drmota and Szpankowski (2011). The purpose of the present paper is to extend their analysis and to provide a precise analysis of variance of the profile. The main (technical) difference is that we have to deal with an inhomogeneous part in a proper functional-differential equations satisfied by the second moment and Poisson variance. However, we show that the variance is asymptotically of the same order as the expected value which implies concentration. These results are derived by methods of analytic combinatorics such as generating functions, Mellin transform, Poissonization, the saddle point method and singularity analysis.

Digital search trees with m trees: Level polynomials and insertion costs

Helmut Prodinger.
We adapt a novel idea of Cichon's related to Approximate Counting to the present instance of Digital Search Trees, by using m (instead of one) such trees. We investigate the level polynomials, which have as coefficients the expected numbers of data on a given level, and the insertion costs. The level polynomials can be precisely described, thanks to formulae from q-analysis. The asymptotics of expectation and variance of the insertion cost are fairly standard these days and done with Rice's method.

A further analysis of Cuckoo Hashing with a Stash and Random Graphs of Excess r

Reinhard Kutzelnigg.
Cuckoo hashing is a hash table data structure offering constant access time, even in the worst case. As a drawback, the construction fails with small, but practically significant probability. However, Kirsch et al. (2008) showed that a constant-sized additional memory, the so called stash, is sufficient to reduce the failure rate drastically. But so far, using a modified insertion procedure that demands additional running time to look for an admissible key is required. As a major contribution of this paper, we show that the same bounds on the failure probability hold even without this search process and thus, the performance increases. Second, we extend the analysis to simplified cuckoo hashing, a variant of the original algorithm offering increased performance. Further, we derive some explicit asymptotic approximations concerning the number of usual resp. bipartite graphs related to the data structures. Using these results, we obtain much more precise asymptotic expansions of the success rate. These calculations are based on a generating function approach and applying the saddle point method. Finally, we provide numerical results to support the theoretical analysis.

Diophantine Approximation, Ostrowski Numeration and the Double-Base Number System

Valerie Berthe ; Laurent Imbert.
A partition of $x > 0$ of the form $x = \sum_i 2^{a_i}3^{b_i}$ with distinct parts is called a double-base expansion of $x$. Such a representation can be obtained using a greedy approach, assuming one can efficiently compute the largest \mbox{$\{2,3\}$-integer}, i.e., a number of the form $2^a3^b$, less than or equal to $x$. In order to solve this problem, we propose an algorithm based on continued fractions in the vein of the Ostrowski number system, we prove its correctness and we analyse its complexity. In a second part, we present some experimental results on the length of double-base expansions when only a few iterations of our algorithm are performed.

On-line extensible bin packing with unequal bin sizes

Deshi Ye ; Guochuan Zhang.
In the extensible bin packing problem we are asked to pack a set of items into a given number of bins, each with an original size. However, the original bin sizes can be extended if necessary. The goal is to minimize the total size of the bins. We consider the problem with unequal (original) bin sizes and give the complete analysis on a list scheduling algorithm (LS). Namely we present tight bounds of LS for every collection of original bin sizes and every number of bins. We further show better on-line algorithms for the two-bin case and the three-bin case. Interestingly, it is proved that the on-line algorithms have better competitive ratios for unequal bins than for equal bins. Some variants of the problem are also discussed.

Convergence of some leader election algorithms

Svante Janson ; Christian Lavault ; Guy Louchard.
We start with a set of $n$ players. With some probability $P(n,k)$, we kill $n-k$ players; the other ones stay alive, and we repeat with them. What is the distribution of the number $X_n$ of \emph{phases} (or rounds) before getting only one player? We present a probabilistic analysis of this algorithm under some conditions on the probability distributions $P(n,k)$, including stochastic monotonicity and the assumption that roughly a fixed proportion $\al$ of the players survive in each round. We prove a kind of convergence in distribution for $X_n - \log_{1/\!\alpha}(n)$; as in many other similar problems there are oscillations and no true limit distribution, but suitable subsequences converge, and there is an absolutely continuous random variable $Z$ such that $d\l(X_n, \lceil Z + \log_{1/\!\alpha} (n)\rceil\r) \to 0$, where $d$ is either the total variation distance or the Wasserstein distance. Applications of the general result include the leader election algorithm where players are eliminated by independent coin tosses and a variation of the leader election algorithm proposed by W.R. Franklin. We study the latter algorithm further, including numerical results.

Analysis of some parameters for random nodes in priority trees

Alois Panholzer.
Priority trees are a certain data structure used for priority queue administration. Under the model that all permutations of the numbers 1, . . . , n are equally likely to construct a priority tree of size n we study the following parameters in size-n trees: depth of a random node, number of right edges to a random node, and number of descendants of a random node. For all parameters studied we give limiting distribution results.

Waiting Time Distribution for Pattern Occurrence in a Constrained Sequence: an Embedding Markov Chain Approach

Gregory Nuel.
In this paper we consider the distribution of a pattern of interest in a binary random (d; k)-sequence generated by a Markov source. Such constrained sequences are frequently encountered in communication systems. Unlike the previous approach based on generating function we have chosen here to use Markov chain embedding techniques. By doing so, we get both previous results (sequence constrained up to the rth occurrence), and new ones (sequence constrained up to its end). We also provide in both cases efficient algorithms using basic linear algebra only. We then compare our numerical results to previous ones and finally propose several interesting extensions of our method which further illustrate the usefulness of our approach. That is to say higher order Markov chains, renewal occurrences rather than overlapping ones, heterogeneous models, more complex patterns of interest, and multistate trial sequences.

The location of the first maximum in the first sojourn of a Dyck path

Helmut Prodinger.
For Dyck paths (nonnegative symmetric) random walks, the location of the first maximum within the first sojourn is studied. Generating functions and explicit resp. asymptotic expressions for the average are derived. Related parameters are also discussed.

Waiting time distributions for pattern occurrence in a constrained sequence

Valeri T. Stefanov ; Wojciech Szpankowski.
A binary sequence of zeros and ones is called a (d; k)-sequence if it does not contain runs of zeros of length either lessthan d or greater than k, where d and k are arbitrary, but fixed, non-negative integers and d < k. Such sequences find requires that (d; k)-sequences do not contain a specific pattern w. Therefore, distribution results concerning pattern occurrence in (d; k)-sequences are of interest. In this paper we study the distribution of the waiting time until the r-th occurrence of a pattern w in a random (d; k)-sequence generated by a Markov source. Numerical examples are also provided.

A combinatorial and probabilistic study of initial and end heights of descents in samples of geometrically distributed random variables and in permutations

Guy Louchard ; Helmut Prodinger.
In words, generated by independent geometrically distributed random variables, we study the lth descent, which is, roughly speaking, the lth occurrence of a neighbouring pair ab with a>b. The value a is called the initial height, and b the end height. We study these two random variables (and some similar ones) by combinatorial and probabilistic tools. We find in all instances a generating function Ψ(v,u), where the coefficient of vjui refers to the jth descent (ascent), and i to the initial (end) height. From this, various conclusions can be drawn, in particular expected values. In the probabilistic part, a Markov chain model is used, which allows to get explicit expressions for the heights of the second descent. In principle, one could go further, but the complexity of the results forbids it. This is extended to permutations of a large number of elements. Methods from q-analysis are used to simplify the expressions. This is the reason that we confine ourselves to the geometric distribution only. For general discrete distributions, no such tools are available.

On the tileability of polygons with colored dominoes

Chris Worman ; Boting Yang.
We consider questions concerning the tileability of orthogonal polygons with colored dominoes. A colored domino is a rotatable 2 × 1 rectangle that is partitioned into two unit squares, which are called faces, each of which is assigned a color. In a colored domino tiling of an orthogonal polygon P, a set of dominoes completely covers P such that no dominoes overlap and so that adjacent faces have the same color. We demonstrated that for simple layout polygons that can be tiled with colored dominoes, two colors are always sufficient. We also show that for tileable non-simple layout polygons, four colors are always sufficient and sometimes necessary. We describe an O(n) time algorithm for computing a colored domino tiling of a simple orthogonal polygon, if such a tiling exists, where n is the number of dominoes used in the tiling. We also show that deciding whether or not a non-simple orthogonal polygon can be tiled with colored dominoes is NP-complete.

Arithmetics in β-numeration

Julien Bernat.
The β-numeration, born with the works of Rényi and Parry, provides a generalization of the notions of integers, decimal numbers and rational numbers by expanding real numbers in base β, where β>1 is not an integer. One of the main differences with the case of numeration in integral base is that the sets which play the role of integers, decimal numbers and rational numbers in base β are not stable under addition or multiplication. In particular, a fractional part may appear when one adds or multiplies two integers in base β. When β is a Pisot number, which corresponds to the most studied case, the lengths of the finite fractional parts that may appear when one adds or multiplies two integers in base β are bounded by constants which only depend on β. We prove that, for any Perron number β, the set of finite or ultimately periodic fractional parts of the sum, or the product, of two integers in base β is finite. Additionally, we prove that it is possible to compute this set for the case of addition when β is a Parry number. As a consequence, we deduce that, when β is a Perron number, there exist bounds, which only depend on β, for the lengths of the finite fractional parts that may appear when one adds or multiplies two integers in base β. Moreover, when β is a Parry number, the bound associated with the case of addition can be explicitly computed.

Asymptotic behaviour of a non-commutative rational series with a nonnegative linear representation

Philippe Dumas ; Helger Lipmaa ; Johan Wallén.
We analyse the asymptotic behaviour in the mean of a non-commutative rational series, which originates from differential cryptanalysis, using tools from probability theory, and from analytic number theory. We derive a Fourier representation of a first-order summation function obtained by interpreting this rational series as a non-classical rational sequence via the octal numeration system. The method is applicable to a wide class of sequences rational with respect to a numeration system essentially under the condition that they admit a linear representation with nonnegative coefficients.

Note on the weighted internal path length of b-ary trees

Ludger Rüschendorf ; Eva-Maria Schopp.
In a recent paper Broutin and Devroye (2005) have studied the height of a class of edge-weighted random trees.This is a class of trees growing in continuous time which includes many wellknown trees as examples. In this paper we derive a limit theorem for the internal path length for this class of trees.For the proof we extend a limit theorem in Neininger and Rüschendorf (2004) to recursive sequences of random variables with continuous time parameter.

Exponential bounds and tails for additive random recursive sequences

Ludger Rüschendorf ; Eva-Maria Schopp.
Exponential bounds and tail estimates are derived for additive random recursive sequences, which typically arise as functionals of recursive structures, of random trees or in recursive algorithms. In particular they arise as parameters of divide and conquer type algorithms. We derive tail bounds from estimates of the Laplace transforms and of the moment sequences. For the proof we use some classical exponential bounds and some variants of the induction method. The paper generalizes results of Rösler (% \citeyearNPRoesler:91, % \citeyearNPRoesler:92) and % \citeNNeininger:05 on subgaussian tails to more general classes of additive random recursive sequences. It also gives sufficient conditions for tail bounds of the form \exp(-a t^p) which are based on a characterization of \citeNKasahara:78.