Discrete Mathematics & Theoretical Computer Science |
The present volume collects the proceedings of the Aofa'12, 23st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms held at the Centre de Recherches Mathématiques at Université de Montreal, Canada, between June 18-22, 2012. The conference builds on the communities of the former series of conferences ``Mathematics and Computer Science'' and ``Analysis of Algorithms'', and aims at studying rigorously the combinatorial objects which appear in particular as data structures, algorithms, models of networks and as well as the essential ubiquitous combinatorial structures. The program committee selected submissions covering this wide range of topics. The regular papers were presented in 30-minute talks, and short abstracts in presentations of 15 minutes. They all appear in the present volume. The conference included five invited plenary lectures: - Michael Drmota (TU Vienna, Austria): Extremal parameters in critical and subcritical graph classes - Svante Janson (Uppsala, Sweden): Simply generated trees, conditioned Galton-Watson trees, random allocations and condensation - Amin Coja-Oghlan (Warwick, UK): Phase transitions and computational complexity - Claire Mathieu (Brown, USA): Algorithms for optimization over noisy data - Avi Wigderson (IAS, USA): Population Recovery via Partial Identification We thank the members of the steering and program committees for their involvement. We also thank the invited speakers, and the authors of the contributed papers. We express our gratitute to the people at CRM and especially Louis Pelletier for their invaluable help in making sure that the meeting went smoothly. We are also grateful to the editor-in-chief of DMTCS Jens Gustedt for his help in the compilation of the proceedings. Finally, our special thanks go to the sponsors of the conference for their contributions: CRM, Inria, and NSERC. Nicolas Broutin and Luc Devroye, Editors **Steering Committee** - Brigitte Chauvin (Versailles, France) - Luc Devroye (McGill, Canada) - Michael Drmota (TU Vienna, Austria) - Daniel Panario (Carleton, Canada) - Robert Sedgewick (Princeton, USA) - Wojciech Szpankowski (Purdue, USA) - Brigitte Vallée (Caen, France) **Program Committee** - Frédérique Bassino, Paris 13, France - Jit Bose, Carleton, Canada - Nicolas Broutin (co-chair), Inria, France - Philippe Chassaing, Nancy, France - Julien Clément, Caen, France - Luc Devroye (co-chair), McGill, Canada - Uriel Feige, Weizmann, Israel - Alan Frieze, Carnegie-Mellon, USA - Bernhard Gittenberger, TU Vienna, Austria - Hsien-Kuei Hwang, Academia Sinica, Taiwan - Philippe Jacquet, Inria, France - Colin McDiarmid, Oxford, UK - Mike Molloy, Toronto, Canada - Ralph Neininger, Frankfurt, Germany - Marc Noy, Barcelona, Spain - Daniel Panario, Carleton, Canada - Alois Panholzer, TU Vienna, Austria - Bruno Salvy, Inria, France - Mohit Singh, McGill, Canada - Brigitte Vallée, Caen, France - Mark Ward, Purdue, USA **Organization**- Nicolas Broutin - Luc Devroye
In a recently proposed graphical compression algorithm by Choi and Szpankowski (2009), the following tree arose in the course of the analysis. The root contains n balls that are consequently distributed between two subtrees according to a simple rule: In each step, all balls independently move down to the left subtree (say with probability $p$) or the right subtree (with probability 1-$p$). A new node is created as long as there is at least one ball in that node. Furthermore, a nonnegative integer $d$ is given, and at level $d$ or greater one ball is removed from the leftmost node before the balls move down to the next level. These steps are repeated until all balls are removed (i.e., after $n+d$ steps). Observe that when $d=∞$ the above tree can be modeled as a $\textit{trie}$ that stores $n$ independent sequences generated by a memoryless source with parameter $p$. Therefore, we coin the name $(n,d)$-tries for the tree just described, and to which we often refer simply as $d$-tries. Parameters of such a tree (e.g., path length, depth, size) are described by an interesting two-dimensional recurrence (in terms of $n$ and $d$) that – to the best of our knowledge – was not analyzed before. We study it, and show how much parameters of such a $(n,d)$-trie differ from the corresponding parameters of regular tries. We use methods of analytic algorithmics, from Mellin transforms to analytic poissonization.
Approximate counting is an algorithm that provides a count of a huge number of objects within an error tolerance. The first detailed analysis of this algorithm was given by Flajolet. In this paper, we propose a new analysis via the Poisson-Laplace-Mellin approach, a method devised for analyzing shape parameters of digital search trees in a recent paper of Hwang et al. Our approach yields a different and more compact expression for the periodic function from the asymptotic expansion of the variance. We show directly that our expression coincides with the one obtained by Flajolet. Moreover, we apply our method to variations of approximate counting, too.
We use death processes and embeddings into continuous time in order to analyze several urn models with a diminishing content. In particular we discuss generalizations of the pill's problem, originally introduced by Knuth and McCarthy, and generalizations of the well known sampling without replacement urn models, and OK Corral urn models.
The purpose of this article is to present a general method to find limiting laws for some renormalized statistics on random permutations. The model considered here is Ewens sampling model, which generalizes uniform random permutations. We describe the asymptotic behavior of a large family of statistics, including the number of occurrences of any given dashed pattern. Our approach is based on the method of moments and relies on the following intuition: two events involving the images of different integers are almost independent.
Let $k≥2$ be a fixed integer. Given a bounded sequence of real numbers $(a_n:n≥k)$, then for any sequence $(f_n:n≥1)$ of real numbers satisfying the divide-and-conquer recurrence $f_n = (k-mod(n,k))f_⌊n/k⌋+mod(n,k)f_⌈n/k⌉ + a_n, n ≥k$, there is a unique continuous periodic function $f^*:\mathbb{R}→\mathbb{R}$ with period 1 such that $f_n = nf^*(\log _kn)+o(n)$. If $(a_n)$ is periodic with period $k, a_k=0$, and the initial conditions $(f_i:1 ≤i ≤k-1)$ are all zero, we obtain a specific iterated function system $S$, consisting of $k$ continuous functions from $[0,1]×\mathbb{R}$ into itself, such that the attractor of $S$ is $\{(x,f^*(x)): 0 ≤x ≤1\}$. Using the system $S$, an accurate plot of $f^*$ can be rapidly obtained.
Many parameters of trees are additive in the sense that they can be computed recursively from the sum of the branches plus a certain toll function. For instance, such parameters occur very frequently in the analysis of divide-and-conquer algorithms. Here we are interested in the situation that the toll function is small (the average over all trees of a given size $n$ decreases exponentially with $n$). We prove a general central limit theorem for random labelled trees and apply it to a number of examples. The main motivation is the study of the number of subtrees in a random labelled tree, but it also applies to classical instances such as the number of leaves.
no abstract
In this paper, we study the shuffle operator on concurrent processes (represented as trees) using analytic combinatorics tools. As a first result, we show that the mean width of shuffle trees is exponentially smaller than the worst case upper-bound. We also study the expected size (in total number of nodes) of shuffle trees. We notice, rather unexpectedly, that only a small ratio of all nodes do not belong to the last two levels. We also provide a precise characterization of what ``exponential growth'' means in the case of the shuffle on trees. Two practical outcomes of our quantitative study are presented: (1) a linear-time algorithm to compute the probability of a concurrent run prefix, and (2) an efficient algorithm for uniform random generation of concurrent runs.
We prove a total variation approximation for the distribution of component vector of a weakly logarithmic random assembly. The proof demonstrates an analytic approach based on a comparative analysis of the coefficients of two power series.
Enumeration of planar lattice walks is a classical topic in combinatorics, at the cross-roads of several domains (e.g., probability, statistical physics, computer science). The aim of this paper is to propose a new approach to obtain some exact asymptotics for walks confined to the quarter plane.
This paper is devoted to the construction of Boltzmann samplers according to various distributions, and uses stochastic bias on the parameter of a Boltzmann sampler, to produce a sampler with a different distribution for the size of the output. As a significant application, we produce Boltzmann samplers for words defined by regular specifications containing shuffle operators and linear recursions. This sampler has linear complexity in the size of the output, where the complexity is measured in terms of real-arithmetic operations and evaluations of generating functions.
We study transversals in random trees with n vertices asymptotically as n tends to infinity. Our investigation treats the average number of transversals of fixed size, the size of a random transversal as well as the probability that a random subset of the vertex set of a tree is a transversal for the class of simply generated trees and for Pólya trees. The last parameter was already studied by Devroye for simply generated trees. We offer an alternative proof based on generating functions and singularity analysis and extend the result to Pólya trees.
We consider a generalization of the uniform word-based distribution for finitely generated subgroups of a free group. In our setting, the number of generators is not fixed, the length of each generator is determined by a random variable with some simple constraints and the distribution of words of a fixed length is specified by a Markov process. We show by probabilistic arguments that under rather relaxed assumptions, the good properties of the uniform word-based distribution are preserved: generically (but maybe not exponentially generically), the tuple we pick is a basis of the subgroup it generates, this subgroup is malnormal and the group presentation defined by this tuple satisfies a small cancellation condition.
We give simple probabilistic algorithms that approximately maximize the volume of overlap of two solid, i.e. full-dimensional, shapes under translations and rigid motions. The shapes are subsets of $ℝ^d$ where $d≥ 2$. The algorithms approximate with respect to an pre-specified additive error and succeed with high probability. Apart from measurability assumptions, we only require that points from the shapes can be generated uniformly at random. An important example are shapes given as finite unions of simplices that have pairwise disjoint interiors.
In the paper we discuss a technology based on Bernstein polynomials of asymptotic analysis of a class of binomial sums that arise in information theory. Our method gives a quick derivation of required sums and can be generalized to multinomial distributions. As an example we derive a formula for the entropy of multinomial distributions. Our method simplifies previous work of Jacquet, Szpankowski and Flajolet from 1999.
The space requirements of an $m$-ary search tree satisfies a well-known phase transition: when $m\leq 26$, the second order asymptotics is Gaussian. When $m\geq 27$, it is not Gaussian any longer and a limit $W$ of a complex-valued martingale arises. We show that the distribution of $W$ has a square integrable density on the complex plane, that its support is the whole complex plane, and that it has finite exponential moments. The proofs are based on the study of the distributional equation $ W \overset{\mathcal{L}}{=} \sum_{k=1}^mV_k^{\lambda}W_k$, where $V_1, ..., V_m$ are the spacings of $(m-1)$ independent random variables uniformly distributed on $[0,1]$, $W_1, ..., W_m$ are independent copies of W which are also independent of $(V_1, ..., V_m)$ and $\lambda$ is a complex number.
This paper sheds light on universal coding with respect to classes of memoryless sources over a countable alphabet defined by an envelope function with finite and non-decreasing hazard rate. We prove that the auto-censuring (AC) code introduced by Bontemps (2011) is adaptive with respect to the collection of such classes. The analysis builds on the tight characterization of universal redundancy rate in terms of metric entropy by Haussler and Opper (1997) and on a careful analysis of the performance of the AC-coding algorithm. The latter relies on non-asymptotic bounds for maxima of samples from discrete distributions with finite and non-decreasing hazard rate.
This paper develops an analytic theory for the study of some Pólya urns with random rules. The idea is to extend the isomorphism theorem in Flajolet et al. (2006), which connects deterministic balanced urns to a differential system for the generating function. The methodology is based upon adaptation of operators and use of a weighted probability generating function. Systems of differential equations are developed, and when they can be solved, they lead to characterization of the exact distributions underlying the urn evolution. We give a few illustrative examples.
We define the notion of $t$-free for locally restricted compositions, which means roughly that if such a composition contains a part $c_i$ and nearby parts are at least $t$ smaller, then $c_i$ can be replaced by any larger part. Two well-known examples are Carlitz and alternating compositions. We show that large parts have asymptotically geometric distributions. This leads to asymptotically independent Poisson variables for numbers of various large parts. Based on this we obtain asymptotic formulas for the probability of being gap free and for the expected values of the largest part and number distinct parts, all accurate to $o(1)$.
We consider the word collector problem, i.e. the expected number of calls to a random weighted generator before all the words of a given length in a language are generated. The originality of this instance of the non-uniform coupon collector lies in the, potentially large, multiplicity of the words/coupons of a given probability/composition. We obtain a general theorem that gives an asymptotic equivalent for the expected waiting time of a general version of the Coupon Collector. This theorem is especially well-suited for classes of coupons featuring high multiplicities. Its application to a given language essentially necessitates knowledge on the number of words of a given composition/probability. We illustrate the application of our theorem, in a step-by-step fashion, on four exemplary languages, whose analyses reveal a large diversity of asymptotic waiting times, generally expressible as $\kappa \cdot m^p \cdot (\log{m})^q \cdot (\log \log{m})^r$, for $m$ the number of words, and $p, q, r$ some positive real numbers.
We assign a uniform probability to the set consisting of partitions of a positive integer $n$ such that the multiplicity of each summand is less than a given number $d$ and we study the limiting distribution of the number of summands in a random partition. It is known from a result by Erdős and Lehner published in 1941 that the distributions of the length in random restricted $(d=2)$ and random unrestricted $(d \geq n+1)$ partitions behave very differently. In this paper we show that as the bound $d$ increases we observe a phase transition in which the distribution goes from the Gaussian distribution of the restricted case to the Gumbel distribution of the unrestricted case.
We consider Euclid’s gcd algorithm for two integers $(p, q)$ with $1 \leq p \leq q \leq N$, with the uniform distribution on input pairs. We study the distribution of the total cost of execution of the algorithm for an additive cost function $d$ on the set of possible digits, asymptotically for $N \to \infty$. For any additive cost of moderate growth $d$, Baladi and Vallée obtained a central limit theorem, and –in the case when the cost $d$ is lattice– a local limit theorem. In both cases, the optimal speed was attained. When the cost is non lattice, the problem was later considered by Baladi and Hachemi, who obtained a local limit theorem under an intertwined diophantine condition which involves the cost $d$ together with the “canonical” cost $c$ of the underlying dynamical system. The speed depends on the irrationality exponent that intervenes in the diophantine condition. We show here how to replace this diophantine condition by another diophantine condition, much more natural, which already intervenes in simpler problems of the same vein, and only involves the cost $d$. This “replacement” is made possible by using the additivity of cost $d$, together with a strong property satisfied by the Euclidean Dynamical System, which states that the cost $c$ is both “strongly” non additive and diophantine in a precise sense. We thus obtain a local limit theorem, whose speed is related to the irrationality exponent which intervenes in the new diophantine condition. […]
String complexity is defined as the cardinality of a set of all distinct words (factors) of a given string. For two strings, we define $\textit{joint string complexity}$ as the set of words that are common to both strings. We also relax this definition and introduce $\textit{joint semi-complexity}$ restricted to the common words appearing at least twice in both strings. String complexity finds a number of applications from capturing the richness of a language to finding similarities between two genome sequences. In this paper we analyze joint complexity and joint semi-complexity when both strings are generated by a Markov source. The problem turns out to be quite challenging requiring subtle singularity analysis and saddle point method over infinity many saddle points leading to novel oscillatory phenomena with single and double periodicities.
In this paper, we show that data streams can sometimes usefully be studied as random permutations. This simple observation allows a wealth of classical and recent results from combinatorics to be recycled, with minimal effort, as estimators for various statistics over data streams. We illustrate this by introducing RECORDINALITY, an algorithm which estimates the number of distinct elements in a stream by counting the number of $k$-records occurring in it. The algorithm has a score of interesting properties, such as providing a random sample of the set underlying the stream. To the best of our knowledge, a modified version of RECORDINALITY is the first cardinality estimation algorithm which, in the random-order model, uses neither sampling nor hashing.
Using a recursive approach, we obtain a simple exact expression for the $L^2$-distance from the limit in the classical limit theorem of Régnier (1989) for the number of key comparisons required by $\texttt{QuickSort}$. A previous study by Fill and Janson (2002) using a similar approach found that the $d_2$-distance is of order between $n^{-1} \log{n}$ and $n^{-1/2}$, and another by Neininger and Ruschendorf (2002) found that the Zolotarev $\zeta _3$-distance is of exact order $n^{-1} \log{n}$. Our expression reveals that the $L^2$-distance is asymptotically equivalent to $(2 n^{-1} \ln{n})^{1/2}$.
In a continuous-time setting, Fill (2012) proved, for a large class of probabilistic sources, that the number of symbol comparisons used by $\texttt{QuickSort}$, when centered by subtracting the mean and scaled by dividing by time, has a limiting distribution, but proved little about that limiting random variable $Y$—not even that it is nondegenerate. We establish the nondegeneracy of $Y$. The proof is perhaps surprisingly difficult.
Consider a pair of $\textit{interlacing regular convex polygons}$, each with $2(n + 2)$ vertices, which we will be referring to as $\textit{red}$ and $\textit{black}$ ones. One can place these vertices on the unit circle $|z | = 1$ in the complex plane; the vertices of the red polygon at $\epsilon^{2k}, k = 0, \ldots , 2n − 1$, of the black polygon at $\epsilon^{2k+1}, k = 0, \ldots , 2n − 1$; here $\epsilon = \exp(i \pi /(2n + 2))$. We assign to the vertices of each polygon alternating (within each polygon) signs. Note that all the pairwise intersections of red and black sides are oriented consistently. We declare the corresponding orientation positive.
In the paper, bike sharing systems with stations having a finite capacity are studied as stochastic networks. The inhomogeneity is modeled by clusters. We use a mean field limit to compute the limiting stationary distribution of the number of bikes at the stations. This method is an alternative to analytical methods. It can be used even if a closed form expression for the stationary distribution is out of reach as illustrated on a variant. Both models are compared. A practical conclusion is that avoiding empty or full stations does not improve overall performance.
In the paper "How to select a looser'' Prodinger was analyzing an algorithm where $n$ participants are selecting a leader by flipping <underline>fair</underline> coins, where recursively, the 0-party (those who i.e. have tossed heads) continues until the leader is chosen. We give an answer to the question stated in the Prodinger's paper – what happens if not a 0-party is recursively looking for a leader but always a party with a smaller cardinality. We show the lower bound on the number of rounds of the greedy algorithm (for <underline>fair</underline> coin).
We use probabilistic and combinatorial tools on strings to discover the average number of 2-protected nodes in tries and in suffix trees. Our analysis covers both the uniform and non-uniform cases. For instance, in a uniform trie with $n$ leaves, the number of 2-protected nodes is approximately 0.803$n$, plus small first-order fluctuations. The 2-protected nodes are an emerging way to distinguish the interior of a tree from the fringe.
Extending an idea of Suppakitpaisarn, Edahiro and Imai, a dynamic programming approach for computing digital expansions of minimal weight is transformed into an asymptotic analysis of minimal weight expansions. The minimal weight of an optimal expansion of a random input of length $\ell$ is shown to be asymptotically normally distributed under suitable conditions. After discussing the general framework, we focus on expansions to the base of $\tau$, where $\tau$ is a root of the polynomial $X^2- \mu X + 2$ for $\mu \in \{ \pm 1\}$. As the Frobenius endomorphism on a binary Koblitz curve fulfils the same equation, digit expansions to the base of this $\tau$ can be used for scalar multiplication and linear combination in elliptic curve cryptosystems over these curves.
Given a planar triangulation, a 3-orientation is an orientation of the internal edges so all internal vertices have out-degree three. Each 3-orientation gives rise to a unique edge coloring known as a $\textit{Schnyder wood}$ that has proven useful for various computing and combinatorics applications. We consider natural Markov chains for sampling uniformly from the set of 3-orientations. First, we study a "triangle-reversing'' chain on the space of 3-orientations of a fixed triangulation that reverses the orientation of the edges around a triangle in each move. We show that (i) when restricted to planar triangulations of maximum degree six, the Markov chain is rapidly mixing, and (ii) there exists a triangulation with high degree on which this Markov chain mixes slowly. Next, we consider an "edge-flipping'' chain on the larger state space consisting of 3-orientations of all planar triangulations on a fixed number of vertices. It was also shown previously that this chain connects the state space and we prove that the chain is always rapidly mixing.
Consider a countable alphabet $\mathcal{A}$. A multi-modular hidden pattern is an $r$-tuple $(w_1,\ldots , w_r)$, where each $w_i$ is a word over $\mathcal{A}$ called a module. The hidden pattern is said to occur in a text $t$ when the later admits the decomposition $t = v_0 w_1v_1 \cdots v_{r−1}w_r v_r$, for arbitrary words $v_i$ over $\mathcal{A}$. Flajolet, Szpankowski and Vallée (2006) proved via the method of moments that the number of matches (or occurrences) with a multi-modular hidden pattern in a random text $X_1\cdots X_n$ is asymptotically Normal, when $(X_n)_{n\geq1}$ are independent and identically distributed $\mathcal{A}$-valued random variables. Bourdon and Vallée (2002) had conjectured however that asymptotic Normality holds more generally when $(X_n)_{n\geq1}$ is produced by an expansive dynamical source. Whereas memoryless and Markovian sequences are instances of dynamical sources with finite memory length, general dynamical sources may be non-Markovian i.e. convey an infinite memory length. The technical difficulty to count hidden patterns under sources with memory is the context-free nature of these patterns as well as the lack of logarithm-and exponential-type transformations to rewrite the product of non-commuting transfer operators. In this paper, we address a case study in which we have successfully overpassed the aforementioned difficulties and which may illuminate how to address more general cases via auto-correlation operators. Our main result […]
In this paper infinite systems of functional equations in finitely or infinitely many random variables arising in combinatorial enumeration problems are studied. We prove sufficient conditions under which the combinatorial random variables encoded in the generating function of the system tend to a finite or infinite dimensional limiting distribution.
We give a unified treatment of the limit, as the size tends to infinity, of random simply generated trees, including both the well-known result in the standard case of critical Galton-Watson trees and similar but less well-known results in the other cases (i.e., when no equivalent critical Galton-Watson tree exists). There is a well-defined limit in the form of an infinite random tree in all cases; for critical Galton-Watson trees this tree is locally finite but for the other cases the random limit has exactly one node of infinite degree. The random infinite limit tree can in all cases be constructed by a modified Galton-Watson process. In the standard case of a critical Galton-Watson tree, the limit tree has an infinite "spine", where the offspring distribution is size-biased. In the other cases, the spine has finite length and ends with a vertex with infinite degree. A node of infinite degree in the limit corresponds to the existence of one node with very high degree in the finite random trees; in physics terminology, this is a type of condensation. In simple cases, there is one node with a degree that is roughly a constant times the number of nodes, while all other degrees are much smaller; however, more complicated behaviour is also possible. The proofs use a well-known connection to a random allocation model that we call balls-in-boxes, and we prove corresponding results for this model.