Frédérique Bassino ; Julien Clément ; Julien Fayolle ; Pierre Nicodème - Constructions for Clumps Statistics.

dmtcs:3563 - Discrete Mathematics & Theoretical Computer Science, January 1, 2008, DMTCS Proceedings vol. AI, Fifth Colloquium on Mathematics and Computer Science -
Constructions for Clumps Statistics.

Authors: Frédérique Bassino ORCID-iD1,2; Julien Clément ORCID-iD3; Julien Fayolle 4; Pierre Nicodème 5

We consider a component of the word statistics known as clump; starting from a finite set of words, clumps are maximal overlapping sets of these occurrences. This object has first been studied by Schbath with the aim of counting the number of occurrences of words in random texts. Later work with similar probabilistic approach used the Chen-Stein approximation for a compound Poisson distribution, where the number of clumps follows a law close to Poisson. Presently there is no combinatorial counterpart to this approach, and we fill the gap here. We also provide a construction for the yet unsolved problem of clumps of an arbitrary finite set of words. In contrast with the probabilistic approach which only provides asymptotic results, the combinatorial method provides exact results that are useful when considering short sequences.

Volume: DMTCS Proceedings vol. AI, Fifth Colloquium on Mathematics and Computer Science
Section: Proceedings
Published on: January 1, 2008
Imported on: May 10, 2017
Keywords: Words counting,formal language decomposition,generating functions,automata,[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS],[INFO.INFO-DM] Computer Science [cs]/Discrete Mathematics [cs.DM],[MATH.MATH-CO] Mathematics [math]/Combinatorics [math.CO]

Linked publications - datasets - softwares

Source : ScholeXplorer IsRelatedTo ARXIV 1412.2587
Source : ScholeXplorer IsRelatedTo DOI 10.1089/cmb.2014.0173
Source : ScholeXplorer IsRelatedTo DOI 10.48550/arxiv.1412.2587
Source : ScholeXplorer IsRelatedTo PMC PMC4253314
Source : ScholeXplorer IsRelatedTo PMID 25393923
  • 25393923
  • PMC4253314
  • PMC4253314
  • 25393923
  • 1412.2587
  • 10.1089/cmb.2014.0173
  • 10.1089/cmb.2014.0173
  • 10.48550/arxiv.1412.2587
A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances

3 Documents citing this article

Consultation statistics

This page has been seen 155 times.
This article's PDF has been downloaded 129 times.