Pablo Barenbaum ; Verónica Becher ; Alejandro Deymonnaz ; Melisa Halsband ; Pablo Ariel Heiber - Efficient repeat finding in sets of strings via suffix arrays

dmtcs:597 - Discrete Mathematics & Theoretical Computer Science, May 6, 2013, Vol. 15 no. 2 -
Efficient repeat finding in sets of strings via suffix arrays

Authors: Pablo Barenbaum 1; Verónica Becher ORCID-iD1,2; Alejandro Deymonnaz 1; Melisa Halsband 1; Pablo Ariel Heiber 1,2

  • 1 Departamento de Computación [Buenos Aires]
  • 2 Consejo Nacional de Investigaciones Científicas y Técnicas [Buenos Aires]

We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs.

Volume: Vol. 15 no. 2
Section: Discrete Algorithms
Published on: May 6, 2013
Accepted on: June 9, 2015
Submitted on: April 6, 2012
Keywords: [INFO.INFO-DM] Computer Science [cs]/Discrete Mathematics [cs.DM]

1 Document citing this article

Consultation statistics

This page has been seen 327 times.
This article's PDF has been downloaded 334 times.