Discrete Mathematics & Theoretical Computer Science |

- 1 Department of Computer Science [Purdue]

For a given matrix of size $n \times m$ over a finite alphabet $\mathcal{A}$, a bicluster is a submatrix composed of selected columns and rows satisfying a certain property. In microarrays analysis one searches for largest biclusters in which selected rows constitute the same string (pattern); in another formulation of the problem one tries to find a maximally dense submatrix. In a conceptually similar problem, namely the bipartite clique problem on graphs, one looks for the largest binary submatrix with all '1'. In this paper, we assume that the original matrix is generated by a memoryless source over a finite alphabet $\mathcal{A}$. We first consider the case where the selected biclusters are square submatrices and prove that with high probability (whp) the largest (square) bicluster having the same row-pattern is of size $\log_Q^2 n m$ where $Q^{-1}$ is the (largest) probability of a symbol. We observe, however, that when we consider $\textit{any}$ submatrices (not just $\textit{square}$ submatrices), then the largest area of a bicluster jumps to $A_n$ (whp) where $A$ is an explicitly computable constant. These findings complete some recent results concerning maximal biclusters and maximum balanced bicliques for random bipartite graphs.

Source: HAL:hal-01184220v1

Volume: DMTCS Proceedings vol. AD, International Conference on Analysis of Algorithms

Section: Proceedings

Published on: January 1, 2005

Imported on: May 10, 2017

Keywords: biclique,Random matrix,two-dimensional patterns,bicluster,microarray data,[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS],[INFO.INFO-DM] Computer Science [cs]/Discrete Mathematics [cs.DM],[MATH.MATH-CO] Mathematics [math]/Combinatorics [math.CO],[INFO.INFO-CG] Computer Science [cs]/Computational Geometry [cs.CG]

Funding:

- Source : OpenAIRE Graph
*Combinatorial &Probabilistic Methods for Biol Sequences*; Funder: National Institutes of Health; Code: 5R01GM068959-04*Analytic Information Theory, Combinatorics, and Algorithmics: The Precise Redundancy and Related Problems*; Funder: National Science Foundation; Code: 0208709

This page has been seen 209 times.

This article's PDF has been downloaded 190 times.