DB - Publications (Single Publication)

Publication Details for Inproceedings "Algorithmic Complexity of Protein Identification: Combinatorics of Weighted Strings"

Back

New Search

Authors:	Mark Cieliebak, Thomas Erlebach, Zsuzsanna Liptak, Jens Stoye, Emo Welzl
Group:	Theory of Communication Networks
Type:	Inproceedings
Title:	Algorithmic Complexity of Protein Identification: Combinatorics of Weighted Strings
Year:	2001
Month:	September
Pub-Key:	CELSW01
Book Titel:	Combinatorics Of Searching, Sorting, And Coding
Abstract:	We investigate a problem from computational biology: Given a constant-size alphabet A with a weight function m mapping each letter to a positive real number, find an efficient data structure and query algorithm solving the following problem: For a weight M and k strings over A, return a list of strings which contain a substring with weight M. In the one-string version of the problem, we consider only one string and ask whether it has a substring with weight M. We allow preprocessing of the strings resp. the one string, and measure efficiency in two parameters: storage space required for the preprocessed data, and running time of the query algorithm for given M. We are interested in data structures and algorithms requiring subquadratic storage space and sublinear query time, where we measure the input size as the length of the input string (resp. strings). We describe several algorithms solving different variants of the problem: Lookup solves the one-string problem with O(n) space and O(n*(log log n)/log n) time; Interval solves the one-string problem with O(n) space in O(log n) time for alphabets of size 2. Both can be generalized to the multi-string problem. Finally, Cluster, which we only sketch, solves the one-string problem and can be adjusted for a space-time-tradeoff; its analysis, however, depends on combinatorial properties of weighted strings which we solve only in part.
Location:	Ischia Island, Italy
Resources:	[BibTeX]