printlogo
ETH Zuerich - Homepage
Computer Engineering and Networks Laboratory (TIK)
 

Publication Details for Techreport "Algorithmic Complexity of Protein Identification: Combinatorics of Weighted Strings"

 

 Back

 New Search

 

Authors: Mark Cieliebak, Thomas Erlebach, Zsuzsanna Liptak, Jens Stoye, Emo Welzl
Group: Theory of Communication Networks
Type: Techreport
Title: Algorithmic Complexity of Protein Identification: Combinatorics of Weighted Strings
Year: 2001
Month: August
Pub-Key: CELSW01tr
Number: 361
Institution: Department of Computer Science, ETH Zurich
Abstract: We investigate a problem from computational biology: Given a constant-size alphabet A with a weight function m mapping each letter to a positive real number, find an efficient data structure and query algorithm solving the following problem: For a weight M and a string s over A, decide whether s contains a substring with weight M (One-String Mass Finding Problem). If the answer is yes, then we may in addition require a witness, i.e. indices i, j such that the substring beginning at position i and ending at position j has weight M. We allow preprocessing of the string, and measure efficiency in two parameters: storage space required for the preprocessed data, and running time of the query algorithm for given M. We are interested in data structures and algorithms requiring subquadratic storage space and sublinear query time, where we measure the input size as the length of the input string. We present two efficient algorithms: Lookup solves the problem with O(n) space and O(n*log(log(n))/log n) time; Interval solves the problem for binary alphabets with O(n) space in O(log n) time. We sketch a third algorithm, Cluster, which can be adjusted for a space-time-tradeoff but for which we do not yet have a resource analysis. We introduce a function on weighted strings which is closely related to the analysis of algorithms for the One-String Mass Finding Problem: The number of different submasses of a weighted string. We present several properties of this function, including upper and lower bounds. Finally, we introduce two more general variants of the problem and sketch how algorithms may be extended for these variants.
Resources: [BibTeX]

 

 Back

 New Search