Pattern Discovery in Bioinformatics: Theory & Algorithms, 1st Edition (Hardback) book cover

Pattern Discovery in Bioinformatics

Theory & Algorithms, 1st Edition

By Laxmi Parida

Chapman and Hall/CRC

512 pages | 95 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781584885498
pub: 2007-07-04
$115.00
x
eBook (VitalSource) : 9780429143786
pub: 2007-07-04
from $28.98


FREE Standard Shipping!

Description

The computational methods of bioinformatics are being used more and more to process the large volume of current biological data. Promoting an understanding of the underlying biology that produces this data, Pattern Discovery in Bioinformatics: Theory and Algorithms provides the tools to study regularities in biological data.

Taking a systematic approach to pattern discovery, the book supplies sound mathematical definitions and efficient algorithms to explain vital information about biological data. It explores various data patterns, including strings, clusters, permutations, topology, partial orders, and boolean expressions. Each of these classes captures a different form of regularity in the data, providing possible answers to a wide range of questions. The book also reviews basic statistics, including probability, information theory, and the central limit theorem.

This self-contained book provides a solid foundation in computational methods, enabling the solution of difficult biological questions.

Reviews

"Pattern Discovery in Bioinformatics provides an important message to all bioinformatics students and researchers. In contrast to some books that present computational biology as a combination of a BLAST user manual and storytelling, Laxmi Parida's new book makes the crucial point that most bioinformatics success stories require algorithmic and statistical ingenuity. It is a comprehensive and excellent book on the subject and I recommend it to everyone who has an interest in bioinformatics."

-Pavel Pevzner, Ronald R. Taylor Professor of Computer Science, University of California, San Diego, USA

"Laxmi Parida has written a comprehensive and very readable treatment of the discovery and analysis of patterns. Not just another textbook on string theory or combinatorics, nor an arcane reference book on the design of algorithms, nor a source book of sequence analysis software, but a coherent treatment of the kinds of patterns that computational biologists discover at the genomic level, in nucleic acid sequence, in protein structure, in microarray data, and in other linear orders and more general formal structures. Eclectically drawing on the whole gamut of discrete mathematical domains, Parida has built on her own numerous contributions during a distinguished career in putting together her unique take on this rapidly growing field. She insists throughout on mathematical rigour but makes the material eminently approachable by careful development from first principles, clear motivation and exposition, and a wealth of interesting and illustrative examples and exercises ranging from easy to challenging. Any enterprising student or neophyte in this field will gain a clear advantage by working through this book and adopting the wide-ranging but precise and sophisticated attitude to the many new problems that are continually emerging as biological data become more diverse and complex, as well as numerous. And established researchers seeking to broaden their perspective will enjoy the leisurely and insightful presentation of both new and familiar aspects of currently relevant research techniques."

-David Sankoff, University of Ottawa, Canada

"Biology has been transformed from a data poor to a data rich field, with massive accumulation of disparate types of data, for example huge databases of sequences (DNA, RNA, or protein). This data allows important biological insights to be made, partly by finding patterns and motifs that are conserved across many individuals or species; there is now a huge biological literature reporting on such conserved patterns and motifs that have been found in biological datasets. In contrast to the area of pattern matching, the patterns and motifs are generally not known ahead of time, but must be identified or discovered from the data; this task is often very subtle and difficult because the patterns and motifs may be short, may be highly degenerate (containing wildcards and variable length elements), may be ordered differently in different genomes, and are generally hidden in that they make up a small fraction of the data. For particular biological applications, even the definition of a relevant pattern may be difficult to state clearly, or may be unresolved. This has lead to a wonderful opportunity to develop a rigorous mathematical and algorithmic theory of patterns and pattern discovery in biological data. Laxmi Parida has been a leader in this effort, making significant contributions to both deep theory and to practical methods in biological pattern discovery. Her book explores a large subset of that theory, motivated by biological sequence and network data. I believe this book will be viewed as a seminal contribution and hope it draws many new people into the area of rigorous pattern discovery in bioinformatics."

-Dan Gusfield, University of California, Davis, USA

Table of Contents

INTRODUCTION

Ubiquity of Patterns

Motivations Form Biology

The Need for Rigor

Who Is a Reader of This Book?

THE FUNDAMENTALS

BASIC ALGORITHMICS

Introduction

Graphs

Tree Problem 1: (Minimum Spanning Tree)

Tree Problem 2: (Steiner Tree)

Tree Problem 3: (Minimum Mutation Labeling)

Storing and Retrieving Elements

Asymptotic Functions

Recurrence Equations

NP-Complete Class of Problems

BASIC STATISTICS

Introduction

Basic Probability

The Bare Truth about Inferential Statistics

Summary

WHAT ARE PATTERNS?

Introduction

Common Thread

Pattern Duality

Irredundant Patterns

Constrained Patterns

When Is a Pattern Specification Non-Trivial?

Classes of Patterns

PATTERNS ON LINEAR STRINGS

MODELING THE STREAM OF LIFE

Introduction

Modeling a Biopolymer

Bernoulli Scheme

Markov Chain

Hidden Markov Model (HMM)

Comparison of the Schemes

Conclusion

STRING PATTERN SPECIFICATIONS

Introduction

Notation

Solid Patterns

Rigid Patterns

Extensible Patterns

Generalizations

ALGORITHMS AND PATTERN STATISTICS

Introduction

Discovery Algorithm

Pattern Statistics

Rigid Patterns

Extensible Patterns

Measure of Surprise

Applications

MOTIF LEARNING

Introduction: Local Multiple Alignment

Probabilistic Model: Motif Profile

The Learning Problem

Importance Measure

Algorithms to Learn a Motif Profile

An Expectation Maximization Framework

A Gibbs Sampling Strategy

Interpreting the Motif Profile in Terms of p

THE SUBTLE MOTIF

Introduction: Consensus Motif

Combinatorial Model: Subtle Motif

Distance between Motifs

Statistics of Subtle Motifs

Performance Score

Enumeration Schemes

A Combinatorial Algorithm

A Probabilistic Algorithm

A Modular Solution

Conclusion

PATTERNS ON META-DATA

PERMUTATION PATTERNS

Introduction

Notation

How Many Permutation Patterns?

Maximality

Parikh Mapping-Based Algorithm

Intervals

Intervals to PQ Trees

Applications

Conclusion

PERMUTATION PATTERN PROBABILITIES

Introduction

Unstructured Permutations

Structured Permutations

TOPOLOGICAL MOTIFS

Introduction

What Are Topological Motifs?

The Topological Motif

Compact Topological Motifs

The Discovery Method

Related Classical Problems

Applications

Conclusion

SET-THEORETIC ALGORITHMIC TOOLS

Introduction

Some Basic Properties of Finite Sets

Partial Order Graph G(S,E) of Sets

Boolean Closure of Sets

Consecutive (Linear) Arrangement of Set Members

Maximal Set Intersection Problem (maxSIP)

Minimal Set Intersection Problem (minSIP)

Multi-Sets

Adapting the Enumeration Scheme

EXPRESSION AND PARTIAL ORDER MOTIFS

Introduction

Extracting (monotone CNF) Boolean Expressions

Extracting Partial Orders

Statistics of Partial Orders

Redescriptions

Application: Partial Order of Expressions

Summary

REFERENCES

INDEX

Exercises appear at the end of every chapter.

About the Series

Chapman & Hall/CRC Mathematical and Computational Biology

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
MAT029000
MATHEMATICS / Probability & Statistics / General
SCI008000
SCIENCE / Life Sciences / Biology / General
SCI010000
SCIENCE / Biotechnology