Combinatorial Pattern Matching Algorithms in Computational Biology Using Perl and R
Emphasizing the search for patterns within and between biological sequences, trees, and graphs, Combinatorial Pattern Matching Algorithms in Computational Biology Using Perl and R shows how combinatorial pattern matching algorithms can solve computational biology problems that arise in the analysis of genomic, transcriptomic, proteomic, metabolomic, and interactomic data. It implements the algorithms in Perl and R, two widely used scripting languages in computational biology.
The book provides a well-rounded explanation of traditional issues as well as an up-to-date account of more recent developments, such as graph similarity and search. It is organized around the specific algorithmic problems that arise when dealing with structures that are commonly found in computational biology, including biological sequences, trees, and graphs. For each of these structures, the author makes a clear distinction between problems that arise in the analysis of one structure and in the comparative analysis of two or more structures. He also presents phylogenetic trees and networks as examples of trees and graphs in computational biology.
This book supplies a comprehensive view of the whole field of combinatorial pattern matching from a computational biology perspective. Along with thorough discussions of each biological problem, it includes detailed algorithmic solutions in pseudo-code, full Perl and R implementation, and pointers to other software, such as those on CPAN and CRAN.
Table of Contents
Combinatorial Pattern Matching
A Motivating Example: Gene Prediction
SEQUENCE PATTERN MATCHING
Sequences in Mathematics
Sequences in Computer Science
Sequences in Computational Biology
Simple Pattern Matching in Sequences
Finding Words in Sequences
General Pattern Matching in Sequences
Finding Common Subsequences
TREE PATTERN MATCHING
Trees in Mathematics
Trees in Computer Science
Trees in Computational Biology
Simple Pattern Matching in Trees
Finding Paths in Unrooted Trees
Finding Paths in Rooted Trees
General Pattern Matching in Trees
Finding Common Subtrees
GRAPH PATTERN MATCHING
Graphs in Mathematics
Graphs in Computer Science
Graphs in Computational Biology
Simple Pattern Matching in Graphs
Finding Paths in Graphs
Finding Trees in Graphs
General Pattern Matching in Graphs
Finding Common Subgraphs
Appendix A: Elements of Perl
Overview of Perl
Perl Quick Reference Card
Appendix B: Elements of R
Overview of R
R Quick Reference Card
Bibliographic Notes appear at the end of each chapter.
Gabriel Valiente is Associate Professor in the Department of Software at the Technical University of Catalonia. His research interests include computational biology, bioinformatics, exact and approximate matching in graphs and patterns, and graph transformation.
I like the hands-on approach this book offers, and the very pedagogical structure it follows … . The book also has tons of examples, thoughtfully chosen and beautifully laid out … the book is very well-written and accessible, undoubtedly written by an author who takes great care in preparing his manuscripts and teaching about an area he enjoys working on.
—Anthony Labarre, SIGACT News, July 2012
This text provides a solid foundation to the field. It will work as a practical handbook for pattern matching applications in computational biology.
—Michael Goldberg, Computing Reviews, February 2010
… the book makes a clear distinction between problems that emerge in the analysis of the structure and in the comparative analysis of two or more structures. … Well-known computational biology tools that allow searching nucleotide and protein databases for local sequence alignment are based on CPM algorithms only. The techniques presented in this book go beyond that. … detailed algorithm solutions in pseudocode, full Perl and R implementation, and pointers to software and implementation are presented. This … is what makes Valiente’s effort unique. …
—Ernesto D’Avanzo, Computing Reviews, February 2010
… It is a well-sorted collection of pattern matching algorithms that are used to work with problems in computational biology. … You can find all of the sources on the author’s website, which come in handy when you actually want to use them, since you do not have to retype them. And there is an introduction to Perl as well as to R, showing how to decode DNA/RNA-triplets to amino acids and giving some basic overview over standard functions. … I certainly recommend this as an introduction and reference to some algorithms of pattern matching in computational biology. You actually learn how algorithms over the most important data types are designed in a straightforward, logical way. …
—Jannik Pewny, IACR Book Reviews, 2009
…after a few minutes of random browsing, I was left with a feeling of total appreciation of the book, admiration for Prof. Gabriel Valiente, and a realization that this book will be part of my fundamental library for me and my group from the moment it is published. There are so many good things to say that I do not know where to start. The organization is straightforward with major sections that extend from simple sequences to trees to graphs. … This parallel structure makes it easy to apply lessons used on the simplest object (sequences) to objects of medium (trees) and significant (graphs) difficulty. …a wonderful way to learn leveraging … The Perl is beautifully clear and the examples have already taught me how to improve my own code.
—Michael Levitt, Professor and Chair, Department of Structural Biology, Stanford University, California, USA
…Balancing a careful mixture of formal methods, programming, and examples, Gabriel Valiente has managed to harmoniously bridge languages and contents into a self-contained source of lasting influence. It is not difficult to predict that this book will be studied indifferently by the specialist of biology and computer science, helping each to walk a few steps toward the other. It will entice new generations of scholars to engage in its beautiful subject.
—From the Foreword, Alberto Apostolico, Professor, College of Computing, Georgia Tech, Atlanta, USA
Unlocks the power for R for Perl programmers, and vice versa. Reveals R to be a powerful and accessible tool for bioinformatics. The title is a mouthful, but the use of both R and Perl for bioinformatics is revealing.
—Steven Skiena, Professor, Department of Computer Science, Stony Brook University, New York, USA