An Easy-to-Use Research Tool for Algorithm Testing and Development
Before the SeqAn project, there was clearly a lack of available implementations in sequence analysis, even for standard tasks. Implementations of needed algorithmic components were either unavailable or hard to access in third-party monolithic software products. Addressing these concerns, the developers of SeqAn created a comprehensive, easy-to-use, open source C++ library of efficient algorithms and data structures for the analysis of biological sequences. Written by the founders of this project, Biological Sequence Analysis Using the SeqAn C++ Library covers the SeqAn library, its documentation, and the supporting infrastructure.
The first part of the book describes the general library design. It introduces biological sequence analysis problems, discusses the benefit of using software libraries, summarizes the design principles and goals of SeqAn, details the main programming techniques used in SeqAn, and demonstrates the application of these techniques in various examples. Focusing on the components provided by SeqAn, the second part explores basic functionality, sequence data structures, alignments, pattern and motif searching, string indices, and graphs. The last part illustrates applications of SeqAn to genome alignment, consensus sequence in assembly projects, suffix array construction, and more.
This handy book describes a user-friendly library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn enables not only the implementation of new algorithms, but also the sound analysis and comparison of existing algorithms.
Visit SeqAn for more information.
Table of Contents
THE SEQAN PROJECT
Sequences in Bioinformatics
Design of SeqAn
Contents of SeqAn
The C++ Programming Language
Global Function Interfaces
The Design in Examples
Example 1: Value Counting
Example 2: Locality-Sensitive Hashing
Containers and Values
Gaps Data Structures
Alignment Data Structures
Alignment Problems Overview
Exact Searching of Multiple Needles
Other Pattern Matching Problems
Seed-Based Motif Search
Multiple Sequence Motifs
Working with Indices
Enhanced Suffix Arrays
Aligning Sequences with LAGAN
The LAGAN Algorithm
Implementation of LAGAN
Multiple Alignment with Segments
Basic Statistical Indices for SeqAn
Statistical Indices and Biological Sequence Analysis
SeqAn Algorithms and Data Types
A BWT-Based Suffix Array Construction
Introduction to BWTWalk
The Main Idea of BWTWalk
SeqAn Implementation of BWTWalkFast
Containers with and without Fast Random Access
Co-founder of the SeqAn project, Andreas Gogol-Döring works at the Max Delbrück Center for Molecular Medicine in Berlin, Germany. He was previously a research associate in the Algorithmic Bioinformatics group in the Department of Computer Science at Freie Universität Berlin in Germany.
Co-founder of the SeqAn project, Knut Reinert is a professor and head of the Algorithmic Bioinformatics group in the Department of Computer Science at Freie Universität Berlin in Germany.