Introduction to Computational Proteomics: 1st Edition (Hardback) book cover

Introduction to Computational Proteomics

1st Edition

By Golan Yona

Chapman and Hall/CRC

767 pages | 22 Color Illus. | 207 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781584885559
pub: 2010-12-09
SAVE ~$22.00
Currently out of stock
$110.00
$88.00
x
eBook (VitalSource) : 9780429144202
pub: 2010-12-09
from $28.98


FREE Standard Shipping!

Description

Introduction to Computational Proteomics introduces the field of computational biology through a focused approach that tackles the different steps and problems involved with protein analysis, classification, and meta-organization. The book starts with the analysis of individual entities and works its way through the analysis of more complex entities, from protein families to interactions, cellular pathways, and gene networks.

The first part of the book presents methods for identifying the building blocks of the protein space, such as motifs and domains. It also describes algorithms for assessing similarity between proteins based on sequence and structure analysis as well as mathematical models, such as hidden Markov models and support vector machines, that are used to represent protein families and classify new instances.

The second part covers methods that investigate higher order structure in the protein space through the application of unsupervised learning algorithms, such as clustering and embedding. The book also explores the broader context of proteins. It discusses methods for analyzing gene expression data, predicting protein-protein interactions, elucidating cellular pathways, and reconstructing gene networks.

This book provides a coherent and thorough introduction to proteome analysis. It offers rigorous, formal descriptions, along with detailed algorithmic solutions and models. Each chapter includes problem sets from courses taught by the author at Cornell University and the Technion. Software downloads, data sets, and other material are available at biozon.org

Table of Contents

PART I: THE BASICS

What Is Computational Proteomics?

The complexity of living organisms

Proteomics in the modern era

The main challenges in computational proteomics

Basic Notions in Molecular Biology

The cell structure of organisms

It all starts from the DNA

Proteins

From DNA to proteins

Protein folding—from sequence to structure

Evolution and relational classes in the protein space

Sequence Comparison

Alignment of sequences

Heuristic algorithms for sequence comparison

Probability and statistics of sequence alignments

Scoring matrices and gap penalties

Distance and pseudo-distance functions for proteins

Further reading

Conclusions

Appendix: performance evaluation

Appendix: basic concepts in probability

Multiple Sequence Alignment, Profiles, and Partial Order Graphs

Dynamic programming in N dimensions

Classical heuristic methods

MSA representation and scoring

Profile analysis

Iterative and progressive alignment

Transitive alignment

Partial order alignment

Further reading

Conclusions

Motif Discovery

Introduction

Model-based algorithms

Searching for good models: Gibbs sampling and MEME

Combinatorial approaches

Further reading

Conclusions

Appendix: the expectation-maximization algorithm

Markov Models of Protein Families

Introduction

Markov models

Main applications of hidden Markov models (the evaluation and decoding problems)

Learning HMMs from data

Higher order models, codes and compression

Variable order Markov models

Further reading

Conclusions

Classifiers and Kernels

Generative models vs discriminative models

Classifiers and discriminant functions

Applying SVMs to protein classification

Decision trees

Further reading

Conclusions

Appendix

Protein Structure Analysis

Introduction

Structure prediction—the protein folding problem

Structure comparison

Generalized sequence profiles—integrating secondary structure with sequence information

Further reading

Conclusions

Appendix

Protein Domains

Introduction

Domain detection

Learning domain boundaries from multiple features

Testing domain predictions

Multi-domain architectures

Further reading

Conclusions

Appendix

PART II: PUTTING ALL THE PIECES TOGETHER

Clustering and Classification

Introduction

Clustering methods

Vector-space clustering algorithms

Graph-based clustering algorithms

Collaborative clustering

Spectral clustering algorithms

Markovian clustering algorithms

Cluster validation and assessment

Clustering proteins

Further reading

Conclusions

Appendix

Embedding Algorithms and Vectorial Representations

Introduction

Structure preserving embedding

Maximal variance embeddings (PCA, SVD)

Distance preserving embeddings (MDS, random projections)

Manifold learning—topological embeddings (IsoMap, LLE, distributional scaling)

Setting the dimension of the host space

Vectorial representations

Further reading

Conclusions

Analysis of Gene Expression Data

Introduction

Microarrays

Analysis of individual genes

Pairwise analysis

Cluster analysis and class discovery

Enrichment analysis

Protein arrays

Further reading

Conclusions

Protein-Protein Interactions

Introduction

Experimental detection of protein interactions

Prediction of protein-protein interactions

Structure-based prediction, protein docking

Sequence-based inference (gene preservation, co-evolution, sequence signatures, and domain-based prediction)

Topological properties of interaction networks

Network motifs

Further reading

Conclusions

Appendices

Cellular Pathways

Introduction

Metabolic pathways

Pathway prediction

Pathway prediction from blueprints

Expression data and pathway analysis

Regulatory networks and modules

Pathway networks and the minimal cell

Further reading

Conclusions

Bayesian Belief Networks

Introduction

Computing the likelihood of observations

Probabilistic inference

Learning the parameters of a Bayesian network

Learning the structure of a Bayesian network

Further reading

Conclusions

References

Problems appear at the end of each chapter.

About the Author

Golan Yona is a senior scientist at Stanford University. He is leader of the Biozon project, a large-scale platform for the integration of heterogeneous biological data, including DNA and protein sequences, structures, gene expression data, interactions, and pathways.

About the Series

Chapman & Hall/CRC Mathematical and Computational Biology

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
MAT003000
MATHEMATICS / Applied
MAT029000
MATHEMATICS / Probability & Statistics / General
SCI008000
SCIENCE / Life Sciences / Biology / General
SCI010000
SCIENCE / Biotechnology