Next-Generation Sequencing Data Analysis: 1st Edition (Hardback) book cover

Next-Generation Sequencing Data Analysis

1st Edition

By Xinkun Wang

CRC Press

246 pages | 48 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781482217889
pub: 2016-02-24
SAVE ~$19.39
eBook (VitalSource) : 9780429160738
pub: 2016-04-06
from $48.48

FREE Standard Shipping!


A Practical Guide to the Highly Dynamic Area of Massively Parallel Sequencing

The development of genome and transcriptome sequencing technologies has led to a paradigm shift in life science research and disease diagnosis and prevention. Scientists are now able to see how human diseases and phenotypic changes are connected to DNA mutation, polymorphism, genome structure, and epigenomic abnormality. Next-Generation Sequencing Data Analysis shows how next-generation sequencing (NGS) technologies are applied to transform nearly all aspects of biological research.

The book walks readers through the multiple stages of NGS data generation and analysis in an easy-to-follow fashion. It covers every step in each stage, from the planning stage of experimental design, sample processing, sequencing strategy formulation, the early stage of base calling, reads quality check and data preprocessing to the intermediate stage of mapping reads to a reference genome and normalization to more advanced stages specific to each application. All major applications of NGS are covered, including:

  • RNA-seq: mRNA-seq and small RNA-seq
  • Genotyping and variant discovery through genome re-sequencing
  • De novo genome assembly
  • ChIP-seq to study DNA–protein interaction
  • Methylated DNA sequencing on epigenetic regulation
  • Metagenome analysis through community genome shotgun sequencing

Before detailing the analytic steps for each of these applications, the book presents the ins and outs of the most widely used NGS platforms, with side-by-side comparisons of key technical aspects. This helps practitioners decide which platform to use for a particular project. The book also offers a perspective on the development of DNA sequencing technologies, from Sanger to future-generation sequencing technologies.

The book discusses concepts and principles that underlie each analytic step, along with software tools for implementation. It highlights key features of the tools while omitting tedious details to provide an easy-to-follow guide for practitioners in life sciences, bioinformatics, and biostatistics. In addition, references to detailed descriptions of the tools are given for further reading if needed. The accompanying website for the book provides step-by-step, real-world examples of how to apply the tools covered in the text to research projects. All the tools are freely available to academic users.

Table of Contents

Introduction to Cellular and Molecular Biology

The Cellular System and the Code of Life

The Cellular Challenge

How Cells Meet the Challenge

Molecules in Cells

Intracellular Structures or Spaces

The Cell as a System

DNA Sequence: The Genome Base

The DNA Double Helix and Base Sequence

How DNA Molecules Replicate and Maintain Fidelity

How the Genetic Information Stored in DNA Is Transferred to Protein

The Genomic Landscape

DNA Packaging, Sequence Access, and DNA–Protein Interactions

DNA Sequence Mutation and Polymorphism

Genome Evolution

Epigenome and DNA Methylation

Genome Sequencing and Disease Risk

RNA: The Transcribed Sequence

RNA as the Messenger

The Molecular Structure of RNA

Generation, Processing, and Turnover of RNA as a Messenger

RNA Is More Than a Messenger

The Cellular Transcriptional Landscape

Introduction to Next-Generation Sequencing (NGS) and NGS Data Analysis

NGS Technologies: Ins and Outs

How to Sequence DNA: From First Generation to the Next

A Typical NGS Experimental Workflow

Ins and Outs of Different NGS Platforms

Biases and Other Adverse Factors That May Affect NGS Data Accuracy

Major Applications of NGS

Early-Stage NGS Data Analysis: Common Steps

Base Calling, FASTQ File Format, and Base Quality Score

NGS Data Quality Control and Preprocessing

Reads Mapping

Tertiary Analysis

Computing Needs for NGS Data Management and Analysis

NGS Data Storage, Transfer, and Sharing

Computing Power Required for NGS Data Analysis

Software Needs for NGS Data Analysis

Bioinformatics Skills Required for NGS Data Analysis

Application-Specific NGS Data Analysis

Transcriptomics by RNA-Seq

Principle of RNA-Seq

Experimental Design

RNA-Seq Data Analysis

RNA-Seq as a Discovery Tool

Small RNA Sequencing

Small RNA NGS Data Generation and Upstream Processing

Identification of Differentially Expressed Small RNAs

Functional Analysis of Identified Small RNAs

Genotyping and Genomic Variation Discovery by Whole Genome Resequencing

Data Preprocessing, Mapping, Realignment, and Recalibration

Single Nucleotide Variant (SNV) and Indel Calling

Structural Variant (SV) Calling

Annotation of Called Variants

Testing of Variant Association with Diseases or Traits

De novo Genome Assembly from NGS Reads

Genomic Factors and Sequencing Strategies for de novo Assembly

Assembly of Contigs


Assembly Quality Evaluation

Gap Closure

Limitations and Future Development

Mapping Protein–DNA Interactions with ChIP-Seq

Principle of ChIP-Seq

Experimental Design

Read Mapping, Peak Calling, and Peak Visualization

Differential Binding Analysis

Functional Analysis

Motif Analysis

Integrated ChIP-Seq Data Analysis

Epigenomics and DNA Methylation Analysis by NGS

DNA Methylation Sequencing Strategies

DNA Methylation Sequencing Data Analysis

Detection of Differentially Methylated Cytosines or Regions

Data Verification, Validation, and Interpretation

Metagenome Analysis by NGS

Experimental Design and Sample Preparation

Sequencing Approaches

Overview of Whole-Genome Shotgun (WGS) Metagenome Sequencing Data Analysis

Sequencing Data Quality Control and Preprocessing

Taxonomic Characterization of a Microbial Community

Functional Characterization of a Microbial Community

Comparative Metagenomic Analysis

Integrated Metagenomics Data Analysis Pipelines

Metagenomics Data Repositories

The Changing Landscape of NGS Technologies and Data Analysis

What Is Next for NGS?

The Changing Landscape of NGS

Rapid Evolution and Growth of Bioinformatics Tools for High-Throughput Sequencing Data Analysis

Standardization and Streamlining of NGS Analytic Pipelines

Parallel Computing

Cloud Computing

Appendix A: Common File Types Used in NGS Data Analysis

Appendix B: Glossary



About the Author

Dr. Xinkun "Sequen" Wang is the director of the NUSeq Core Facility and research associate professor in the Department of Biochemistry and Molecular Genetics at Northwestern University. He was previously an associate research professor of neurogenomics in the Higuchi Biosciences Center and Department of Pharmacology and Toxicology at the University of Kansas, where he was also the director of the Genomics Facility and Genome Sequencing Core. Dr. Wang’s research focuses on unraveling genomic changes that underlie neurodegeneration in brain aging and neurodegenerative diseases, such as Alzheimer’s disease.

Subject Categories

BISAC Subject Codes/Headings:
MATHEMATICS / Probability & Statistics / General
SCIENCE / Life Sciences / Biology / General
SCIENCE / Biotechnology