Population Genomics with R  book cover
1st Edition

Population Genomics with R

ISBN 9781138608184
Published May 5, 2020 by Chapman and Hall/CRC
394 Pages

SAVE ~ $24.00
was $120.00
USD $96.00

Prices & shipping based on shipping country


Book Description

Population Genomics With R presents a multidisciplinary approach to the analysis of population genomics. The methods treated cover a large number of topics from traditional population genetics to large-scale genomics with high-throughput sequencing data. Several dozen R packages are examined and integrated to provide a coherent software environment with a wide range of computational, statistical, and graphical tools. Small examples are used to illustrate the basics and published data are used as case studies. Readers are expected to have a basic knowledge of biology, genetics, and statistical inference methods. Graduate students and post-doctorate researchers will find resources to analyze their population genetic and genomic data as well as help them design new studies.

The first four chapters review the basics of population genomics, data acquisition, and the use of R to store and manipulate genomic data. Chapter 5 treats the exploration of genomic data, an important issue when analysing large data sets. The other five chapters cover linkage disequilibrium, population genomic structure, geographical structure, past demographic events, and natural selection. These chapters include supervised and unsupervised methods, admixture analysis, an in-depth treatment of multivariate methods, and advice on how to handle GIS data. The analysis of natural selection, a traditional issue in evolutionary biology, has known a revival with modern population genomic data. All chapters include exercises. Supplemental materials are available on-line (http://ape-package.ird.fr/PGR.html).

Table of Contents

1. Introduction

Heredity, Genetics, and Genomics

Principles of Population Genomics


Genome Structures


Drift and Selection

R Packages and Conventions

Required Knowledge and Other Readings

2. Data Acquisition

Samples and Sampling Designs

How Much DNA in a Sample?

Degraded Samples

Sampling Designs

Low-Throughput Technologies

Genotypes From Phenotypes

DNA Cleavage Methods

Repeat Length Polymorphism

Sanger and Shotgun Sequencing

DNA Methylation and Bisulfite Sequencing

High-Throughput Technologies

DNA Microarrays

High-Throughput Sequencing

Restriction Site Associated DNA

RNA Sequencing

Exome Sequencing

Sequencing of Pooled Individuals

Designing a Study With HTS

The Future of DNA Sequencing

File Formats

Data Files

Archiving and Compression

Bioinformatics and Genomics

Processing Sanger Sequencing Data With sangerseqR

Read Mapping With Rsubread

Managing Read Alignments With Rsamtools

Simulation of High-Throughput Sequencing Data


3. Genomic Data in R

What is an R Data Object?

Data Classes for Genomic Data

The Class "loci" (pegas)

The Class "genind" (adegenet)

The Classes "SNPbin" and "genlight" (adegenet)

The Class "SnpMatrix" (snpStats)

The Class "DNAbin" (ape)

The Classes "XString" and "XStringSet" (Biostrings)

The Package SNPRelate

Data Input and Output

Reading Text Files

Reading Spreadsheet Files

Reading VCF Files

Reading PED and BED Files

Reading Sequence Files

Reading Annotation Files

Writing Files

Internet Databases

Managing Files and Projects


4. Data Manipulation

Basic Data Manipulation in R

Subsetting, Replacement, and Deletion

Commonly Used Functions

Recycling and Coercion

Logical Vectors

Memory Management


Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Human Genomes

Influenza HN Virus Sequences

Jaguar Microsatellites

Bacterial Whole Genome Sequences

Metabarcoding of Fish Communities


5. Data Exploration and Summaries

Genotype and Allele Frequencies

Allelic Richness

Missing Data

Haplotype and Nucleotide Diversity

The Class "haplotype"

Haplotype and Nucleotide Diversity From DNA Sequences

Genetic and Genomic Distances

Theoretical Background

Hamming Distance

Distances From DNA Sequences

Distances From Allele Sharing

Distances From Microsatellites

Summary by Groups

Sliding Windows

DNA Sequences

Summaries With Genomic Positions

Package SNPRelate

Multivariate Methods

Matrix Decomposition


Singular Value Decomposition

Power Method and Random Matrices

Principal Component Analysis




Multidimensional Scaling

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Human Genomes

Influenza HN Virus Sequences

Jaguar Microsatellites

Bacterial Whole Genome Sequences

Metabarcoding of Fish Communities


6. Linkage Disequilibrium and Haplotype Structure

Why Linkage Disequilibrium is Important?

Linkage Disequilibrium: Two Loci

Phased Genotypes

Theoretical Background

Implementation in pegas

Unphased Genotypes

More Than Two Loci

Haplotypes From Unphased Genotypes

The Expectation–Maximization Algorithm

Implementation in haplostats

Locus-Specific Imputation

Maps of Linkage Disequilibrium

Phased Genotypes With pegas



Case Studies

Complete Genomes of the Fruit Fly

Human Genomes

Jaguar Microsatellites


7. Population Genetic Structure

Hardy–Weinberg Equilibrium


Theoretical Background

Implementations in pegas and in mmod

Implementations in snpStats and in SNPRelate

Trees and Networks

Minimum Spanning Trees and Networks

Statistical Parsimony

Median Networks

Phylogenetic Trees

Multivariate Methods

Principles of Discriminant Analysis

Discriminant Analysis of Principal Components


Maximum Likelihood Methods

Bayesian Clustering


Likelihood Method

Principal Component Analysis of Coancestry

A Second Look at F-Statistics

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Influenza HN Virus Sequences

Jaguar Microsatellites


8. Geographical Structure

Geographical Data in R

Packages and Classes

Calculating Geographical Distances

A Third Look at F-Statistics

Hierarchical Components of Genetic Diversity

Analysis of Molecular Variance

Moran I and Spatial Autocorrelation

Spatial Principal Component Analysis

Finding Boundaries Between Populations

Spatial Ancestry (tessr)

Bayesian Methods (Geneland)

Case Studies

Complete Genomes of the Fruit Fly

Human Genomes


9. Past Demographic Events

The Coalescent

The Standard Coalescent

The Sequential Markovian Coalescent

Simulation of Coalescent Data

Estimation of _


Number of Alleles

Segregating Sites



Coalescent-Based Inference

Maximum Likelihood Methods

Analysis of Markov Chain Monte Carlo Outputs

Skyline Plots

Bayesian Methods

Heterochronous Samples

Site Frequency Spectrum Methods

The Stairway Method



Whole-Genome Methods (psmcr)

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Influenza HN Virus Sequences

Bacterial Whole Genome Sequences


10. Natural Selection

Testing Neutrality

Simple Tests

Selection in Protein-Coding Sequences

Selection Scans

A Fourth Look at F-Statistics

Association Studies (LEA)

Principal Component Analysis (pcadapt)

Scans for Selection With Extended Haplotypes

FST Outliers

Time-Series of Allele Frequencies

Case Studies

Mitochondrial Genomes of the Asiatic Golden Cat

Complete Genomes of the Fruit Fly

Influenza HN Virus Sequences


A Installing R Packages

B Compressing Large Sequence Files

C Sampling of Alleles in a Population

View More



Emmanuel Paradis is senior researcher in the French Institute of Research for Development (IRD). His research focuses on evolutionary models and their applications. The development and publication of software associated to his research has been an important aspect of his activities for more than twenty years. He adopted R as his main software for data analysis in 2000 and has since published and maintained several packages, including ape since 2002 and pegas since 2009. He gives regular workshops and trainings in several countries.


"The author has taken good care of including several important as well as emerging topics (data acquisition, next generation sequencing) that would be extremely useful for the readers. suggest that this book be targeted to graduate students and researchers who have some background in basic genetics or are taking a graduate level population genetics course…The data acquisition chapter, descriptions of DNA sample quality, and file formats are the strengths. Case studies are very valuable and would provide more "hands-on" training on working on specific population genetics problems."
~Santhosh Girirajan, Pennsylvania State University

"The strength of those chapters is to provide a global coverage of the field of population genetics based on a broad spectrum of statistical methods. The author proposes to deal with population genetic analyses in a unified programming framework that uses specific classes of the R packages ape/pegas and adegenet, and I was impressed by the work done."
~Oliver Francois, University Grenoble Alpes

"This book could serve as both a reference book and a textbook. Population genetics, applied bioinformatics, genomics, molecular ecology, and conservation genetic classes with a lab component at both undergraduate and graduate levels could teach from this text. Graduate students and possible postdocs in evolutionary biology and applied bioinformatics could use this as a reference. Additionally, government and non-profit organizations that process genetic samples for conservation and management purposes would find this instruction useful. …What this text offers is unique in that it is focused on practical steps to analyze data using already available programs that users can install…Given the variety of subjects and types of analyses, I think it could be a valuable resource for many students."
~Sarah Hendricks, San Diego Zoo Institute for Conservation Research