Computational Genomics with R
Computational Genomics with R provides a starting point for beginners in genomic data analysis and also guides more advanced practitioners to sophisticated data analysis techniques in genomics. The book covers topics from R programming, to machine learning and statistics, to the latest genomic data analysis techniques. The text provides accessible information and explanations, always with the genomics context in the background. This also contains practical and well-documented examples in R so readers can analyze their data by simply reusing the code presented. As the field of computational genomics is interdisciplinary, it requires different starting points for people with different backgrounds. For example, a biologist might skip sections on basic genome biology and start with R programming, whereas a computer scientist might want to start with genome biology.
- You will have the basics of R and be able to dive right into specialized uses of R for computational genomics such as using Bioconductor packages.
- You will be familiar with statistics, supervised and unsupervised learning techniques that are important in data modeling, and exploratory analysis of high-dimensional data.
- You will understand genomic intervals and operations on them that are used for tasks such as aligned read counting and genomic feature annotation.
- You will know the basics of processing and quality checking high-throughput sequencing data.
- You will be able to do sequence analysis, such as calculating GC content for parts of a genome or finding transcription factor binding sites.
- You will know about visualization techniques used in genomics, such as heatmaps, meta-gene plots, and genomic track visualization.
- You will be familiar with analysis of different high-throughput sequencing data sets, such as RNA-seq, ChIP-seq, and BS-seq.
- You will know basic techniques for integrating and interpreting multi-omics datasets.
Altuna Akalin is a group leader and head of the Bioinformatics and Omics Data Science Platform at the Berlin Institute of Medical Systems Biology, Max Delbrück Center, Berlin. He has been developing computational methods for analyzing and integrating large-scale genomics data sets since 2002. He has published an extensive body of work in this area. The framework for this book grew out of the yearly computational genomics courses he has been organizing and teaching since 2015.
Table of Contents
- Introduction to Genomics. 2. Introduction to R for Genomic Data Analysis 3. Statistics for Genomics 4. Exploratory Data Analysis with Unsupervised Machine Learning 5. Predictive Modeling with Supervised Machine Learning 6. Operations on Genomic Intervals and Genome Arithmetic 7. Quality Check, Processing and Alignment of High-throughput Sequencing Reads 8. RNA-seq Analysis 9. ChIP-seq analysis 10. DNA methylation analysis using bisulfite sequencing 11. Multi-omics Analysis.
Dr. Altuna Akalin is a bioinformatics scientist and the head of Bioinformatics and Omics Data Science Platform at the Berlin Institute of Medical Systems Biology, Max Delbrück Center in Berlin. He has been developing computational methods for analyzing and integrating large-scale genomics data sets since 2002. His interest is in using machine learning and statistics to uncover patterns related to important biological variables such as disease state and type. He has lived in the USA, Norway, Turkey, Japan, and Switzerland in order to pursue research work and education related to computational genomics.
'This book provides a basic overview of computational tools developed in R for carrying out data analyses in genomics. It can be a valuable companion for anyone whowants to utilise the computational tools developed within the Bioconductor and R environments for education and research. This book’s main target audience are students of computational biology to get a first look at the diversity of machine learning methods. Thebook will also servewell biomedical researchers needing a guide to packages that can help them with the analysis of data that they encounter in their work.'
- Krzysztof Podgórski, International Statistical Review (2021) doi: 10.1111/insr.12453