Large biological data, which are often noisy and high-dimensional, have become increasingly prevalent in biology and medicine. There is a real need for good training in statistics, from data exploration through to analysis and interpretation. This book provides an overview of statistical and dimension reduction methods for high-throughput biological data, with a specific focus on data integration. It starts with some biological background, key concepts underlying the multivariate methods, and then covers an array of methods implemented using the mixOmics package in R.
- Provides a broad and accessible overview of methods for multi-omics data integration
- Covers a wide range of multivariate methods, each designed to answer specific biological questions
- Includes comprehensive visualisation techniques to aid in data interpretation
- Includes many worked examples and case studies using real data
- Includes reproducible R code for each multivariate method, using the mixOmics package
The book is suitable for researchers from a wide range of scientific disciplines wishing to apply these methods to obtain new and deeper insights into biological mechanisms and biomedical problems. The suite of tools introduced in this book will enable students and scientists to work at the interface between, and provide critical collaborative expertise to, biologists, bioinformaticians, statisticians and clinicians.
Table of Contents
I Modern biology and multivariate analysis
1. Multi-omics and biological systems
2. The cycle of analysis
3. Key multivariate concepts and dimension reduction in mixOmics
4. Choose the right method for the right question in mixOmics
II mixOmics under the hood
5. Projection to Latent Structures
6. Visualisation for data integration
7. Performance assessment in multivariate analyses
III mixOmics in action
8. mixOmics: get started
9. Principal Component Analysis (PCA)
10. 10 Projection to Latent Structure (PLS)
11. Canonical Correlation Analysis (CCA)
12. PLS - Discriminant Analysis (PLS-DA)
13. N − data integration
14. P − data integration
15. Glossary of Terms
Dr Kim-Anh Lê Cao develops novel methods, software and tools to interpret big biological data and answer research questions efficiently. She is committed to statistical education to instill best analytical practice and has taught numerous statistical workshops for biologists and leads collaborative projects in medicine, fundamental biology or microbiology disciplines. Dr Kim-Anh Lê Cao has a mathematical engineering background and graduated with a PhD in Statistics from the Université de Toulouse, France. She then moved to Australia first as a biostatistician consultant at QFAB Bioinformatics, then as a research group leader at the biomedical University of Queensland Diamantina Institute. She currently is Associate Professor in Statistical Genomics at the University of Melbourne. In 2019, Kim-Anh received the Australian Academy of Science’s Moran Medal for her contributions to Applied Statistics in multidisciplinary collaborations. She has been part of leadership program for women in STEMM, including the international Homeward Bound which culminated in a trip to Antarctica, and Superstars of STEM from Science Technology Australia.
Zoe Welham completed a BSc in molecular biology and during this time developed a keen interest in the analysis of big data. She completed a Masters of Bioinformatics with a focus on the statistical integration of different omics data in bowel cancer. She is currently a PhD candidate at the Kolling Institute in Sydney where she is furthering her research into bowel cancer with a focus on integrating microbiome data with other omics to characterise early bowel polyps. Her research interests include bioinformatics and biostatistics for many areas of biology and disseminating that information to the general public through reader-friendly writing.
"This book was eagerly awaited both to bring together numerous research works published in recent years and to support the use of the Mixomics software which has become an essential tool for data integration and exploration when dealing with multiple types of high-dimensional biological data. It is the result of many years of research on cutting-edge developments in this domain as for sparsity. The book is very pleasant to read and well-structured around the different multivariate approaches. It is well documented with many recent references on the statistical methods and is very didactic through numerous examples accompanied by R codes and illustrations. It can be used by a large audience of statisticians and biologists to process, analyze, visualize, and interpret their multivariate microbiome and multi-omics data, but also as a basis for a course. I highly recommend this book."
- Philippe Bastien, Senior Research Associate - L'Oréal R&I
"The book belongs to the Computational Biology Series and presents a wide spectrum of modern methods of multivariate statistical analysis, integration and high-dimension reduction for biological data evaluated via the specialized R package. The neologism Omic is used as a root related to constellations of objects with biological information, for instance, in genomes and proteins—genomics and proteomics (in studying proteins expressed by cells and tissues), metabolic and transcription products—metabolomics and transcriptomics (in studying messenger RNA molecules expressed from the gens of an organism), or also in economics—Reaganomics, etc.
[. . . ] Numerous links to the internet websites related to the considered methods of multi-omics data integration are suggested, particularly, the mixOmics project is described at the link http://www.mixOmics.org, and the package is available at Install |mixOmics. The developed methods and software are suitable not only for biologists and bioinformaticians students and researchers, but can be useful for solving computational and content problems in many other fields as well."
"This is an excellent book for computational biologists, bioinformaticians, statisticians, data scientists, and graduate students who work with high-throughput omics data. The book covers most fundamental concepts of multi-omics data integration, while focusing on their implementations through hands-on examples implemented in the mixOmics R package."
- Yuehua Cui, Michigan State University, Biometrics, September 2022