Compositional Data Analysis in Practice is a user-oriented practical guide to the analysis of data with the property of a constant sum, for example percentages adding up to 100%. Compositional data can give misleading results if regular statistical methods are applied, and are best analysed by first transforming them to logarithms of ratios. This book explains how this transformation affects the analysis, results and interpretation of this very special type of data. All aspects of compositional data analysis are considered: visualization, modelling, dimension-reduction, clustering and variable selection, with many examples in the fields of food science, archaeology, sociology and biochemistry, and a final chapter containing a complete case study using fatty acid compositions in ecology. The applicability of these methods extends to other fields such as linguistics, geochemistry, marketing, economics and finance.
Table of Contents
What are compositional data, and why are they special?
Geometry and visualization of compositional data.
Properties and distributions of logratios.
Regression models involving compositional data.
Dimension reduction using logratio analysis.
Clustering of compositional data.
The problem of zeros, with some solutions.
Simplifying the task: variable selection.
Case study: Fatty acids of marine amphipods.
Appendix A: Theory of compositional data analysis.
Appendix B Bibliography of compositional data analysis
Appendix C Computation of compositional data analysis
Appendix D Glossary of terms
Appendix E Epilogue
Michael Greenacre is Professor of Statistics at the Universitat Pompeu Fabra, Barcelona, Spain, where he teaches a course, amongst others, on Data Visualization. He has authored and co-edited nine books and 80 journal articles and book chapters, mostly on correspondence analysis, the latest being Correspondence Analysis in Practice (Third Edition) in 2016. He has given short courses in fifteen countries to environmental scientists, sociologists, data scientists and marketing professionals, and has specialized in statistics in ecology and social science.
"(…This book) avoids cumbersome theoretical digressions and only presents to the reader the essential basic concepts for the application of CODA, using ratios and logratios that retain most of the original data structure and, subsequently, may lead to proper conclusions. … The simplification of the analysis and the straightforward interpretability of results is, clearly, one of the primary values of the publication. In addition, the emphasis on the general application of weights in the calculus of most of the operations and methodologies used throughout the book deserves a special mention.. … Altogether, the book and the easyCODA R package may represent a promising instrument for introducing CODA in the fat and oils field, where fatty acid compositions have been treated until now exclusively by classical multivariate techniques without considering their compositional structure. Predicting the future is risky, but the book may represent an essential instrument for CODA spreading since it represents just what many practitioners were expecting to initiate their experience in this promising new statistical field of compositional data analysis."
—A. Garrido Fernández in Gracas y Aceites – International Journal of Fats and Oils, July-September 2019
"…an interesting book, certainly controversial in some respects for scholars in the field. It has a strong data analytic focus and requires some background in multivariate analysis and biplot theory for a good understanding. It overemphasizes links to correspondence analysis at times, but is very well written and didactically nicely sliced into modules numbering exactly eight pages each. Most examples in the book are reproducible in the R environment. Finally, it will help the analyst to reflect on the use of weights, to the benefit of the analysis of compositional data."
—Jan Graffelman in the Biometrical Journal, March 2019
"This book provides a essential reference as a practical way to evaluate and interpret compositional data across a broad spectrum of disciplines in the life and natural sciences for both academia and industry. The book takes a prescribed approach starting with the definition of compositional data, the use of logratios for dimension reduction, clustering and variable selection issues along with several practical examples and a case study. The theory of compositional data analysis and computational aspects are included as Appendices.
This book can be used at the undergraduate level as part of a course in data analysis. At the graduate level, for research studies, this book is essential in understanding how to collect and interpret compositional data. Using the methods described in this book will help to avoid costly mistakes made from misinterpreting compositional data."
—Professor Eric Grunsky, Department of Earth and Environmental Sciences, University of Waterloo
Waterloo, Ontario, Canada
"Clearly the best introduction to compositional data analysis"
—Professor John Bacon-Shone
"Compositional Data Analysis in Practice is a short book by Michael Greenacre that introduces the statistician to the analysis of data partitions adding to a constant total. These data appear frequently in biology, chemistry, sociology, and other areas. ...The book is organised in to 10 chapters, each of eight pages, with a final summary, which makes it easy to read and very didactic. Easy to follow examples are used throughout the book, analyzed with R packages. This book is short, which I find appealing for a fast introduction to the topic. It covers the important practical analytical problems and provides easy solutions with example code. I recommend it for those who need to use compositional data analysis, or require a study guide for courses on the topic."
- Victor Moreno in ISCB, June 2019