Proven Methods for Big Data Analysis
As big data has become standard in many application areas, challenges have arisen related to methodology and software development, including how to discover meaningful patterns in the vast amounts of data. Addressing these problems, Applied Biclustering Methods for Big and High-Dimensional Data Using R shows how to apply biclustering methods to find local patterns in a big data matrix.
The book presents an overview of data analysis using biclustering methods from a practical point of view. Real case studies in drug discovery, genetics, marketing research, biology, toxicity, and sports illustrate the use of several biclustering methods. References to technical details of the methods are provided for readers who wish to investigate the full theoretical background. All the methods are accompanied with R examples that show how to conduct the analyses. The examples, software, and other materials are available on a supplementary website.
Table of Contents
Introduction. From Cluster Analysis to Biclustering. Biclustering Methods:δ-biclustering and FLOC Algorithm. The xMotif Algorithm. The Bimax Algorithm. The Plaid Model. Spectral Biclustering. FABIA. Iterative Signature Algorithm. Ensemble Methods and Robust Solutions. Case Studies and Applications: Gene Expression Experiments in Drug Discovery. Biclustering Methods in Chemoinformatics and Molecular Modeling. Integrative Analysis of miRNA and mRNA Data. Enrichment of Gene Expression Modules using Multiple Factor Analysis and Biclustering. Ranking of Biclusters in Drug Discovery Experiments. HapFABIA: Biclustering for Detecting Identity by Descent. Overcoming Data Dimensionality Problems in Market Segmentation. Identification of Local Patterns in the NBA Performance Indicators. R Tools for Biclustering: The BiclustGUI Package. We R a Community: Including a New Package in BiclustGUI. Biclustering for Cloud Computing. The biclustGUI Shiny App. Bibliography. Index.
Adetayo Kasim is a senior research statistician at Durham University.
Ziv Shkedy is a professor in the Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat) in the Center for Statistics at the University of Hasselt.
Sebastian Kaiser is a professor in the Department of Statistics in the Faculty of Mathematics, Informatics and Statistics at Ludwig-Maximilians University of Munich.
Sepp Hochreiter is a professor and head of the Institute of Bioinformatics at Johannes Kepler University Linz.
Willem Talloen is a principal statistician at the Janssen Pharmaceutical Companies of Johnson & Johnson.
"One finds here not only the final results illustrated frequently in colour images, but also the main steps of calculations in R with a possibility to access freely the software located somewhere in the cloud. The book represents an interesting and useful initiative and a tremendous work for putting all this together. It is also a testimony how much data analysis methods have widened and deepened in recent years." ~International Society for Clinical Biostatistics
"Statisticians, software specialists, and people in application fields address methodology and software development for big data and high-dimensional data in contexts where local patterns in a large data matrix are of primary interest. They describe biclustering methods for finding such patterns, and use the open-source statistics software R. The topics include from cluster analysis to biclustering, ensemble methods and robust solutions, biclustering methods in chemoinformatics and molecular modeling, overcoming data dimensionality problems in market segmentation, and biclustering for cloud computing." ~ProtoView
"A key feature of this book is the focus on R tools, with an emphasis on building a fully-functional and user-friendly data analysis solution.All presentedmethods are integrated into an R package BiClustGUI, which provides user-friendly interface and allows for the addition of custom extensions to the implemented methods. The book’s website . . . provides a wealth of additional resources on biclustering, as well as links to all code implementation and data from the book. The chapter on cloud computing will be particularly useful to some readers, as it describes a portable implementation of a biclustering solution that integrates various previously introduced methods. In addition to replicating the results reported in the book, this section provides a blueprint for reproducible research." ~The American Statistician