Proven Methods for Big Data Analysis
As big data has become standard in many application areas, challenges have arisen related to methodology and software development, including how to discover meaningful patterns in the vast amounts of data. Addressing these problems, Applied Biclustering Methods for Big and High-Dimensional Data Using R shows how to apply biclustering methods to find local patterns in a big data matrix.
The book presents an overview of data analysis using biclustering methods from a practical point of view. Real case studies in drug discovery, genetics, marketing research, biology, toxicity, and sports illustrate the use of several biclustering methods. References to technical details of the methods are provided for readers who wish to investigate the full theoretical background. All the methods are accompanied with R examples that show how to conduct the analyses. The examples, software, and other materials are available on a supplementary website.
Table of Contents
Ziv Shkedy, Adetayo Kasim, Sepp Hochreiter, Sebastian Kaiser, and Willem Talloen
From Cluster Analysis to Biclustering
Dhammika Amaratunga, Javier Cabrera, Nolen Joy Perualila, Adetayo Kasim, and Ziv Shkedy
δ-biclustering and FLOC Algorithm
Adetayo Kasim, Sepp Hochreiter, and Ziv Shkedy
The xMotif Algorithm
Ewoud De Troyer, Dan Lin, Ziv Shkedy, and Sebastian Kaiser
The Bimax Algorithm
Ewoud De Troyer, Suzy Van Sanden, Ziv Shkedy, and Sebastian Kaiser
The Plaid Model
Ziv Shkedy, Ewoud De Troyer, Adetayo Kasim, Sepp Hochreiter, and Heather Turner
Adetayo Kasim, Setia Pramana, and Ziv Shkedy
Iterative Signature Algorithm
Adetayo Kasim and Ziv Shkedy
Ensemble Methods and Robust Solutions
Tatsiana Khamiakova, Sebastian Kasier, and Ziv Shkedy
Case Studies and Applications
Gene Expression Experiments in Drug Discovery
Willem Talloen, Hinrich W.H. Göhlmann, Bie Verbist, Nolen Joy Perualila, Ziv Shkedy, Adetayo Kasim, and the QSTAR Consortium
Biclustering Methods in Chemoinformatics and Molecular Modeling
Nolen Joy Perualila, Ziv Shkedy, Aakash Chavan Ravindranath, Georgios Drakakis, Sonia Liggi, Andreas Bender, Adetayo Kasim, QSTAR Consortium, Willem Talloen, and Hinrich. W.H. Göhlmann
Integrative Analysis of miRNA and mRNA Data
Tatsiana Khamiakova, Adetayo Kasim and Ziv Shkedy
Enrichment of Gene Expression Modules using Multiple Factor Analysis and Biclustering
Nolen Joy Perualila, Ziv Shkedy, Dhammika Amaratunga, Javier Cabrera, and Adetayo Kasim
Ranking of Biclusters in Drug Discovery Experiments
Nolen Joy Perualila, Ziv Shkedy, Sepp Hochreiter, and Djork-Arne Clevert
HapFABIA: Biclustering for Detecting Identity by Descent
Overcoming Data Dimensionality Problems in Market Segmentation
Sebastian Kaiser Sara Dolnicar, Katie Lazarevski, and Friedrich Leisch
Identification of Local Patterns in the NBA Performance Indicators
Ziv Shkedy, Rudradev Sengupta, and Nolen Joy Perualila
R Tools for Biclustering
The BiclustGUI Package
Ewoud De Troyer, Martin Otava, Jitao David Zhang, Setia Pramana, Tatsiana Khamiakova, Sebastian Kaiser, Martin Sill, Aedin Culhane, Daniel Gusenleitner,Pierre Gestraud, Gabor Csardi, Mengsteab Aregay, Sepp Hochreiter, Gunter Klambauer, Djork-Arne Clevert, Tobias Verbeke, Nolen Joy Perualila, Adetayo Kasim, and Ziv Shkedy
We R a Community: Including a New Package in BiclustGUI
Ewoud De Troyer
Biclustering for Cloud Computing
Rudradev Sengupta, Oswaldo Trelles, Oscar Torreno Tirado, and Ziv Shkedy
The biclustGUI Shiny App
Ewoud De Troyer, Rudradev Sengupta, Martin Otava, Jitao David Zhang, Sebastian Kaiser, Aedin Culhane, Daniel Gusenleitner, Pierre Gestraud, Gabor Csardi, Sepp Hochreiter, Gunter Klambauer, Djork-Arne Clevert, Nolen Joy Perualila, Adetayo Kasim, and Ziv Shkedy
Adetayo Kasim is a senior research statistician at Durham University.
Ziv Shkedy is a professor in the Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat) in the Center for Statistics at the University of Hasselt.
Sebastian Kaiser is a professor in the Department of Statistics in the Faculty of Mathematics, Informatics and Statistics at Ludwig-Maximilians University of Munich.
Sepp Hochreiter is a professor and head of the Institute of Bioinformatics at Johannes Kepler University Linz.
Willem Talloen is a principal statistician at the Janssen Pharmaceutical Companies of Johnson & Johnson.
"One finds here not only the final results illustrated frequently in colour images, but also the main steps of calculations in R with a possibility to access freely the software located somewhere in the cloud. The book represents an interesting and useful initiative and a tremendous work for putting all this together. It is also a testimony how much data analysis methods have widened and deepened in recent years." ~International Society for Clinical Biostatistics
"Statisticians, software specialists, and people in application fields address methodology and software development for big data and high-dimensional data in contexts where local patterns in a large data matrix are of primary interest. They describe biclustering methods for finding such patterns, and use the open-source statistics software R. The topics include from cluster analysis to biclustering, ensemble methods and robust solutions, biclustering methods in chemoinformatics and molecular modeling, overcoming data dimensionality problems in market segmentation, and biclustering for cloud computing." ~ProtoView
"A key feature of this book is the focus on R tools, with an emphasis on building a fully-functional and user-friendly data analysis solution.All presentedmethods are integrated into an R package BiClustGUI, which provides user-friendly interface and allows for the addition of custom extensions to the implemented methods. The book’s website . . . provides a wealth of additional resources on biclustering, as well as links to all code implementation and data from the book. The chapter on cloud computing will be particularly useful to some readers, as it describes a portable implementation of a biclustering solution that integrates various previously introduced methods. In addition to replicating the results reported in the book, this section provides a blueprint for reproducible research." ~The American Statistician