1st Edition

Gene Expression Data Analysis A Statistical and Machine Learning Perspective

    378 Pages 70 B/W Illustrations
    by Chapman & Hall

    378 Pages 70 B/W Illustrations
    by Chapman & Hall

    Development of high-throughput technologies in molecular biology during the last two decades has contributed to the production of tremendous amounts of data. Microarray and RNA sequencing are two such widely used high-throughput technologies for simultaneously monitoring the expression patterns of thousands of genes. Data produced from such experiments are voluminous (both in dimensionality and numbers of instances) and evolving in nature. Analysis of huge amounts of data toward the identification of interesting patterns that are relevant for a given biological question requires high-performance computational infrastructure as well as efficient machine learning algorithms. Cross-communication of ideas between biologists and computer scientists remains a big challenge.

    Gene Expression Data Analysis: A Statistical and Machine Learning Perspective has been written with a multidisciplinary audience in mind. The book discusses gene expression data analysis from molecular biology, machine learning, and statistical perspectives. Readers will be able to acquire both theoretical and practical knowledge of methods for identifying novel patterns of high biological significance. To measure the effectiveness of such algorithms, we discuss statistical and biological performance metrics that can be used in real life or in a simulated environment. This book discusses a large number of benchmark algorithms, tools, systems, and repositories that are commonly used in analyzing gene expression data and validating results. This book will benefit students, researchers, and practitioners in biology, medicine, and computer science by enabling them to acquire in-depth knowledge in statistical and machine-learning-based methods for analyzing gene expression data.

    Key Features:

     

    • An introduction to the Central Dogma of molecular biology and information flow in biological systems
    • A systematic overview of the methods for generating gene expression data
    • Background knowledge on statistical modeling and machine learning techniques
    • Detailed methodology of analyzing gene expression data with an example case study
    • Clustering methods for finding co-expression patterns from microarray, bulkRNA, and scRNA data
    • A large number of practical tools, systems, and repositories that are useful for computational biologists to create, analyze, and validate biologically relevant gene expression patterns
    • Suitable for multidisciplinary researchers and practitioners in computer science and biological sciences

    Preface. Introduction. Introduction. Central Dogma. Measuring Gene Expression. Representation of Gene Expression Data. Gene Expression Data Analysis: Applications. Machine Learning. Statistical and Biological Evaluation. Gene Expression Analysis Approaches. Differential Coexpression Analysis. Differential Expression Analysis. Tools and Systems for Gene Expression Data Analysis. Contribution of This Book. Organization of This Book. Information Flow in Biological Systems. Concept of systems theory. Complexity in Biological Systems. Central Dogma of Molecular Biology. Ambiguity in Central Dogma. Chapter Summary. Gene Expression Data Generation. History of Gene Expression Data Generation. Low Throughput Methods. High throughput methods. Chapter Summary. Statistical Foundations and Machine Learning. Introduction. Statistical Background. Background in Machine Learning Background. Chapter Summary. Coexpression Analysis. Introduction. Gene Co-expression Analysis. Measures to Identify Coexpressed Patterns. Coexpression Analysis Using Clustering. Network Analysis for Coexpressed Patterns Finding. Chapter Summary and Recommendations. Differential Expression Analysis. Introduction. Differential Expression (DE) of a Gene. Differential Expression Analysis (DEA). Biomarker Identification Using DEA: A Case Study. Chapter Summary and Recommendations. Tools and Systems. Introduction. Systems Biology Tools. Gene Expression Data Analysis Tools. Visualization. Validation. Biological Validation. Chapter Summary and Concluding Remarks. Concluding Remarks and Research Challenges. Concluding Remarks. Issues and Research Challenges. Glossary. Index.

    Biography

    Pankaj Barah is an Assistant professor in Molecular Biology and Biotechnology at Tezpur University. He has received his M.Sc. degree in Bioinformatics (2006) from University of Madras in India and PhD in Computational Systems Biology (2013) from the Norwegian University of Science and Technology (NTNU), Trondheim, Norway. He has worked as Bioinformatics scientist in the division of Theoretical Bioinformatics at German Cancer Research Center (DKFZ) in Heidelberg, Germany during 2015-2017. His research areas include- computational systems biology, bioinformatics, evolutionary systems biology, Next Generation Sequencing (NGS), Big data analytics and biological networks. He has authored 20 research articles, edited two books and written 5 book chapters. He is recipient of Ramalingaswami Re-entry Fellowship from the Department of Biotechnology, Government of India. Dr. Barah is currently a member of the Indian National Young Academy of Sciences.

    Dhruba Kumar Bhattacharyya is a professor in Computer Science and Engineering at Tezpur University. He teaches machine learning, network security, cryptography and computational biology in UG, PG and PhD classes at Tezpur University. Professor Bhattacharyya's research areas include machine learning, network security, and bioinformatics. He has published more than 280 research articles in leading international journals and peer-reviewed conference proceedings. Dr. Bhattacharyya has authored 5 technical reference books and edited 9 technical volumes. Under his guidance, twenty students have successfully completed Ph.D. in the areas of machine learning, bioinformatics and network security. He is PI of several major research grants, including the Centre of Excellence of Ministry of HRD of Government of India under FAST instituted at Tezpur University. Professor Bhattacharyya is a Fellow of IETE and IE, India. He is also a Senior Member of IEEE. More details about Dr Bhattacharyya can be found at http://agnigarh.tezu.ernet.in/_dkb/index.html.

    Jugal Kumar Kalita teaches computer science at the University of Colorado, Colorado Springs. He received M.S. and Ph.D. degrees in computer and information science from the University of Pennsylvania in Philadelphia in 1988 and 1990, respectively. Prior to that he had received an M.Sc. from the University of Saskatchewan in Saskatoon, Canada in 1984 and a B.Tech. from the Indian Institute of Technology, Kharagpur in 1982. His expertise is in the areas of artificial intelligence and machine learning, and the application of techniques in machine learning to network security, natural language processing and bioinformatics. He has published 130 papers in journals and refereed conferences. He is the author of a book on Perl titled "On Perl: Perl for Students and Professionals". He is also a coauthor of a book titled "Network Anomaly Detection: A Machine Learning Perspective" with Dr Dhruba K Bhattacharyya. He received the Chancellor's Award at the University of Colorado, Colorado Springs, in 2011, in recognition of lifelong excellence in teaching, research and service. More details about Dr. Kalita can be found at http://www.cs.uccs.edu/_kalita.