1st Edition

Big Data in Omics and Imaging Integrated Analysis and Causal Inference

By Momiao Xiong Copyright 2018
    766 Pages 40 B/W Illustrations
    by Chapman & Hall

    766 Pages 40 B/W Illustrations
    by Chapman & Hall

    766 Pages 40 B/W Illustrations
    by Chapman & Hall

    Big Data in Omics and Imaging: Integrated Analysis and Causal Inference addresses the recent development of integrated genomic, epigenomic and imaging data analysis and causal inference in big data era. Despite significant progress in dissecting the genetic architecture of complex diseases by genome-wide association studies (GWAS), genome-wide expression studies (GWES), and epigenome-wide association studies (EWAS), the overall contribution of the new identified genetic variants is small and a large fraction of genetic variants is still hidden. Understanding the etiology and causal chain of mechanism underlying complex diseases remains elusive. It is time to bring big data, machine learning and causal revolution to developing a new generation of genetic analysis for shifting the current paradigm of genetic analysis from shallow association analysis to deep causal inference and from genetic analysis alone to integrated omics and imaging data analysis for unraveling the mechanism of complex diseases.



     



    FEATURES









    • Provides a natural extension and companion volume to Big Data in Omic and Imaging: Association Analysis, but can be read independently.






    • Introduce causal inference theory to genomic, epigenomic and imaging data analysis






    • Develop novel statistics for genome-wide causation studies and epigenome-wide causation studies.






    • Bridge the gap between the traditional association analysis and modern causation analysis






    • Use combinatorial optimization methods and various causal models as a general framework for inferring multilevel omic and image causal networks






    • Present statistical methods and computational algorithms for searching causal paths from genetic variant to disease






    • Develop causal machine learning methods integrating causal inference and machine learning






    • Develop statistics for testing significant difference in directed edge, path, and graphs, and for assessing causal relationships between two networks




     



    The book is designed for graduate students and researchers in genomics, epigenomics, medical image, bioinformatics, and data science. Topics covered are: mathematical formulation of causal inference, information geometry for causal inference, topology group and Haar measure, additive noise models, distance correlation, multivariate causal inference and causal networks, dynamic causal networks, multivariate and functional structural equation models, mixed structural equation models, causal inference with confounders, integer programming, deep learning and differential equations for wearable computing, genetic analysis of function-valued traits, RNA-seq data analysis, causal networks for genetic methylation analysis, gene expression and methylation deconvolution, cell –specific causal networks, deep learning for image segmentation and image analysis, imaging and genomic data analysis, integrated multilevel causal genomic, epigenomic and imaging data analysis.







    1. Genotype-Phenotype Network Analysis

    Undirected Graphs for Genotype Network

    Gaussian Graphic Model

    Alternating Direction Method of Multipliers for Estimation of Gaussian Graphical Model

    Coordinate Descent Algorithm and Graphical Lasso

    Multiple Graphical Models

    Directed Graphs and Structural Equation Models for Networks

    Directed Acyclic Graphs

    Linear Structural Equation Models

    Estimation Methods

    Sparse Linear Structural Equations

    Penalized Maximum Likelihood Estimation

    Penalized Two Stage Least Square Estimation

    Penalized Three Stage Least Square Estimation

    Functional Structural Equation Models for Genotype-Phenotype Networks

    Functional Structural Equation Models

    Group Lasso and ADMM for Parameter Estimation in the Functional Structural Equation Models

    Causal Calculus

    Effect Decomposition and Estimation

    Graphical Tools for Causal Inference in Linear SEMs

    Identification and Single-door Criterion

    Instrument Variables

    Total Effects and Backdoor Criterion

    Counterfactuals and Linear SEMs

    Simulations and Real Data Analysis

    Simulations for Model Evaluation

    Application to Real Data Examples

    Appendix 1A

    Appendix 1B

    Exercises

    Figure Legend

     

    2 Causal analysis and network biology

    Bayesian Networks as a General Framework for Causal Inference

    Parameter Estimation and Bayesian Dirichlet Equivalent Uniform Score for Discrete Bayesian Networks

    Structural Equations and Score Metrics for Continuous Causal Networks

    Multivariate SEMs for Generating Node Core Metrics

    Mixed SEMs for Pedigree-based Causal Inference

    Bayesian Networks with Discrete and Continuous Variable

    Two-class Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks

    Multiple Network Penalized Functional Logistic Regression Models for NGS Data

    Multi-class Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks

    Other Statistical Models for Quantifying Node Score Function

    Integer Programming for Causal Structure Leaning

    Introduction

    Integer Linear Programming Formulation of DAG Learning

    Cutting Plane for Integer Linear Programming

    Branch and Cut Algorithm for Integer Linear Programming

    Sink Finding Primal Heuristic Algorithm

    Simulations and Real Data Analysis

    Simulations

    Real Data Analysis

    Figure Legend

    Software Package

    Appendix 2A Introduction to Smoothing Splines

    Smoothing Spline Regression for a Single Variable

    Smoothing Spline Regression for Multiple Variables

    Appendix 2B Penalized Likelihood Function for Jointly Observational and Interventional Data

    Exercises

    Figure Legend

     

    3. Wearable Computing and Genetic Analysis of Function-valued Traits

    Classification of Wearable Biosensor Data

    Introduction

    Functional Data Analysis for Classification of Time Course Wearable Biosensor Data

    Differential Equations for Extracting Features of the Dynamic Process and for Classification of Time Course Data

    Deep Learning for Physiological Time Series Data Analysis

    Association Studies of Function-Valued Traits

    Introduction

    Functional Linear Models with both Functional Response and Predictors for Association Analysis of Function-valued Traits

    Test Statistics

    Null Distribution of Test Statistics

    Power

    Real Data Analysis

    Association Analysis of Multiple Function-valued Traits

    Gene-gene Interaction Analysis of Function-Valued Traits

    Introduction

    Functional Regression Models

    Estimation of Interaction Effect Function

    Test Statistics

    Simulations

    Real Data Analysis

    Figure Legend

    Appendix 3.A Gradient Methods for Parameter Estimation in the Convolutional Neural

    Networks

    Multilayer Feedforward Pass

    Backpropagation Pass

    Convolutional Layer

    Exercises

     

    4. RNA-seq Data Analysis

    Normalization Methods on RNA-seq Data Analysis

    Gene Expression

    RNA Sequencing Expression Profiling

    Methods for Normalization

    Differential Expression Analysis for RNA-Seq Data

    Distribution-based Approach to Differential Expression Analysis

    Functional Expansion Approach to Differential Expression Analysis of RNA-Seq Data

    Differential Analysis of Allele Specific Expressions with RNA-Seq Data

    eQTL and eQTL Epistasis Analysis with RNA-Seq Data

    Matrix Factorization

    Quadratically Regularized Matrix Factorization and Canonical Correlation Analysis

    QRFCCA for eQTL and eQTL Epistasis Analysis of RNA-Seq Data

    Real Data Analysis

    Gene Co-expression Network and Gene Regulatory Networks

    Co-expression Network Construction with RNA-Seq Data by CCA and FCCA

    Graphical Gaussian Models

    Real Data Applications

    Directed Graph and Gene Regulatory Networks

    Hierarchical Bayesian Networks for Whole Genome Regulatory Networks

    Linear Regulatory Networks

    Nonlinear Regulatory Networks

    Dynamic Bayesian Network and Longitudinal Expression Data Analysis

    Single Cell RNA-Seq Data Analysis, Gene Expression Deconvolution and Genetic Screening

    Cell Type Identification

    Gene Expression Deconvolution and Cell Type-Specific Expression

    Figure Legend

    Software Package

    Appendix 4.1A Variational Bayesian Theory for Parameter Estimation and RNA-Seq

    Normalization

    Variational Methods for expectation-maximization (EM) algorithm

    Variational Methods for Bayesian Learning

    Appendix 4.2A Log-linear Model for Differential Expression Analysis of the RNA-Seq Data with Negative Binomial Distribution

    Appendix 4.5A Derivation of ADMM Algorithm

    Appendix 4.5B Low Rank Representation Induced Sparse Structural Equation Models

    Appendix 4.6A Maximum Likelihood (ML) Estimation of Parameters for Dynamic Structural Equation Models

    Appendix 4.6B Generalized Least Squares Estimator of The Parameters in Dynamic Structural Equation Models

    Appendix 4.6C Proximal Algorithm for L1-Penalized Maximum Likelihood Estimation of Dynamic Structural Equation Model

    Appendix 4.6D Proximal Algorithm for L1- Penalized Generalized Least Square Estimation of Parameters in the Dynamic Structural Equation Models

    Appendix 4.7A Multikernel Learning and Spectral Clustering for Cell Type Identification

    Exercises

     

    5 Methylation Data Analysis

    DNA Methylation Analysis

    Epigenome-wide Association Studies (EWAS)

    Single-Locus Test

    Set-based Methods

    Epigenome-wide Causal Studies

    Introduction

    Additive Functional Model for EWCS

    Genome-wide DNA Methylation Quantitative Trait Locus (mQTL) Analysis

    Causal Networks for Genetic-Methylation Analysis

    Structural Equation Models with Scalar Endogenous Variables and Functional Exogenous Variables

    Functional Structural Equation Models with Functional Endogenous Variables and Scalar Exogenous Variables (FSEMS)

    Functional Structural Equation Models with both Functional Endogenous Variables an Exogenous Variables (FSEMF)

    Figure Legend

    Software Package

    Appendix 5A Biased and Unbiased Estimators of the HSIC

    Appendix 5B Asymptotic Null Distribution of Block-Based HSIC

    Exercises

     

    6 Imaging and Genomics

    Introduction

    Image Segmentation

    Unsupervised Learning Methods for Image Segmentation

    Supervised Deep Learning Methods for Image Segmentation

    Two or Three dimensional Functional Principal Component Analysis for Image Data Reduction 645

    Formulation

    Integral Equation and Eigenfunctions

    Association Analysis of Imaging-Genomic Data

    Multivariate Functional Regression Models for Imaging-Genomic Data Analysis

    Multivariate Functional Regression Models for Longitudinal Imaging-Genetics Analysis

    Quadratically Regularized Functional Canonical Correlation Analysis for Gene-Gene Interaction Detection in Imaging-Genetic Studies

    Causal Analysis of Imaging-Genomic Data

    Sparse SEMs for Joint Causal Analysis of Structural Imaging and Genomic Data

    Sparse Functional Structural Equation Models for phenotype and genotype networks.

    Conditional Gaussian Graphical Models (CGGMs) for Structural Imaging and Genomic Data Analysis.

    Time Series SEMs for Integrated Causal Analysis of fMRI and Genomic Data Models

    Reduced Form Equations

    Single Equation and Generalized Least Square Estimator

    Sparse SEMs and Alternating Direction Method of Multipliers

    Causal machine learning

    Figure Legend

    Software Package

    Appendix 6A Factor Graphs and Mean Field Methods for Prediction of Marginal Distribution

    Exercises

     

    7. From Association Analysis to Integrated Causal Inference

    Genome-wide Causal Studies

    Mathematical Formulation of Causal Analysis

    Basic Causal Assumptions

    Linear Additive SEMs with non-Gaussian Noise

    Information Geometry Approach

    Causal Inference on Discrete Data

    Multivariate Causal Inference and Causal Networks

    Markov Condition, Markov Equivalence, Faithfulness and Minimality

    Multilevel Causal Networks for Integrative Omics and Imaging Data Analysis

    Causal Inference with Confounders

    Causal Sufficiency

    Instrumental Variables

    Figure Legend

    Software Package

    Appendix 7A Approximation of log-likelihood Ratio for the LiNGAM

    Appendix 7B Orthogonality Conditions and Covariance

    Appendix 7C Equivalent Formulations Orthogonality Conditions

    Appendix 7D M-L Distance in Backward Direction

    Appendix 7E Multiplicativity of Traces

    Appendix 7F Anisotropy and K-L Distance

    Appendix 7G Trace Method for Noise Linear Model

    Appendix 7H Characterization of Association

    Appendix 7I Algorithm for Sparse Trace Method

    Exercises

    Biography

    Momiao Xiong is a professor of Biostatistics at the University of Texas Health Science Center in Houston where he has worked since 1997. He received his PhD in 1993 from the University of Georgia.

    "I would like to recommend a new option in the library market, Big Data in Omics and Imaging: Integrated Analysis and Causal Inference, written by Momiao Xiong, a Professor of Biostatistics at the University of Texas Health Science Center in Houston. It is an extensive and comprehensive textbook on big data inbiomedical sciences. Indeed, its contents is very valuable, because it concerns the analysis of large-scale datasets, which now regularly occur in computational biology and medicine, in particular in ‘omics’ problems... The book introduces in detail the currently developed statistical methods and software for big genomic and epi-genomic, wearable biosensors, computing, and image data analysis. It covers important topics in this area, such as: genotype-phenotype network analysis, causal analysis and network biology, wearable computing and genetic analysis of function-valued traits, RNA-seq data analysis, methylation data analysis, imaging, and genomics... It was really interesting and fascinating to go through the pages of the book. It would hold a very valuable position on the home shelf-book or university library; I warmly recommend the book."
    - Malgorzata Cwiklinska-Jurkowska, ISCB, December 2019

    "In his book, Professor Xiong introduces, discusses, and implements a rich variety of statistical tools that can be used to study large-scale features obtained from the human brain and genome, map neural and genetic signatures to behavioral and disease outcomes, and make causal enquiries into their relationships. The scope of the book is comprehensive, the concepts deep, and technicalities oftentimes mathematically heavy...the book discusses statistical concepts and devices that readers may find useful in studying general problems in human neuroscience and human genetics."
    - Oliver Y. Chén, Journal of the American Statistical Association, March 2020