Big Data in Omics and Imaging
Integrated Analysis and Causal Inference
Preview
Book Description
Big Data in Omics and Imaging: Integrated Analysis and Causal Inference addresses the recent development of integrated genomic, epigenomic and imaging data analysis and causal inference in big data era. Despite significant progress in dissecting the genetic architecture of complex diseases by genomewide association studies (GWAS), genomewide expression studies (GWES), and epigenomewide association studies (EWAS), the overall contribution of the new identified genetic variants is small and a large fraction of genetic variants is still hidden. Understanding the etiology and causal chain of mechanism underlying complex diseases remains elusive. It is time to bring big data, machine learning and causal revolution to developing a new generation of genetic analysis for shifting the current paradigm of genetic analysis from shallow association analysis to deep causal inference and from genetic analysis alone to integrated omics and imaging data analysis for unraveling the mechanism of complex diseases.
FEATURES
 Provides a natural extension and companion volume to Big Data in Omic and Imaging: Association Analysis, but can be read independently.
 Introduce causal inference theory to genomic, epigenomic and imaging data analysis
 Develop novel statistics for genomewide causation studies and epigenomewide causation studies.
 Bridge the gap between the traditional association analysis and modern causation analysis
 Use combinatorial optimization methods and various causal models as a general framework for inferring multilevel omic and image causal networks
 Present statistical methods and computational algorithms for searching causal paths from genetic variant to disease
 Develop causal machine learning methods integrating causal inference and machine learning
 Develop statistics for testing significant difference in directed edge, path, and graphs, and for assessing causal relationships between two networks
The book is designed for graduate students and researchers in genomics, epigenomics, medical image, bioinformatics, and data science. Topics covered are: mathematical formulation of causal inference, information geometry for causal inference, topology group and Haar measure, additive noise models, distance correlation, multivariate causal inference and causal networks, dynamic causal networks, multivariate and functional structural equation models, mixed structural equation models, causal inference with confounders, integer programming, deep learning and differential equations for wearable computing, genetic analysis of functionvalued traits, RNAseq data analysis, causal networks for genetic methylation analysis, gene expression and methylation deconvolution, cell –specific causal networks, deep learning for image segmentation and image analysis, imaging and genomic data analysis, integrated multilevel causal genomic, epigenomic and imaging data analysis.
Table of Contents
1. GenotypePhenotype Network Analysis
Undirected Graphs for Genotype Network
Gaussian Graphic Model
Alternating Direction Method of Multipliers for Estimation of Gaussian Graphical Model
Coordinate Descent Algorithm and Graphical Lasso
Multiple Graphical Models
Directed Graphs and Structural Equation Models for Networks
Directed Acyclic Graphs
Linear Structural Equation Models
Estimation Methods
Sparse Linear Structural Equations
Penalized Maximum Likelihood Estimation
Penalized Two Stage Least Square Estimation
Penalized Three Stage Least Square Estimation
Functional Structural Equation Models for GenotypePhenotype Networks
Functional Structural Equation Models
Group Lasso and ADMM for Parameter Estimation in the Functional Structural Equation Models
Causal Calculus
Effect Decomposition and Estimation
Graphical Tools for Causal Inference in Linear SEMs
Identification and Singledoor Criterion
Instrument Variables
Total Effects and Backdoor Criterion
Counterfactuals and Linear SEMs
Simulations and Real Data Analysis
Simulations for Model Evaluation
Application to Real Data Examples
Appendix 1A
Appendix 1B
Exercises
Figure Legend
2 Causal analysis and network biology
Bayesian Networks as a General Framework for Causal Inference
Parameter Estimation and Bayesian Dirichlet Equivalent Uniform Score for Discrete Bayesian Networks
Structural Equations and Score Metrics for Continuous Causal Networks
Multivariate SEMs for Generating Node Core Metrics
Mixed SEMs for Pedigreebased Causal Inference
Bayesian Networks with Discrete and Continuous Variable
Twoclass Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks
Multiple Network Penalized Functional Logistic Regression Models for NGS Data
Multiclass Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks
Other Statistical Models for Quantifying Node Score Function
Integer Programming for Causal Structure Leaning
Introduction
Integer Linear Programming Formulation of DAG Learning
Cutting Plane for Integer Linear Programming
Branch and Cut Algorithm for Integer Linear Programming
Sink Finding Primal Heuristic Algorithm
Simulations and Real Data Analysis
Simulations
Real Data Analysis
Figure Legend
Software Package
Appendix 2A Introduction to Smoothing Splines
Smoothing Spline Regression for a Single Variable
Smoothing Spline Regression for Multiple Variables
Appendix 2B Penalized Likelihood Function for Jointly Observational and Interventional Data
Exercises
Figure Legend
3. Wearable Computing and Genetic Analysis of Functionvalued Traits
Classification of Wearable Biosensor Data
Introduction
Functional Data Analysis for Classification of Time Course Wearable Biosensor Data
Differential Equations for Extracting Features of the Dynamic Process and for Classification of Time Course Data
Deep Learning for Physiological Time Series Data Analysis
Association Studies of FunctionValued Traits
Introduction
Functional Linear Models with both Functional Response and Predictors for Association Analysis of Functionvalued Traits
Test Statistics
Null Distribution of Test Statistics
Power
Real Data Analysis
Association Analysis of Multiple Functionvalued Traits
Genegene Interaction Analysis of FunctionValued Traits
Introduction
Functional Regression Models
Estimation of Interaction Effect Function
Test Statistics
Simulations
Real Data Analysis
Figure Legend
Appendix 3.A Gradient Methods for Parameter Estimation in the Convolutional Neural
Networks
Multilayer Feedforward Pass
Backpropagation Pass
Convolutional Layer
Exercises
4. RNAseq Data Analysis
Normalization Methods on RNAseq Data Analysis
Gene Expression
RNA Sequencing Expression Profiling
Methods for Normalization
Differential Expression Analysis for RNASeq Data
Distributionbased Approach to Differential Expression Analysis
Functional Expansion Approach to Differential Expression Analysis of RNASeq Data
Differential Analysis of Allele Specific Expressions with RNASeq Data
eQTL and eQTL Epistasis Analysis with RNASeq Data
Matrix Factorization
Quadratically Regularized Matrix Factorization and Canonical Correlation Analysis
QRFCCA for eQTL and eQTL Epistasis Analysis of RNASeq Data
Real Data Analysis
Gene Coexpression Network and Gene Regulatory Networks
Coexpression Network Construction with RNASeq Data by CCA and FCCA
Graphical Gaussian Models
Real Data Applications
Directed Graph and Gene Regulatory Networks
Hierarchical Bayesian Networks for Whole Genome Regulatory Networks
Linear Regulatory Networks
Nonlinear Regulatory Networks
Dynamic Bayesian Network and Longitudinal Expression Data Analysis
Single Cell RNASeq Data Analysis, Gene Expression Deconvolution and Genetic Screening
Cell Type Identification
Gene Expression Deconvolution and Cell TypeSpecific Expression
Figure Legend
Software Package
Appendix 4.1A Variational Bayesian Theory for Parameter Estimation and RNASeq
Normalization
Variational Methods for expectationmaximization (EM) algorithm
Variational Methods for Bayesian Learning
Appendix 4.2A Loglinear Model for Differential Expression Analysis of the RNASeq Data with Negative Binomial Distribution
Appendix 4.5A Derivation of ADMM Algorithm
Appendix 4.5B Low Rank Representation Induced Sparse Structural Equation Models
Appendix 4.6A Maximum Likelihood (ML) Estimation of Parameters for Dynamic Structural Equation Models
Appendix 4.6B Generalized Least Squares Estimator of The Parameters in Dynamic Structural Equation Models
Appendix 4.6C Proximal Algorithm for L1Penalized Maximum Likelihood Estimation of Dynamic Structural Equation Model
Appendix 4.6D Proximal Algorithm for L1 Penalized Generalized Least Square Estimation of Parameters in the Dynamic Structural Equation Models
Appendix 4.7A Multikernel Learning and Spectral Clustering for Cell Type Identification
Exercises
5 Methylation Data Analysis
DNA Methylation Analysis
Epigenomewide Association Studies (EWAS)
SingleLocus Test
Setbased Methods
Epigenomewide Causal Studies
Introduction
Additive Functional Model for EWCS
Genomewide DNA Methylation Quantitative Trait Locus (mQTL) Analysis
Causal Networks for GeneticMethylation Analysis
Structural Equation Models with Scalar Endogenous Variables and Functional Exogenous Variables
Functional Structural Equation Models with Functional Endogenous Variables and Scalar Exogenous Variables (FSEMS)
Functional Structural Equation Models with both Functional Endogenous Variables an Exogenous Variables (FSEMF)
Figure Legend
Software Package
Appendix 5A Biased and Unbiased Estimators of the HSIC
Appendix 5B Asymptotic Null Distribution of BlockBased HSIC
Exercises
6 Imaging and Genomics
Introduction
Image Segmentation
Unsupervised Learning Methods for Image Segmentation
Supervised Deep Learning Methods for Image Segmentation
Two or Three dimensional Functional Principal Component Analysis for Image Data Reduction 645
Formulation
Integral Equation and Eigenfunctions
Association Analysis of ImagingGenomic Data
Multivariate Functional Regression Models for ImagingGenomic Data Analysis
Multivariate Functional Regression Models for Longitudinal ImagingGenetics Analysis
Quadratically Regularized Functional Canonical Correlation Analysis for GeneGene Interaction Detection in ImagingGenetic Studies
Causal Analysis of ImagingGenomic Data
Sparse SEMs for Joint Causal Analysis of Structural Imaging and Genomic Data
Sparse Functional Structural Equation Models for phenotype and genotype networks.
Conditional Gaussian Graphical Models (CGGMs) for Structural Imaging and Genomic Data Analysis.
Time Series SEMs for Integrated Causal Analysis of fMRI and Genomic Data Models
Reduced Form Equations
Single Equation and Generalized Least Square Estimator
Sparse SEMs and Alternating Direction Method of Multipliers
Causal machine learning
Figure Legend
Software Package
Appendix 6A Factor Graphs and Mean Field Methods for Prediction of Marginal Distribution
Exercises
7. From Association Analysis to Integrated Causal Inference
Genomewide Causal Studies
Mathematical Formulation of Causal Analysis
Basic Causal Assumptions
Linear Additive SEMs with nonGaussian Noise
Information Geometry Approach
Causal Inference on Discrete Data
Multivariate Causal Inference and Causal Networks
Markov Condition, Markov Equivalence, Faithfulness and Minimality
Multilevel Causal Networks for Integrative Omics and Imaging Data Analysis
Causal Inference with Confounders
Causal Sufficiency
Instrumental Variables
Figure Legend
Software Package
Appendix 7A Approximation of loglikelihood Ratio for the LiNGAM
Appendix 7B Orthogonality Conditions and Covariance
Appendix 7C Equivalent Formulations Orthogonality Conditions
Appendix 7D ML Distance in Backward Direction
Appendix 7E Multiplicativity of Traces
Appendix 7F Anisotropy and KL Distance
Appendix 7G Trace Method for Noise Linear Model
Appendix 7H Characterization of Association
Appendix 7I Algorithm for Sparse Trace Method
Exercises
Author(s)
Biography
Momiao Xiong is a professor of Biostatistics at the University of Texas Health Science Center in Houston where he has worked since 1997. He received his PhD in 1993 from the University of Georgia.
Reviews
"I would like to recommend a new option in the library market, Big Data in Omics and Imaging: Integrated Analysis and Causal Inference, written by Momiao Xiong, a Professor of Biostatistics at the University of Texas Health Science Center in Houston. It is an extensive and comprehensive textbook on big data inbiomedical sciences. Indeed, its contents is very valuable, because it concerns the analysis of largescale datasets, which now regularly occur in computational biology and medicine, in particular in ‘omics’ problems... The book introduces in detail the currently developed statistical methods and software for big genomic and epigenomic, wearable biosensors, computing, and image data analysis. It covers important topics in this area, such as: genotypephenotype network analysis, causal analysis and network biology, wearable computing and genetic analysis of functionvalued traits, RNAseq data analysis, methylation data analysis, imaging, and genomics... It was really interesting and fascinating to go through the pages of the book. It would hold a very valuable position on the home shelfbook or university library; I warmly recommend the book."
 Malgorzata CwiklinskaJurkowska, ISCB, December 2019"In his book, Professor Xiong introduces, discusses, and implements a rich variety of statistical tools that can be used to study largescale features obtained from the human brain and genome, map neural and genetic signatures to behavioral and disease outcomes, and make causal enquiries into their relationships. The scope of the book is comprehensive, the concepts deep, and technicalities oftentimes mathematically heavy...the book discusses statistical concepts and devices that readers may find useful in studying general problems in human neuroscience and human genetics."
 Oliver Y. Chén, Journal of the American Statistical Association, March 2020
Support Material
Ancillaries

Student Resources
Watch Video