Big Data in Omics and Imaging, Two Volume Set  book cover
1st Edition

Big Data in Omics and Imaging, Two Volume Set

Edited By

Momiao Xiong



ISBN 9780367002183
Published June 19, 2018 by Chapman and Hall/CRC
1404 Pages

FREE Standard Shipping
USD $199.95

Prices & shipping based on shipping country


Preview

Book Description

FEATURES

Bridges the gap between the traditional statistical methods and computational tools for small genetic and epigenetic data analysis and the modern advanced statistical methods for big data

Provides tools for high dimensional data reduction

Discusses searching algorithms for model and variable selection including randomization algorithms, Proximal methods and matrix subset selection

Provides real-world examples and case studies

Will have an accompanying website with R code

Provides a natural extension and companion volume to Big Data in Omic and Imaging: Association Analysis, but can be read independently.

Introduce causal inference theory to genomic, epigenomic and imaging data analysis

Develop novel statistics for genome-wide causation studies and epigenome-wide causation studies.

Bridge the gap between the traditional association analysis and modern causation analysis

Use combinatorial optimization methods and various causal models as a general framework for inferring multilevel omic and image causal networks

Present statistical methods and computational algorithms for searching causal paths from genetic variant to disease

Develop causal machine learning methods integrating causal inference and machine learning

Develop statistics for testing significant difference in directed edge, path, and graphs, and for assessing causal relationships between two networks

The book is designed for graduate students and researchers in genomics, bioinformatics, and data science. It represents the paradigm shift of genetic studies of complex diseases– from shallow to deep genomic analysis, from low-dimensional to high dimensional, multivariate to functional data analysis with next-generation sequencing (NGS) data, and from homogeneous populations to heterogeneous population and pedigree data analysis. Topics covered are: advanced matrix theory, convex optimization algorithms, generalized low rank models, functional data analysis techniques, deep learning principle and machine learning methods for modern association, interaction, pathway and network analysis of rare and common variants, biomarker identification, disease risk and drug response prediction.

Table of Contents

Mathematical Foundation

Sparsity-Inducing Norms, Dual Norms and Fenchel Conjugate

Subdifferential

Definition of Subgradient

Subgradients of differentiable functions

Calculus of subgradients

Proximal Methods

Introduction

Basics of Proximate Methods

Properties of the Proximal Operator

Proximal Algorithms

Computing the Proximal Operator

Matrix Calculus

Derivative of a Function with Respect to a Vector

Derivative of a Function with Respect to a Matrix

Derivative of a Matrix with Respect to a Scalar

Derivative of a Matrix with Respect to a Matrix or a Vector

Derivative of a Vector Function of a Vector

Chain Rules

Widely Used Formulae

Functional Principal Component Analysis (FPCA)

Principal Component Analysis (PCA)

Basic Mathematical Tools for Functional Principal Component Analysis

Unsmoothed Functional Principal Component Analysis

Smoothed Principal Component Analysis

Computations for the Principal Component Function and the Principal Component Score

Canonical Correlation Analysis

Linkage Disequilibrium

Concepts of Linkage Disequilibrium

Measures of Two-locus Linkage Disequilibrium

Linkage Disequilibrium Coefficient D

Normalized Measure of Linkage Disequilibrium

Correlation Coefficient r

Composite Measure of Linkage Disequilibrium

The Relationship Between the Measure of LD and Physical Distance

Haplotype Reconstruction

Clark’s Algorithm

EM algorithm

Bayesian and Coalescence-based Methods

Multi-locus Measures of Linkage Disequilibrium

Mutual Information Measure of LD

Multi-Information and Multi-locus Measure of LD

Joint Mutual Information and a Measure of LD between a Marker and a Haplotype Block or Between Two Haplotype Blocks

Interaction Information

Conditional Interaction Information

Normalized Multi-Information

Distribution of Estimated Mutual Information, Multi-information and Interaction Information

Canonical Correlation Analysis Measure for LD between Two Genomic Regions

Association Measure between Two Genomic Regions Based on CCA

Relationship between Canonical Correlation and Joint Information

Software Package

Association Studies for Qualitative Traits

Population-based Association Analysis for Common Variants

Introduction

The Hardy-Weinberg Equilibrium

Genetic Models

Odds Ratio

Single Marker Association Analysis

Multi-marker Association Analysis

Population-based Multivariate Association Analysis for Next-generation Sequencing

Multivariate Group Tests

Score Tests and Logistic Regression

Application of Score Tests for Association of Rare Variants

Variance-component Score Statistics and Logistic Mixed Effects Models

Population-based Functional Association Analysis for Next-generation Sequencing

Introduction

Functional Principal Component Analysis for Association Test

Smoothed Functional Principal Component Analysis for Association TestSoftware Package

Association Studies for Quantitative Traits

Fixed Effect Model for a Single Trait

Introduction

Genetic Effects

Linear Regression for a Quantitative Trait

Multiple Linear Regression for a Quantitative Trait

Gene-based Quantitative Trait Analysis

Functional Linear Model for a Quantitative Trait

Canonical Correlation Analysis for Gene-based Quantitative Trait Analysis

Kernel Approach to Gene-based Quantitative Trait Analysis

Kernel and RKHS

Covariance Operator and Dependence Measure

Simulations and Real Data Analysis

Power Evaluation

Application to Real Data Examples

Software Package

Multiple Phenotype Association Studies

Pleiotropic Additive and Dominance Effects

Multivariate Marginal Regression

Models

Estimation of Genetic Effects

Test Statistics

Linear Models for Multiple Phenotypes and Multiple Markers

Multivariate Multiple Linear Regression Models

Multivariate Functional Linear Models for Gene-based Genetic Analysis of Multiple Phenotypes

Canonical Correlation Analysis for Gene-based Genetic Pleiotropic Analysis

Multivariate Canonical Correlation Analysis (CCA)

Kernel CCA

Functional CCA

Quadratically Regularized Functional CCA

Dependence Measure and Association Tests of Multiple Traits

Principal Component for Phenotype Dimension Reduction

Principal Component Analysis

Kernel Principal Component Analysis

Quadratically Regularized PCA or Kernel PCA

Other Statistics for Pleiotropic Genetics Analysis

Sum of Squared Score Test

Unified Score-based Association Test (USAT)

Combining Marginal Tests

FPCA-based Kernel Measure Test of Independence

Connection between Statistics

Simulations and Real Data Analysis

Type Error Rate and Power Evaluation

Application to Real Data Example

Software Package

Family-based Association Analysis

Genetic Similarity and Kinship Coefficients

Kinship Coefficients

Identity Coefficients

Relation between identity coefficients and kinship coefficient

Estimation of Genetic Relations from the Data

Genetic Covariance between Relatives

Assumptions and Genetic Models

Analysis for Genetic Covariance between Relatives

Mixed Linear Model for a Single Trait

Genetic Random Effect

Mixed Linear Model for Quantitative Trait Association Analysis

Estimating Variance Components

Hypothesis Test in Mixed Linear Models

Mixed Linear Models for Quantitative Trait Analysis with Sequencing Data

Mixed Functional Linear Models for Sequence-based Quantitative Trait Analysis

Mixed Functional Linear Models (Type )

Mixed Functional Linear Models (Type : Functional Variance Component Models)

Multivariate Mixed Linear Model for Multiple Traits

Multivariate Mixed Linear Model

Maximum Likelihood Estimate of Variance Components

REML Estimate of Variance Components

Heritability

Heritability Estimation for a Single Trait

Heritability Estimation for Multiple Traits

Family-based Association Analysis for Qualitative Trait

The Generalized T Test with Families and Additional Population Structures

Collapsing Method

CMC with Families

The Functional Principal Component Analysis and Smooth Functional Principal Component Analysis with Families

Software Package

Interaction Analysis

Measures of Gene-gene and Gene-environment Interaction for Qualitative Trait

Binary Measure of Gene-gene and Gene-environment Interaction

Disequilibrium Measure of Gene-gene and Gene-environment Interaction

Information Measure of Gene-gene and Gene-environment Interaction

Measure of Interaction between Gene and Continuous Environment

Statistics for Testing Gene-gene and Gene-Environment Interaction for Qualitative Trait with Common Variants

Relative Risk and Odds-ration-based Statistics for Testing Interaction between Gene and Discrete Environment

Disequilibrium-based Statistics for Testing Gene-gene Interaction

Information-based Statistics for Testing Gene-Gene Interaction

Haplotype-Odds Ratio and Tests for Gene-Gene Interaction

Multiplicative Measure-based Statistics for Testing Interaction between Gene and Continuous Environment

Information Measure-based Statistics for Testing Interaction between Gene and Continuous Environment

Real Example

Statistics for Testing Gene-gene and Gene-Environment Interaction for Qualitative Trait with Next-generation Sequencing Data

Multiple Logistic Regression Model for Gene-Gene Interaction Analysis

Functional logistic regression model for gene-gene interaction analysis

Statistics for Testing Interaction between Two Genomic Regions

Statistics for Testing Gene-gene and Gene-Environment Interaction for Quantitative Traits

Genetic Models for Epistasis Effects of Quantitative Traits

Regression Model for Interaction Analysis with Quantitative Traits

Functional Regression Model for Interaction Analysis with a Quantitative Trait

Functional Regression Model for Interaction Analysis with Multiple Quantitative Traits

Multivariate and Functional Canonical Correlation as a Unified Framework for Testing Gen-Gene and Gene-Environment Interaction for both Qualitative and Quantitative Traits

Data Structure of CCA for Interaction Analysis

CCA and Functional CCA

Kernel CCA

Software Package

Machine Learning, Low Rank Models and Their Application to Disease Risk Prediction and Precision Medicine

Logistic Regression

Two Class Logistic Regression

Multiclass Logistic Regression

Parameter Estimation

Test Statistics

Network Penalized Two-class Logistic Regression

Network Penalized Multiclass Logistic Regression

Fisher’s Linear Discriminant Analysis

Fisher’s Linear Discriminant Analysis for Two Classes

Multi-class Fisher’s Linear Discriminant Analysis

Connections between Linear Discriminant Analysis, Optimal Scoring and Canonical Correlation Analysis (CCA)

Support Vector Machine

Introduction

Linear Support Vector Machines

Nonlinear SVM

Penalized SVMs

Low Rank Approximation

Quadratically Regularized PCA

Generalized Regularization

Generalized Canonical Correlation Analysis (CCA)

Quadratically Regularized Canonical Correlation Analysis

Sparse Canonical Correlation Analysis

Sparse Canonical Correlation Analysis via a Penalized Matrix Decomposition

Inverse Regression (IR) and Sufficient Dimension Reduction

Sufficient Dimension Reduction (SDR) and Sliced Inverse Regression (SIR)

Sparse SDRSoftware Package

Genotype-Phenotype Network Analysis

Undirected Graphs for Genotype Network

Gaussian Graphic Model

Alternating Direction Method of Multipliers for Estimation of Gaussian Graphical Model

Coordinate Descent Algorithm and Graphical Lasso

Multiple Graphical Models

Directed Graphs and Structural Equation Models for Networks

Directed Acyclic Graphs

Linear Structural Equation Models

Estimation Methods

Sparse Linear Structural Equations

Penalized Maximum Likelihood Estimation

Penalized Two Stage Least Square Estimation

Penalized Three Stage Least Square Estimation

Functional Structural Equation Models for Genotype-Phenotype Networks

Functional Structural Equation Models

Group Lasso and ADMM for Parameter Estimation in the Functional Structural Equation Models

Causal Calculus

Effect Decomposition and Estimation

Graphical Tools for Causal Inference in Linear SEMs

Identification and Single-door Criterion

Instrument Variables

Total Effects and Backdoor Criterion

Counterfactuals and Linear SEMs

Simulations and Real Data Analysis

Simulations for Model Evaluation

Application to Real Data Examples 

Causal analysis and network biology

Bayesian Networks as a General Framework for Causal Inference

Parameter Estimation and Bayesian Dirichlet Equivalent Uniform Score for Discrete Bayesian Networks

Structural Equations and Score Metrics for Continuous Causal Networks

Multivariate SEMs for Generating Node Core Metrics

Mixed SEMs for Pedigree-based Causal Inference

Bayesian Networks with Discrete and Continuous Variable

Two-class Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks

Multiple Network Penalized Functional Logistic Regression Models for NGS Data

Multi-class Network Penalized Logistic Regression for Learning Hybrid Bayesian Networks

Other Statistical Models for Quantifying Node Score Function

Integer Programming for Causal Structure Leaning

Introduction

Integer Linear Programming Formulation of DAG Learning

Cutting Plane for Integer Linear Programming

Branch and Cut Algorithm for Integer Linear Programming

Sink Finding Primal Heuristic Algorithm

Simulations and Real Data Analysis

Simulations

Real Data Analysis

Smoothing Spline Regression for a Single Variable

Smoothing Spline Regression for Multiple Variables

Wearable Computing and Genetic Analysis of Function-valued Traits

Classification of Wearable Biosensor Data

Introduction

Functional Data Analysis for Classification of Time Course Wearable Biosensor Data

Differential Equations for Extracting Features of the Dynamic Process and for Classification of Time Course Data

Deep Learning for Physiological Time Series Data Analysis

Association Studies of Function-Valued Traits

Introduction

Functional Linear Models with both Functional Response and Predictors for Association Analysis of Function-valued Traits

Test Statistics

Null Distribution of Test Statistics

Power

Real Data Analysis

Association Analysis of Multiple Function-valued Traits

Gene-gene Interaction Analysis of Function-Valued Traits

Introduction

Functional Regression Models

Estimation of Interaction Effect Function

Test Statistics

Simulations

Real Data Analysis

Networks

Multilayer Feedforward Pass

Backpropagation Pass

Convolutional Layer

RNA-seq Data Analysis

Normalization Methods on RNA-seq Data Analysis

Gene Expression

RNA Sequencing Expression Profiling

Methods for Normalization

Differential Expression Analysis for RNA-Seq Data

Distribution-based Approach to Differential Expression Analysis

Functional Expansion Approach to Differential Expression Analysis of RNA-Seq Data

Differential Analysis of Allele Specific Expressions with RNA-Seq Data

eQTL and eQTL Epistasis Analysis with RNA-Seq Data

Matrix Factorization

Quadratically Regularized Matrix Factorization and Canonical Correlation Analysis

QRFCCA for eQTL and eQTL Epistasis Analysis of RNA-Seq Data

Real Data Analysis

Gene Co-expression Network and Gene Regulatory Networks

Co-expression Network Construction with RNA-Seq Data by CCA and FCCA

Graphical Gaussian Models

Real Data Applications

Directed Graph and Gene Regulatory Networks

Hierarchical Bayesian Networks for Whole Genome Regulatory Networks

Linear Regulatory Networks

Nonlinear Regulatory Networks

Dynamic Bayesian Network and Longitudinal Expression Data Analysis

Single Cell RNA-Seq Data Analysis, Gene Expression Deconvolution and Genetic Screening

Cell Type Identification

Gene Expression Deconvolution and Cell Type-Specific Expression

Normalization

Variational Methods for expectation-maximization (EM) algorithm

Variational Methods for Bayesian Learning

Methylation Data Analysis

DNA Methylation Analysis

Epigenome-wide Association Studies (EWAS)

Single-Locus Test

Set-based Methods

Epigenome-wide Causal Studies

Introduction

Additive Functional Model for EWCS

Genome-wide DNA Methylation Quantitative Trait Locus (mQTL) Analysis

Causal Networks for Genetic-Methylation Analysis

Structural Equation Models with Scalar Endogenous Variables and Functional Exogenous Variables

Functional Structural Equation Models with Functional Endogenous Variables and Scalar Exogenous Variables (FSEMS)

Functional Structural Equation Models with both Functional Endogenous Variables an Exogenous Variables (FSEMF)

Imaging and Genomics

Introduction

Image Segmentation

Unsupervised Learning Methods for Image Segmentation

Supervised Deep Learning Methods for Image Segmentation

Two or Three dimensional Functional Principal Component Analysis for Image Data Reduction 645

Formulation

Integral Equation and Eigenfunctions

Association Analysis of Imaging-Genomic Data

Multivariate Functional Regression Models for Imaging-Genomic Data Analysis

Multivariate Functional Regression Models for Longitudinal Imaging-Genetics Analysis

Quadratically Regularized Functional Canonical Correlation Analysis for Gene-Gene Interaction Detection in Imaging-Genetic Studies

Causal Analysis of Imaging-Genomic Data

Sparse SEMs for Joint Causal Analysis of Structural Imaging and Genomic Data

Sparse Functional Structural Equation Models for phenotype and genotype networks.

Conditional Gaussian Graphical Models (CGGMs) for Structural Imaging and Genomic Data Analysis.

Time Series SEMs for Integrated Causal Analysis of fMRI and Genomic Data Models

Reduced Form Equations

Single Equation and Generalized Least Square Estimator

Sparse SEMs and Alternating Direction Method of Multipliers

Causal machine learning

From Association Analysis to Integrated Causal Inference

Genome-wide Causal Studies

Mathematical Formulation of Causal Analysis

Basic Causal Assumptions

Linear Additive SEMs with non-Gaussian Noise

Information Geometry Approach

Causal Inference on Discrete Data

Multivariate Causal Inference and Causal Networks

Markov Condition, Markov Equivalence, Faithfulness and Minimality

Multilevel Causal Networks for Integrative Omics and Imaging Data Analysis

Causal Inference with Confounders

Causal Sufficiency

Instrumental Variables

...
View More

Editor(s)

Biography

Momiao Xiong is a professor of Biostatistics at the University of Texas Health Science Center in Houston where he has worked since 1997. He received his PhD in 1993 from the University of Georgia.