Big Data in Omics and Imaging: Association Analysis, 1st Edition (Hardback) book cover

Big Data in Omics and Imaging

Association Analysis, 1st Edition

By Momiao Xiong

Chapman and Hall/CRC

668 pages | 60 Color Illus. | 3 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781498725781
pub: 2017-12-13
SAVE ~$25.00
$125.00
$100.00
x
eBook (VitalSource) : 9781315370507
pub: 2017-12-01
from $28.98


FREE Standard Shipping!

Description

Big Data in Omics and Imaging: Association Analysis addresses the recent development of association analysis and machine learning for both population and family genomic data in sequencing era. It is unique in that it presents both hypothesis testing and a data mining approach to holistically dissecting the genetic structure of complex traits and to designing efficient strategies for precision medicine. The general frameworks for association analysis and machine learning, developed in the text, can be applied to genomic, epigenomic and imaging data.

FEATURES

Bridges the gap between the traditional statistical methods and computational tools for small genetic and epigenetic data analysis and the modern advanced statistical methods for big data

Provides tools for high dimensional data reduction

Discusses searching algorithms for model and variable selection including randomization algorithms, Proximal methods and matrix subset selection

Provides real-world examples and case studies

Will have an accompanying website with R code

The book is designed for graduate students and researchers in genomics, bioinformatics, and data science. It represents the paradigm shift of genetic studies of complex diseases– from shallow to deep genomic analysis, from low-dimensional to high dimensional, multivariate to functional data analysis with next-generation sequencing (NGS) data, and from homogeneous populations to heterogeneous population and pedigree data analysis. Topics covered are: advanced matrix theory, convex optimization algorithms, generalized low rank models, functional data analysis techniques, deep learning principle and machine learning methods for modern association, interaction, pathway and network analysis of rare and common variants, biomarker identification, disease risk and drug response prediction.

 

Table of Contents

Mathematical Foundation

Sparsity-Inducing Norms, Dual Norms and Fenchel Conjugate

Subdifferential

Definition of Subgradient

Subgradients of differentiable functions

Calculus of subgradients

Proximal Methods

Introduction

Basics of Proximate Methods

Properties of the Proximal Operator

Proximal Algorithms

Computing the Proximal Operator

Matrix Calculus

Derivative of a Function with Respect to a Vector

Derivative of a Function with Respect to a Matrix

Derivative of a Matrix with Respect to a Scalar

Derivative of a Matrix with Respect to a Matrix or a Vector

Derivative of a Vector Function of a Vector

Chain Rules

Widely Used Formulae

Functional Principal Component Analysis (FPCA)

Principal Component Analysis (PCA)

Basic Mathematical Tools for Functional Principal Component Analysis

Unsmoothed Functional Principal Component Analysis

Smoothed Principal Component Analysis

Computations for the Principal Component Function and the Principal Component Score

Canonical Correlation Analysis

Exercises

Appendix

Linkage Disequilibrium

Concepts of Linkage Disequilibrium

Measures of Two-locus Linkage Disequilibrium

Linkage Disequilibrium Coefficient D

Normalized Measure of Linkage Disequilibrium

Correlation Coefficient r

Composite Measure of Linkage Disequilibrium

The Relationship Between the Measure of LD and Physical Distance

Haplotype Reconstruction

Clark’s Algorithm

EM algorithm

Bayesian and Coalescence-based Methods

Multi-locus Measures of Linkage Disequilibrium

Mutual Information Measure of LD

Multi-Information and Multi-locus Measure of LD

Joint Mutual Information and a Measure of LD between a Marker and a Haplotype Block or Between Two Haplotype Blocks

Interaction Information

Conditional Interaction Information

Normalized Multi-Information

Distribution of Estimated Mutual Information, Multi-information and Interaction Information

Canonical Correlation Analysis Measure for LD between Two Genomic Regions

Association Measure between Two Genomic Regions Based on CCA

Relationship between Canonical Correlation and Joint Information

Software Package

Bibliographical Notes

Appendices

Exercises

Association Studies for Qualitative Traits

Population-based Association Analysis for Common Variants

Introduction

The Hardy-Weinberg Equilibrium

Genetic Models

Odds Ratio

Single Marker Association Analysis

Multi-marker Association Analysis

Population-based Multivariate Association Analysis for Next-generation Sequencing

Multivariate Group Tests

Score Tests and Logistic Regression

Application of Score Tests for Association of Rare Variants

Variance-component Score Statistics and Logistic Mixed Effects Models

Population-based Functional Association Analysis for Next-generation Sequencing

Introduction

Functional Principal Component Analysis for Association Test

Smoothed Functional Principal Component Analysis for Association Test

Software Package

Appendices

Exercises

Association Studies for Quantitative Traits

Fixed Effect Model for a Single Trait

Introduction

Genetic Effects

Linear Regression for a Quantitative Trait

Multiple Linear Regression for a Quantitative Trait

Gene-based Quantitative Trait Analysis

Functional Linear Model for a Quantitative Trait

Canonical Correlation Analysis for Gene-based Quantitative Trait Analysis

Kernel Approach to Gene-based Quantitative Trait Analysis

Kernel and RKHS

Covariance Operator and Dependence Measure

Simulations and Real Data Analysis

Power Evaluation

Application to Real Data Examples

Software Package

Appendices

Exercises

Multiple Phenotype Association Studies

Pleiotropic Additive and Dominance Effects

Multivariate Marginal Regression

Models

Estimation of Genetic Effects

Test Statistics

Linear Models for Multiple Phenotypes and Multiple Markers

Multivariate Multiple Linear Regression Models

Multivariate Functional Linear Models for Gene-based Genetic Analysis of Multiple Phenotypes

Canonical Correlation Analysis for Gene-based Genetic Pleiotropic Analysis

Multivariate Canonical Correlation Analysis (CCA)

Kernel CCA

Functional CCA

Quadratically Regularized Functional CCA

Dependence Measure and Association Tests of Multiple Traits

Principal Component for Phenotype Dimension Reduction

Principal Component Analysis

Kernel Principal Component Analysis

Quadratically Regularized PCA or Kernel PCA

Other Statistics for Pleiotropic Genetics Analysis

Sum of Squared Score Test

Unified Score-based Association Test (USAT)

Combining Marginal Tests

FPCA-based Kernel Measure Test of Independence

Connection between Statistics

Simulations and Real Data Analysis

Type Error Rate and Power Evaluation

Application to Real Data Example

Software Package

Appendices

Exercises

Family-based Association Analysis

Genetic Similarity and Kinship Coefficients

Kinship Coefficients

Identity Coefficients

Relation between identity coefficients and kinship coefficient

Estimation of Genetic Relations from the Data

Genetic Covariance between Relatives

Assumptions and Genetic Models

Analysis for Genetic Covariance between Relatives

Mixed Linear Model for a Single Trait

Genetic Random Effect

Mixed Linear Model for Quantitative Trait Association Analysis

Estimating Variance Components

Hypothesis Test in Mixed Linear Models

Mixed Linear Models for Quantitative Trait Analysis with Sequencing Data

Mixed Functional Linear Models for Sequence-based Quantitative Trait Analysis

Mixed Functional Linear Models (Type )

Mixed Functional Linear Models (Type : Functional Variance Component Models)

Multivariate Mixed Linear Model for Multiple Traits

Multivariate Mixed Linear Model

Maximum Likelihood Estimate of Variance Components

REML Estimate of Variance Components

Heritability

Heritability Estimation for a Single Trait

Heritability Estimation for Multiple Traits

Family-based Association Analysis for Qualitative Trait

The Generalized T Test with Families and Additional Population Structures

Collapsing Method

CMC with Families

The Functional Principal Component Analysis and Smooth Functional Principal Component Analysis with Families

Software Package

Exercise

Interaction Analysis

Measures of Gene-gene and Gene-environment Interaction for Qualitative Trait

Binary Measure of Gene-gene and Gene-environment Interaction

Disequilibrium Measure of Gene-gene and Gene-environment Interaction

Information Measure of Gene-gene and Gene-environment Interaction

Measure of Interaction between Gene and Continuous Environment

Statistics for Testing Gene-gene and Gene-Environment Interaction for Qualitative Trait with Common Variants

Relative Risk and Odds-ration-based Statistics for Testing Interaction between Gene and Discrete Environment

Disequilibrium-based Statistics for Testing Gene-gene Interaction

Information-based Statistics for Testing Gene-Gene Interaction

Haplotype-Odds Ratio and Tests for Gene-Gene Interaction

Multiplicative Measure-based Statistics for Testing Interaction between Gene and Continuous Environment

Information Measure-based Statistics for Testing Interaction between Gene and Continuous Environment

Real Example

Statistics for Testing Gene-gene and Gene-Environment Interaction for Qualitative Trait with Next-generation Sequencing Data

Multiple Logistic Regression Model for Gene-Gene Interaction Analysis

Functional logistic regression model for gene-gene interaction analysis

Statistics for Testing Interaction between Two Genomic Regions

Statistics for Testing Gene-gene and Gene-Environment Interaction for Quantitative Traits

Genetic Models for Epistasis Effects of Quantitative Traits

Regression Model for Interaction Analysis with Quantitative Traits

Functional Regression Model for Interaction Analysis with a Quantitative Trait

Functional Regression Model for Interaction Analysis with Multiple Quantitative Traits

Multivariate and Functional Canonical Correlation as a Unified Framework for Testing Gen-Gene and Gene-Environment Interaction for both Qualitative and Quantitative Traits

Data Structure of CCA for Interaction Analysis

CCA and Functional CCA

Kernel CCA

Software Package

Appendices

Exercise

Machine Learning, Low Rank Models and Their Application to Disease Risk Prediction and Precision Medicine

Logistic Regression

Two Class Logistic Regression

Multiclass Logistic Regression

Parameter Estimation

Test Statistics

Network Penalized Two-class Logistic Regression

Network Penalized Multiclass Logistic Regression

Fisher’s Linear Discriminant Analysis

Fisher’s Linear Discriminant Analysis for Two Classes

Multi-class Fisher’s Linear Discriminant Analysis

Connections between Linear Discriminant Analysis, Optimal Scoring and Canonical Correlation Analysis (CCA)

Support Vector Machine

Introduction

Linear Support Vector Machines

Nonlinear SVM

Penalized SVMs

Low Rank Approximation

Quadratically Regularized PCA

Generalized Regularization

Generalized Canonical Correlation Analysis (CCA)

Quadratically Regularized Canonical Correlation Analysis

Sparse Canonical Correlation Analysis

Sparse Canonical Correlation Analysis via a Penalized Matrix Decomposition

Inverse Regression (IR) and Sufficient Dimension Reduction

Sufficient Dimension Reduction (SDR) and Sliced Inverse Regression (SIR)

Sparse SDR

Software Package

Appendices

Exercises

 

 

 

 

 

About the Author

Momiao Xiong, is a professor in the Department of Biostatistics, University of Texas School of Public Health, and a regular member in the Genetics & Epigenetics (G&E) Graduate Program at The University of Texas MD Anderson Cancer Center, UTHealth Graduate School of Biomedical Science.

About the Series

Chapman & Hall/CRC Mathematical and Computational Biology

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
MAT029000
MATHEMATICS / Probability & Statistics / General
SCI008000
SCIENCE / Life Sciences / Biology / General
SCI010000
SCIENCE / Biotechnology