Free Shipping (6-12 Business Days)
shipping options
Free Shipping (6-12 Business Days)
shipping options
Big Data in Omics and Imaging: Association Analysis addresses the recent development of association analysis and machine learning for both population and family genomic data in sequencing era. It is unique in that it presents both hypothesis testing and a data mining approach to holistically dissecting the genetic structure of complex traits and to designing efficient strategies for precision medicine. The general frameworks for association analysis and machine learning, developed in the text, can be applied to genomic, epigenomic and imaging data.
FEATURES
Bridges the gap between the traditional statistical methods and computational tools for small genetic and epigenetic data analysis and the modern advanced statistical methods for big data
Provides tools for high dimensional data reduction
Discusses searching algorithms for model and variable selection including randomization algorithms, Proximal methods and matrix subset selection
Provides real-world examples and case studies
Will have an accompanying website with R code
The book is designed for graduate students and researchers in genomics, bioinformatics, and data science. It represents the paradigm shift of genetic studies of complex diseases– from shallow to deep genomic analysis, from low-dimensional to high dimensional, multivariate to functional data analysis with next-generation sequencing (NGS) data, and from homogeneous populations to heterogeneous population and pedigree data analysis. Topics covered are: advanced matrix theory, convex optimization algorithms, generalized low rank models, functional data analysis techniques, deep learning principle and machine learning methods for modern association, interaction, pathway and network analysis of rare and common variants, biomarker identification, disease risk and drug response prediction.
Mathematical Foundation
Sparsity-Inducing Norms, Dual Norms and Fenchel Conjugate
Subdifferential
Definition of Subgradient
Subgradients of differentiable functions
Calculus of subgradients
Proximal Methods
Introduction
Basics of Proximate Methods
Properties of the Proximal Operator
Proximal Algorithms
Computing the Proximal Operator
Matrix Calculus
Derivative of a Function with Respect to a Vector
Derivative of a Function with Respect to a Matrix
Derivative of a Matrix with Respect to a Scalar
Derivative of a Matrix with Respect to a Matrix or a Vector
Derivative of a Vector Function of a Vector
Chain Rules
Widely Used Formulae
Functional Principal Component Analysis (FPCA)
Principal Component Analysis (PCA)
Basic Mathematical Tools for Functional Principal Component Analysis
Unsmoothed Functional Principal Component Analysis
Smoothed Principal Component Analysis
Computations for the Principal Component Function and the Principal Component Score
Canonical Correlation Analysis
Exercises
Appendix
Linkage Disequilibrium
Concepts of Linkage Disequilibrium
Measures of Two-locus Linkage Disequilibrium
Linkage Disequilibrium Coefficient D
Normalized Measure of Linkage Disequilibrium
Correlation Coefficient r
Composite Measure of Linkage Disequilibrium
The Relationship Between the Measure of LD and Physical Distance
Haplotype Reconstruction
Clark’s Algorithm
EM algorithm
Bayesian and Coalescence-based Methods
Multi-locus Measures of Linkage Disequilibrium
Mutual Information Measure of LD
Multi-Information and Multi-locus Measure of LD
Joint Mutual Information and a Measure of LD between a Marker and a Haplotype Block or Between Two Haplotype Blocks
Interaction Information
Conditional Interaction Information
Normalized Multi-Information
Distribution of Estimated Mutual Information, Multi-information and Interaction Information
Canonical Correlation Analysis Measure for LD between Two Genomic Regions
Association Measure between Two Genomic Regions Based on CCA
Relationship between Canonical Correlation and Joint Information
Software Package
Bibliographical Notes
Appendices
Exercises
Association Studies for Qualitative Traits
Population-based Association Analysis for Common Variants
Introduction
The Hardy-Weinberg Equilibrium
Genetic Models
Odds Ratio
Single Marker Association Analysis
Multi-marker Association Analysis
Population-based Multivariate Association Analysis for Next-generation Sequencing
Multivariate Group Tests
Score Tests and Logistic Regression
Application of Score Tests for Association of Rare Variants
Variance-component Score Statistics and Logistic Mixed Effects Models
Population-based Functional Association Analysis for Next-generation Sequencing
Introduction
Functional Principal Component Analysis for Association Test
Smoothed Functional Principal Component Analysis for Association Test
Software Package
Appendices
Exercises
Association Studies for Quantitative Traits
Fixed Effect Model for a Single Trait
Introduction
Genetic Effects
Linear Regression for a Quantitative Trait
Multiple Linear Regression for a Quantitative Trait
Gene-based Quantitative Trait Analysis
Functional Linear Model for a Quantitative Trait
Canonical Correlation Analysis for Gene-based Quantitative Trait Analysis
Kernel Approach to Gene-based Quantitative Trait Analysis
Kernel and RKHS
Covariance Operator and Dependence Measure
Simulations and Real Data Analysis
Power Evaluation
Application to Real Data Examples
Software Package
Appendices
Exercises
Multiple Phenotype Association Studies
Pleiotropic Additive and Dominance Effects
Multivariate Marginal Regression
Models
Estimation of Genetic Effects
Test Statistics
Linear Models for Multiple Phenotypes and Multiple Markers
Multivariate Multiple Linear Regression Models
Multivariate Functional Linear Models for Gene-based Genetic Analysis of Multiple Phenotypes
Canonical Correlation Analysis for Gene-based Genetic Pleiotropic Analysis
Multivariate Canonical Correlation Analysis (CCA)
Kernel CCA
Functional CCA
Quadratically Regularized Functional CCA
Dependence Measure and Association Tests of Multiple Traits
Principal Component for Phenotype Dimension Reduction
Principal Component Analysis
Kernel Principal Component Analysis
Quadratically Regularized PCA or Kernel PCA
Other Statistics for Pleiotropic Genetics Analysis
Sum of Squared Score Test
Unified Score-based Association Test (USAT)
Combining Marginal Tests
FPCA-based Kernel Measure Test of Independence
Connection between Statistics
Simulations and Real Data Analysis
Type Error Rate and Power Evaluation
Application to Real Data Example
Software Package
Appendices
Exercises
Family-based Association Analysis
Genetic Similarity and Kinship Coefficients
Kinship Coefficients
Identity Coefficients
Relation between identity coefficients and kinship coefficient
Estimation of Genetic Relations from the Data
Genetic Covariance between Relatives
Assumptions and Genetic Models
Analysis for Genetic Covariance between Relatives
Mixed Linear Model for a Single Trait
Genetic Random Effect
Mixed Linear Model for Quantitative Trait Association Analysis
Estimating Variance Components
Hypothesis Test in Mixed Linear Models
Mixed Linear Models for Quantitative Trait Analysis with Sequencing Data
Mixed Functional Linear Models for Sequence-based Quantitative Trait Analysis
Mixed Functional Linear Models (Type )
Mixed Functional Linear Models (Type : Functional Variance Component Models)
Multivariate Mixed Linear Model for Multiple Traits
Multivariate Mixed Linear Model
Maximum Likelihood Estimate of Variance Components
REML Estimate of Variance Components
Heritability
Heritability Estimation for a Single Trait
Heritability Estimation for Multiple Traits
Family-based Association Analysis for Qualitative Trait
The Generalized T Test with Families and Additional Population Structures
Collapsing Method
CMC with Families
The Functional Principal Component Analysis and Smooth Functional Principal Component Analysis with Families
Software Package
Exercise
Interaction Analysis
Measures of Gene-gene and Gene-environment Interaction for Qualitative Trait
Binary Measure of Gene-gene and Gene-environment Interaction
Disequilibrium Measure of Gene-gene and Gene-environment Interaction
Information Measure of Gene-gene and Gene-environment Interaction
Measure of Interaction between Gene and Continuous Environment
Statistics for Testing Gene-gene and Gene-Environment Interaction for Qualitative Trait with Common Variants
Relative Risk and Odds-ration-based Statistics for Testing Interaction between Gene and Discrete Environment
Disequilibrium-based Statistics for Testing Gene-gene Interaction
Information-based Statistics for Testing Gene-Gene Interaction
Haplotype-Odds Ratio and Tests for Gene-Gene Interaction
Multiplicative Measure-based Statistics for Testing Interaction between Gene and Continuous Environment
Information Measure-based Statistics for Testing Interaction between Gene and Continuous Environment
Real Example
Statistics for Testing Gene-gene and Gene-Environment Interaction for Qualitative Trait with Next-generation Sequencing Data
Multiple Logistic Regression Model for Gene-Gene Interaction Analysis
Functional logistic regression model for gene-gene interaction analysis
Statistics for Testing Interaction between Two Genomic Regions
Statistics for Testing Gene-gene and Gene-Environment Interaction for Quantitative Traits
Genetic Models for Epistasis Effects of Quantitative Traits
Regression Model for Interaction Analysis with Quantitative Traits
Functional Regression Model for Interaction Analysis with a Quantitative Trait
Functional Regression Model for Interaction Analysis with Multiple Quantitative Traits
Multivariate and Functional Canonical Correlation as a Unified Framework for Testing Gen-Gene and Gene-Environment Interaction for both Qualitative and Quantitative Traits
Data Structure of CCA for Interaction Analysis
CCA and Functional CCA
Kernel CCA
Software Package
Appendices
Exercise
Machine Learning, Low Rank Models and Their Application to Disease Risk Prediction and Precision Medicine
Logistic Regression
Two Class Logistic Regression
Multiclass Logistic Regression
Parameter Estimation
Test Statistics
Network Penalized Two-class Logistic Regression
Network Penalized Multiclass Logistic Regression
Fisher’s Linear Discriminant Analysis
Fisher’s Linear Discriminant Analysis for Two Classes
Multi-class Fisher’s Linear Discriminant Analysis
Connections between Linear Discriminant Analysis, Optimal Scoring and Canonical Correlation Analysis (CCA)
Support Vector Machine
Introduction
Linear Support Vector Machines
Nonlinear SVM
Penalized SVMs
Low Rank Approximation
Quadratically Regularized PCA
Generalized Regularization
Generalized Canonical Correlation Analysis (CCA)
Quadratically Regularized Canonical Correlation Analysis
Sparse Canonical Correlation Analysis
Sparse Canonical Correlation Analysis via a Penalized Matrix Decomposition
Inverse Regression (IR) and Sufficient Dimension Reduction
Sufficient Dimension Reduction (SDR) and Sliced Inverse Regression (SIR)
Sparse SDR
Software Package
Appendices
Exercises
Biography
Momiao Xiong, is a professor in the Department of Biostatistics, University of Texas School of Public Health, and a regular member in the Genetics & Epigenetics (G&E) Graduate Program at The University of Texas MD Anderson Cancer Center, UTHealth Graduate School of Biomedical Science.
"This is a fantastic book intensively focusing on the mathematical underpinnings of modern genome-wide association studies (GWAS). It serves well for senior graduate students in applied mathematics, computer science, and statistics who are interested in building a solid mathematical understanding of GWAS. Backgrounds of advanced mathematics and genetics are expected. It can also be used as a handbook for professionals to quickly check mathematical contexts of GWAS approaches and tools. This book is especially helpful for the latest generation of statistical geneticists who are pursuing academic career paths."
~Journal of the American Statistical Association, Jing Su (Wake Forest School of Medicine)
We offer free standard shipping on every order across the globe.
- Free Shipping (6-12 Business Days)