Statisticians have met the need to test hundreds or thousands of genomics hypotheses simultaneously with novel empirical Bayes methods that combine advantages of traditional Bayesian and frequentist statistics. Techniques for estimating the local false discovery rate assign probabilities of differential gene expression, genetic association, etc. without requiring subjective prior distributions. This book brings these methods to scientists while keeping the mathematics at an elementary level. Readers will learn the fundamental concepts behind local false discovery rates, preparing them to analyze their own genomics data and to critically evaluate published genomics research.
* dice games and exercises, including one using interactive software, for teaching the concepts in the classroom
* examples focusing on gene expression and on genetic association data and briefly covering metabolomics data and proteomics data
* gradual introduction to the mathematical equations needed
* how to choose between different methods of multiple hypothesis testing
* how to convert the output of genomics hypothesis testing software to estimates of local false discovery rates
* guidance through the minefield of current criticisms of p values
* material on non-Bayesian prior p values and posterior p values not previously published
Table of Contents
1. Basic probability and statistics
Hypothesis tests and p values
2. Introduction to likelihood
Likelihood function defined
Odds and probability: What’s the difference?
Bayesian uses of likelihood
3. False discovery rates
Local false discovery rate
Global and local false discovery rates
Computing the LFDR estimate
Exercises (L4; A-B)
4. Simulating and analyzing gene expression data
Simulating gene expression with dice
Effects and Estimates (E&E)
Under the hood: normal distributions
Exercises (C-E; G1-G4)
5. Variations in dimension and data
Subclasses and superclasses
Medium number of features
6. Correcting bias in estimates of the false discovery rate
Why correct the bias in estimates of the false discovery rate?
A misleading estimator of the false discovery rate 66
Corrected and re-ranked estimators of the local false discovery rate
Application to gene expression data analysis
7. The L value: An estimated local false discovery rate to replace a p value
What if I only have one p value? Am I doomed?
The L value to the rescue!
The multiple-test L value
8. Maximum likelihood and applications
Non-Bayesian uses of likelihood
Empirical Bayes uses of likelihood
Appendix A. Generalized Bonferroni correction derived from conditional compatibility
A non-Bayesian approach to testing single and multiple hypotheses
Appendix B. How to choose a method of hypothesis testing
Guidelines for scientists performing statistical hypothesis tests
David R. Bickel is an Associate Professor in the Department of Biochemistry, Microbiology and Immunology of the University of Ottawa and a Core Member of the Ottawa Institute of Systems Biology. Since 2011, he has been teaching classes focused on the statistical analysis of genomics data. While working as a biostatistician in academia and industry, he has published new statistical methods for analyzing genomics data in leading statistics and bioinformatics journals. He is also investigating the foundations of statistical inference. For recent activity, see davidbickel.com or follow him at @DavidRBickel (Twitter).