1st Edition

Statistics and Machine Learning Methods for EHR Data From Data Extraction to Data Analytics

    327 Pages 73 B/W Illustrations
    by Chapman & Hall

    328 Pages 73 B/W Illustrations
    by Chapman & Hall

    327 Pages 73 B/W Illustrations
    by Chapman & Hall

    The use of Electronic Health Records (EHR)/Electronic Medical Records (EMR) data is becoming more prevalent for research. However, analysis of this type of data has many unique complications due to how they are collected, processed and types of questions that can be answered. This book covers many important topics related to using EHR/EMR data for research including data extraction, cleaning, processing, analysis, inference, and predictions based on many years of practical experience of the authors. The book carefully evaluates and compares the standard statistical models and approaches with those of machine learning and deep learning methods and reports the unbiased comparison results for these methods in predicting clinical outcomes based on the EHR data.

    Key Features:

    • Written based on hands-on experience of contributors from multidisciplinary EHR research projects, which include methods and approaches from statistics, computing, informatics, data science and clinical/epidemiological domains.
    • Documents the detailed experience on EHR data extraction, cleaning and preparation
    • Provides a broad view of statistical approaches and machine learning prediction models to deal with the challenges and limitations of EHR data.
    • Considers the complete cycle of EHR data analysis.

    The use of EHR/EMR analysis requires close collaborations between statisticians, informaticians, data scientists and clinical/epidemiological investigators. This book reflects that multidisciplinary perspective.


    About the Editors

    List of Contributors

    1. Introduction: Use of EHR Data for Scientific Discoveries—Challenges and Opportunities
    2. Hulin Wu

    3. EHR Project Management
    4. Yashar Talebi and Ashraf Yaseen

    5. EHR Databases and Data Management: Data Query and Extraction
    6. Gen Zhu, Vi K. Ly, Michael Gonzalez, Leqing Wu, Hulin Wu, and Ashraf Yaseen

    7. EHR Data Cleaning
    8. Yashar Talebi, Han Feng, Yuefan Huang, and Vahed Maroufy

    9. EHR Data Pre-Processing and Preparation
    10. Duo Yu, Xueying Wang, and Hulin Wu

    11. EHR Missing Data Issues
    12. Chenguang Zhang, Vahed Maroufy, Baojiang Chen, and Hulin Wu

    13. Causal Inference and Analysis for EHR Data
    14. Stacia DeSantis, Momiao Xiong, Jose-Miguel Yamal, Gen Zhu, Duo Yu, Xueying Wang, Chenguang Zhang, and Vi K. Ly

    15. EHR Data Exploration, Analysis and Predictions: Statistical Models and Methods
    16. Gen Zhu, Frances Brito, Stacia M DeSantis, and Vahed Maroufy

    17. Neural Network and Deep Learning Methods for EHR Data
    18. Duo Yu, Ashraf Yaseen, and Xi Luo

    19. EHR Data Analytics and Predictions: Machine Learning Methods
    20. Yuxuan Gu, Yuefan Huang, Vi Ly, Ashraf Yaseen, and Hongyu Miao

    21. Use of EHR Data for Research: Future

    Hulin Wu



    • Hulin Wu, PhD, the endowed Betty Wheless Trotter Professor and Chair, Department of Biostatistics & Data Science, School of Public Health (SPH), University of Texas Health Science Center at Houston (UTHealth). Dr. Wu also holds a joined appointment as Professor at UTHealth School of Biomedical Informatics. Dr. Wu received BS and MS training in engineering and PhD in statistics. He has many years of experience in developing novel statistical methods, mathematical models and informatics tools for biomedical data analysis and modeling. He is the Founding Director of the Center for Big Data in Health Sciences (CBD-HS) and he is directing the EHR research working group at UTHealth SPH.

    • Dr. Yamal is a tenured Associate Professor in the Department of Biostatistics & Data Science and a member of the Coordinating Center for Clinical Trials at UTHealth School of Public Health. Dr. Yamal has extensive experience in clinical trials including data coordinating centers and serving on Data Safety Monitoring Boards for clinical trials in stroke and traumatic brain injury. He has also contributed towards statistical methodology for classification problems for nested data as well as machine learning applications. 
    • Ashraf Yaseen is an Assistant Professor of Data Science at the School of Public Health, UTHealth. He has extensive experience in database design, implementation and management, machine learning, and high-performance computing. In his current research work, Dr. Yaseen is exploring big data integration and deep learning technologies in electronic health records to address clinical and public health questions.

    • Vahed Maroufy, PhD, Assistant Professor, Department of Biostatistics & Data Science, UTHealth School of Public Health. Dr. Maroufy received MSc and PhD training in statistics and has experience in applied and theoretical statistics, including geometry of statistical models, mixture models, Bayesian inference, predictive models using EHR data, and analysis of genetic data in cancer research.

    'This book should make it to the bookshelf of anyone involved in data preparation and statistical analysis for EHR research.'

    - Madan G. Kandu, Journal of Biopharmaceutcal Statistics, Vol 31, No 4

    'To conclude, this book provides a strong basis for handling real-world data from EHR and will be useful both for the beginner and for more advanced researchers.'

    - Sébastien Bailly, International Society for Clinical Biostatistics, 72, 2021