1st Edition

Handbook of Sharing Confidential Data Differential Privacy, Secure Multiparty Computation, and Synthetic Data

    376 Pages 40 B/W Illustrations
    by Chapman & Hall

    Statistical agencies, research organizations, companies, and other data stewards that seek to share data with the public face a challenging dilemma. They need to protect the privacy and confidentiality of data subjects’ and their attributes while providing data products that are useful for their intended purposes. In an age when information on data subjects is available from a wide range of data sources, as are the computational resources to obtain that information, this challenge is increasingly difficult. The Handbook of Sharing Confidential Data helps data stewards understand how tools from the data confidentiality literature—specifically, synthetic data, formal privacy, and secure computation—can be used to manage trade-offs in disclosure risk and data usefulness.

    Key features:

    • Provides overviews of the potential and the limitations of synthetic data, differential privacy, and secure computation.

    • Offers an accessible introduction to differential privacy, both from methodological and practical perspectives.

    • Presents perspectives from both computer science and statistical science for addressing data confidentiality and privacy.

    • Describes genuine applications of synthetic data, formal privacy, and secure computation to help practitioners implement these approaches.

    The handbook is accessible to both researchers and practitioners who work with confidential data. It requires familiarity with basic concepts from probability and data analysis.

    1. Introduction
    Jörg Drechsler, Daniel Kifer, Jerome Reiter and Aleksandra Slavkovic
    Part 1. The Big Picture
    2. Protecting Confidential Data through Non-Statistical Methods
    Lars Vilhuber
    3. 21st Century Statistical Disclosure Limitation: Motivations and Challenges
    John M. Abowd and Michael B. Hawes
    Part 2. Formal Privacy Techniques
    4. Review of Popular Algorithms for Differential Privacy
    Ninghui Li and Tianhao Wang
    5. Privacy Implications of Practical Model Design Choices
    Audra McMillan
    6. Query answering for tabular data
    Ryan McKenna
    7. Machine learning with differential privacy
    Anand D. Sarwate
    8. Statistical Inference and Differential Privacy
    Jordan Awan and Ruobin Gong
    9. Systems Issues in Formally Private Systems
    Philip Leclerc and Pavel Zhuravlev
    Part 3. Synthetic Data
    10. Synthetic Data
    Trivellore Raghunathan
    11. Methods for Synthetic Data Generation
    Joshua Snoke and Satkartar K. Kinney
    12. Validation Services for Confidential Data
    Gary Benedetto, Rolando A. Rodríguez, Jordan Stanley and Evan Totty
    Part 4. Secure Multiparty Computation
    13. Privacy-Preserving Distributed Computation
    Jonathan Katz
    14. Differential Privacy and Cryptography
    Xi He
    15. Overview of Secure Multi-Party Computation Applications in Health Research and Social Sciences
    Liina Kamm and Dan Bogdanov
    Part 5. Use Cases
    16. Differential Privacy Implementations
    Matthew Graham, Andrew Foote, Lee Tucker and Hubert Janicki
    17. Synthpop a tool to enable more flexible use of sensitive data within the Scottish Longitudinal Study
    Chris Dibben, Gillian M. Raab, Beata Nowok, Lee Williamson and Lynne Adair
    18. Safe Data Technologies: Safely Expanding Access to Administrative Tax Data
    Claire Bowen, Leonard E. Burman, Robert McClelland and Aaron R. Williams
    19. Secure Federated Learning: Integrated Statistical Modeling for Healthcare Applications
    Xiaoqian Jiang, Jihoon Kim, Tsung-Ting Kuo and Lucila Ohno-Machado


     

    Biography

    Jörg Drechsler is head of the Department for Statistical Methods at the Institute for Employment Research in Nuremberg, Germany and Professor of Statistical Science at the Institute for Statistics at the Ludwig-Maximilians-University in Munich. He is also an Associate Research Professor in the Joint Program in Survey Methodology at the University of Maryland. His main research interests are data confidentiality and nonresponse in surveys. He is a fellow of the International Statistical Institute. He received his PhD in Social Science from the University in Bamberg and his Habilitation in Statistics from the Ludwig-Maximilians-Universität in Munich.

    Daniel Kifer is a Professor of Computer Science at Penn State University. He has published extensively on technical approaches for privacy and confidentiality, with work spanning attack algorithms, novel methods for disclosure avoidance, statistical analysis of perturbed data, and automated tools for catching implementation errors. In 2016-2017, Kifer spent his sabbatical at the U.S. Census Bureau and helped design the disclosure avoidance system used for the 2020 Decennial Census. Kifer obtained his bachelor's degrees in mathematics and computer science at New York University and his Ph.D. at Cornell.

    Jerome Reiter is a Professor of Statistical Science at Duke University. His primary research areas include methods for protecting data confidentiality, for handling missing values, and for integrating data across multiple sources. He has worked extensively on theory, methods. and applications for synthetic data. He is a Fellow of the Institute of Mathematical Statistics and the American Statistical Association. He received a PhD in statistics from Harvard University and his undergraduate degree from Duke University.

    Aleksandra (Seša) Slavkovic is a Professor of Statistics & Public Health Sciences, Dorothy Foehr Huck and J. Lloyd Huck Chair in Data Privacy and Confidentiality, and Associate Dean for Research, Eberly College of Science at Penn State. Her research focuses on methodological developments in the area of data privacy and confidentiality in the context of small and large scale surveys, health, genomic, and network data, including work on differential privacy and broad data access that offers guarantees of accurate statistical inference needed to support reliable science and policy. She is a fellow of the American Statistical Association, Institute of Mathematical Statistics and the International Statistical Institute. She received her PhD (2004) and M.S. (2001) in Statistics, and a Master of Human-Computer Interaction (1999) from Carnegie Mellon University. She received her B.A. in Psychology from Duquesne University (1996).