Guide to the De-Identification of Personal Health Information: 1st Edition (Hardback) book cover

Guide to the De-Identification of Personal Health Information

1st Edition

By Khaled El Emam

Auerbach Publications

414 pages | 32 B/W Illus.

Purchasing Options:$ = USD
Hardback: 9781466579064
pub: 2013-05-06
SAVE ~$15.99
$79.95
$63.96
x


FREE Standard Shipping!

Description

Offering compelling practical and legal reasons why de-identification should be one of the main approaches to protecting patients’ privacy, the Guide to the De-Identification of Personal Health Information outlines a proven, risk-based methodology for the de-identification of sensitive health information. It situates and contextualizes this risk-based methodology and provides a general overview of its steps.

The book supplies a detailed case for why de-identification is important as well as best practices to help you pin point when it is necessary to apply de-identification in the disclosure of personal health information. It also:

  • Outlines practical methods for de-identification
  • Describes how to measure re-identification risk
  • Explains how to reduce the risk of re-identification
  • Includes proofs and supporting reference material
  • Focuses only on transformations proven to work on health information—rather than covering all possible approaches, whether they work in practice or not

Rated the top systems and software engineering scholar worldwide by The Journal of Systems and Software, Dr. El Emam is one of only a handful of individuals worldwide qualified to de-identify personal health information for secondary use under the HIPAA Privacy Rule Statistical Standard. In this book Dr. El Emam explains how we can make health data more accessible—while protecting patients’ privacy and complying with current regulations.

Reviews

By arguing persuasively for the use of de-identification as a privacy-enhancing tool, and setting out a practical methodology for the use of de-identification techniques and re-identification risk measurement tools, this book provides a valuable and much needed resource for all data custodians who use or disclose personal health information for secondary purposes. Doubly enabling, privacy-enhancing tools like these, that embrace privacy by design, will ensure the continued availability of personal health information for valuable secondary purposes that benefit us all.

—Dr. Ann Cavoukian, Information and Privacy Commissioner, Ontario, Canada

Table of Contents

Introduction

Primary and Secondary Purposes

The Spectrum of Risk for Data Access

Managing Risk

What Is De-identification?

Learning Something New

The Status Quo

Safe Harbor-Compliant Data Can Have a High Risk of Re-identification

The Adversary Knows Who Is in the Data

The Data Set Is Not a Random Sample from the U.S. Population

Other Fields Can Be Used for Re-identification

Moving Forward beyond Safe Harbor

Why We Wrote This Book

References

THE CASE FOR DE-IDENTIFYING PERSONAL HEALTH INFORMATION

Permitted Disclosures, Consent, and De-identification of PHI

Common Data Flows

The Need for De-identification

Permitted Uses and Disclosures of Health Information

Uses of Health Information by an Agent

Disclosing Identifiable Data When Permitted

References

The Impact of Consent

Differences between Consenters and Non-Consenters in Clinical Trials

The Impact of Consent on Observational Studies

Impact on Recruitment

Impact on Bias

Impact on Cost

Impact on Time

References

Data Breach Notifications

Benefits and Costs of Breach Notification

Cost of Data Breach Notifications to Custodian

Data Breach Trends

The Value of Health Data

Financial Information in the Health Records

Financial Value of Health Records

Medical Identity Theft

Monetizing Health Records through Extortion

References

Peeping and Snooping

Examples of Peeping

Information and Privacy Commissioners Orders

Ontario

HO-002

HO-010

HR06-53

HI-050013-1

Alberta

Investigation Report H2011-IR-004

IPC Investigation (Report Not Available)

Saskatchewan

H-2010-001

References

Unplanned but Legitimate Uses and Disclosures

Unplanned Uses by Governments

Data Sharing for Research Purposes

Open Government

Open Data for Research

Unplanned Uses and Disclosures by Commercial Players

Competitions

References

Public Perception and Privacy Protective Behaviors

References

Alternative Methods for Data Access

Remote Access

On-Site Access

Remote Execution

Remote Queries

Secure Computation

Summary

References

UNDERSTANDING DISCLOSURE RISKS

Scope, Terminology, and Definitions

Perspective on De-identification

Original Data and DFs

Unit of Analysis

Types of Data

Relational Data

Transactional Data

Sequential Data

Trajectory Data

Graph Data

The Notion of an Adversary

Types of Variables

Directly Identifying Variables

Indirectly Identifying Variables (Quasi-identifiers)

Sensitive Variables

Other Variables

Equivalence Classes

Aggregate Tables

References

Frequently Asked Questions about De-identification

Can We Have Zero Risk?

Will All DFs Be Re-identified in the Future?

Is a Data Set Identifiable If a Person Can Find His or Her Record?

Can De-identified Data Be Linked to Other Data Sets?

Doesn’t Differential Privacy Already Provide the Answer?

A Methodology for Managing Re-identification Risk

Re-identification Risk versus Re-identification Probability

Re-identification Risk for Public Files

Managing Re-identification Risk

References

Definitions of Identifiability

Definitions

Common Framework for Assessing Identifiability

References

Data Masking Methods

Suppression

Randomization

Irreversible Coding

Reversible Coding

Reversible Coding, HIPAA, and the Common Rule

Other Techniques That Do Not Work Well

Constraining Names

Adding Noise

Character Scrambling

Character Masking

Truncation

Encoding

Summary

References

Theoretical Re-identification Attacks

Background Knowledge of the Adversary

Re-identification Attacks

Example of a Linking Attack on Relational Data

Example of a Linking Attack on Transaction Data

Example of a Linking Attack on Sequential Data

Example of a Linking Attack on Trajectory Data

Example of a Linking Attack Based on Semantic Information

References

MEASURING RE-IDENTIFICATION RISK

Measuring the Probability of Re-identification

Simple and Derived Metrics

Simple Risk Metrics: Prosecutor and Journalist Risk

Measuring Prosecutor Risk

Measuring Journalist Risk

Applying the Derived Metrics and Decision Rules

Relationship among Metrics

References

Measures of Uniqueness

Uniqueness under Prosecutor Risk

Uniqueness under Journalist Risk

Summary

References

Modeling the Threat

Characterizing the Adversaries

Attempting a Re-identification Attack

Plausible Adversaries

An Internal Adversary

An External Adversary

What Are the Quasi-identifiers?

Sources of Data

Correlated and Inferred Variables

References

Choosing Metric Thresholds

Choosing the α Threshold

Choosing the τ and λ Thresholds

Choosing the Threshold for Marketer Risk

Choosing among Thresholds

Thresholds and Incorrect Re-identification

References

PRACTICAL METHODS FOR DE-IDENTIFICATION

De-identification Methods

Generalization

Principles

Optimal Lattice Anonymization (OLA)

Tagging

Records to Suppress

Suppression Methods

Overview

Fast Local Cell Suppression

Available Tools

Case Study: De-identification of the BORN Registry

General Parameters

Attack T1

Attack T2

Attack T3

Summary of Risk Assessment and De-identification

References

Practical Tips

Disclosed Files Should Be Samples

Disclosing Multiple Samples

Creating Cohorts

Cohort Defined on Quasi-identifiers Only

Cohort Defined on a Non-Quasi-identifier

Cohort Defined on Non-Quasi-identifiers and Quasi-identifiers

Impact of Data Quality

Publicizing Re-identification Risk Assessment

Adversary Power

Levels of Adversary Background Knowledge

De-identification in the Context of a Data Warehouse

References

END MATTER

An Analysis of Historical Breach Notification Trends

Methods

Definitions

Breach Lists

Original Data Sources

Sponsors of Lists

Data Quality

Estimating the Number of Disclosed Breaches

Data Collection

Interrater Agreement

Results

Discussion

Summary of Main Results

Post Hoc Analysis

References

Methods of Attack for Maximum Journalist Risk

Method of Attack 1

Method of Attack 2

Method of Attack 3

How Many Friends Do We Have?

References

Cell Size Precedents

References

The Invasion of Privacy Construct

6B Dimensions

Sensitivity of the Data

Potential Injury to Consumers

Appropriateness of Consent

General Information on Mitigating Controls

Introduction

Origins of the MCI

Subject of Assessment: Data Requestor versus Data Recipient

Applicability of the MCI

Structure of the MCI

Scoring

Which Practices to Rate

Third-Party versus Self-Assessment

Scoring the MCI

Interpreting to the MCI Questions

General Justifications for Time Intervals

Practical Requirements

Remediation

Controlling Access, Disclosure, Retention, and Disposition of Personal Data

Safeguarding Personal Data

Ensuring Accountability and Transparency in the Management of Personal Data

Assessing Motives and Capacity

Dimensions

Motives to Re-identify the Data

Capacity to Re-identify the Data

Invasion of Privacy

Sensitivity of the Data

Potential Injury to Patients

Appropriateness of Consent

Index

About the Author

Dr. El Emam holds the Canada Research Chair in Electronic Health Information at the University of Ottawa and is an Associate Professor in the Faculty of Medicine at the university. In 2003 and 2004, he was ranked as the top systems and software engineering scholar worldwide by The Journal of Systems and Software based on his research on measurement and quality evaluation and improvement. He is a senior scientist at the Children’s Hospital of Eastern Ontario Research Institute and leads the multi-disciplinary Electronic Health Information Laboratory (EHIL) team.

Dr. El Emam is one of only a handful of individuals worldwide known to be qualified to de-identify personal health information for secondary use under the HIPAA Privacy Rule Statistical Standard. Khaled is also a world-renowned expert in health information privacy and the head of the Electronic Health Information Laboratory www.ehealthinformation.ca which conducts cutting edge research in this area. He has been de-identifying data since 2004, and has a large following and speaks extensively on this topic.

He has edited 2 books and written one already, as well as contributed chapters to a number of others.

Subject Categories

BISAC Subject Codes/Headings:
BUS070080
BUSINESS & ECONOMICS / Industries / Service Industries
COM032000
COMPUTERS / Information Technology
COM053000
COMPUTERS / Security / General
MED002000
MEDICAL / Administration