Data Warehousing for Biomedical Informatics is a step-by-step how-to guide for designing and building an enterprise-wide data warehouse across a biomedical or healthcare institution, using a four-iteration lifecycle and standardized design pattern. It enables you to quickly implement a fully-scalable generic data architecture that supports your organization’s clinical, operational, administrative, financial, and research data. By following the guidelines in this book, you will be able to successfully progress through the Alpha, Beta, and Gamma versions, plus fully implement your first production release in about a year.
The Alpha version allows you to implement just enough of the basic design pattern to illustrate its core capabilities while loading a small sampling of limited data for demonstration purposes. This provides an easy way for everyone involved to visualize the new warehouse paradigm by actually examining a core subset of the working system. You can finish the Alpha version, also referred to as the proof-of-concept, in as little as 3-4 weeks.
The Beta version, which can be completed in about 2-3 months, adds required functionality and much more data. It allows you to get the full warehouse up and running quickly, in order to facilitate longer-term planning, user and support team training, and setup of the operational environment. The Gamma version, which is a fully-functional system–though still lacking data–can be implemented in about 3-4 months. About one year after starting, you will be ready to launch Release 1.0 as a complete and secure data warehouse.
Biomedical Data Warehousing
Nature of Biomedical Data
Nature of Warehoused Data
Business Requirements
Functional Requirements
Never-Finished Warehouse
Organizational Readiness
Implementation Strategy
SECTION I ALPHA VERSION
Dimensional Data Modeling
Evolution of Data Warehouses
The Star Schema
Transposing Dimensional Schema
Anticipating Dimensions
Affinity Analysis
Understanding Source Data
Implicit versus Explicit Data
Semantic Layers
Information Artifacts
Biomedical Context
Clinical Picture
Ontological Levels
Epistemological Levels
Conclusions
Biomedical Warehouse
Biomedical Star
Biomedical Facts
Master Dimensions
Reference Dimensions
Almanac Dimensions
Analysis Dimensions
Control Dimensions
Requirements Alignment
Star Dimension Design Pattern
Structure of a Dimension
Master Data: Definition Tables
Slowly Changing Dimensions
Source Keys: Context and Reference Tables
Fact Participation: Group and Bridge Tables
Interconnections: Hierarchy Tables
Connecting to Facts
Dimension Navigation
Loading Alpha Version
Throw-Away Code
Selecting and Preparing Sources
Generating Surrogate Keys
Simple Dimensions and Facts
Recap of Simple ETLs
Complicated Dimensions and Facts
Finalizing Alpha Structures
V&V of Alpha Version
SECTION II BETA VERSION
Completing the Design
Unit of Measure
Metadata Mappings
Control Dimensions
Reinitializing the Warehouse
Data Sourcing
Source Mapping Challenges
Dimensionalizing Facts
Sourcing Your Data
Generalizing ETL Workflows
Standardizing Source Data
Source Data Intake Jobs
SDI Design Pattern
Source Data Consolidation
External versus Internal Sourcing
Single Point of Function
ETL "Pipes"
Metadata Transformation
Data Control Pipe
Wide versus Deep Data
ETL Reference Pipe
Metadata Transformation
Reference Composite
Resolve References
Unresolved References
Reference Entries
Alias Entries
Bridges and Groups
Hierarchy Entries
Fiat Hierarchies
Natural Hierarchies
ETL Definition Pipe
Processing Complexities
Example Master Loads
Insert New Definitions
New Orphans
Orphan Auto-Adoption
Definition Change Processing
Building SCD Transaction Sets
Applying Transactions to Dimensions
Performance Concerns
ETL Fact Pipe
Metadata Transformation
Bridges and Groups
Build Facts
Finalize Dimensions
Set Control Dimensions
Insert Fact Values
Superseding Facts
Finalizing Beta
Audit Trail Facts
Datafeed Dimension
Verification and Validation
Preparing for Gamma
SECTION III GAMMA VERSION
Finalizing ETL Workflows
Alternatively Sourced Keys
Sourced Metadata
Standard Data Editing
Value-Level UOM
Undetermined Dimensionality
ETL Transactions
Target States
Superseded Facts
Continuous Functional Evolution
Establishing Data Controls
Finalizing Warehouse Design
Redaction Control Settings
Data Monitoring
Surrogate Merges
Security Controls
Implementing Dataset Controls
Warehouse Support Team
Building out the Data
Minimize Data Seams
Shifting toward Metrics
Populating Metric Values
Populating Control Values
Populating Displays
Delivering Data
Warehousing Use Cases
Privacy-Oriented Usage Profiles
Metadata Browsing
Cohort Identification
Fact Count Queries
Timeline Generation
Business Intelligence
Alternative Data Views
Finalizing Gamma
Business Requirements
Technical Challenges
Functional Challenges
Going Live
SECTION IV RELEASE 1.0
Knowledge Synthesis
Fact Counts
Derivative Data
Timeline Analysis
Statistical Analyses
Statistical Process Control
Semantic Annotation
Data Governance
Organizing for Governance
Governance Opportunities
Index
Biography
Richard E. Biehl is an information technology consultant with 37 years of experience, specializing in logical and physical data architectures, quality management, and strategic planning for the application of information technology. His research interests include semantic interoperability in biomedical data and the integration of chaos and complexity theories into the systems engineering of healthcare. Dr. Biehl holds a PhD in applied management and decision science and an MS in educational change and technology innovation from Walden University, Minneapolis, Minnesota. He is a certified Six Sigma Black Belt (CSSBB) and a Software Quality Engineer (CSQE) by the American Society for Quality (ASQ), Milwaukee, Wisconsin. Dr. Biehl is a visiting instructor at the University of Central Florida (UCF), Orlando, Florida, in the College of Engineering and Computer Science (CECS), teaching quality and systems engineering in the Industrial Engineering and Management Systems (IEMS) Department.