1st Edition

Data Warehousing for Biomedical Informatics

By Richard E. Biehl Copyright 2016
    656 Pages 141 B/W Illustrations
    by Auerbach Publications

    Data Warehousing for Biomedical Informatics is a step-by-step how-to guide for designing and building an enterprise-wide data warehouse across a biomedical or healthcare institution, using a four-iteration lifecycle and standardized design pattern. It enables you to quickly implement a fully-scalable generic data architecture that supports your organization’s clinical, operational, administrative, financial, and research data. By following the guidelines in this book, you will be able to successfully progress through the Alpha, Beta, and Gamma versions, plus fully implement your first production release in about a year.

    The Alpha version allows you to implement just enough of the basic design pattern to illustrate its core capabilities while loading a small sampling of limited data for demonstration purposes. This provides an easy way for everyone involved to visualize the new warehouse paradigm by actually examining a core subset of the working system. You can finish the Alpha version, also referred to as the proof-of-concept, in as little as 3-4 weeks.

    The Beta version, which can be completed in about 2-3 months, adds required functionality and much more data. It allows you to get the full warehouse up and running quickly, in order to facilitate longer-term planning, user and support team training, and setup of the operational environment. The Gamma version, which is a fully-functional system–though still lacking data–can be implemented in about 3-4 months. About one year after starting, you will be ready to launch Release 1.0 as a complete and secure data warehouse.

    Biomedical Data Warehousing
    Nature of Biomedical Data
    Nature of Warehoused Data
    Business Requirements
    Functional Requirements
    Never-Finished Warehouse
    Organizational Readiness
    Implementation Strategy

    SECTION I ALPHA VERSION

    Dimensional Data Modeling
    Evolution of Data Warehouses
    The Star Schema
    Transposing Dimensional Schema
    Anticipating Dimensions
    Affinity Analysis

    Understanding Source Data
    Implicit versus Explicit Data
    Semantic Layers
    Information Artifacts
    Biomedical Context
    Clinical Picture
    Ontological Levels
    Epistemological Levels
    Conclusions

    Biomedical Warehouse
    Biomedical Star
    Biomedical Facts
    Master Dimensions
    Reference Dimensions
    Almanac Dimensions
    Analysis Dimensions
    Control Dimensions
    Requirements Alignment

    Star Dimension Design Pattern
    Structure of a Dimension
    Master Data: Definition Tables
    Slowly Changing Dimensions
    Source Keys: Context and Reference Tables
    Fact Participation: Group and Bridge Tables
    Interconnections: Hierarchy Tables
    Connecting to Facts
    Dimension Navigation

    Loading Alpha Version
    Throw-Away Code
    Selecting and Preparing Sources
    Generating Surrogate Keys
    Simple Dimensions and Facts
    Recap of Simple ETLs
    Complicated Dimensions and Facts
    Finalizing Alpha Structures
    V&V of Alpha Version

    SECTION II BETA VERSION

    Completing the Design
    Unit of Measure
    Metadata Mappings
    Control Dimensions
    Reinitializing the Warehouse

    Data Sourcing
    Source Mapping Challenges
    Dimensionalizing Facts
    Sourcing Your Data

    Generalizing ETL Workflows
    Standardizing Source Data
    Source Data Intake Jobs
    SDI Design Pattern
    Source Data Consolidation
    External versus Internal Sourcing
    Single Point of Function
    ETL "Pipes"
    Metadata Transformation
    Data Control Pipe
    Wide versus Deep Data

    ETL Reference Pipe
    Metadata Transformation
    Reference Composite
    Resolve References
    Unresolved References
    Reference Entries
    Alias Entries
    Bridges and Groups
    Hierarchy Entries
    Fiat Hierarchies
    Natural Hierarchies

    ETL Definition Pipe
    Processing Complexities
    Example Master Loads
    Insert New Definitions
    New Orphans
    Orphan Auto-Adoption
    Definition Change Processing
    Building SCD Transaction Sets
    Applying Transactions to Dimensions
    Performance Concerns

    ETL Fact Pipe
    Metadata Transformation
    Bridges and Groups
    Build Facts
    Finalize Dimensions
    Set Control Dimensions
    Insert Fact Values
    Superseding Facts

    Finalizing Beta
    Audit Trail Facts
    Datafeed Dimension
    Verification and Validation
    Preparing for Gamma

    SECTION III GAMMA VERSION

    Finalizing ETL Workflows
    Alternatively Sourced Keys
    Sourced Metadata
    Standard Data Editing
    Value-Level UOM
    Undetermined Dimensionality
    ETL Transactions
    Target States
    Superseded Facts
    Continuous Functional Evolution

    Establishing Data Controls
    Finalizing Warehouse Design
    Redaction Control Settings
    Data Monitoring
    Surrogate Merges
    Security Controls
    Implementing Dataset Controls
    Warehouse Support Team

    Building out the Data
    Minimize Data Seams
    Shifting toward Metrics
    Populating Metric Values
    Populating Control Values
    Populating Displays

    Delivering Data
    Warehousing Use Cases
    Privacy-Oriented Usage Profiles
    Metadata Browsing
    Cohort Identification
    Fact Count Queries
    Timeline Generation
    Business Intelligence
    Alternative Data Views

    Finalizing Gamma
    Business Requirements
    Technical Challenges
    Functional Challenges
    Going Live

    SECTION IV RELEASE 1.0

    Knowledge Synthesis
    Fact Counts
    Derivative Data
    Timeline Analysis
    Statistical Analyses
    Statistical Process Control
    Semantic Annotation

    Data Governance
    Organizing for Governance
    Governance Opportunities
    Index

    Biography

    Richard E. Biehl is an information technology consultant with 37 years of experience, specializing in logical and physical data architectures, quality management, and strategic planning for the application of information technology. His research interests include semantic interoperability in biomedical data and the integration of chaos and complexity theories into the systems engineering of healthcare. Dr. Biehl holds a PhD in applied management and decision science and an MS in educational change and technology innovation from Walden University, Minneapolis, Minnesota. He is a certified Six Sigma Black Belt (CSSBB) and a Software Quality Engineer (CSQE) by the American Society for Quality (ASQ), Milwaukee, Wisconsin. Dr. Biehl is a visiting instructor at the University of Central Florida (UCF), Orlando, Florida, in the College of Engineering and Computer Science (CECS), teaching quality and systems engineering in the Industrial Engineering and Management Systems (IEMS) Department.