1st Edition

Methods in Medical Informatics
Fundamentals of Healthcare Programming in Perl, Python, and Ruby

ISBN 9781138374416
Published September 18, 2018 by Chapman and Hall/CRC
413 Pages

USD $74.95

Prices & shipping based on shipping country


Book Description

Too often, healthcare workers are led to believe that medical informatics is a complex field that can only be mastered by teams of professional programmers. This is simply not the case. With just a few dozen simple algorithms, easily implemented with open source programming languages, you can fully utilize the medical information contained in clinical and research datasets. The common computational tasks of medical informatics are accessible to anyone willing to learn the basics.

Methods in Medical Informatics: Fundamentals of Healthcare Programming in Perl, Python, and Ruby demonstrates that biomedical professionals with fundamental programming knowledge can master any kind of data collection. Providing you with access to data, nomenclatures, and programming scripts and languages that are all free and publicly available, this book —

  • Describes the structure of data sources used, with instructions for downloading
  • Includes a clearly written explanation of each algorithm
  • Offers equivalent scripts in Perl, Python, and Ruby, for each algorithm
  • Shows how to write short, quickly learned scripts, using a minimal selection of commands
  • Teaches basic informatics methods for retrieving, organizing, merging, and analyzing data sources
  • Provides case studies that detail the kinds of questions that biomedical scientists can ask and answer with public data and an open source programming language

Requiring no more than a working knowledge of Perl, Python, or Ruby, Methods in Medical Informatics will have you writing powerful programs in just a few minutes. Within its chapters, you will find descriptions of the basic methods and implementations needed to complete many of the projects you will encounter in your biomedical career.

Table of Contents

Chapter 1
Parsing and Transforming Text Files
Peeking into Large Files
Paging through Large Text Files
Extracting Lines that Match a Regular Expression
Changing Every File in a Subdirectory
Counting the Words in a File
Making a Word List with Occurrence Tally
Using Printf Formatting Style
Chapter 2 Utility Scripts
Random Numbers
Converting Non-ASCII to Base64 ASCII
Creating a Universally Unique Identifier
Splitting Text into Sentences
One-Way Hash on a Name
One-Way Hash on a File
A Prime Number Generator
Chapter 3 Viewing and Modifying Images
Viewing a JPEG Image
Converting between Image Formats
Batch Conversions
Drawing a Graph from List Data
Drawing an Image Mashup
Chapter 4 Indexing Text
ZIPF Distribution of a Text File
Preparing a Concordance
Extracting Phrases
Preparing an Index
Comparing Texts Using Similarity Scores

Chapter 5 The National Library of Medicine’s Medical Subject Headings (MeSH )
Determining the Hierarchical Lineage for MeSH Terms
Creating a MeSH Database
Reading the MeSH Database
Creating an SQLite Database for MeSH
Reading the SQLite MeSH Database
Chapter 6 The International Classification of Diseases
Creating the ICD Dictionary
Building the ICD-O (Oncology) Dictionary
Chapter 7 SEER: The Cancer Surveillance, Epidemiology, and End Results Program
Parsing the SEER Data Files
Finding the Occurrences of All Cancers in the SEER Data Files
Finding the Age Distributions of the Cancers in the SEER Data Files
Chapter 8 OMIM: The Online Mendelian Inheritance in Man
Collecting the OMIM Entry Terms
Finding Inherited Cancer Conditions
Chapter 9 PubMed
Building a Large Text Corpus of Biomedical Information
Creating a List of Doublets from a PubMed Corpus
Downloading Gene Synonyms from PubMed
Downloading Protein Synonyms from PubMed
Chapter 10 Taxonomy
Finding a Taxonomic Hierarchy
Finding the Restricted Classes of Human Infectious Pathogens
Chapter 11 Developmental Lineage Classification and Taxonomyof Neoplasms
Building the Doublet Hash
Scanning the Literature for Candidate Terms
Adding Terms to the Neoplasm Classification
Determining the Lineage of Every Neoplasm Concept
Chapter 12 U.S. Census Files
Total Population of the United States
Stratified Distribution for the U.S. Census
Adjusting for Age
Chapter 13 Centers for Disease Control and Prevention Mortality Files
Death Certificate Data
Obtaining the CDC Data Files
How Death Certificates Are Represented in Data Records
Ranking, by Number of Occurrences, Every Condition in the CDC
Mortality Files

Chapter 14 Autocoding

A Neoplasm Autocoder
Chapter 15 Text Scrubber for Deidentifyin g Confidential Text
Chapter 16 Web Pages and CGI Scripts

Grabbing Web Pages
CGI Script for Searching the Neoplasm Classification
Chapter 17 Image Annotation
Inserting a Header Comment
Extracting the Header Comment in a JPEG Image File
Inserting IPTC Annotations
Extracting Comment, EXIF, and IPTC Annotations
Dealing with DICOM
Finding DICOM Images
DICOM-to-JPEG Conversion
Chapter 18 Describing Data with Data, Using XML
Parsing XML
Resource Description Framework (RDF)
Dublin Core Metadata
Insert an RDF Document into an Image File
Insert an Image File into an RDF Document
RDF Schema
Visualizing an RDF Schema with GraphViz
Obtaining GraphViz
Converting a Data Structure to GraphViz

Chapter 19 Case Study: Emphysema Rates
Chapter 20 Case Study: Cancer Occurrence Rates
Chapter 21 Case Study: Germ Cell Tumor Rates across Ethnicities
Chapter 22 Case Study: Ranking the Death-Certifying Process, by State
Chapter 23 Case Study: Data Mashups for Epidemics
Tally of Coccidioidomycosis Cases by State
Creating the Map Mashup
Chapter 24 Case Study: Sickle Cell Rates
Chapter 25 Case Study: Site-Specific Tumor Biology

Anatomic Origins of Mesotheliomas
Mesothelioma Records in the SEER Data Sets
Graphic Representation
Chapter 26 Case Study: Bimodal Tumors
Chapter 27 Case Study: The Age of Occurrence of Precancers
Epilogue for Healthcare Professionals and Medical Scientists
Learn One or More Open Source Programming Languages
Don’t Agonize Over Which Language You Should Choose
Learn Algorithms
Unless You Are a Professional Programmer, Relax and Enjoy Being a Newbie
Do Not Delegate Simple Programming Tasks to Others
Break Complex Tasks into Simple Methods and Algorithms
Write Fast Scripts
Concentrate on the Questions, Not the Answers

How to Acquire Ruby
How to Acquire Perl
How to Acquire Python
How to Acquire RMagick
How to Acquire SQLite
How to Acquire the Public Data Files Used in This Book
Other Publicly Available Files, Data Sets, and Utilities

View More


As subspecialty board certification in clinical informatics has finally become a reality, Jules Berman’s Methods in Medical Informatics could not be more timely. This well-written and informative text combines Dr. Berman’s expertise in programming with his vast knowledge of publicly available data sets and everyday healthcare programming needs to result in a book which … should become a staple in health informatics education programs as well as a standard addition to the personal libraries of informaticists.
—Alexis B. Carter, Journal of Pathology Informatics, October 2011

This book provides an introduction to processing clinical and population health data using rigorous methods and widely available, low cost, but very capable tools. The inclusion of the three leading dynamic programming languages broadens the appeal … bridges the gap from programming instruction to dealing with specialized medical data, making it possible to teach a relevant programming course in a biomedical environment. I would have loved to have a copy of this when I was teaching introductory programming for medical informatics.
—Professor James H. Harrison, Jr., Director of Clinical Informatics, University of Virginia

… presents students and professionals in the healthcare field (who have some working knowledge of the open-source programming languages Perl, Python, or Ruby) with instruction for applying basic informatics algorithms to medical data sets. He [the author] provides algorithm scripts for each of the languages, along with step-by-step explanations of the algorithms used for retrieving, organizing, merging, and analyzing such data sources as the National Cancer Institute’s Surveillance Epidemiology and End Results project, the National Library of Medicine’s PubMed service, the mortality records of the US Centers for Disease Control and Prevention, the US Census, and the Online Mendelian Inheritance in Man data set on inherited conditions.
SciTech Book News, February 2011

Support Material