Methods in Medical Informatics: Fundamentals of Healthcare Programming in Perl, Python, and Ruby, 1st Edition (Paperback) book cover

Methods in Medical Informatics

Fundamentals of Healthcare Programming in Perl, Python, and Ruby, 1st Edition

By Jules J. Berman

Chapman and Hall/CRC

413 pages

Purchasing Options:$ = USD
New in Paperback: 9781138374416
pub: 2018-09-18
Hardback: 9781439841822
pub: 2010-09-22
eBook (VitalSource) : 9780429189388
pub: 2010-09-22
from $52.50

FREE Standard Shipping!


Too often, healthcare workers are led to believe that medical informatics is a complex field that can only be mastered by teams of professional programmers. This is simply not the case. With just a few dozen simple algorithms, easily implemented with open source programming languages, you can fully utilize the medical information contained in clinical and research datasets. The common computational tasks of medical informatics are accessible to anyone willing to learn the basics.

Methods in Medical Informatics: Fundamentals of Healthcare Programming in Perl, Python, and Ruby demonstrates that biomedical professionals with fundamental programming knowledge can master any kind of data collection. Providing you with access to data, nomenclatures, and programming scripts and languages that are all free and publicly available, this book —

  • Describes the structure of data sources used, with instructions for downloading
  • Includes a clearly written explanation of each algorithm
  • Offers equivalent scripts in Perl, Python, and Ruby, for each algorithm
  • Shows how to write short, quickly learned scripts, using a minimal selection of commands
  • Teaches basic informatics methods for retrieving, organizing, merging, and analyzing data sources
  • Provides case studies that detail the kinds of questions that biomedical scientists can ask and answer with public data and an open source programming language

Requiring no more than a working knowledge of Perl, Python, or Ruby, Methods in Medical Informatics will have you writing powerful programs in just a few minutes. Within its chapters, you will find descriptions of the basic methods and implementations needed to complete many of the projects you will encounter in your biomedical career.


As subspecialty board certification in clinical informatics has finally become a reality, Jules Berman’s Methods in Medical Informatics could not be more timely. This well-written and informative text combines Dr. Berman’s expertise in programming with his vast knowledge of publicly available data sets and everyday healthcare programming needs to result in a book which … should become a staple in health informatics education programs as well as a standard addition to the personal libraries of informaticists.

—Alexis B. Carter, Journal of Pathology Informatics, October 2011

This book provides an introduction to processing clinical and population health data using rigorous methods and widely available, low cost, but very capable tools. The inclusion of the three leading dynamic programming languages broadens the appeal … bridges the gap from programming instruction to dealing with specialized medical data, making it possible to teach a relevant programming course in a biomedical environment. I would have loved to have a copy of this when I was teaching introductory programming for medical informatics.

—Professor James H. Harrison, Jr., Director of Clinical Informatics, University of Virginia

… presents students and professionals in the healthcare field (who have some working knowledge of the open-source programming languages Perl, Python, or Ruby) with instruction for applying basic informatics algorithms to medical data sets. He [the author] provides algorithm scripts for each of the languages, along with step-by-step explanations of the algorithms used for retrieving, organizing, merging, and analyzing such data sources as the National Cancer Institute’s Surveillance Epidemiology and End Results project, the National Library of Medicine’s PubMed service, the mortality records of the US Centers for Disease Control and Prevention, the US Census, and the Online Mendelian Inheritance in Man data set on inherited conditions.

SciTech Book News, February 2011

Table of Contents


Chapter 1 Parsing and Transforming Text Files

Peeking into Large Files

Paging through Large Text Files

Extracting Lines that Match a Regular Expression

Changing Every File in a Subdirectory

Counting the Words in a File

Making a Word List with Occurrence Tally

Using Printf Formatting Style

Chapter 2 Utility Scripts

Random Numbers

Converting Non-ASCII to Base64 ASCII

Creating a Universally Unique Identifier

Splitting Text into Sentences

One-Way Hash on a Name

One-Way Hash on a File

A Prime Number Generator

Chapter 3 Viewing and Modifying Images

Viewing a JPEG Image

Converting between Image Formats

Batch Conversions

Drawing a Graph from List Data

Drawing an Image Mashup

Chapter 4 Indexing Text

ZIPF Distribution of a Text File

Preparing a Concordance

Extracting Phrases

Preparing an Index

Comparing Texts Using Similarity Scores


Chapter 5 The National Library of Medicine’s Medical Subject Headings (MeSH )

Determining the Hierarchical Lineage for MeSH Terms

Creating a MeSH Database

Reading the MeSH Database

Creating an SQLite Database for MeSH

Reading the SQLite MeSH Database

Chapter 6 The International Classification of Diseases

Creating the ICD Dictionary

Building the ICD-O (Oncology) Dictionary

Chapter 7 SEER: The Cancer Surveillance, Epidemiology, and End Results Program

Parsing the SEER Data Files

Finding the Occurrences of All Cancers in the SEER Data Files

Finding the Age Distributions of the Cancers in the SEER Data Files

Chapter 8 OMIM: The Online Mendelian Inheritance in Man

Collecting the OMIM Entry Terms

Finding Inherited Cancer Conditions

Chapter 9 PubMed

Building a Large Text Corpus of Biomedical Information

Creating a List of Doublets from a PubMed Corpus

Downloading Gene Synonyms from PubMed

Downloading Protein Synonyms from PubMed

Chapter 10 Taxonomy

Finding a Taxonomic Hierarchy

Finding the Restricted Classes of Human Infectious Pathogens

Chapter 11 Developmental Lineage Classification and Taxonomyof Neoplasms

Building the Doublet Hash

Scanning the Literature for Candidate Terms

Adding Terms to the Neoplasm Classification

Determining the Lineage of Every Neoplasm Concept

Chapter 12 U.S. Census Files

Total Population of the United States

Stratified Distribution for the U.S. Census

Adjusting for Age

Chapter 13 Centers for Disease Control and Prevention Mortality Files

Death Certificate Data

Obtaining the CDC Data Files

How Death Certificates Are Represented in Data Records

Ranking, by Number of Occurrences, Every Condition in the CDC

Mortality Files


Chapter 14 Autocoding

A Neoplasm Autocoder


Chapter 15 Text Scrubber for Deidentifyin g Confidential Text

Chapter 16 Web Pages and CGI Scripts

Grabbing Web Pages

CGI Script for Searching the Neoplasm Classification

Chapter 17 Image Annotation

Inserting a Header Comment

Extracting the Header Comment in a JPEG Image File

Inserting IPTC Annotations

Extracting Comment, EXIF, and IPTC Annotations

Dealing with DICOM

Finding DICOM Images

DICOM-to-JPEG Conversion

Chapter 18 Describing Data with Data, Using XML

Parsing XML

Resource Description Framework (RDF)

Dublin Core Metadata

Insert an RDF Document into an Image File

Insert an Image File into an RDF Document

RDF Schema

Visualizing an RDF Schema with GraphViz

Obtaining GraphViz

Converting a Data Structure to GraphViz


Chapter 19 Case Study: Emphysema Rates

Chapter 20 Case Study: Cancer Occurrence Rates

Chapter 21 Case Study: Germ Cell Tumor Rates across Ethnicities

Chapter 22 Case Study: Ranking the Death-Certifying Process, by State

Chapter 23 Case Study: Data Mashups for Epidemics

Tally of Coccidioidomycosis Cases by State

Creating the Map Mashup

Chapter 24 Case Study: Sickle Cell Rates

Chapter 25 Case Study: Site-Specific Tumor Biology

Anatomic Origins of Mesotheliomas

Mesothelioma Records in the SEER Data Sets

Graphic Representation

Chapter 26 Case Study: Bimodal Tumors

Chapter 27 Case Study: The Age of Occurrence of Precancers

Epilogue for Healthcare Professionals and Medical Scientists

Learn One or More Open Source Programming Languages

Don’t Agonize Over Which Language You Should Choose

Learn Algorithms

Unless You Are a Professional Programmer, Relax and Enjoy Being a Newbie

Do Not Delegate Simple Programming Tasks to Others

Break Complex Tasks into Simple Methods and Algorithms

Write Fast Scripts

Concentrate on the Questions, Not the Answers


How to Acquire Ruby

How to Acquire Perl

How to Acquire Python

How to Acquire RMagick

How to Acquire SQLite

How to Acquire the Public Data Files Used in This Book

Other Publicly Available Files, Data Sets, and Utilities

About the Author

Jules Berman, Ph.D., M.D., received two bachelor of science degrees (mathematics and earth sciences) from MIT, a Ph.D. in pathology from Temple University, and an M.D. from the University of Miami School of Medicine. His postdoctoral research was conducted at the National Cancer Institute. His medical residence in pathology was completed at the George Washington University School of Medicine. He became board certified in anatomic pathology and in cytopathology, and served as the chief of Anatomic Pathology, Surgical Pathology and Cytopathology at the Veterans Administration (VA) Medical Center in Baltimore, Maryland.

While at the Baltimore VA, Dr. Berman held appointments at the University of Maryland Medical Center and at theJohns Hopkins Medical Institutions. In 1998, he became the program director for pathology informatics in the Cancer Diagnosis Program at the U.S. National Cancer Institute. In 2006, he became president of the Association for Pathology Informatics. Over the course of his career, he has written, as first author, more than 100 publications, including five books in the field of medical informatics. Today, Dr. Berman is a full-time freelance writer.

About the Series

Chapman & Hall/CRC Mathematical and Computational Biology

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
MATHEMATICS / Probability & Statistics / General
SCIENCE / Life Sciences / Biology / General
SCIENCE / Biotechnology