1st Edition

The Data Preparation Journey Finding Your Way with R

By Martin Hugh Monkman Copyright 2024
    236 Pages 13 Color & 26 B/W Illustrations
    by Chapman & Hall

    236 Pages 13 Color & 26 B/W Illustrations
    by Chapman & Hall

    236 Pages 13 Color & 26 B/W Illustrations
    by Chapman & Hall

    The Data Preparation Journey: Finding Your Way With R introduces the principles of data preparation within in a systematic approach that follows a typical data science or statistical workflow. With that context, readers will work through practical solutions to resolving problems in data using the statistical and data science programming language R. These solutions include examples of complex real-world data, adding greater context and exposing the reader to greater technical challenges. This book focuses on the Import to Tidy to Transform steps. It demonstrates how “Visualise” is an important part of Exploratory Data Analysis, a strategy for identifying potential problems with the data prior to cleaning.

    This book is designed for readers with a working knowledge of data manipulation functions in R or other programming languages. It is suitable for academics for whom analyzing data is crucial, businesses who make decisions based on the insights gleaned from collecting data from customer interactions, and public servants who use data to inform policy and program decisions. The principles and practices described within The Data Preparation Journey apply regardless of the context.

    Key Features:

    • Includes R package containing the code and data sets used in the book
    • Comprehensive examples of data preparation from a variety of disciplines
    • Defines the key principles of data preparation, from access to publication

    1. Introduction

    2. Foundations

    3. Data documentation

    4. Importing data

    5. Importing data: plain-text files

    6. Importing data: Excel

    7. Importing data: statistical software

    8. Importing data: PDF files

    9. Data from web sources

    10. Linking to relational databases

    11. Exploration and validation strategies

    12. Cleaning techniques

    13. Recap


    Martin Monkman is a Senior Manager at MNP, and a Course Instructor at the University of Victoria Continuing Studies’ Business Intelligence and Data Analytics program. Prior to joining MNP, Martin had a long career at BC Stats, the provincial statistics agency in British Columbia, Canada, including a decade with the job title “Provincial Statistician”. Martin has Bachelor of Science and Master of Arts degrees in Geography from the University of Victoria, and he has been a member of the Statistical Society of Canada since 2022.