1st Edition

Research Software Engineering with Python Building software that makes research possible

    528 Pages 55 Color Illustrations
    by Chapman & Hall

    528 Pages 55 Color Illustrations
    by Chapman & Hall

    528 Pages 55 Color Illustrations
    by Chapman & Hall

    Writing and running software is now as much a part of science as telescopes and test tubes, but most researchers are never taught how to do either well. As a result, it takes them longer to accomplish simple tasks than it should, and it is harder for them to share their work with others than it needs to be.

    This book introduces the concepts, tools, and skills that researchers need to get more done in less time and with less pain. Based on the practical experiences of its authors, who collectively have spent several decades teaching software skills to scientists, it covers everything graduate-level researchers need to automate their workflows, collaborate with colleagues, ensure that their results are trustworthy, and publish what they have built so that others can build on it. The book assumes only a basic knowledge of Python as a starting point, and shows readers how it, the Unix shell, Git, Make, and related tools can give them more time to focus on the research they actually want to do.

    Research Software Engineering with Python can be used as the main text in a one-semester course or for self-guided study. A running example shows how to organize a small research project step by step; over a hundred exercises give readers a chance to practice these skills themselves, while a glossary defining over two hundred terms will help readers find their way through the terminology. All of the material can be re-used under a Creative Commons license, and all royalties from sales of the book will be donated to The Carpentries, an organization that teaches foundational coding and data science skills to researchers worldwide.

    0.1The Big Picture 
    0.2 Intended Audience 
    0.3 What You Will Learn
    0.4 Using this Book
    0.5 Contributing and Re-Use 
    0.6 Acknowledgments 

    Getting Started 
    1.1 Project Structure 
    1.2 Downloading the Data 
    1.3 Installing the Software 
    1.4 Summary 
    1.5 Exercises 
    1.6 Key Points 

    The Basics of the Unix Shell 
    2.1 Exploring Files and Directories 
    2.2 Moving Around 
    2.3 Creating New Files and Directories 
    2.4 Moving Files and Directories 
    2.5 Copying Files and Directories 
    2.6 Deleting Files and Directories 
    2.7 Wildcards 
    2.8 Reading the Manual 
    2.9 Summary 
    2.10 Exercises 
    2.11 Key Points 

    Building Tools with the Unix Shell 
    3.1 Combining Commands 
    3.2 How Pipes Work 
    3.3 Repeating Commands on Many Files 
    3.4 Variable Names 
    3.5 Redoing Things 
    3.6 Creating New Filenames Automatically 
    3.7 Summary
    3.8 Exercises 
    3.9 Key Points 

    Going Further with the Unix Shell
    4.1 Creating New Commands 
    4.2 Making Scripts More Versatile 
    4.3 Turning Interactive Work into a Script 
    4.4 Finding Things in Files 
    4.5 Finding Files 
    4.6 Configuring the Shell 
    4.7 Summary 
    4.8 Exercises .
    4.9 Key Points 

    Building Command-Line Tools with Python

    5.1 Programs and Modules
    5.2 Handling Command-Line Options 
    5.3 Documentation 
    5.4 Counting Words 
    5.5 Pipelining 
    5.6 Positional and Optional Arguments 
    5.7 Collating Results 
    5.8 Writing Our Own Modules 
    5.9 Plotting 
    5.10 Summary 
    5.11 Exercises 
    5.12 Key Points

    Using Git at the Command Line 
    6.1 Setting Up 
    6.2 Creating a New Repository 
    6.3 Adding Existing Work 
    6.4 Describing Commits 
    6.5 Saving and Tracking Changes 
    6.6 Synchronizing with Other Repositories 
    6.7 Exploring History 
    6.8 Restoring Old Versions of Files 
    6.9 Ignoring Files
    6.10 Summary
    6.11 Exercises 
    6.12 Key Points 

    Going Further with Git 
    7.1 What’s a Branch? 
    7.2 Creating a Branch 
    7.3 What Curve Should We Fit?
    7.4 Verifying Zipf’s Law 
    7.5 Merging 
    7.6 Handling Conflicts 
    7.7 A Branch-Based Workflow
    7.8 Using Other People’s Work 
    7.9 Pull Requests 
    7.10 Handling Conflicts in Pull Requests 
    7.11 Summary 
    7.12 Exercises 
    7.13 Key Points 

    Working in Teams 
    8.1 What is a Project? 
    8.2 Include Everyone 
    8.3 Establish a Code of Conduct
    8.4 Include a License
    8.5 Planning
    8.6 Bug Reports 
    8.7 Labeling Issues 
    8.8 Prioritizing 
    8.9 Meetings 
    8.10 Making Decisions 
    8.11 Make All This Obvious to Newcomers 
    8.12 Handling Conflict 
    8.13 Summary 
    8.14 Exercises
    8.15 Key Points

    Automating Analyses with Make
    9.1 Updating a Single File 
    9.2 Managing Multiple Files 
    9.3 Updating Files When Programs Change 
    9.4 Reducing Repetition in a Makefile 
    9.5 Automatic Variables 
    9.6 Generic Rules 
    9.7 Defining Sets of Files 
    9.8 Documenting a Makefile 
    9.9 Automating Entire Analyses
    9.10 Summary 
    9.11 Exercises 
    9.12 Key Points

    Configuring Programs
    10.1 Configuration File Formats 
    10.2 Matplotlib Configuration 
    10.3 The Global Configuration File 
    10.4 The User Configuration File 
    10.5 Adding Command-Line Options 
    10.6 A Job Control File 
    10.7 Summary 
    10.8 Exercises
    10.9 Key Points 

    Testing Software
    11.1 Assertions 
    11.2 Unit Testing 
    11.3 Testing Frameworks 
    11.4 Testing Floating-Point Values 
    11.5 Integration Testing 
    11.6 Regression Testing 
    11.7 Test Coverage 
    11.8 Continuous Integration
    11.9 When to Write Tests 
    11.10 Summary 
    11.11 Exercises
    11.12 Key Points 

    Handling Errors 
    12.1 Exceptions 
    12.2 Writing Useful Error Messages
    12.3 Testing Error Handling 
    12.4 Reporting Errors 
    12.5 Summary 
    12.6 Exercises 
    12.7 Key Points 

    Tracking Provenance 
    13.1 Data Provenance 
    13.2 Code Provenance 
    13.3 Summary 
    13.4 Exercises 
    13.5 Key Points 

    Creating Packages with Python 
    14.1 Creating a Python Package 
    14.2 Virtual Environments 
    14.3 Installing a Development Package 
    14.4 What Installation Does
    14.5 Distributing Packages 
    14.6 Documenting Packages 
    14.7 Software Journals 
    14.8 Summary 
    14.9 Exercises 
    14.10 Key Points 

    15.1 Why We Wrote This Book 

    A Solutions 
    B Learning Objectives 
    C Key Points 
    D Project Tree 
    E Working Remotely 
    F Writing Readable Code 
    G Documenting Programs 
    H YAML
    I Anaconda 
    J Glossary 
    K References 


    Dr. Damien B. Irving is post-doctoral researcher in climate science at the University of New South Wales living in Hobart, Tasmania. With a strong interest in data science education and open/reproducible research, Damien is involved in The Carpentries community as an instructor, lesson author and Regional Coordinator for Australia, is an Associate Editor with the Journal of Open Research Software, and is currently the Global Coordinator for the Research Bazaar, a worldwide festival promoting the digital literacy emerging at the center of modern research.

    Dr. Kate L. Hertweck is a scientist and educator who endeavors to uphold core values like diversity/equity/inclusion, accessibility of information, and learning over knowing. They currently lead training and community efforts to support biomedical researchers at Fred Hutchinson Cancer Research Center in Seattle, Washington. Kate is an instructor and trainer for the Carpentries and has also participated in that group's lesson development/maintenance and community governance.

    Dr. Luke Johnston is a diabetes epidemiologist working at the Steno Diabetes Center Aarhus in Denmark. He is passionate about educating researchers on modern computing tools and skills, having taught many Carpentry workshops as well as creating and instructing several intensive courses teaching computing skills and analytic reproducibility to diabetes researchers. When he isn't teaching or doing research, he is building software tools to automate common research workflows and tasks.

    Dr. Joel Ostblom is a post-doctoral teaching fellow in the Master's of Data Science program at the University of British Columbia in Vancouver, B.C. He has co-created or led the development of several courses and workshops at the University of Toronto and the University of British Columbia. Joel cares deeply about spreading data literacy and excitement over programmatic data analysis, which is reflected in his contributions to open source projects and data science learning resources.

    Dr. Charlotte Wickham is a data scientist and educator, who teaches in the Statistics Department at Oregon State University, as well as operating her own consulting and training business. She loves to help people build their data super powers in the R programming language. She currently lives in Corvallis, Oregon, but originally hails from New Zealand.

    Dr. Greg Wilson is a programmer and educator based in Toronto, Ontario, and was the co-founder and first Executive Director of Software Carpentry. A member of the Python Software Foundation, Greg has written or edited over a dozen books and received ACM SIGSOFT's Influential Educator Award in 2020.