1st Edition

Research Software Engineering with Python
Building software that makes research possible




  • Available for pre-order. Item will ship after July 13, 2021
ISBN 9780367698324
July 13, 2021 Forthcoming by Chapman and Hall/CRC
532 Pages 55 Color Illustrations

USD $59.95

Prices & shipping based on shipping country


Preview

Book Description

Writing and running software is now as much a part of science as telescopes and test tubes, but most researchers are never taught how to do either well. As a result, it takes them longer to accomplish simple tasks than it should, and it is harder for them to share their work with others than it needs to be.

This book introduces the concepts, tools, and skills that researchers need to get more done in less time and with less pain. Based on the practical experiences of its authors, who collectively have spent several decades teaching software skills to scientists, it covers everything graduate-level researchers need to automate their workflows, collaborate with colleagues, ensure that their results are trustworthy, and publish what they have built so that others can build on it. The book assumes only a basic knowledge of Python as a starting point, and shows readers how it, the Unix shell, Git, Make, and related tools can give them more time to focus on the research they actually want to do.

Research Software Engineering with Python can be used as the main text in a one-semester course or for self-guided study. A running example shows how to organize a small research project step by step; over a hundred exercises give readers a chance to practice these skills themselves, while a glossary defining over two hundred terms will help readers find their way through the terminology. All of the material can be re-used under a Creative Commons license, and all royalties from sales of the book will be donated to The Carpentries, an organization that teaches foundational coding and data science skills to researchers worldwide.

Table of Contents

Welcome 
0.1The Big Picture 
0.2 Intended Audience 
0.3 What You Will Learn
0.4 Using this Book
0.5 Contributing and Re-Use 
0.6 Acknowledgments 

Getting Started 
1.1 Project Structure 
1.2 Downloading the Data 
1.3 Installing the Software 
1.4 Summary 
1.5 Exercises 
1.6 Key Points 


The Basics of the Unix Shell 
2.1 Exploring Files and Directories 
2.2 Moving Around 
2.3 Creating New Files and Directories 
2.4 Moving Files and Directories 
2.5 Copying Files and Directories 
2.6 Deleting Files and Directories 
2.7 Wildcards 
2.8 Reading the Manual 
2.9 Summary 
2.10 Exercises 
2.11 Key Points 


Building Tools with the Unix Shell 
3.1 Combining Commands 
3.2 How Pipes Work 
3.3 Repeating Commands on Many Files 
3.4 Variable Names 
3.5 Redoing Things 
3.6 Creating New Filenames Automatically 
3.7 Summary
3.8 Exercises 
3.9 Key Points 

Going Further with the Unix Shell
4.1 Creating New Commands 
4.2 Making Scripts More Versatile 
4.3 Turning Interactive Work into a Script 
4.4 Finding Things in Files 
4.5 Finding Files 
4.6 Configuring the Shell 
4.7 Summary 
4.8 Exercises .
4.9 Key Points 


Building Command-Line Tools with Python

5.1 Programs and Modules
5.2 Handling Command-Line Options 
5.3 Documentation 
5.4 Counting Words 
5.5 Pipelining 
5.6 Positional and Optional Arguments 
5.7 Collating Results 
5.8 Writing Our Own Modules 
5.9 Plotting 
5.10 Summary 
5.11 Exercises 
5.12 Key Points

Using Git at the Command Line 
6.1 Setting Up 
6.2 Creating a New Repository 
6.3 Adding Existing Work 
6.4 Describing Commits 
6.5 Saving and Tracking Changes 
6.6 Synchronizing with Other Repositories 
6.7 Exploring History 
6.8 Restoring Old Versions of Files 
6.9 Ignoring Files
6.10 Summary
6.11 Exercises 
6.12 Key Points 

Going Further with Git 
7.1 What’s a Branch? 
7.2 Creating a Branch 
7.3 What Curve Should We Fit?
7.4 Verifying Zipf’s Law 
7.5 Merging 
7.6 Handling Conflicts 
7.7 A Branch-Based Workflow
7.8 Using Other People’s Work 
7.9 Pull Requests 
7.10 Handling Conflicts in Pull Requests 
7.11 Summary 
7.12 Exercises 
7.13 Key Points 

Working in Teams 
8.1 What is a Project? 
8.2 Include Everyone 
8.3 Establish a Code of Conduct
8.4 Include a License
8.5 Planning
8.6 Bug Reports 
8.7 Labeling Issues 
8.8 Prioritizing 
8.9 Meetings 
8.10 Making Decisions 
8.11 Make All This Obvious to Newcomers 
8.12 Handling Conflict 
8.13 Summary 
8.14 Exercises
8.15 Key Points

Automating Analyses with Make
9.1 Updating a Single File 
9.2 Managing Multiple Files 
9.3 Updating Files When Programs Change 
9.4 Reducing Repetition in a Makefile 
9.5 Automatic Variables 
9.6 Generic Rules 
9.7 Defining Sets of Files 
9.8 Documenting a Makefile 
9.9 Automating Entire Analyses
9.10 Summary 
9.11 Exercises 
9.12 Key Points

Configuring Programs
10.1 Configuration File Formats 
10.2 Matplotlib Configuration 
10.3 The Global Configuration File 
10.4 The User Configuration File 
10.5 Adding Command-Line Options 
10.6 A Job Control File 
10.7 Summary 
10.8 Exercises
10.9 Key Points 

Testing Software
11.1 Assertions 
11.2 Unit Testing 
11.3 Testing Frameworks 
11.4 Testing Floating-Point Values 
11.5 Integration Testing 
11.6 Regression Testing 
11.7 Test Coverage 
11.8 Continuous Integration
11.9 When to Write Tests 
11.10 Summary 
11.11 Exercises
11.12 Key Points 

Handling Errors 
12.1 Exceptions 
12.2 Writing Useful Error Messages
12.3 Testing Error Handling 
12.4 Reporting Errors 
12.5 Summary 
12.6 Exercises 
12.7 Key Points 

Tracking Provenance 
13.1 Data Provenance 
13.2 Code Provenance 
13.3 Summary 
13.4 Exercises 
13.5 Key Points 

Creating Packages with Python 
14.1 Creating a Python Package 
14.2 Virtual Environments 
14.3 Installing a Development Package 
14.4 What Installation Does
14.5 Distributing Packages 
14.6 Documenting Packages 
14.7 Software Journals 
14.8 Summary 
14.9 Exercises 
14.10 Key Points 

Finale 
15.1 Why We Wrote This Book 

Appendix 
A Solutions 
B Learning Objectives 
C Key Points 
D Project Tree 
E Working Remotely 
F Writing Readable Code 
G Documenting Programs 
H YAML
I Anaconda 
J Glossary 
K References 
Index 

...
View More

Author(s)

Biography

Dr. Damien B. Irving is post-doctoral researcher in climate science at the University of New South Wales living in Hobart, Tasmania. With a strong interest in data science education and open/reproducible research, Damien is involved in The Carpentries community as an instructor, lesson author and Regional Coordinator for Australia, is an Associate Editor with the Journal of Open Research Software, and is currently the Global Coordinator for the Research Bazaar, a worldwide festival promoting the digital literacy emerging at the center of modern research.

Dr. Kate L. Hertweck is a scientist and educator who endeavors to uphold core values like diversity/equity/inclusion, accessibility of information, and learning over knowing. They currently lead training and community efforts to support biomedical researchers at Fred Hutchinson Cancer Research Center in Seattle, Washington. Kate is an instructor and trainer for the Carpentries and has also participated in that group's lesson development/maintenance and community governance.

Dr. Luke Johnston is a diabetes epidemiologist working at the Steno Diabetes Center Aarhus in Denmark. He is passionate about educating
researchers on modern computing tools and skills, having taught many Carpentry workshops as well as creating and instructing several intensive courses teaching computing skills and analytic reproducibility to diabetes researchers. When he isn't teaching or doing research, he is building software tools to automate common research workflows and tasks.

Dr. Joel Ostblom is a post-doctoral teaching fellow in the Master's of Data Science program at the University of British Columbia in Vancouver, B.C.
He has co-created or led the development of several courses and workshops at the University of Toronto and the University of British Columbia. Joel cares deeply about spreading data literacy and excitement over programmatic data analysis, which is reflected in his contributions to open source projects and data science learning resources.

Dr. Charlotte Wickham is a data scientist and educator, who teaches in the Statistics Department at Oregon State University, as well as operating her own consulting and training business. She loves to help people build their data super powers in the R programming language. She currently lives in Corvallis, Oregon, but originally hails from New Zealand.

Dr. Greg Wilson is a programmer and educator based in Toronto, Ontario. He was the co-founder and first Executive Director of Software Carpentry,
and currently leads the instructor training program at RStudio PBC. A member of the Python Software Foundation, Greg has written or edited over a dozen books and received ACM SIGSOFT's Influential Educator Award in 2020.