Data Scientists are experts at analyzing, modelling and visualizing data but, at one point or another, have all encountered difficulties in collaborating with or delivering their work to the people and systems that matter. Born out of the agile software movement, DevOps is a set of practices, principles and tools that help software engineers reliably deploy work to production. This book takes the lessons of DevOps and aplies them to creating and delivering production-grade data science projects in Python and R.
This book’s first section explores how to build data science projects that deploy to production with no frills or fuss. Its second section covers the rudiments of administering a server, including Linux, application, and network administration before concluding with a demystification of the concerns of enterprise IT/Administration in its final section, making it possible for data scientists to communicate and collaborate with their organization’s security, networking, and administration teams.
Key Features:
• Start-to-finish labs take readers through creating projects that meet DevOps best practices and creating a server-based environment to work on and deploy them.
• Provides an appendix of cheatsheets so that readers will never be without the reference they need to remember a Git, Docker, or Command Line command.
• Distills what a data scientist needs to know about Docker, APIs, CI/CD, Linux, DNS, SSL, HTTP, Auth, and more.
• Written specifically to address the concern of a data scientist who wants to take their Python or R work to production.
There are countless books on creating data science work that is correct. This book, on the otherhand, aims to go beyond this, targeted at data scientists who want their work to be than merely accurate and deliver work that matters.
Welcome!
Introduction
I DevOps Lessons for Data Science
Chapter 1 Environments as Code
Chapter 2 Data Project Architecture
Chapter 3 Databases and Data APIs
Chapter 4 Logging and Monitoring
Chapter 5 Deployments and Code Promotion
Chapter 6 Demystifying Docker
II IT/Admin for Data Science
Chapter 7 The Cloud
Chapter 8 The Command Line
Chapter 9 Linux Administration
Chapter 10 Application Administration
Chapter 11 Server Resources and Scaling
Chapter 12 Computer Networks
Chapter 13 Domains and DNS
Chapter 14 SSL/TLS and HTTPS
III Enterprise-grade data science
Chapter 15 Enterprise Networking
Chapter 16 Auth in Enterprise
Chapter 17 Compute at Enterprise Scale
Chapter 18 Package Management in the Enterprise
Appendices
A Technical Detail: Auth Technologies
B Technical Detail: Load Balancers
C Lab Map
D Cheatsheets
Biography
Alex leads the Solutions Engineering team at Posit (formerly RStudio). In that role, he has advised hundreds of organizations of all sizes and levels of sophistication to create production-grade open-source data science environments. Before coming to Posit, he was a data scientist and data science team lead and worked on politics, consulting, and healthcare.