1st Edition

DevOps for Data Science

By Alex Gold Copyright 2024
    282 Pages 38 Color & 1 B/W Illustrations
    by Chapman & Hall

    Data Scientists are experts at analyzing, modelling and visualizing data but, at one point or another, have all encountered difficulties in collaborating with or delivering their work to the people and systems that matter. Born out of the agile software movement, DevOps is a set of practices, principles and tools that help software engineers reliably deploy work to production. This book takes the lessons of DevOps and aplies them to creating and delivering production-grade data science projects in Python and R.

    This book’s first section explores how to build data science projects that deploy to production with no frills or fuss. Its second section covers the rudiments of administering a server, including Linux, application, and network administration before concluding with a demystification of the concerns of enterprise IT/Administration in its final section, making it possible for data scientists to communicate and collaborate with their organization’s security, networking, and administration teams.

    Key Features:

    • Start-to-finish labs take readers through creating projects that meet DevOps best practices and creating a server-based environment to work on and deploy them.
    • Provides an appendix of cheatsheets so that readers will never be without the reference they need to remember a Git, Docker, or Command Line command.
    • Distills what a data scientist needs to know about Docker, APIs, CI/CD, Linux, DNS, SSL, HTTP, Auth, and more.
    • Written specifically to address the concern of a data scientist who wants to take their Python or R work to production.
    There are countless books on creating data science work that is correct. This book, on the otherhand, aims to go beyond this, targeted at data scientists who want their work to be than merely accurate and deliver work that matters.

    Welcome!

    Introduction

    I DevOps Lessons for Data Science


    Chapter 1 Environments as Code
    Chapter 2 Data Project Architecture
    Chapter 3 Databases and Data APIs
    Chapter 4 Logging and Monitoring
    Chapter 5 Deployments and Code Promotion
    Chapter 6 Demystifying Docker

    II IT/Admin for Data Science

    Chapter 7 The Cloud
    Chapter 8 The Command Line
    Chapter 9 Linux Administration
    Chapter 10 Application Administration

    Chapter 11 Server Resources and Scaling
    Chapter 12 Computer Networks
    Chapter 13 Domains and DNS
    Chapter 14 SSL/TLS and HTTPS

    III Enterprise-grade data science

    Chapter 15 Enterprise Networking
    Chapter 16 Auth in Enterprise
    Chapter 17 Compute at Enterprise Scale
    Chapter 18 Package Management in the Enterprise

    Appendices

    A Technical Detail: Auth Technologies
    B Technical Detail: Load Balancers
    C Lab Map
    D Cheatsheets

    Biography

    Alex leads the Solutions Engineering team at Posit (formerly RStudio). In that role, he has advised hundreds of organizations of all sizes and levels of sophistication to create production-grade open-source data science environments. Before coming to Posit, he was a data scientist and data science team lead and worked on politics, consulting, and healthcare.