1st Edition

A Tour of Data Science Learn R and Python in Parallel

By Nailong Zhang Copyright 2021
    216 Pages 25 B/W Illustrations
    by Chapman & Hall

    216 Pages 25 B/W Illustrations
    by Chapman & Hall

    216 Pages 25 B/W Illustrations
    by Chapman & Hall

    A Tour of Data Science: Learn R and Python in Parallel covers the fundamentals of data science, including programming, statistics, optimization, and machine learning in a single short book. It does not cover everything, but rather, teaches the key concepts and topics in Data Science. It also covers two of the most popular programming languages used in Data Science, R and Python, in one source.

    Key features:

    • Allows you to learn R and Python in parallel
    • Cover statistics, programming, optimization and predictive modelling, and the popular data manipulation tools – data.table and pandas
    • Provides a concise and accessible presentation
    • Includes machine learning algorithms implemented from scratch, linear regression, lasso, ridge, logistic regression, gradient boosting trees, etc.

    Appealing to data scientists, statisticians, quantitative analysts, and others who want to learn programming with R and Python from a data science perspective.

    Assumptions about the reader’s background
    Book overview 

    Introduction to R/Python Programming 

    Variable and Type
    Control flows
    Some built-in data structures 
    Revisit of variables 
    Object-oriented programming (OOP) in R/Python 

    More on R/Python Programming 
    Work with R/Python scripts 
    Debugging in R/Python 
    Embarrassingly parallelism in R/Python 
    Evaluation strategy
    Speed up with C/C++ in R/Python
    A first impression of functional programming Miscellaneous 

    data.table and pandas
    Get started with data.table and pandas 
    Indexing & selecting data 
    Group by 

    Random Variables, Distributions & Linear Regression 
    A refresher on distributions 
    Inversion sampling & rejection sampling 
    Joint distribution & copula 
    Fit a distribution 
    Confidence interval
    Hypothesis testing 
    Basics of linear regression 
    Ridge regression 

    Optimization in Practice
    Gradient descent 
    General purpose minimization tools in R/Python 
    Linear programming 

    Machine Learning - A gentle introduction 
    Supervised learning 
    Gradient boosting machine 
    Unsupervised learning 
    Reinforcement learning 
    Deep Q-Networks 
    Computational differentiation 


    Nailong Zhang is lead Data Scientist at Mass Mutual Life Insurance Company.