A Tour of Data Science: Learn R and Python in Parallel covers the fundamentals of data science, including programming, statistics, optimization, and machine learning in a single short book. It does not cover everything, but rather, teaches the key concepts and topics in Data Science. It also covers two of the most popular programming languages used in Data Science, R and Python, in one source.
- Allows you to learn R and Python in parallel
- Cover statistics, programming, optimization and predictive modelling, and the popular data manipulation tools – data.table and pandas
- Provides a concise and accessible presentation
- Includes machine learning algorithms implemented from scratch, linear regression, lasso, ridge, logistic regression, gradient boosting trees, etc.
Appealing to data scientists, statisticians, quantitative analysts, and others who want to learn programming with R and Python from a data science perspective.
Table of Contents
Assumptions about the reader’s background
Introduction to R/Python Programming
Variable and Type
Some built-in data structures
Revisit of variables
Object-oriented programming (OOP) in R/Python
More on R/Python Programming
Work with R/Python scripts
Debugging in R/Python
Embarrassingly parallelism in R/Python
Speed up with C/C++ in R/Python
A first impression of functional programming Miscellaneous
data.table and pandas
Get started with data.table and pandas
Indexing & selecting data
Random Variables, Distributions & Linear Regression
A refresher on distributions
Inversion sampling & rejection sampling
Joint distribution & copula
Fit a distribution
Basics of linear regression
Optimization in Practice
General purpose minimization tools in R/Python
Machine Learning - A gentle introduction
Gradient boosting machine
Nailong Zhang is lead Data Scientist at Mass Mutual Life Insurance Company.