Data Science and Analytics with Python: 1st Edition (Paperback) book cover

Data Science and Analytics with Python

1st Edition

By Jesus Rogel-Salazar

Chapman and Hall/CRC

400 pages | 25 B/W Illus.

Purchasing Options:$ = USD
Paperback: 9781498742092
pub: 2017-08-16
$61.95
x
Hardback: 9781138043176
pub: 2017-12-26
$125.00
x
eBook (VitalSource) : 9781315151670
pub: 2018-02-05
from $29.98


FREE Standard Shipping!

Description

Data Science and Analytics with Python is designed for practitioners in data science and data analytics in both academic and business environments. The aim is to present the reader with the main concepts used in data science using tools developed in Python, such as SciKit-learn, Pandas, Numpy, and others. The use of Python is of particular interest, given its recent popularity in the data science community. The book can be used by seasoned programmers and newcomers alike.

The book is organized in a way that individual chapters are sufficiently independent from each other so that the reader is comfortable using the contents as a reference. The book discusses what data science and analytics are, from the point of view of the process and results obtained. Important features of Python are also covered, including a Python primer. The basic elements of machine learning, pattern recognition, and artificial intelligence that underpin the algorithms and implementations used in the rest of the book also appear in the first part of the book.

Regression analysis using Python, clustering techniques, and classification algorithms are covered in the second part of the book. Hierarchical clustering, decision trees, and ensemble techniques are also explored, along with dimensionality reduction techniques and recommendation systems. The support vector machine algorithm and the Kernel trick are discussed in the last part of the book.

About the Author

Dr. Jesús Rogel-Salazar is a Lead Data scientist with experience in the field working for companies such as AKQA, IBM Data Science Studio, Dow Jones and others. He is a visiting researcher at the Department of Physics at Imperial College London, UK and a member of the School of Physics, Astronomy and Mathematics at the University of Hertfordshire, UK, He obtained his doctorate in physics at Imperial College London for work on quantum atom optics and ultra-cold matter. He has held a position as senior lecturer in mathematics as well as a consultant in the financial industry since 2006. He is the author of the book Essential Matlab and Octave, also published by CRC Press. His interests include mathematical modelling, data science, and optimization in a wide range of applications including optics, quantum mechanics, data journalism, and finance.

Reviews

For advanced students and professionals in data science and data analytics, this work provides an excellent introduction to the main concepts of data analytics using tools developed in Python. The popularity and open source nature of Python makes it an excellent choice for developing analytic models using add-on tools such as SciKit-learn, Numpy, and others. The book does not assume a working knowledge of Python and provides a through introductory chapter. The other chapters can be read independently of one another, making the text a valuable resource for readers interested in a specific area of data analytics. The book's design is user-friendly as well; wide margins allow for taking notes while reading. This space also contains summary notes of the material, making it easy to scan for specific concepts. The material covered includes machine learning and pattern recognition, various regression techniques, classification algorithms, decision tree and hierarchical clustering, and dimensionality reduction. Though this text is not recommended for those just getting started with computer programming, it would make an excellent tool for readers who wish to add Python to their programming language repertoire while developing models or analyzing data.

D. B. Mason, Albright College, CHOICE, June 2018

Table of Contents

The Trials and Tribulations of a Data Scientist

Data? Science? Data Science!

The Data Scientist: A Modern Jackalope

Data Science Tools

From Data to Insight: the Data Science Workflow

Python: For Something Completely Different

Why Python? Why not?!

Firsts Slithers with Python

Control Flow

Computation and Data Manipulation

Pandas to the rescue

Plotting and visualising: Matplotlib

The Machine that Goes "Ping": Machine Learning and Pattern Recognition

Recognising Patterns

Artificial Intelligence and Machine Learning

Data is good, but other things are also needed

Learning, Predicting and Classifying

Machine Learning and Data Science

Feature selection

Bias, Variance and Regularisation: A Balancing Act

Some Useful Measures: Distance and Similarity

Beware the Curse of Dimensionality

Scikit-learn is our Friend

Training and Testing

Cross-validation

The Relationship Conundrum: Regression

Relationships between variables: Regression

Multivariate Linear Regression

Ordinary Least Squares

Brain and Body: Regression with one variable

Logarithmic transformation

Making the Task Easier: Standardisation and Scaling

Polynomial Regression

Variance-Bias Trade-Off

Shrinkage: LASSO and Ridge

Jackalopes and Hares: Clustering

Clustering

Clustering with k-means

Summary

Unicorns and Horses: Classification

Classification

Classification with KNN

Classification with Logistic Regression

Classification with Naïve Bayes

Decisions, Decisions: Hierarchical Clustering, Decision Trees and Ensable Techniques

Hierarchical Clustering

Decision Trees

Ensemble Techniques

Ensemble Techniques in Action

Less is More: Dimensionality Reduction

Dimensionality Reduction

Principal Component Analysis

Singular Value Decomposition

Recommendation Systems

Kernel Tricks under the Sleeve: Support Vector Machines

Support Vector Machines and Kernel Methods

Pipelines in Scikit-learn

About the Author

Dr. Jesús Rogel-Salazar is a Lead Data Scientist at IBM Data Science Studio and visiting researcher at the Department of Physics at Imperial College London, UK. He is also a member of the School of Physics, Astronomy and Mathematics at the University of Hertfordshire, UK. He obtained his doctorate in Physics at Imperial College London for work on quantum atom optics and ultra-cold matter. He has held a position as senior lecturer in mathematics as well as a consultant and data scientist in the financial industry since 2006. He is the author of the book “Essential Matlab and Octave”, also published with CRC Press. His interests include mathematical modelling, data science and optimisation in a wide range of applications including optics, quantum mechanics, data journalism and finance. Dr. Jesús Rogel-Salazar is a Lead Data Scientist at IBM Data Science Studio and visiting researcher at the Department of Physics at Imperial College London, UK. He is also a member of the School of Physics, Astronomy and Mathematics at the University of Hertfordshire, UK. He obtained his doctorate in Physics at Imperial College London for work on quantum atom optics and ultra-cold matter. He has held a position as senior lecturer in mathematics as well as a consultant and data scientist in the financial industry since 2006. He is the author of the book “Essential Matlab and Octave”, also published with CRC Press. His interests include mathematical modelling, data science and optimisation in a wide range of applications including optics, quantum mechanics, data journalism and finance.

About the Series

Chapman & Hall/CRC Data Mining and Knowledge Discovery Series

Learn more…

Subject Categories

BISAC Subject Codes/Headings:
COM021030
COMPUTERS / Database Management / Data Mining
COM051010
COMPUTERS / Programming Languages / General
MAT000000
MATHEMATICS / General