This book aims to increase the visibility of data science in real-world, which differs from what you learn from a typical textbook. Many aspects of day-to-day data science work are almost absent from conventional statistics, machine learning, and data science curriculum. Yet these activities account for a considerable share of the time and effort for data professionals in the industry. Based on industry experience, this book outlines real-world scenarios and discusses pitfalls that data science practitioners should avoid. It also covers the big data cloud platform and the art of data science, such as soft skills. The authors use R as the primary tool and provide code for both R and Python.
This book is for readers who want to explore possible career paths and eventually become data scientists. This book comprehensively introduces various data science fields, soft and programming skills in data science projects, and potential career paths. Traditional data-related practitioners such as statisticians, business analysts, and data analysts will find this book helpful in expanding their skills for future data science careers. Undergraduate and graduate students from analytics-related areas will find this book beneficial to learn real-world data science applications. Non-mathematical readers will appreciate the reproducibility of the companion R and python codes.
• It covers both technical and soft skills.
• It has a chapter dedicated to the big data cloud environment. For industry applications, the practice of data science is often in such an environment.
• It is hands-on. We provide the data and repeatable R and Python code in notebooks. Readers can repeat the analysis in the book using the data and code provided. We also suggest that readers modify the notebook to perform analyses with their data and problems, if possible. The best way to learn data science is to do it!
1. Introduction 2. Soft Skills for Data Scientists 3. Introduction to The Data 4. Big Data Cloud Platform 5. Data Pre-processing 6. Data Wrangling 7. Model Tuning Strategy 8. Measuring Performance 9. Regression Models 10. Regularization Methods 11. Tree-Based Methods 12. Deep Learning Appendix A. Handling Large Local Data Appendix B. R code for data simulation
"If you want to use Data Science to have a practical impact on businesses (either as a current employee or someone looking to build a career here), this book is an amazing way to get started. "Data Science Practitioner's Guide to Data Science" offers a refreshing perspective. It emphasizes practical skills and real-world problem-solving over theoretical knowledge. This guide covers everything from technical and soft skills, including project management and communication. If you want to elevate your skills and make a meaningful impact, I highly recommend this book."
- Mike Clarke, Director of Product Management, Shopify
"As a data scientist with nearly two decades of experience, I highly recommend this book. Amidst the myriad publications in the constantly evolving field of data science, "Practitioner's Guide to Data Science" distinguishes itself as an indispensable resource for both newcomers and seasoned professionals. The authors adeptly merge the technical aspects of data science with practical guidance on career development and soft skills, resulting in a well-rounded approach to the subject. The book is precise, meticulously organized, and easy to follow. The book encompasses a wide range of topics, from linear regression and deep learning to data imputation and cloud environments. It also thoroughly explores the data science project cycle, including common pitfalls to avoid, ensuring readers are well-prepared to confidently tackle real-world projects. Additionally, the book delves into the data science job family, providing valuable insights into various roles and career trajectories. With its comprehensive approach and emphasis on practical applications, "Practitioner's Guide to Data Science" serves as a very useful guide for anyone aiming to excel in this dynamic field, whether they are learning new concepts or refreshing their knowledge."
- Tianran Li, Director of Data Science, Coupang
"As a 20+ year practitioner with experience building high-performing data science teams, I strongly recommend this book to anyone aspiring to start or grow their career in data science. The readers have practical access to R and Python notebooks to explore independently. At the same time, they can review the data science project cycle and familiarize themselves with common pitfalls. For example, great code alone will not make a successful data scientist, but understanding how to manage the entire project to ensure adoption and business value creation is a differentiating factor. The most common question I get from my mentees is about making choices and tradeoffs as they start and build their careers. In this book, the authors have done a great job discussing the different roles within data science and organizational structures that can help candidates select roles that align best with their strengths and facilitate their career aspirations."
- Elpida Ormanidou, Analytics, and Insights Vice President, PetSmart
"Lin and Li have written an excellent book on data science. As the title implies, it is designed for practitioners, and combines very practical guidance on applications with sample R and Python code, as well as providing theoretical underpinnings of a wide variety of data science methods. Both authors combine solid academic credentials with practical experience in leading data science organizations, such as Google and Amazon. I found Chapters 1 and 2 to be particularly unique for data science books. While most such texts provide some degree of introduction to the topic, in Chapter 1 Lin and Li provide much more depth, for example by discussing the different types of data science roles available in business and industry. Chapter 2, on soft skills needed by data scientists, provides some of the most important information that future data scientists will need, in my opinion. For example, it discusses common mistakes that are made in data science projects, such as poor problem formulation and the use of the wrong data to develop models. While most people tend to think of data quality as a 'data are right' problem, the 'right data' question is just as important, but often overlooked. I strongly recommend this book for those planning careers in data science."
- Roger Hoerl, Associate Professor of Statistics, Union College
"Practitioner's Guide to Data Science" is a comprehensive resource that bridges the gap between theory and practice in data science. Drawing from their extensive industry experience, authors Hui Lin and Ming Li provide invaluable insights into real-world applications, career development, and the importance of soft skills. With hands-on exercises and practical scenarios, this book is an essential read for anyone looking to navigate and excel in the dynamic field of data science."
- Todd Pearson, North America Commercial Data Science and Engineering Lead, Corteva Agriscience