1st Edition

A Computational Approach to Statistical Learning

    374 Pages
    by Chapman & Hall

    376 Pages
    by Chapman & Hall

    A Computational Approach to Statistical Learning gives a novel introduction to predictive modeling by focusing on the algorithmic and numeric motivations behind popular statistical methods. The text contains annotated code to over 80 original reference functions. These functions provide minimal working implementations of common statistical learning algorithms. Every chapter concludes with a fully worked out application that illustrates predictive modeling tasks using a real-world dataset.





    The text begins with a detailed analysis of linear models and ordinary least squares. Subsequent chapters explore extensions such as ridge regression, generalized linear models, and additive models. The second half focuses on the use of general-purpose algorithms for convex optimization and their application to tasks in statistical learning. Models covered include the elastic net, dense neural networks, convolutional neural networks (CNNs), and spectral clustering. A unifying theme throughout the text is the use of optimization theory in the description of predictive models, with a particular focus on the singular value decomposition (SVD). Through this theme, the computational approach motivates and clarifies the relationships between various predictive models.





    Taylor Arnold is an assistant professor of statistics at the University of Richmond. His work at the intersection of computer vision, natural language processing, and digital humanities has been supported by multiple grants from the National Endowment for the Humanities (NEH) and the American Council of Learned Societies (ACLS). His first book, Humanities Data in R, was published in 2015.





    Michael Kane is an assistant professor of biostatistics at Yale University. He is the recipient of grants from the National Institutes of Health (NIH), DARPA, and the Bill and Melinda Gates Foundation. His R package bigmemory won the Chamber's prize for statistical software in 2010.





    Bryan Lewis is an applied mathematician and author of many popular R packages, including irlba, doRedis, and threejs.



    Matrix Methods. Direct solutions to linear systems. Iterative linear model solutions. Iteratively reweighted least squares. Blockwise techniques. Convex optimization. Quasi-Newton and gradient descent. Interior point method. Proximal algorithms. Coordinate descent. Active sets and path solutions. Other techniques. Expectation maximization. Model featurization. Neighborhood prediction. Spectral learning. Stochastic techniques.

    Biography

    Taylor Arnold is an assistant professor of statistics at the University of Richmond. His work at the intersection of computer vision, natural language processing, and digital humanities has been supported by multiple grants from the National Endowment for the Humanities (NEH) and the American Council of Learned Societies (ACLS). His first book, Humanities Data in R, was published in 2015.



    Michael Kane is an assistant professor of biostatistics at Yale University. He is the recipient of grants from the National Institutes of Health (NIH), DARPA, and the Bill and Melinda Gates Foundation. His R package bigmemory won the Chamber's prize for statistical software in 2010.





    Bryan Lewis is an applied mathematician and author of many popular R packages, including irlba, doRedis, and threejs.

    "As best as I can determine, ‘A Computational Approach to Statistical Learning’ (CASL) is unique among R books devoted to statistical learning and data science. Other popular texts…cover much of the same ground, and include extensive R code implementing statistical models. What makes CASL different is the unifying mathematical structure underlying the presentation and the focus on the computations themselves…CASL’s great strengths are the use linear algebra to provide a coherent, unifying mathematical framework for explaining a wide class of models, a lucid writing style that appeals to geometric intuition, clear explanations of many details that are mostly glossed over in more superficial treatments, the inclusion of historical references, and R code that is tightly integrated into the text. The R code is extensive, concise without being opaque, and in many cases, elegant. The code illustrates R’s advantages for developing statistical algorithms as well as its power to present versatile and compelling visualizations…CASL ought to appeal to anyone working in data science or machine learning seeking a sophisticated understanding of both the theoretical basis and efficient algorithms underlying a modern approach to computational statistics."
    ~Joe Rickert, RStudio

    "Machine learning books tend to come in three types: those that focus on theory and the underlying mathematics, those that develop well-known algorithms ‘from scratch’ to illustrate how they work, and those that take a ‘hands-on’approach and apply methods from standard libraries to real data. This book has the perfect balance of all three... The book is very well written and suitable for both self-study and as a course text."

    Stanley E. Lazic, Prioris.ai Inc, Canada, Royal Statistical Society, Series A Statistics in Society, July 2021.