1st Edition

Linear Algebra With Machine Learning and Data

By Crista Arangala Copyright 2023
    310 Pages 130 B/W Illustrations
    by Chapman & Hall

    310 Pages 130 B/W Illustrations
    by Chapman & Hall

    This book takes a deep dive into several key linear algebra subjects as they apply to data analytics and data mining. The book offers a case study approach where each case will be grounded in a real-world application.

    This text is meant to be used for a second course in applications of Linear Algebra to Data Analytics, with a supplemental chapter on Decision Trees and their applications in regression analysis. The text can be considered in two different but overlapping general data analytics categories: clustering and interpolation.

    Knowledge of mathematical techniques related to data analytics and exposure to interpretation of results within a data analytics context are particularly valuable for students studying undergraduate mathematics. Each chapter of this text takes the reader through several relevant case studies using real-world data.

    All data sets, as well as Python and R syntax, are provided to the reader through links to Github documentation. Following each chapter is a short exercise set in which students are encouraged to use technology to apply their expanding knowledge of linear algebra as it is applied to data analytics.

    A basic knowledge of the concepts in a first Linear Algebra course is assumed; however, an overview of key concepts is presented in the Introduction and as needed throughout the text.



    1 Graph Theory

    1.1 Basic Terminology

    1.2 The Power of the Adjacency Matrix

    1.3 Eigenvalues and Eigenvectors as Key Players

    1.4 CASE STUDY: Applications in Sport Ranking

    1.5 CASE STUDY: Gerrymandering

    1.6 Exercises

    2. Stochastic Processes

    2.1 Markov Chain Basics

    2.2 Hidden Markov Models

    2.2.1 The Likelihood Problem

    2.2.2 The Decoding Problem

    2.2.3 The Learning Problem

    2.3 CASE STUDY: Spread of Infectious Disease

    2.4 CASE STUDY: Text Analysis and Autocorrect

    2.5 CASE STUDY: Tweets and Time Series

    2.6 Exercises

    3. SVD and PCA

    3.1 Vector and Inner Product Spaces

    3.2 Singular Values

    3.3 Singular Value Decomposition

    3.4 Compression of Data Using Principal Component Analysis (PCA)

    3.5 PCA, Covariance, and Correlation

    3.6 Linear Discriminant Analysis

    3.7 CASE STUDY: Digital Humanities

    3.8 CASE STUDY: Facial Recognition Using PCA and LDA

    3.9 Exercises

    4. Interpolation

    4.1 Lagrange Interpolation

    4.2 Orthogonal Families of Polynomials

    4.3 Newton’s Divided Difference

    4.3.1 Newton’s interpolation via divided difference

    4.3.2 Newton’s interpolation via the Vandermonde matrix

    4.4 Chebyshev interpolation

    4.5 Hermite interpolation

    4.6 Least Squares Regression

    4.7 CASE STUDY : Chebyshev Polynomials and Cryptography

    4.8 CASE STUDY: Racial Disparities in Marijuana Arrests

    4.9 CASE STUDY : Interpolation in Higher Education Data

    4.10 Exercises

    5. Optimization and Learning Techniques for Regression

    5.1 Basics of Probability Theory

    5.2 Introduction to Matrix Calculus

    5.2.1 Matrix Differentiation

    5.2.2 Matrix Integration

    5.3 Maximum Likelihood Estimation

    5.4 Gradient Descent Method

    5.5 Introduction to Neural Networks

    5.5.1 The Learning Process

    5.5.2 Sigmoid Activation Functions

    5.5.3 Radial Activation Functions

    5.6 CASE STUDY: Handwriting Digit Recognition

    5.7 CASE STUDY: Poisson Regression and COVID Counts

    5.8 Exercises

    6 Decision Trees and Random Forests

    6.1 Decision Trees

    6.1.1 Decision Trees Regression

    6.2 Regression Trees

    6.3 Random Decision Trees and Forests

    6.4 CASE STUDY: Entropy of Wordle

    6.5 CASE STUDY : Bird Call Identification

    6.6 Exercises

    7. Random Matrices and Covariance Estimate

    7.1 Introduction to Random Matrices

    7.2 Stability

    7.3 Gaussian Orthogonal Ensemble

    7.4 Gaussian Unitary Ensemble

    7.5 Gaussian Symplectic Ensemble

    7.6 Random Matrices and the Relationship to the Covariance

    7.7 CASE STUDY: Finance and Brownian Motion

    7.8 CASE STUDY: Random Matrices in Gene Interaction

    7.9 Exercises

    8. Sample Solutions to Exercises

    8.1 Chapter 1

    8.2 Chapter 2

    8.3 Chapter 3

    8.4 Chapter 4

    8.5 Chapter 5

    8.6 Chapter 6

    8.7 Chapter 7

    Github Links 349

    Bibliography 351

    Index 355


    Dr. Crista Arangala is Professor of Mathematics and Chair of the Department of Mathematics and Statistics at Elon University in North Carolina. She has been teaching and researching in a variety of fields including inverse problems, applied partial differential equations, applied linear algebra, mathematical modeling and service learning education. She runs a traveling science museum with her Elon University students in Kerala, India. Dr. Arangala was chosen to be a Fulbright Scholar in 2014 as a visiting lecturer at the University of Colombo where she continued her projects in inquiry learning in Linear Algebra and began working with a modeling team focusing on Dengue fever research. Dr. Arangala has published several textbooks that implore inquiry learning techniques including Exploring Linear Algebra: Labs and Projects with MATLAB® and Mathematical Modeling: Branching Beyond Calculus.