1st Edition

An Introduction to Optimization with Applications in Machine Learning and Data Analytics

By Jeffrey Paul Wheeler Copyright 2024
    473 Pages 94 B/W Illustrations
    by Chapman & Hall

    473 Pages 94 B/W Illustrations
    by Chapman & Hall

    The primary goal of this text is a practical one. Equipping students with enough knowledge and creating an independent research platform, the author strives to prepare students for professional careers. Providing students with a marketable skill set requires topics from many areas of optimization. The initial goal of this text is to develop a marketable skill set for mathematics majors as well as for students of engineering, computer science, economics, statistics, and business. Optimization reaches into many different fields.

    This text provides a balance where one is needed. Mathematics optimization books are often too heavy on theory without enough applications; texts aimed at business students are often strong on applications, but weak on math. The book represents an attempt at overcoming this imbalance for all students taking such a course.

    The book contains many practical applications but also explains the mathematics behind the techniques, including stating definitions and proving theorems. Optimization techniques are at the heart of the first spam filters, are used in self-driving cars, play a great role in machine learning, and can be used in such places as determining a batting order in a Major League Baseball game. Additionally, optimization has seemingly limitless other applications in business and industry. In short, knowledge of this subject offers an individual both a very marketable skill set for a wealth of jobs as well as useful tools for research in many academic disciplines.

    Many of the problems rely on using a computer. Microsoft’s Excel is most often used, as this is common in business, but Python and other languages are considered. The consideration of other programming languages permits experienced mathematics and engineering students to use MATLAB® or Mathematica, and the computer science students to write their own programs in Java or Python.

    I Preliminary Matters                                                             

    1  Preamble                                                                                                               





    1.3.2 Contents                                                                                   

          1.4 One Semester Course Material        


    2 The Language of Optimization                                                               

    2.1 Basic Terms Defined                                                                    
    2.2 When a Max or Min is not in the Set                                            

    2.3 Solving an Optimization Problem                                                   

    2.4 Algorithms and Heuristics                                                             



    2.7 Keywords                                                                                    

    2.8 Exercises                                                                                     

    3 Computational Complexity                                                                        

    3.1 Arithmetic Complexity                                                               

    3.2 Asymptotic Notation                                                                  

    3.3 Intractability

    3.4 Complexity Classes                                                                    


    3.4.2        Time, Space, and Big O Notation

    3.4.3        The Complexity Class P

    3.4.4        The Complexity Class NP

    3.4.5        Utility of Complexity Classes

    3.5       For Further Study

    3.6       Keywords

    3.7       Exercises

    4      Algebra Review                                                                                                      

    4.1       Systems of Linear Inequalities in Two Variables - Geometric Solutions

    4.1.1        Examples

    4.2       Solving Systems of Linear Equations Using Linear Algebra

    4.2.1        Gauss-Jordan Elimination

    4.2.2        Gaussian Elimination compared with Gauss-Jordan Elimination

    4.3       Linear Algebra Basics

    4.3.1        Matrices and Their Multiplication

    4.3.2        Identity Matrices, Inverses, and Determinants of Matrices

    4.3.3        Solving Systems of Linear Equations via Cramer’s Rule       

    4.3.4        Vector and Matrix Norms

    4.3.5        Vector Spaces

    4.4       Matrix Properties Important to Optimization

    4.4.1        Eigenvalues

    4.4.2        Unimodular Matrices

    4.5       Keywords

    4.6       Exercises

    5       Matrix Factorization                                                                                         

    5.1       LU Factorization

    5.2       Cholesky Decomposition

    5.3       Orthogonality

    5.4       Orthonormal Matrices

    5.5       The Gram-Schmidt Process

    5.6       QR Factorization

    5.7       Keywords

    5.8       For Further Study

    5.9       Exercises

    II          Linear Programming                                                        

    6      Linear Programming                                                                                         

    6.1       A Geometric Approach to Linear Programming in Two Dimen-


    6.1.1        Example

    6.1.2        Summary

    6.1.3        Keywords

    6.2      ≤

      The Simplex Method: Maximization Problems with Problem Constraints of the form

    6.2.1        Introduction

    6.2.2        Slack Variables

    6.2.3        The Method

    6.2.4        Summary

    6.2.5        Keywords

    6.3       The Dual: Minimization with Problem Constraints of the form

    6.3.1        How it Works

    6.3.2        Why it Works

    6.3.3        Keywords

    6.4       The Big M Method: Maximum and Minimization Problems ≥ ≤

      with Mixed Constraints ( , , or =)

    6.4.1        Maximization Problems with the Big M Method

    6.4.2        Minimization Problems with the Big M Method

    6.5       Summary of Applying the Simplex Method to Linear Program-

    ming Problems

    6.6       Degeneracy and Cycling in the Simplex Method

    6.7       Exercises

    7       Sensitivity Analysis                                                                                         

    7.1       Motivation

    7.2       An Excel Example

    7.2.1        Solver’s Answer Report

    7.2.2        Solver’s Sensitivity Report

    7.2.3        Solver’s Limits Report

    7.3       Exercises

    8      Integer Linear Programming                                                                     

    8.1       Introduction

    8.2       Dakin’s Branch and Bound

    8.3       Gomory Cut-Planes

    8.4       For Further Study

    8.5       Exercises

    III Nonlinear (Geometric) Programming

    4      Calculus Review                                                                                                  

    4.1       Derivatives and Continuity

    4.2       Taylor Series for Functions of a Single Variable

    4.3       Newton’s Method

    4.4       Vectors

    4.5       Partial Derivatives

    4.6       The Taylor Series of a Function of Two Variables

    4.7       Exercises

    5       A Calculus Approach to Nonlinear Programming                       

    5.1       Using Derivatives to Find Extrema of Functions of a Single Variable

    5.2       Calculus and Extrema of Multivariable Functions

    5.3       Exercises

    6      Constrained Nonlinear Programming: Lagrange Multipliers and the KKT Conditions                                                                             

    6.1       Lagrange Multipliers

    6.2       The KKT Conditions

    6.3       Exercises

    7       Optimization involving Quadratic Forms                                           

    7.1       Quadratic Forms

    7.2       Definite and Semidefinite Matrices and Optimization

    7.3       The Role of Eigenvalues in Optimization

    7.4       Keywords

    7.5       Exercises

    8      Iterative Methods                                                                                              

    8.1       Newton’s Method for Optimization

    8.1.1        Single-Variable Newton’s Method for Optimization

    8.1.2        Multivariable Newton’s Method for Optimization

    8.2       Steepest Descent (or Gradient Descent)

    8.2.1        Generalized Reduced Gradient

    8.3       Additional Geometric Programming Techniques

    8.4       Exercises

    9      Derivative-Free Methods                                                                               

    9.1       The Arithmetic Mean-Geometric Mean Inequality (AGM)

    9.2       Weight Finding Algorithm for the AGM

    9.3       The AGM, Newton’s Method, and Reduced Gradient Com- pared

    9.4       Exercises

    10   Search Algorithms                                                                                            

    10.1    Evolutionary Algorithms

    10.2    Ant Foraging Optimization

    II    Convexity and the Fundamental Theorem of Lin- ear Programming                                                                       

    11     Important Sets for Optimization                                                             

    11.1    Special Linear Combinations

    11.2    Special Sets

    11.3    Special Properties of Sets

    11.4    Special Objects

    11.5    Exercises

    12    The Fundamental Theorem of Linear Programming                  

    12.1    The Finite Basis Theorem

    12.2    The Fundamental Theorem of Linear Programming

    12.3    For Further Study

    12.4    Exercises

    13    Convex Functions                                                                                             

    13.1    Convex Functions of a Single Variable

    13.2    Concave Functions

    13.3    Graphs of Convex and Concave Functions

    13.4    Multivariable Convex Functions

    13.5    Mathematical Results from Convexity

    13.6    Exercises

    14    Convex Optimization                                                                                      

    Jourdain Lamperski

    14.1    Convex optimization and applications

    14.2    Duality

    14.3    Subgradient descent

    14.4    Exercises

    III      Combinatorial Optimization                                           

    15    An Introduction to Combinatorics                                                         

    15.1    Introduction

    15.2    The Basic Tools of Counting

    15.2.1    When We Add, Subtract, Multiply or Divide

    15.2.2    Permutations and Combinations

    15.3    The Binomial Theorem and Binomial Coefficients

    15.3.1    Pascal’s Triangle

    15.3.2    Binomial Coefficients

    15.3.3    The Binomial Theorem

    15.3.4    Another Counting Argument

    15.3.5    The Multinomial Theorem

    15.4    Counting When Objects are Indistinguishable

    15.4.1    Permutations with Indistinguishable Objects

    15.4.2    Summary of Basic Counting Techniques

    15.5    The Pigeonhole Principle

    15.6    Exercises

    16    An Introduction to Graph Theory                                                          

    16.1    Basic Definitions

    16.2    Special Graphs

    16.2.1    Empty Graphs and the Trivial Graph

    16.2.2    Walks, Trails, Paths, and Cycles

    16.2.3    Trees

    16.2.4    Complete and Bipartite Graphs

    16.3    Vertex and Edge Cuts

    16.3.1    Graph Connectivity

    16.3.2    Notation for Removing a Vertex or Edge from a Graph     

    16.3.3    Cut Vertices and Vertex Cuts

    16.3.4    Edge Cuts and Bridges

    16.4    Some Useful and Interesting Results

    16.5    Exercises

    17    Network Flows                                                                                                    

    17.1    Basic Definitions

    17.2    Maximum Flows and Cuts

    17.3    The Dinitz-Edmonds-Karp-Ford-Fulkerson Algorithm

    17.4    Max Flow as a Linear Programming Problem

    17.5    Application to a Major League Baseball Pennant Race

    17.6    Exercises

    18   Minimum-Weight Spanning Trees and Shortest Paths               

    18.1    Weighted Graphs and Spanning Trees

    18.2    Minimum-Weight Spanning Trees

    18.2.1    Kruskal’s Algorithm

    18.2.2    Prim’s Method

    18.2.3    Kruskal’s and Prim’s Compared

    18.3    Shortest Paths

    18.3.1    Dijkstra’s Algorithm

    18.3.2    A Linear Programming Approach to Shortest Paths

    18.4    Exercises

    19    Network Modeling and the Transshipment Problem                  

    19.1    Introduction of the Problem

    19.2    The Guarantee of Integer Solutions in Network Flow Problems     

    19.3    Exercises

    20   The Traveling Salesperson Problem                                                       

    20.1    History of the Traveling Salesperson Problem

    20.2    The Problem

    20.3    Heuristic Solutions

    20.3.1    Nearest Neighbor

    20.3.2    Insertion Algorithms      Closest Insertion      Cheapest Insertion

    20.3.3    The Geometric Heuristic

    20.3.4    2-Opt and 3-Opt

    20.3.5    Christophides Algorithm

    20.4    Concorde

    20.5    For Further Study

    20.6    Exercises

    IV Optimization for Data Analytics and Machine Learning                                                                                

    21    Probability                                                                                                             

    21.1    Introduction

    21.2    Set Theory

    21.2.1    The Vocabulary of Sets and Sample Spaces

    21.2.2    The Algebra of Sets

    21.3    Foundations of Probability

    21.3.1    Borel Sets

    21.3.2    The Axioms and Basic Properties of Probability

    21.4    Conditional Probability

    21.4.1    Naive Probability

    21.4.2    Conditional Probability

    21.4.3    Bayes’ Theorem      Bayesian Spam Filters

    21.4.4    Independence

    21.5    Random Variables and Distributions

    21.5.1    Random Variables

    21.5.2    Probability Mass and Probability Density Functions      Expectation and Variance of a Discrete Random Variable

    21.5.3    Some Discrete Random Variable Probability Distributions      The Binomial Distribution      The Geometric Distribution      The Negative Binomial Distribution      The Hypergeometric Distribution

    21.6    Exercises

    22   Regression Analysis via Least Squares                                                 

    John McKay and Suren Jayasuria

    22.1    Introduction

    22.2    Formulation

    22.3    Linear Least Squares

    22.3.1    Pseudo-Inverse

    22.3.2    Brief Discussion of Probabilistic Interpretation

    22.4    Regularized Linear Least Squares

    23   Forecasting                                                                                                           

    Joseph “Nico” Gabriel

    23.1    Smoothing

    23.1.1    Exponential Smoothing

    23.1.2    Trends

    23.1.3    Seasonality

    23.2    Stationary Data and Differencing

    23.2.1    Autocorrelation

    23.3    ARIMA models

    23.3.1    Autoregressive models

    23.3.2    Moving Average models

    23.3.3    ARIMA model structure

    23.4    Partial

    23.4.1    Goodness-of-fit metrics

    23.5    Exercises

    24   Introduction to Machine Learning                                                         

    Suren Jayasuria and John McKay

    24.1    Introduction

    24.2    Nearest Neighbors

    24.3    Support Vector Machines

    24.4    Neural Networks

    24.4.1    Artificial Neural Networks

    24.4.2    Exercises

    A Techniques of Proof                                                                                          

    A.1      Introduction to Propositional Logic

    A.2      Direct Proofs

    A.3      Indirect Proofs

    A.3.1      Proving the Contrapositive

    A.3.2      Proof by Contradiction

    A.4      The Principle of Mathematical Induction

    A.5      Exercises

    B     Useful Tools from Analysis and Topology                                          

    B.1      Definitions

    B.2      Useful Theorems





    List of Notation


    Jeffrey Paul Wheeler earned his PhD in Combinatorial Number Theory from the University of Memphis by extending what had been a conjecture of Erdős on the integers to finite groups. He has published, given talks at numerous schools, and twice been a guest of Trinity College at the University of Cambridge. He has taught mathematics at Miami University (Ohio), the University of Tennessee-Knoxville, the University of Memphis, Rhodes College, the University of Pittsburgh, Carnegie Mellon University, and Duquesne University. He has received numerous teaching awards and is currently in the Department of Mathematics at the University of Pittsburgh. He also occasionally teaches for Pitt’s Computer Science Department and the College of Business Administration. Dr. Wheeler’s Optimization course was one of the original thirty to participate in the Mathematical Association of America’s NSF-funded PIC Math program.