1st Edition
An Introduction to Optimization with Applications in Machine Learning and Data Analytics
The primary goal of this text is a practical one. Equipping students with enough knowledge and creating an independent research platform, the author strives to prepare students for professional careers. Providing students with a marketable skill set requires topics from many areas of optimization. The initial goal of this text is to develop a marketable skill set for mathematics majors as well as for students of engineering, computer science, economics, statistics, and business. Optimization reaches into many different fields.
This text provides a balance where one is needed. Mathematics optimization books are often too heavy on theory without enough applications; texts aimed at business students are often strong on applications, but weak on math. The book represents an attempt at overcoming this imbalance for all students taking such a course.
The book contains many practical applications but also explains the mathematics behind the techniques, including stating definitions and proving theorems. Optimization techniques are at the heart of the first spam filters, are used in self-driving cars, play a great role in machine learning, and can be used in such places as determining a batting order in a Major League Baseball game. Additionally, optimization has seemingly limitless other applications in business and industry. In short, knowledge of this subject offers an individual both a very marketable skill set for a wealth of jobs as well as useful tools for research in many academic disciplines.
Many of the problems rely on using a computer. Microsoft’s Excel is most often used, as this is common in business, but Python and other languages are considered. The consideration of other programming languages permits experienced mathematics and engineering students to use MATLAB® or Mathematica, and the computer science students to write their own programs in Java or Python.
I Preliminary Matters
1 Preamble
1.1Introduction
1.2Software
1.3AboutthisBook
1.3.1Presentation
1.3.2 Contents
1.4 One Semester Course Material
1.5Acknowledgements
2 The Language of Optimization
2.1 Basic Terms Defined
2.2 When a Max or Min is not in the Set
2.3 Solving an Optimization Problem
2.4 Algorithms and Heuristics
2.5RuntimeofanAlgorithmoraHeuristic
2.6ForFurtherStudy
2.7 Keywords
2.8 Exercises
3 Computational Complexity
3.1 Arithmetic Complexity
3.2 Asymptotic Notation
3.3 Intractability
3.4 Complexity Classes
3.4.1Introduction
3.4.2 Time, Space, and Big O Notation
3.4.3 The Complexity Class P
3.4.4 The Complexity Class NP
3.4.5 Utility of Complexity Classes
3.6 Keywords
3.7 Exercises
4 Algebra Review
4.1 Systems of Linear Inequalities in Two Variables - Geometric Solutions
4.1.1 Examples
4.2 Solving Systems of Linear Equations Using Linear Algebra
4.2.1 Gauss-Jordan Elimination
4.2.2 Gaussian Elimination compared with Gauss-Jordan Elimination
4.3.1 Matrices and Their Multiplication
4.3.2 Identity Matrices, Inverses, and Determinants of Matrices
4.3.3 Solving Systems of Linear Equations via Cramer’s Rule
4.3.4 Vector and Matrix Norms
4.3.5 Vector Spaces
4.4 Matrix Properties Important to Optimization
4.4.1 Eigenvalues
4.4.2 Unimodular Matrices
4.5 Keywords
4.6 Exercises
5 Matrix Factorization
5.1 LU Factorization
5.3 Orthogonality
5.6 QR Factorization
5.7 Keywords
5.9 Exercises
II Linear Programming
6 Linear Programming
6.1 A Geometric Approach to Linear Programming in Two Dimen-
sions
6.1.1 Example
6.1.2 Summary
6.1.3 Keywords
6.2 ≤
The Simplex Method: Maximization Problems with Problem Constraints of the form
6.2.1 Introduction
6.2.2 Slack Variables
6.2.3 The Method
6.2.4 Summary
6.2.5 Keywords
6.3 The Dual: Minimization with Problem Constraints of the form≥
6.3.1 How it Works
6.3.2 Why it Works
6.3.3 Keywords
6.4 The Big M Method: Maximum and Minimization Problems ≥ ≤
with Mixed Constraints ( , , or =)
6.4.1 Maximization Problems with the Big M Method
6.4.2 Minimization Problems with the Big M Method
6.5 Summary of Applying the Simplex Method to Linear Program-
ming Problems
6.6 Degeneracy and Cycling in the Simplex Method
6.7 Exercises
7 Sensitivity Analysis
7.1 Motivation
7.2 An Excel Example
7.2.1 Solver’s Answer Report
7.2.2 Solver’s Sensitivity Report
7.2.3 Solver’s Limits Report
7.3 Exercises
8 Integer Linear Programming
8.1 Introduction
8.5 Exercises
III Nonlinear (Geometric) Programming
4 Calculus Review
4.1 Derivatives and Continuity
4.2 Taylor Series for Functions of a Single Variable
4.3 Newton’s Method
4.4 Vectors
4.6 The Taylor Series of a Function of Two Variables
4.7 Exercises
5 A Calculus Approach to Nonlinear Programming
5.1 Using Derivatives to Find Extrema of Functions of a Single Variable
5.2 Calculus and Extrema of Multivariable Functions
5.3 Exercises
6 Constrained Nonlinear Programming: Lagrange Multipliers and the KKT Conditions
6.3 Exercises
7 Optimization involving Quadratic Forms
7.1 Quadratic Forms
7.2 Definite and Semidefinite Matrices and Optimization
7.3 The Role of Eigenvalues in Optimization
7.4 Keywords
7.5 Exercises
8 Iterative Methods
8.1 Newton’s Method for Optimization
8.1.1 Single-Variable Newton’s Method for Optimization
8.1.2 Multivariable Newton’s Method for Optimization
8.2 Steepest Descent (or Gradient Descent)
8.2.1 Generalized Reduced Gradient
8.3 Additional Geometric Programming Techniques
8.4 Exercises
9 Derivative-Free Methods
9.1 The Arithmetic Mean-Geometric Mean Inequality (AGM)
9.2 Weight Finding Algorithm for the AGM
9.3 The AGM, Newton’s Method, and Reduced Gradient Com- pared
9.4 Exercises
10 Search Algorithms
10.2 Ant Foraging Optimization
II Convexity and the Fundamental Theorem of Lin- ear Programming
11 Important Sets for Optimization
11.1 Special Linear Combinations
11.2 Special Sets
11.3 Special Properties of Sets
11.4 Special Objects
11.5 Exercises
12 The Fundamental Theorem of Linear Programming
12.2 The Fundamental Theorem of Linear Programming
12.3 For Further Study
12.4 Exercises
13 Convex Functions
13.1 Convex Functions of a Single Variable
13.2 Concave Functions
13.3 Graphs of Convex and Concave Functions
13.4 Multivariable Convex Functions
13.5 Mathematical Results from Convexity
13.6 Exercises
14 Convex Optimization
Jourdain Lamperski
14.1 Convex optimization and applications
14.2 Duality
14.3 Subgradient descent
14.4 Exercises
III Combinatorial Optimization
15 An Introduction to Combinatorics
15.1 Introduction
15.2 The Basic Tools of Counting
15.2.1 When We Add, Subtract, Multiply or Divide
15.2.2 Permutations and Combinations
15.3 The Binomial Theorem and Binomial Coefficients
15.3.1 Pascal’s Triangle
15.3.2 Binomial Coefficients
15.3.3 The Binomial Theorem
15.3.4 Another Counting Argument
15.3.5 The Multinomial Theorem
15.4 Counting When Objects are Indistinguishable
15.4.1 Permutations with Indistinguishable Objects
15.4.2 Summary of Basic Counting Techniques
15.6 Exercises
16 An Introduction to Graph Theory
16.1 Basic Definitions
16.2 Special Graphs
16.2.1 Empty Graphs and the Trivial Graph
16.2.2 Walks, Trails, Paths, and Cycles
16.2.3 Trees
16.2.4 Complete and Bipartite Graphs
16.3 Vertex and Edge Cuts
16.3.1 Graph Connectivity
16.3.2 Notation for Removing a Vertex or Edge from a Graph
16.3.3 Cut Vertices and Vertex Cuts
16.3.4 Edge Cuts and Bridges
16.4 Some Useful and Interesting Results
16.5 Exercises
17 Network Flows
17.1 Basic Definitions
17.3 The Dinitz-Edmonds-Karp-Ford-Fulkerson Algorithm
17.4 Max Flow as a Linear Programming Problem
17.5 Application to a Major League Baseball Pennant Race
17.6 Exercises
18 Minimum-Weight Spanning Trees and Shortest Paths
18.1 Weighted Graphs and Spanning Trees
18.2 Minimum-Weight Spanning Trees
18.2.1 Kruskal’s Algorithm
18.2.2 Prim’s Method
18.2.3 Kruskal’s and Prim’s Compared
18.3 Shortest Paths
18.3.1 Dijkstra’s Algorithm
18.3.2 A Linear Programming Approach to Shortest Paths
18.4 Exercises
19 Network Modeling and the Transshipment Problem
19.1 Introduction of the Problem
19.2 The Guarantee of Integer Solutions in Network Flow Problems
19.3 Exercises
20 The Traveling Salesperson Problem
20.1 History of the Traveling Salesperson Problem
20.2 The Problem
20.3 Heuristic Solutions
20.3.1 Nearest Neighbor
20.3.2 Insertion Algorithms
20.3.2.1 Closest Insertion
20.3.2.2 Cheapest Insertion
20.3.3 The Geometric Heuristic
20.3.4 2-Opt and 3-Opt
20.3.5 Christophides Algorithm
20.4 Concorde
20.5 For Further Study
20.6 Exercises
IV Optimization for Data Analytics and Machine Learning
21 Probability
21.1 Introduction
21.2 Set Theory
21.2.1 The Vocabulary of Sets and Sample Spaces
21.2.2 The Algebra of Sets
21.3 Foundations of Probability
21.3.1 Borel Sets
21.3.2 The Axioms and Basic Properties of Probability
21.4.1 Naive Probability
21.4.2 Conditional Probability
21.4.3 Bayes’ Theorem
21.4.3.1 Bayesian Spam Filters
21.4.4 Independence
21.5 Random Variables and Distributions
21.5.1 Random Variables
21.5.2 Probability Mass and Probability Density Functions
21.5.2.1 Expectation and Variance of a Discrete Random Variable
21.5.3 Some Discrete Random Variable Probability Distributions
21.5.3.1 The Binomial Distribution
21.5.3.2 The Geometric Distribution
21.5.3.3 The Negative Binomial Distribution
21.5.3.4 The Hypergeometric Distribution
21.6 Exercises
22 Regression Analysis via Least Squares
John McKay and Suren Jayasuria
22.1 Introduction
22.2 Formulation
22.3 Linear Least Squares
22.3.1 Pseudo-Inverse
22.3.2 Brief Discussion of Probabilistic Interpretation
22.4 Regularized Linear Least Squares
23 Forecasting
Joseph “Nico” Gabriel
23.1 Smoothing
23.1.1 Exponential Smoothing
23.1.2 Trends
23.1.3 Seasonality
23.2 Stationary Data and Differencing
23.2.1 Autocorrelation
23.3 ARIMA models
23.3.1 Autoregressive models
23.3.2 Moving Average models
23.3.3 ARIMA model structure
23.4 Partial
23.4.1 Goodness-of-fit metrics
23.5 Exercises
24 Introduction to Machine Learning
Suren Jayasuria and John McKay
24.1 Introduction
24.2 Nearest Neighbors
24.4 Neural Networks
24.4.1 Artificial Neural Networks
24.4.2 Exercises
A Techniques of Proof
A.1 Introduction to Propositional Logic
A.2 Direct Proofs
A.3 Indirect Proofs
A.3.1 Proving the Contrapositive
A.3.2 Proof by Contradiction
A.4 The Principle of Mathematical Induction
A.5 Exercises
B Useful Tools from Analysis and Topology
B.1 Definitions
B.2 Useful Theorems
Bibliography
Notation
Biography
Jeffrey Paul Wheeler earned his PhD in Combinatorial Number Theory from the University of Memphis by extending what had been a conjecture of Erdős on the integers to finite groups. He has published, given talks at numerous schools, and twice been a guest of Trinity College at the University of Cambridge. He has taught mathematics at Miami University (Ohio), the University of Tennessee-Knoxville, the University of Memphis, Rhodes College, the University of Pittsburgh, Carnegie Mellon University, and Duquesne University. He has received numerous teaching awards and is currently in the Department of Mathematics at the University of Pittsburgh. He also occasionally teaches for Pitt’s Computer Science Department and the College of Business Administration. Dr. Wheeler’s Optimization course was one of the original thirty to participate in the Mathematical Association of America’s NSF-funded PIC Math program.