An Introduction to Optimization with Applications in Machine Learning

Description

The primary goal of this text is a practical one. Equipping students with enough knowledge and creating an independent research platform, the author strives to prepare students for professional careers. Providing students with a marketable skill set requires topics from many areas of optimization. The initial goal of this text is to develop a marketable skill set for mathematics majors as well as for students of engineering, computer science, economics, statistics, and business. Optimization reaches into many different fields.

This text provides a balance where one is needed. Mathematics optimization books are often too heavy on theory without enough applications; texts aimed at business students are often strong on applications, but weak on math. The book represents an attempt at overcoming this imbalance for all students taking such a course.

The book contains many practical applications but also explains the mathematics behind the techniques, including stating definitions and proving theorems. Optimization techniques are at the heart of the first spam filters, are used in self-driving cars, play a great role in machine learning, and can be used in such places as determining a batting order in a Major League Baseball game. Additionally, optimization has seemingly limitless other applications in business and industry. In short, knowledge of this subject offers an individual both a very marketable skill set for a wealth of jobs as well as useful tools for research in many academic disciplines.

Many of the problems rely on using a computer. Microsoft’s Excel is most often used, as this is common in business, but Python and other languages are considered. The consideration of other programming languages permits experienced mathematics and engineering students to use MATLAB® or Mathematica, and the computer science students to write their own programs in Java or Python.

4.1       Derivatives and Continuity

4.2       Taylor Series for Functions of a Single Variable

4.3       Newton’s Method

4.4       Vectors

4.5       Partial Derivatives

4.6       The Taylor Series of a Function of Two Variables

4.7       Exercises

5       A Calculus Approach to Nonlinear Programming

5.1       Using Derivatives to Find Extrema of Functions of a Single Variable

5.2       Calculus and Extrema of Multivariable Functions

5.3       Exercises

6      Constrained Nonlinear Programming: Lagrange Multipliers and the KKT Conditions

6.1       Lagrange Multipliers

6.2       The KKT Conditions

6.3       Exercises

7       Optimization involving Quadratic Forms

7.1       Quadratic Forms

7.2       Definite and Semidefinite Matrices and Optimization

7.3       The Role of Eigenvalues in Optimization

7.4       Keywords

7.5       Exercises

8      Iterative Methods

8.1       Newton’s Method for Optimization

8.1.1        Single-Variable Newton’s Method for Optimization

8.1.2        Multivariable Newton’s Method for Optimization

8.2       Steepest Descent (or Gradient Descent)

8.2.1        Generalized Reduced Gradient

8.3       Additional Geometric Programming Techniques

8.4       Exercises

9      Derivative-Free Methods

9.1       The Arithmetic Mean-Geometric Mean Inequality (AGM)

9.2       Weight Finding Algorithm for the AGM

9.3       The AGM, Newton’s Method, and Reduced Gradient Com- pared

9.4       Exercises

10   Search Algorithms

10.1    Evolutionary Algorithms

10.2    Ant Foraging Optimization

II    Convexity and the Fundamental Theorem of Lin- ear Programming

11     Important Sets for Optimization

11.1    Special Linear Combinations

11.2    Special Sets

11.3    Special Properties of Sets

11.4    Special Objects

11.5    Exercises

12    The Fundamental Theorem of Linear Programming

12.1    The Finite Basis Theorem

12.2    The Fundamental Theorem of Linear Programming

12.3    For Further Study

12.4    Exercises

13    Convex Functions

13.1    Convex Functions of a Single Variable

13.2    Concave Functions

13.3    Graphs of Convex and Concave Functions

13.4    Multivariable Convex Functions

13.5    Mathematical Results from Convexity

13.6    Exercises

14    Convex Optimization

Jourdain Lamperski

14.1    Convex optimization and applications

14.2    Duality

14.3    Subgradient descent

14.4    Exercises

III      Combinatorial Optimization

15    An Introduction to Combinatorics

15.1    Introduction

15.2    The Basic Tools of Counting

15.2.1    When We Add, Subtract, Multiply or Divide

15.2.2    Permutations and Combinations

15.3    The Binomial Theorem and Binomial Coefficients

15.3.1    Pascal’s Triangle

15.3.2    Binomial Coefficients

15.3.3    The Binomial Theorem

15.3.4    Another Counting Argument

15.3.5    The Multinomial Theorem

15.4    Counting When Objects are Indistinguishable

15.4.1    Permutations with Indistinguishable Objects

15.4.2    Summary of Basic Counting Techniques

15.5    The Pigeonhole Principle

15.6    Exercises

16    An Introduction to Graph Theory

16.1    Basic Definitions

16.2    Special Graphs

16.2.1    Empty Graphs and the Trivial Graph

16.2.2    Walks, Trails, Paths, and Cycles

16.2.3    Trees

16.2.4    Complete and Bipartite Graphs

16.3    Vertex and Edge Cuts

16.3.1    Graph Connectivity

16.3.2    Notation for Removing a Vertex or Edge from a Graph

16.3.3    Cut Vertices and Vertex Cuts

16.3.4    Edge Cuts and Bridges

16.4    Some Useful and Interesting Results

16.5    Exercises

17    Network Flows

17.1    Basic Definitions

17.2    Maximum Flows and Cuts

17.3    The Dinitz-Edmonds-Karp-Ford-Fulkerson Algorithm

17.4    Max Flow as a Linear Programming Problem

17.5    Application to a Major League Baseball Pennant Race

17.6    Exercises

18   Minimum-Weight Spanning Trees and Shortest Paths

18.1    Weighted Graphs and Spanning Trees

18.2    Minimum-Weight Spanning Trees

18.2.1    Kruskal’s Algorithm

18.2.2    Prim’s Method

18.2.3    Kruskal’s and Prim’s Compared

18.3    Shortest Paths

18.3.1    Dijkstra’s Algorithm

18.3.2    A Linear Programming Approach to Shortest Paths

18.4    Exercises

19    Network Modeling and the Transshipment Problem

19.1    Introduction of the Problem

19.2    The Guarantee of Integer Solutions in Network Flow Problems

19.3    Exercises

20   The Traveling Salesperson Problem

20.1    History of the Traveling Salesperson Problem

20.2    The Problem

20.3    Heuristic Solutions

20.3.1    Nearest Neighbor

20.3.2    Insertion Algorithms

20.3.2.1      Closest Insertion

20.3.2.2      Cheapest Insertion

20.3.3    The Geometric Heuristic

20.3.4    2-Opt and 3-Opt

20.3.5    Christophides Algorithm

20.4    Concorde

20.5    For Further Study

20.6    Exercises

IV Optimization for Data Analytics and Machine Learning

21    Probability

21.1    Introduction

21.2    Set Theory

21.2.1    The Vocabulary of Sets and Sample Spaces

21.2.2    The Algebra of Sets

21.3    Foundations of Probability

21.3.1    Borel Sets

21.3.2    The Axioms and Basic Properties of Probability

21.4    Conditional Probability

21.4.1    Naive Probability

21.4.2    Conditional Probability

21.4.3    Bayes’ Theorem

21.4.3.1      Bayesian Spam Filters

21.4.4    Independence

21.5    Random Variables and Distributions

21.5.1    Random Variables

21.5.2    Probability Mass and Probability Density Functions

21.5.2.1      Expectation and Variance of a Discrete Random Variable

21.5.3    Some Discrete Random Variable Probability Distributions

21.5.3.1      The Binomial Distribution

21.5.3.2      The Geometric Distribution

21.5.3.3      The Negative Binomial Distribution

21.5.3.4      The Hypergeometric Distribution

21.6    Exercises

22   Regression Analysis via Least Squares

John McKay and Suren Jayasuria

22.1    Introduction

22.2    Formulation

22.3    Linear Least Squares

22.3.1    Pseudo-Inverse

22.3.2    Brief Discussion of Probabilistic Interpretation

22.4    Regularized Linear Least Squares

23   Forecasting

Joseph “Nico” Gabriel

23.1    Smoothing

23.1.1    Exponential Smoothing

23.1.2    Trends

23.1.3    Seasonality

23.2    Stationary Data and Differencing

23.2.1    Autocorrelation

23.3    ARIMA models

23.3.1    Autoregressive models

23.3.2    Moving Average models

23.3.3    ARIMA model structure

23.4    Partial

23.4.1    Goodness-of-fit metrics

23.5    Exercises

24   Introduction to Machine Learning

Suren Jayasuria and John McKay

24.1    Introduction

24.2    Nearest Neighbors

24.3    Support Vector Machines

24.4    Neural Networks

24.4.1    Artificial Neural Networks

24.4.2    Exercises

A Techniques of Proof

A.1      Introduction to Propositional Logic

A.2      Direct Proofs

A.3      Indirect Proofs

A.3.1      Proving the Contrapositive

A.3.2      Proof by Contradiction

A.4      The Principle of Mathematical Induction

A.5      Exercises

B     Useful Tools from Analysis and Topology

B.1      Definitions

B.2      Useful Theorems

Bibliography

Bibliography

Index

Notation

List of Notation

Author(s)

Biography

Jeffrey Paul Wheeler earned his PhD in Combinatorial Number Theory from the University of Memphis by extending what had been a conjecture of Erdős on the integers to finite groups. He has published, given talks at numerous schools, and twice been a guest of Trinity College at the University of Cambridge. He has taught mathematics at Miami University (Ohio), the University of Tennessee-Knoxville, the University of Memphis, Rhodes College, the University of Pittsburgh, Carnegie Mellon University, and Duquesne University. He has received numerous teaching awards and is currently in the Department of Mathematics at the University of Pittsburgh. He also occasionally teaches for Pitt’s Computer Science Department and the College of Business Administration. Dr. Wheeler’s Optimization course was one of the original thirty to participate in the Mathematical Association of America’s NSF-funded PIC Math program.

An Introduction to Optimization with Applications in Machine Learning and Data Analytics

Description

Table of Contents

Author(s)

Biography