Iterative algorithms often rely on approximate evaluation techniques, which may include statistical estimation, computer simulation or functional approximation. This volume presents methods for the study of approximate iterative algorithms, providing tools for the derivation of error bounds and convergence rates, and for the optimal design of such algorithms. Techniques of functional analysis are used to derive analytical relationships between approximation methods and convergence properties for general classes of algorithms. This work provides the necessary background in functional analysis and probability theory. Extensive applications to Markov decision processes are presented.

This volume is intended for mathematicians, engineers and computer scientists, who work on learning processes in numerical analysis and are involved with optimization, optimal control, decision analysis and machine learning.

"This is an excellent book on dynamic programming and Markov decision processes. Dynamic programming, invented by the late Richard Bellman, has created a new field of optimality and approximation theory. The author has divided his book into three parts: I: Mathematical background with 8 chapters, II: General theory of approximate iterative algorithms with 3 chapters, and III: Application to Markov decision processes with 6 chapters. […] The author has elaborated the theory in the application to online parameter estimation and exploration schedule."

*Nirode C. Mohanty (Huntington Beach), Zentralblatt MATH 1297-1*

"Many real-life processes and program optimization tasks require approximations for their analysis and execution, as well asbeing recursive and requiring multiple iterations to achieve workable approximations. This rather dense and mathematically beautiful text provides the nexcessary background for the construction and development of algorithms to handle such tasks. […] Thorough and mathematically rigorous throughout, the book will be useful to both pure mathematicians and programmers working in diverse fields from error analysis to machine learning."

*2014 Ringgold, Inc., Portland, OR, USA*

**1 Introduction **

**PART I**

**Mathematical background **

**2 Real analysis and linear algebra **

2.1 Definitions and notation

2.1.1 Numbers, sets and vectors

2.1.2 Logical notation

2.1.3 Set algebra

2.1.4 The supremum and infimum

2.1.5 Rounding off

2.1.6 Functions

2.1.7 Sequences and limits

2.1.8 Infinite series

2.1.9 Geometric series

2.1.10 Classes of real valued functions

2.1.11 Graphs

2.1.12 The binomial coefficient

2.1.13 Stirling’s approximation of the factorial

2.1.14 L’Hôpital’s rule

2.1.15 Taylor’s theorem

2.1.16 The lp norm

2.1.17 Power means

2.2 Equivalence relationships

2.3 Linear algebra

2.3.1 Matrices

2.3.2 Eigenvalues and spectral decomposition

2.3.3 Symmetric, Hermitian and positive definite matrices

2.3.4 Positive matrices

2.3.5 Stochastic matrices

2.3.6 Nonnegative matrices and graph structure

**3 Background – measure theory **

3.1 Topological spaces

3.1.1 Bases of topologies

3.1.2 Metric space topologies

3.2 Measure spaces

3.2.1 Formal construction of measures

3.2.2 Completion of measures

3.2.3 Outer measure

3.2.4 Extension of measures

3.2.5 Counting measure

3.2.6 Lebesgue measure

3.2.7 Borel sets

3.2.8 Dynkin system theorem

3.2.9 Signed measures

3.2.10 Decomposition of measures

3.2.11 Measurable functions

3.3 Integration

3.3.1 Convergence of integrals

3.3.2 Lp spaces

3.3.3 Radon-Nikodym derivative

3.4 Product spaces

3.4.1 Product topologies

3.4.2 Product measures

3.4.3 The Kolmogorov extension theorem

**4 Background – probability theory **

4.1 Probability measures – basic properties

4.2 Moment generating functions (MGF) and cumulant generating functions (CGF)

4.2.1 Moments and cumulants

4.2.2 MGF and CGF of independent sums

4.2.3 Relationship of the CGF to the normal distribution

4.2.4 Probability generating functions

4.3 Conditional distributions

4.4 Martingales

4.4.1 Stopping times

4.5 Some important theorems

4.6 Inequalities for tail probabilities

4.6.1 Chernoff bounds

4.6.2 Chernoff bound for the normal distribution

4.6.3 Chernoff bound for the gamma distribution

4.6.4 Sample means

4.6.5 Some inequalities for bounded random variables

4.7 Stochastic ordering

4.7.1 MGF ordering of the gamma and exponential distribution

4.7.2 Improved bounds based on hazard functions

4.8 Theory of stochastic limits

4.8.1 Covergence of random variables

4.8.2 Convergence of measures

4.8.3 Total variation norm

4.9 Stochastic kernels

4.9.1 Measurability of measure kernels

4.9.2 Continuity of measure kernels

4.10 Convergence of sums

4.11 The law of large numbers

4.12 Extreme value theory

4.13 Maximum likelihood estimation

4.14 Nonparametric estimates of distributions

4.15 Total variation distance for discrete distributions

**5 Background – stochastic processes **

5.1 Counting processes

5.1.1 Renewal processes

5.1.2 Poisson process

5.2 Markov processes

5.2.1 Discrete state spaces

5.2.2 Global properties of Markov chains

5.2.3 General state spaces

5.2.4 Geometric ergodicity

5.2.5 Spectral properties of Markov chains

5.3 Continuous-time Markov chains

5.3.1 Birth and death processes

5.4 Queueing systems

5.4.1 Queueing systems as birth and death processes

5.4.2 Utilization factor

5.4.3 General queueing systems and embedded Markov chains

5.5 Adapted counting processes

5.5.1 Asymptotic behavior

5.5.2 Relationship to adapted events

**6 Functional analysis **

6.1 Metric spaces

6.1.1 Contractive mappings

6.2 The Banach fixed point theorem

6.2.1 Stopping rules for fixed point algorithms

6.3 Vector spaces

6.3.1 Quotient spaces

6.3.2 Basis of a vector space

6.3.3 Operators

6.4 Banach spaces

6.4.1 Banach spaces and completeness

6.4.2 Linear operators

6.5 Norms and norm equivalence

6.5.1 Norm dominance

6.5.2 Equivalence properties of norm equivalence classes

6.6 Quotient spaces and seminorms

6.7 Hilbert spaces

6.8 Examples of Banach spaces

6.8.1 Finite dimensional spaces

6.8.2 Matrix norms and the submultiplicative property

6.8.3 Weighted norms on function spaces

6.8.4 Span seminorms

6.8.5 Operators on span quotient spaces

6.9 Measure kernels as linear operators

6.9.1 The contraction property of stochastic kernels

6.9.2 Stochastic kernels and the span seminorm

**7 Fixed point equations **

7.1 Contraction as a norm equivalence property

7.2 Linear fixed point equations

7.3 The geometric series theorem

7.4 Invariant transformations of fixed point equations

7.5 Fixed point algorithms and the span seminorm

7.5.1 Approximations in the span seminorm

7.5.2 Magnitude of fixed points in the span seminorm

7.6 Stopping rules for fixed point algorithms

7.6.1 Fixed point iteration in the span seminorm

7.7 Perturbations of fixed point equations

**8 The distribution of a maximum **

8.1 General approach

8.2 Bounds on ¯M based on MGFs

8.2.1 Sample means

8.2.2 Gamma distribution

8.3 Bounds for varying marginal distributions

8.3.1 Example

8.4 Tail probabilities of maxima

8.4.1 Extreme value distributions

8.4.2 Tail probabilities based on Boole’s inequality

8.4.3 The normal case

8.4.4 The gamma(α, λ) case

8.5 Variance mixtures based on random sample sizes

8.6 Bounds for maxima based on the first two moments

8.6.1 Stability

**PART II**

**General theory of approximate iterative algorithms **

**9 Background – linear convergence **

9.1 Linear convergence

9.2 Construction of envelopes – the nonstochastic case

9.3 Construction of envelopes – the stochastic case

9.4 A version of l’Hôpital’s rule for series

**10 A general theory of approximate iterative algorithms (AIA) **

10.1 A general tolerance model

10.2 Example: a preliminary model

10.3 Model elements of an AIA

10.3.1 Lipschitz kernels

10.3.2 Lipschitz convolutions

10.4 A classification system for AIAs

10.4.1 Relative error model

10.5 General inequalities

10.5.1 Hilbert space models of AIAs

10.6 Nonexpansive operators

10.6.1 Application of general inequalities to nonexpansive AIAs

10.6.2 Weakly contractive AIAs 216

10.6.3 Examples

10.6.4 Stochastic approximation (Robbins-Monro algorithm)

10.7 Rates of convergence for AIAs

10.7.1 Monotonicity of the Lipschitz kernel

10.7.2 Case I – strongly contractive models with nonvanishing bounds

10.7.3 Case II – rapidly vanishing approximation error

10.7.4 Case III – approximation error decreasing at contraction rate

10.7.5 Case IV – Approximation error greater than contraction rate

10.7.6 Case V – Contraction rates approaching 1

10.7.7 Adjustments for relative error models

10.7.8 A comparison of Banach space and Hilbert space models

10.8 Stochastic approximation as a weakly contractive algorithm

10.9 Tightness of algorithm tolerance

10.10 Finite bounds

10.10.1 Numerical example

10.11 Summary of convergence rates for strongly contractive models

**11 Selection of approximation schedules for coarse-to-fine AIAs **

11.1 Extending the tolerance model

11.1.1 Comparison model for tolerance schedules

11.1.2 Regularity conditions for the computation function

11.2 Main result

11.3 Examples of cost functions

11.4 A general principle for AIAs

**PART III**

**Application to Markov decision processes **

**12 Markov decision processes (MDP) – background **

12.1 Model definition

12.2 The optimal control problem

12.2.1 Adaptive control policies

12.2.2 Optimal control policies

12.3 Dynamic programming and linear operators

12.3.1 The dynamic programming operator (DPO)

12.3.2 Finite horizon dynamic programming

12.3.3 Infinite horizon problem

12.3.4 Classes of MDP

12.3.5 Measurability of the DPO

12.4 Dynamic programming and value iteration

12.4.1 Value iteration and optimality

12.5 Regret and ε-optimal solutions

12.6 Banach space structure of dynamic programming

12.6.1 The contraction property

12.6.2 Contraction properties of the DPO

12.6.3 The equivalence of uniform convergence and contraction for the DPO

12.7 Average cost criterion for MDP

**13 Markov decision processes – value iteration **

13.1 Value iteration on quotient spaces

13.2 Contraction in the span seminorm

13.2.1 Contraction properties of the DPO

13.3 Stopping rules for value iteration

13.4 Value iteration in the span seminorm

13.5 Example: M/D/1/K queueing system

13.6 Efficient calculation of |||QJ |||SP

13.7 Example: M/D/1/K system with optimal control of service capacity

13.8 Policy iteration

13.9 Value iteration for the average cost optimization

**14 Model approximation in dynamic programming – general theory **

14.1 The general inequality for MDPs

14.2 Model distance

14.3 Regret

14.4 A comment on the approximation of regret

14.5 Example

**15 Sampling based approximation methods **

15.1 Modeling maxima

15.1.1 Nonuniform sample allocation: Dependence on qmin, and the ‘Curse of the Supremum Norm’

15.1.2 Some queueing system examples

15.1.3 Truncated geometric model

15.1.4 M/G/1/K queueing model

15.1.5 Restarting schemes

15.2 Continuous state/action spaces

15.3 Parametric estimation of MDP models

**16 Approximate value iteration by truncation **

16.1 Truncation algorithm

16.2 Regularity conditions for tolerance-cost model

16.2.1 Suboptimal orderings

16.3 Example

**17 Grid approximations of MDPs with continuous state/action spaces **

17.1 Discretization methods

17.2 Complexity analysis

17.3 Application of approximation schedules

**18 Adaptive control of MDPs**

18.1 Regret bounds for adaptive policies

18.2 Definition of an adaptive MDP

18.3 Online parameter estimation

18.4 Exploration schedule

Bibliography

Subject index

- COM037000
- COMPUTERS / Machine Theory
- MAT003000
- MATHEMATICS / Applied
- TEC008000
- TECHNOLOGY & ENGINEERING / Electronics / General