# Constrained Markov Decision Processes

## Preview

## Book Description

This book provides a unified approach for the study of constrained Markov decision processes with a finite state space and unbounded costs. Unlike the single controller case considered in many other books, the author considers a single controller with several objectives, such as minimizing delays and loss, probabilities, and maximization of throughputs. It is desirable to design a controller that minimizes one cost objective, subject to inequality constraints on other cost objectives. This framework describes dynamic decision problems arising frequently in many engineering fields. A thorough overview of these applications is presented in the introduction.

The book is then divided into three sections that build upon each other.

The first part explains the theory for the finite state space. The author characterizes the set of achievable expected occupation measures as well as performance vectors, and identifies simple classes of policies among which optimal policies exist. This allows the reduction of the original dynamic into a linear program. A Lagranian approach is then used to derive the dual linear program using dynamic programming techniques.

In the second part, these results are extended to the infinite state space and action spaces. The author provides two frameworks: the case where costs are bounded below and the contracting framework.

The third part builds upon the results of the first two parts and examines asymptotical results of the convergence of both the value and the policies in the time horizon and in the discount factor. Finally, several state truncation algorithms that enable the approximation of the solution of the original control problem via finite linear programs are given.

## Table of Contents

INTRODUCTION

Examples of Constrained Dynamic Control Problems

On Solution Approaches for CMDPs with Expected Costs

Other Types of CMDPs

Cost Criteria and Assumptions

The Convex Analytical Approach and Occupation Measures

Linear Programming and Lagrangian Approach for CMDPs

About the Methodology

The Structure of the Book

PART ONE: FINITE MDPS

MARKOV DECISION PROCESSES

The Model

Cost Criteria and the Constrained Problem

Some Notation

The Dominance of Markov Policies

THE DISCOUNTED COST

Occupation Measure and the Primal LP

Dynamic Programming and Dual LP: the Unconstrained Case

Constrained Control: Lagrangian Approach

The Dual LP

Number of Randomizations

THE EXPECTED AVERAGE COST

Occupation Measure and the Primal LP

Equivalent Linear Program

The Dual Program

Number of Randomizations

FLOW AND SERVICE CONTROL IN A SINGLE-SERVER QUEUE

The Model

The Lagrangian

The Original Constrained Problem

Structure of Randomization and Implementation Issues

On Coordination Between Controllers

Open Questions

PART TWO: INFINITE MDPS

MDPS WITH INFINITE STATE AND ACTION SPACES

The Model

Cost Criteria

Mixed Policies, and Topologic Structures

The Dominance of Markov Policies

Aggregation of States

Extra Randomization in the Policies

Equivalent Quasi-Markov Model and Quasi-Markov Policies

THE TOTAL COST: CLASSIFICATION OF MDPS

Transient and Absorbing MDPs

MDPs With Uniform Lyapunov Functions

Equivalence of MDP With Unbounded and bounded costs

Properties of MDPs With Uniform Lyapunov Functions

Properties for Fixed Initial Distribution

Examples of Uniform Lyapunov Functions

Contracting MDPs

THE TOTAL COST: OCCUPATION MEASURES AND THE PRIMAL LP

Occupation Measure

Continuity of Occupation Measures

More Properties of MDPs

Characterization of Achievable Sets of Occupation Measure

Relation Between Cost and Occupation Measure

Dominating Classes of Policies

Equivalent Linear Program

The Dual Program

THE TOTAL COST: DYNAMIC AND LINEAR PROGRAMMING

Non-Constrained Control: Dynamic and Linear Programming

Superharmonic Functions and Linear Programming

Set of Achievable Costs

Constrained Control: Lagrangian Approach

The Dual LP

State Truncation

A Second LP Approach for Optimal Mixed Policies

More on Unbound Costs

THE DISCOUNTED COST

The Equivalent Total Cost Model

Occupation Measure and LP

Non-negative Immediate Cost

Weak Contracting Assumptions and Lyapunov Functions

Example: Flow and Service Control

THE EXPECTED AVERAGE COST

Occupation Measures

Completeness Properties of Stationary Policies

Relation Between Cost and Occupation Measure

Dominating Classes of Policies

Equivalent Linear Program

The Dual Program

The Contracting Framework

Other Conditions for the Uniform Integrability

The Case of Uniform Lyapunov Conditions

EXPECTED AVERAGE COST: DYNAMIC PROGRAMMING AND LP

The Non-Constrained Case: Optimality Inequality

Non-Constrained Control: Cost Bounded Below

Dynamic Programming and Uniform Lyapunov Function

Super-Harmonic Functions and Linear Programming

Set of Achievable Costs

Constrained Control: Lagrangian Approach

The Dual LP

A Second LP Approach for Optimal Mixed Policies

PART THREE: ASYMPTOTIC METHODS AND APPROXIMATIONS

SENSITIVITY ANALYSIS

Introduction

Approximation of the Values

Approximation and Robustness of the Policies

CONVERGENCE OF DISCOUNTED CONSTRAINED MDPS

Convergence in the Discount Factor

Convergence to the Expected Average Cost

The Case of Uniform Lyapunov Function

CONVERGENCE AS THE HORIZON TENDS TO INFINITY

The Discounted Cost

The Expected Average Cost: Stationary Policies

The Expected Average Cost: General Policies

STATE TRUNCATION AND APPROXIMATION

The Approximating sets of States

Scheme I: the Total Cost

Scheme II: the Total Cost

Scheme III: the Total Cost

The Expected Average Cost

Infinite MDPs: on the Number of Randomizations

APPENDIX: CONVERGENCE OF PROBABILITY MEASURES

REFERENCES

LIST OF SYMBOLS AND NOTATION

INDEX

## Author(s)

### Biography

Altman, Eitan

## Reviews

"…an outstanding addition to the MDPs literature and it complements…".

tstanding addition to the MDPs literature and it complements…".