Statistical Reinforcement Learning : Modern Machine Learning Approaches book cover
1st Edition

Statistical Reinforcement Learning
Modern Machine Learning Approaches

ISBN 9781439856895
Published March 16, 2015 by Chapman and Hall/CRC
206 Pages 114 B/W Illustrations

SAVE $21.00
was $105.00
USD $84.00

Prices & shipping based on shipping country


Book Description

Reinforcement learning is a mathematical framework for developing computer agents that can learn an optimal behavior by relating generic reward signals with its past actions. With numerous successful applications in business intelligence, plant control, and gaming, the RL framework is ideal for decision making in unknown environments with large amounts of data.

Supplying an up-to-date and accessible introduction to the field, Statistical Reinforcement Learning: Modern Machine Learning Approaches presents fundamental concepts and practical algorithms of statistical reinforcement learning from the modern machine learning viewpoint. It covers various types of RL approaches, including model-based and model-free approaches, policy iteration, and policy search methods.

  • Covers the range of reinforcement learning algorithms from a modern perspective
  • Lays out the associated optimization problems for each reinforcement learning scenario covered
  • Provides thought-provoking statistical treatment of reinforcement learning algorithms

The book covers approaches recently introduced in the data mining and machine learning fields to provide a systematic bridge between RL and data mining/machine learning researchers. It presents state-of-the-art results, including dimensionality reduction in RL and risk-sensitive RL. Numerous illustrative examples are included to help readers understand the intuition and usefulness of reinforcement learning techniques.

This book is an ideal resource for graduate-level students in computer science and applied statistics programs, as well as researchers and engineers in related fields.

Table of Contents

Introduction to Reinforcement Learning
Reinforcement Learning
Mathematical Formulation
Structure of the Book
     Model-Free Policy Iteration
     Model-Free Policy Search
     Model-Based Reinforcement Learning


Policy Iteration with Value Function Approximation
Value Functions
     State Value Functions
     State-Action Value Functions
Least-Squares Policy Iteration
      Immediate-Reward Regression
     Model Selection

Basis Design for Value Function Approximation
Gaussian Kernels on Graphs
     MDP-Induced Graph
     Ordinary Gaussian Kernels
     Geodesic Gaussian Kernels
     Extension to Continuous State Spaces
     Geodesic Gaussian Kernels
     Ordinary Gaussian Kernels
     Graph-Laplacian Eigenbases
     Diffusion Wavelets
Numerical Examples
     Robot-Arm Control
     Robot-Agent Navigation

Sample Reuse in Policy Iteration
Off-Policy Value Function Approximation
     Episodic Importance Weighting
     Per-Decision Importance Weighting
     Adaptive Per-Decision Importance Weighting
Automatic Selection of Flattening Parameter
     Importance-Weighted Cross-Validation
Sample-Reuse Policy Iteration
Numerical Examples
     Inverted Pendulum
     Mountain Car

Active Learning in Policy Iteration
Efficient Exploration with Active Learning
     Problem Setup
     Decomposition of Generalization Error
     Estimation of Generalization Error
     Designing Sampling Policies
Active Policy Iteration
     Sample-Reuse Policy Iteration with Active Learning
Numerical Examples

Robust Policy Iteration
Robustness and Reliability in Policy Iteration
Least Absolute Policy Iteration
Numerical Examples
Possible Extensions
     Huber Loss
     Pinball Loss
     Deadzone-Linear Loss
     Chebyshev Approximation
     Conditional Value-At-Risk


Direct Policy Search by Gradient Ascent
Gradient Approach
     Gradient Ascent
     Baseline Subtraction for Variance Reduction
     Variance Analysis of Gradient Estimators
Natural Gradient Approach 
     Natural Gradient Ascent
Application in Computer Graphics: Artist Agent
     Sumie Paining 
     Design of States, Actions, and Immediate Rewards
     Experimental Results

Direct Policy Search by Expectation-Maximization
Expectation-Maximization Approach
Sample Reuse
     Episodic Importance Weighting
     Per-Decision Importance Weight
     Adaptive Per-Decision Importance Weighting
     Automatic Selection of Flattening Parameter
     Reward-Weighted Regression with Sample Reuse
Numerical Examples

Policy-Prior Search
Policy Gradients with Parameter-Based Exploration 
     Policy-Prior Gradient Ascent
     Baseline Subtraction for Variance Reduction
     Variance Analysis of Gradient Estimators
     Numerical Examples
Sample Reuse in Policy-Prior Search 
     Importance Weighting
     Variance Reduction by Baseline Subtraction
     Numerical Examples


Transition Model Estimation
Conditional Density Estimation
     Regression-Based Approach
     Q-Neighbor Kernel Density Estimation
     Least-Squares Conditional Density Estimation
Model-Based Reinforcement Learning
Numerical Examples
     Continuous Chain Walk
     Humanoid Robot Control

Dimensionality Reduction for Transition Model Estimation
Sufficient Dimensionality Reduction
Squared-Loss Conditional Entropy
     Conditional Independence
     Dimensionality Reduction with SCE
     Relation to Squared-Loss Mutual Information
Numerical Examples
     Artificial and Benchmark Datasets 
     Humanoid Robot


View More



Masashi Sugiyama received his bachelor, master, and doctor of engineering degrees in computer science from the Tokyo Institute of Technology, Japan. In 2001 he was appointed assistant professor at the Tokyo Institute of Technology and he was promoted to associate professor in 2003. He moved to the University of Tokyo as professor in 2014.

He received an Alexander von Humboldt Foundation Research Fellowship and researched at Fraunhofer Institute, Berlin, Germany, from 2003 to 2004. In 2006, he received a European Commission Program Erasmus Mundus Scholarship and researched at the University of Edinburgh, Scotland. He received the Faculty Award from IBM in 2007 for his contribution to machine learning under non-stationarity, the Nagao Special Researcher Award from the Information Processing Society of Japan in 2011, and the Young Scientists’ Prize from the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology for his contribution to the density-ratio paradigm of machine learning.

His research interests include theories and algorithms of machine learning and data mining, and a wide range of applications such as signal processing, image processing, and robot control. He published Density Ratio Estimation in Machine Learning (Cambridge University Press, 2012) and Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation (MIT Press, 2012).


This book by Prof. Masashi Sugiyama covers the range of reinforcement learning algorithms from a fresh, modern perspective. With a focus on the statistical properties of estimating parameters for reinforcement learning, the book relates a number of different approaches across the gamut of learning scenarios.... It is a contemporary and welcome addition to the rapidly growing machine learning literature. Both beginner students and experienced researchers will find it to be an important source for understanding the latest reinforcement learning techniques.
—Daniel D. Lee, GRASP Laboratory, School of Engineering and Applied Science, University of Pennsylvania