PhD Thesis: Learning from Delayed Rewards

PhD Thesis PhD Thesis: Learning from Delayed Rewards

The following is a link to my PhD thesis "Learning from Delayed Rewards", Cambridge, 1989. Unfortunately, the original electronic version is long lost, and this version has been scanned in from a photocopy.

Download .pdf version

The thesis introduces the notion of reinforcement learning as learning to control a Markov Decision Process by incremental dynamic programming, and describes a range of algorithms for doing this, including Q-learning, for which a sketch of a proof of convergence is given.

A brief account of how and when and why I wrote it is here Reinforcement Learning: some history