Inverse Reinforcement Learning

Learn about deep Q-learning and inverse reinforcement learning in the context of video games.

While the field of deep learning is independent of reinforcement learning methods such as the QQ-learning algorithm, a powerful combination of these two approaches was applied in training algorithms to play arcade games at near-human levelMnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. “Playing Atari with Deep Reinforcement Learning.” ArXiv.org. 2013. https://arxiv.org/abs/1312.5602..

Deep Q-learning

A major insight in this research was to apply a deep neural network to generate vector representations from the raw pixels of the video game rather than trying to explicitly represent some features of the “state of the game”; this neural network is the QQ-function for this RL algorithm.

Another key development was a technique called experience replay, wherein the history of states (here, pixels from video frames in a game), actions, and rewards is stored in a fixed-length list and re-sampled at random repeatedly, with some stochastic possibility to choose a non-optimal outcome using the epsilon-greedy approach described above. The result is that the value function updates are averaged over many samples of the same data, and correlations between consecutive samples (which could make the algorithm explore only a limited set of the solution space) are broken. Further, this “deep” QQ-learning algorithm is implemented off-policy to avoid the potential circular feedback of generating optimal samples with a jointly optimized policy function.

Get hands-on with 1400+ tech skills courses.