Generative AI with Python and TensorFlow 2/

...

Inverse Reinforcement Learning

Learn about deep Q-learning and inverse reinforcement learning in the context of video games.

We'll cover the following...

Deep Q-learning
Inverse reinforcement learning: Learning from experts

While the field of deep learning is independent of reinforcement learning methods such as the $Q$ -learning algorithm, a powerful combination of these two approaches was applied in training algorithms to play arcade games at near-human levelMnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. “Playing Atari with Deep Reinforcement Learning.” ArXiv.org. 2013. https://arxiv.org/abs/1312.5602..

Deep Q-learning

A major insight in this research was to apply a deep neural network to generate vector representations from the raw pixels of the video game rather than trying to explicitly represent some features of the “state of the game”; this neural network is the $Q$ -function for this RL algorithm.

Another key development was a technique called experience replay, wherein the history of states (here, pixels from video frames in a game), actions, and rewards is stored in a fixed-length list and re-sampled at random repeatedly, with some stochastic possibility to choose a non-optimal outcome using the epsilon-greedy approach described above. The result is that the value function updates are averaged over many samples of the same data, and correlations between consecutive samples (which could make the algorithm explore only a limited set of the solution space) are broken. Further, this “deep” $Q$ -learning algorithm is implemented off-policy to avoid the potential circular feedback of generating optimal samples with a jointly optimized policy function.

Press + to interact

The figure above illustrates an overview of deep $Q$ -learning, where we use a neural network to approximate the $Q$ -value function. The state is given as the input, and the $Q$ -value of all possible actions is generated as the output.

Deep $Q$ -learning is also model-free, in the sense we have no representation or model (such as a generative model that can simulate new frames of the game) of the environment $E$ . In fact, as with the video game example, it could just be samples of historical data that represent the internal state of a game that is observed by a player.

Putting these pieces together, the deep $Q$ -learning algorithm uses the following steps to learn to play Atari gamesMnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. arXiv. https://arxiv.org/abs/1312.5602:

Step 1: Create a list to store samples of (current state, action, reward, next state) as a “replay memory.”
Step 2: Randomly initialize the weights in the neural network representing the $Q$ -function.
Step 3: For a certain number of gameplay sequences, initialize a starting game screen (pixels) and a transformation of this input (such as the last four screens). This “window" of fixed-length history is important because otherwise, the $Q$ -network would need to accommodate arbitrarily sized input (very long or very short sequences of game screens), and this restriction makes it easier to apply a convolutional neural network to the problem.
Step 4: For a certain number of steps (screens) in the game, use epsilon greedy sampling to choose the next action given the current screen and reward function computed through $Q$ .
Step 5: After updating the state, save this transition of (current state, action, reward, action, next state) into the replay memory.
Step 6: Choose random sets of (current state, action, reward, next state) transitions from the replay memory and compute their reward using the $Q$ -function. Use stochastic gradient descent to update $Q$ based on these transitions.
Step 7: Continue steps 3–6 for many games and gameplay steps until the weights in the $Q$ -network converge.

While other applications of $Q$ -learning have nuances tied to their specific domain, the general approach of using a deep neural network to approximate the $Q$ -function on a large space of possible outcomes (rather than a small set of states and actions that can be represented in a table) has proved effective in many cases. Other examples in which deep $Q$ -learning has been applied include:

Processing the positions on a Go (an East Asian ...

Introduction to the Course

An Introduction to Generative AI

Building Blocks of Deep Neural Networks

Teaching Networks to Generate Digits

Painting Pictures with Neural Networks Using VAEs

Recognize Handwritten Digits Using a Deep Neural Network

Image Generation with GANs

Dataset Augmentation with GANs

Style Transfer with GANs

Assessment: Introduction to Generative AI to Style Transfer

Deepfakes with GANs

The Rise of Methods for Text Generation

Exploring OpenAI API

NLP 2.0: Using Transformers to Generate Text

Composing Music with Generative Models

Generating New Music with Artificial Intelligence

Play Video Games with Generative AI: GAIL

Emerging Applications in Generative AI

Assessment: Deepfakes using GANs to Emerging Applications

Conclusion

Appendix

Inverse Reinforcement Learning

Deep Q-learning