Adversarial Learning and Imitation

Learn about the generative adversarial imitation learning algorithm.

We'll cover the following

Given a set of expert observations (of a driver or a champion video game player), we would like to find a reward function that assigns a high reward to an agent that matches expert behavior and a low reward to agents that do not. At the same time, we want to choose such an agent’s policy, π\pi, under this reward function such that it is as informative as possible by maximizing entropy and preferring expert over non-expert choices. We’ll show how both are achieved through an algorithm known as generative adversarial imitation learningHo, Jonathan, and Stefano Ermon. 2016. “Generative Adversarial Imitation Learning.” ArXiv.org. June 10, 2016. https://doi.org/10.48550/arXiv.1606.03476. (GAIL) published in 2016.

GAIL

In the following, instead of a reward function, we use a cost function to match the conventions used in the referenced literature on this topic, but it is just the negative of the reward function.

Putting these constraints together, we getHo, Jonathan, and Stefano Ermon. 2016. “Generative Adversarial Imitation Learning.” ArXiv.org. June 10, 2016. https://doi.org/10.48550/arXiv.1606.03476.:

Get hands-on with 1400+ tech skills courses.