Training and Results

Learn how to train the actor-critic network and visualize the results.

We'll cover the following

To train the actor-critic network, we apply a loss function that tries to classify expert (observation, action) pairs as 00s and agent (observation, action) pairs as 11s. When the agent learns to generate high quality (observation, action) pairs that resemble the expert, the discriminator will have increasing difficulty distinguishing between samples from the agent and expert and will assign agent samples a label of 00 as well:

Get hands-on with 1400+ tech skills courses.