Training and Results

Learn how to train the actor-critic network and visualize the results.

We'll cover the following...

To train the actor-critic network, we apply a loss function that tries to classify expert (observation, action) pairs as 00s and agent (observation, action) pairs as 11s. When the agent learns to generate high quality (observation, action) pairs that resemble the expert, the discriminator will have increasing difficulty distinguishing ...