Actors and Actor-Critics
Learn about policy search and actor-critic schemes.
We'll cover the following
Policy search
So far, we have focused on finding a value function, and we derived from this the greedy policy as the action that leads to the state with the largest return. The value function can be seen as a critic that adapts the policy. Another approach, especially when using function approximators, is to consider a parameterized policy directly and search for good policy parameters. Such an approach is called an actor. We need to find parameters that maximize the payoff. We illustrated such a setting in this
Get hands-on with 1400+ tech skills courses.