A paradigm of Reinforcement Learning that makes use of only a Policy Function and Value Function that does not concern itself with learning the dynamics of the RL Environment with an internal Environment Model.
A paradigm of Reinforcement Learning that makes use of only a Policy Function and Value Function that does not concern itself with learning the dynamics of the RL Environment with an internal Environment Model.