The Policy is mapping from states to actions that defines the agent's behaviour, formmally defined as \(a = \pi(s)\). The policy can also be stochastic:

\[\pi(a|s) = \mathbb{P}[A_t = a | S_t = s]\]

The Policy is mapping from states to actions that defines the agent's behaviour, formmally defined as \(a = \pi(s)\). The policy can also be stochastic:

\[\pi(a|s) = \mathbb{P}[A_t = a | S_t = s]\]