An environment model is a model that predicts what an RL Environment will do next. As we deal with Markov Decision Processes, we can model the dynamics of the environment my learning the state transitions and reward function. These are formally defined below.

\[P^a_{ss'} = \mathbb{P}[S_{t+1} = s' | S_t = a, A_t = a]\]

\[R^a_s = \mathbb{E}[R_{t+1} | S_t = s, A_t = a]\]