Exploration and Exploitation

about | blog | config | notes | github

A fundamental problem in Reinforcement Learning is whether to leverage exploration in order to find new strategies or states with better Reward, or to exploit existing knowledge of strategies and states to get existing Reward Signal. How do we leverage new and existing experiences to safely explore the environment in order to gain new (better) reward?

Created: 2021-11-13

Emacs 26.1 (Org mode 9.5)