Behavior search algorithms provide new behaviors that can be tested in the environments and learn from the feedback.
The following table gives an overview of the behavior search methods that are provided by BOLeRo.
Behavior search name | Usecase |
---|---|
Black-box Search | Policy search, behavior is considered to be a black box |
Monte Carlo Reinforcement Learning | Reinforcement learning, uses a behavior derived from the value function |
A BlackBoxSearch
combines an Optimizer
and a
BlackBoxBehavior
for direct policy search.
The optimizer does not need to know anything about the behavior except its
number of parameters and the performance in the environment to do direct policy
search.
MonteCarloRL
is a epsilon-soft on-policy Monte Carlo control
algorithm. It is a model-free reinforcement learning method. The
epsilon-greedy policy that is used during the learning process is derived
from the state-action value function Q. Q is estimated from the experience of
previous episodes. This implementation can only handle discrete state and
action spaces.
bolero.behavior_search.MonteCarloRL
¶The following table gives an overview of the contextual behavior search methods that are provided by BOLeRo.
Behavior search name | Usecase |
---|---|
ContextualBlackBoxSearch | Contextual policy search, behavior is considered to be a black box |
A ContextualBlackBoxSearch
combines a
ContextualOptimizer
and a
BlackBoxBehavior
for contextual policy search.
The contextual optimizer does not need to know anything about the behavior
except its number of parameters and the performance in the environment to do
contextual policy search.