Fork me on GitHub

bolero.controller.Controller

class bolero.controller.Controller(config=None, environment=None, behavior_search=None, **kwargs)[source]

A controller implements the communication between learning components.

Controllers organize communication between Environment and BehaviorSearch. The code should neither depend on the environment nor on the behavior search algorithm so that we can reuse a controller for as many scenarios as possible.

The controller subsection of the configuration dictionary may contain the following parameters:

  • n_episodes (int) - number of episodes that will be executed by learn()
  • record_inputs (bool) - store control signal trajectories (outputs of behaviors) of each episode in self.inputs_
  • record_outputs (bool) - store outputs of environment (inputs for behaviors) for each episode in self.outputs_
  • accumulate_feedbacks (bool) - log the sum of feedbacks (episode returns a scalar) or all feedbacks
  • record_contexts (bool) - store context vectors of each episode in self.contexts_ (only available for contextual environments)
  • n_episodes_before_test (int) - the upper-level policy will be evaluated after n_episodes_before_test episodes
  • finish_after_convergence (bool) - finish the evaluation after either the environment or the behavior search reports convergence even though the maximum number of episodes might not be reached yet
  • verbose (bool) - print information to stdout
Parameters:
config : dict

Configuration dictionary for the controller. The environment and the behavior search can either be specified in this dictionary or can be passed as arguments. In addition, parameters that configurate the controller can be passed here in the ‘Controller’ subsection.

environment : Environment

Environment in which we will execute behaviors and learn

behavior_search : BehaviorSearch, optional (default: None)

Behavior search that evolves the behavior in the environment

kwargs : dict

Additional controller parameters

__init__(config=None, environment=None, behavior_search=None, **kwargs)[source]
episode(meta_parameter_keys=(), meta_parameters=())[source]

Execute one learning episode.

Parameters:
meta_parameter_keys : array-like, shape = (n_meta_parameters,)

Meta parameter keys

meta_parameters : array-like, shape = (n_meta_parameters,)

Meta parameter values

Returns:
accumulated_feedback : float or array-like, shape = (n_feedbacks,)

Feedback(s) of the episode

episode_with(behavior, meta_parameter_keys=[], meta_parameters=[], record=True)[source]

Execute a behavior in the environment.

Parameters:
behavior : Behavior

Fix behavior

meta_parameter_keys : list, optional (default: [])

Meta parameter keys

meta_parameters : list, optional (default: [])

Meta parameter values

record : bool, optional (default: True)

Record feedbacks or trajectories if activated

Returns:
feedbacks : array, shape (n_steps,)

Feedback for each step in the environment

get_args()

Get parameters for this estimator.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

learn(meta_parameter_keys=(), meta_parameters=())[source]

Learn the behavior.

Parameters:
meta_parameter_keys : list

Meta parameter keys

meta_parameters : list

Meta parameter values

Returns:
feedback_history : array, shape (n_episodes or less, dim_feedback)

Feedbacks for each episode. If is_behavior_learning_done is True before the n_episodes is reached, the length of feedback_history is shorter than n_episodes.

trajectories_

inputs to the environment (outputs of the behavior)