Fork me on GitHub

bolero.optimizer.ContextualOptimizer

class bolero.optimizer.ContextualOptimizer[source]

Common interface for (contextual) optimizers.

This is a simple derivative-free parameter optimizer.

__init__()

x.__init__(…) initializes x; see help(type(x)) for signature

best_policy()[source]

Return current best estimate of contextual policy.

Returns:
policy : UpperLevelPolicy

Best estimate of upper-level policy

get_args()

Get parameters for this estimator.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

get_desired_context()[source]

Chooses desired context for next evaluation.

Returns:
context : ndarray-like, default=None

The context in which the next rollout shall be performed. If None, the environment may select the next context without any preferences.

get_next_parameters(params)[source]

Get next individual/parameter vector for evaluation.

Parameters:
params : array_like, shape (n_params,)

Parameter vector, will be modified

init(n_params, n_context_dims)[source]

Initialize optimizer.

Parameters:
n_params : int

dimension of the parameter vector

n_context_dims : int

number of dimensions of the context space

is_behavior_learning_done()[source]

Check if the optimization is finished.

Returns:
finished : bool

Is the learning of a behavior finished?

set_context(context)[source]

Set context of next evaluation.

Note that the set context need not necessarily be the same that was requested by get_desired_context().

Parameters:
context : array-like, shape (n_context_dims,)

The context in which the next rollout will be performed

set_evaluation_feedback(rewards)[source]

Set feedbacks for the parameter vector.

Parameters:
rewards : list of float

Feedbacks for each step or for the episode, depends on the problem