Fork me on GitHub

bolero.optimizer.CCMAESOptimizer

class bolero.optimizer.CCMAESOptimizer(initial_params=None, variance=1.0, covariance=None, n_samples_per_update=None, context_features=None, baseline_degree=2, gamma=0.0001, log_to_file=False, log_to_stdout=False, random_state=None, **kwargs)[source]

Contextual Covariance Matrix Adaptation Evolution Strategy.

This contextual version of CMAESOptimizer inherits the properties from the original algorithm. More information on the algorithm can be found in the original publication [1]. A major advantage over C-REPS is that it quickly adapts the step size.

C-CMA-ES internally models the context-dependent baseline with polynomial ridge regression. The degree of the polynomial can be configured. The regularization coefficient is shared with the upper-level policy. Context features are only used for the upper-level policy.

Parameters:
initial_params : array-like, shape (n_params,)

Initial parameter vector.

variance : float, optional (default: 1.0)

Initial exploration variance.

covariance : array-like, optional (default: None)

Either a diagonal (with shape (n_params,)) or a full covariance matrix (with shape (n_params, n_params)). A full covariance can contain information about the correlation of variables.

n_samples_per_update : int, optional

Number of samples that will be used to update a policy. default: 4 + int(3*log(n_params + n_context_dims)) * (1 + 2 * n_context_dims)

context_features : string or callable, optional (default: None)

(Nonlinear) feature transformation for the context, which will be used to learn the upper-level policy.

baseline_degree : int, optional (default: 2)

Degree of the polynomial features that will be used to estimate the context-dependent reward baseline.

gamma : float, optional (default: 1e-4)

Regularization parameter for baseline and upper-level policy.

log_to_file: optional, boolean or string (default: False)

Log results to given file, it will be located in the $BL_LOG_PATH

log_to_stdout: optional, boolean (default: False)

Log to standard output

random_state : optional, int

Seed for the random number generator.

References

[1](1, 2) Abdolmaleki, A.; Price, B.; Lau, N.; Paulo Reis, L.; Neumann, G. Contextual Covariance Matrix Adaptation Evolution Strategies.
__init__(initial_params=None, variance=1.0, covariance=None, n_samples_per_update=None, context_features=None, baseline_degree=2, gamma=0.0001, log_to_file=False, log_to_stdout=False, random_state=None, **kwargs)[source]
best_policy()[source]

Return current best estimate of contextual policy.

Returns:
policy : UpperLevelPolicy

Best estimate of upper-level policy

get_args()

Get parameters for this estimator.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

get_desired_context()[source]

C-CMA-ES does not actively select the context.

Returns:
context : None

C-CMA-ES does not have any preference

get_next_parameters(params, explore=True)[source]

Get next individual/parameter vector for evaluation.

Parameters:
params : array_like, shape (n_params,)

Parameter vector, will be modified

explore : bool, optional (default: True)

Whether we want to turn exploration on for the next evaluation

init(n_params, n_context_dims)[source]

Initialize optimizer.

Parameters:
n_params : int

number of parameters

n_context_dims : int

number of dimensions of the context space

is_behavior_learning_done()[source]

Check if the optimization is finished.

Returns:
finished : bool

Is the learning of a behavior finished?

set_context(context)[source]

Set context of next evaluation.

Parameters:
context : array-like, shape (n_context_dims,)

The context in which the next rollout will be performed

set_evaluation_feedback(rewards)[source]

Set feedbacks for the parameter vector.

Parameters:
rewards : list of float

Feedbacks for each step or for the episode, depending on the problem