`bolero.optimizer`.CCMAESOptimizer¶

class bolero.optimizer.CCMAESOptimizer(initial_params=None, variance=1.0, covariance=None, n_samples_per_update=None, context_features=None, baseline_degree=2, gamma=0.0001, log_to_file=False, log_to_stdout=False, random_state=None, **kwargs)[source]¶

Contextual Covariance Matrix Adaptation Evolution Strategy.

This contextual version of CMAESOptimizer inherits the properties from the original algorithm. More information on the algorithm can be found in the original publication [1]. A major advantage over C-REPS is that it quickly adapts the step size.

C-CMA-ES internally models the context-dependent baseline with polynomial ridge regression. The degree of the polynomial can be configured. The regularization coefficient is shared with the upper-level policy. Context features are only used for the upper-level policy.

Parameters:

Parameters:	initial_params : array-like, shape (n_params,) Initial parameter vector. variance : float, optional (default: 1.0) Initial exploration variance. covariance : array-like, optional (default: None) Either a diagonal (with shape (n_params,)) or a full covariance matrix (with shape (n_params, n_params)). A full covariance can contain information about the correlation of variables. n_samples_per_update : int, optional Number of samples that will be used to update a policy. default: 4 + int(3log(n_params + n_context_dims)) (1 + 2 * n_context_dims) context_features : string or callable, optional (default: None) (Nonlinear) feature transformation for the context, which will be used to learn the upper-level policy. baseline_degree : int, optional (default: 2) Degree of the polynomial features that will be used to estimate the context-dependent reward baseline. gamma : float, optional (default: 1e-4) Regularization parameter for baseline and upper-level policy. log_to_file: optional, boolean or string (default: False) Log results to given file, it will be located in the $BL_LOG_PATH log_to_stdout: optional, boolean (default: False) Log to standard output random_state : optional, int Seed for the random number generator.

initial_params : array-like, shape (n_params,): Initial parameter vector.
variance : float, optional (default: 1.0): Initial exploration variance.
covariance : array-like, optional (default: None): Either a diagonal (with shape (n_params,)) or a full covariance matrix (with shape (n_params, n_params)). A full covariance can contain information about the correlation of variables.
n_samples_per_update : int, optional: Number of samples that will be used to update a policy. default: 4 + int(3*log(n_params + n_context_dims)) * (1 + 2 * n_context_dims)
context_features : string or callable, optional (default: None): (Nonlinear) feature transformation for the context, which will be used to learn the upper-level policy.
baseline_degree : int, optional (default: 2): Degree of the polynomial features that will be used to estimate the context-dependent reward baseline.
gamma : float, optional (default: 1e-4): Regularization parameter for baseline and upper-level policy.
log_to_file: optional, boolean or string (default: False): Log results to given file, it will be located in the $BL_LOG_PATH
log_to_stdout: optional, boolean (default: False): Log to standard output
random_state : optional, int: Seed for the random number generator.

References

[1]	(1, 2) Abdolmaleki, A.; Price, B.; Lau, N.; Paulo Reis, L.; Neumann, G. Contextual Covariance Matrix Adaptation Evolution Strategies.

__init__(initial_params=None, variance=1.0, covariance=None, n_samples_per_update=None, context_features=None, baseline_degree=2, gamma=0.0001, log_to_file=False, log_to_stdout=False, random_state=None, **kwargs)[source]¶

best_policy()[source]¶

Return current best estimate of contextual policy.

Returns:	policy : UpperLevelPolicy Best estimate of upper-level policy

get_args()¶

Get parameters for this estimator.

Returns:	params : mapping of string to any Parameter names mapped to their values.

get_desired_context()[source]¶

C-CMA-ES does not actively select the context.

Returns:	context : None C-CMA-ES does not have any preference

get_next_parameters(params, explore=True)[source]¶

Get next individual/parameter vector for evaluation.

Parameters:	params : array_like, shape (n_params,) Parameter vector, will be modified explore : bool, optional (default: True) Whether we want to turn exploration on for the next evaluation

init(n_params, n_context_dims)[source]¶

Initialize optimizer.

Parameters:	n_params : int number of parameters n_context_dims : int number of dimensions of the context space

is_behavior_learning_done()[source]¶

Check if the optimization is finished.

Returns:	finished : bool Is the learning of a behavior finished?

set_context(context)[source]¶

Set context of next evaluation.

Parameters:	context : array-like, shape (n_context_dims,) The context in which the next rollout will be performed

set_evaluation_feedback(rewards)[source]¶

Set feedbacks for the parameter vector.

Parameters:	rewards : list of float Feedbacks for each step or for the episode, depending on the problem

bolero.optimizer.CCMAESOptimizer¶

`bolero.optimizer`.CCMAESOptimizer¶