bolero.optimizer
.CREPSOptimizer¶bolero.optimizer.
CREPSOptimizer
(initial_params=None, variance=None, covariance=None, epsilon=2.0, min_eta=1e-08, train_freq=25, n_samples_per_update=100, context_features=None, gamma=0.0001, bounds=None, log_to_file=False, log_to_stdout=False, random_state=None, **kwargs)[source]¶Contextual Relative Entropy Policy Search.
Use C-REPS as a black-box contextual optimizer: Learns an upper-level distribution \(\pi(\boldsymbol{\theta}|\boldsymbol{s})\) which selects weights \(\boldsymbol{\theta}\) for the objective function. At the moment, \(\pi(\boldsymbol{\theta}|\boldsymbol{s})\) is assumed to be a multivariate Gaussian distribution whose mean is a linear function of nonlinear features from the context. C-REPS constrains the learning updates such that the KL divergence between successive distribution is below the threshold \(\epsilon\).
This contextual version of REPSOptimizer
inherits the properties from the original algorithm. More information
on the algorithm can be found in the original publication [1].
Parameters: |
|
---|
References
[1] | (1, 2) Kupcsik, A.; Deisenroth, M.P.; Peters, J.; Loh, A.P.; Vadakkepat, P.; Neumann, G. Model-based contextual policy search for data-efficient generalization of robot skills. Artificial Intelligence 247, 2017. |
__init__
(initial_params=None, variance=None, covariance=None, epsilon=2.0, min_eta=1e-08, train_freq=25, n_samples_per_update=100, context_features=None, gamma=0.0001, bounds=None, log_to_file=False, log_to_stdout=False, random_state=None, **kwargs)[source]¶best_policy
()[source]¶Return current best estimate of contextual policy.
Returns: |
|
---|
get_args
()¶Get parameters for this estimator.
Returns: |
|
---|
get_desired_context
()[source]¶C-REPS does not actively select the context.
Returns: |
|
---|
get_next_parameters
(params, explore=True)[source]¶Get next individual/parameter vector for evaluation.
Parameters: |
|
---|
init
(n_params, n_context_dims)[source]¶Initialize optimizer.
Parameters: |
|
---|
is_behavior_learning_done
()[source]¶Check if the optimization is finished.
Returns: |
|
---|