bolero.optimizer
.REPSOptimizer¶bolero.optimizer.
REPSOptimizer
(initial_params=None, variance=1.0, covariance=None, epsilon=2.0, min_eta=1e-08, train_freq=25, n_samples_per_update=100, bounds=None, log_to_file=False, log_to_stdout=False, random_state=None)[source]¶Relative Entropy Policy Search (REPS) as Optimizer.
Use REPS as a black-box optimizer: learn an upper-level distribution \(\pi(\boldsymbol{\theta})\) which selects weights \(\boldsymbol{\theta}\) for the objective function. At the moment, \(\pi(\boldsymbol{\theta})\) is assumed to be a multivariate gaussian distribution whose mean and covariance (governing exploration) are learned. REPS constrains the learning updates such that the KL divergence between the old and the new distribution is below a threshold epsilon. More details can be found in the original publication [1].
Abdolmaleki et al. [2] state that “the episodic REPS algorithm uses a sample based approximation of the KL-bound, which needs a lot of samples in order to be accurate. Moreover, a typical problem of REPS is that the entropy of the search distribution decreases too quickly, resulting in premature convergence.”
Parameters: |
|
---|
References
[1] | (1, 2) Peters, J.; Muelling, K.; Altuen, Y. Relative Entropy Policy Search. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010. |
[2] | (1, 2) Abdolmaleki, A.; Lioutikov, R.; Lau, N; Paulo Reis, L.; Peters, J.; Neumann, G. Model-Based Relative Entropy Stochastic Search. Advances in Neural Information Processing Systems 28, 2015. |
__init__
(initial_params=None, variance=1.0, covariance=None, epsilon=2.0, min_eta=1e-08, train_freq=25, n_samples_per_update=100, bounds=None, log_to_file=False, log_to_stdout=False, random_state=None)[source]¶get_args
()¶Get parameters for this estimator.
Returns: |
|
---|
get_best_parameters
()[source]¶Get the best parameters.
Returns: |
|
---|
get_next_parameters
(params, explore=True)[source]¶Return parameter vector that shall be evaluated next.
Parameters: |
|
---|