`bolero.behavior_search`.MonteCarloRL¶

class bolero.behavior_search.MonteCarloRL(action_space, gamma=0.9, epsilon=0.1, random_state=None)[source]¶

Tabular Monte Carlo is a model-free reinforcement learning method.

This implements the epsilon-soft on-policy Monte Carlo control algorithm shown at page 120 of “Reinforcement Learning: An Introduction” (Sutton and Barto, 2nd edition, http://people.inf.elte.hu/lorincz/Files/RL_2006/SuttonBook.pdf). The action space and the state space must be discrete for this implementation.

Parameters:

Parameters:	action_space : list Actions that the agent can select from gamma : float, optional (default: 0.9) Discount factor for the discounted infinite horizon model epsilon : float, optional (default: 0.1) Exploration probability for epsilon-greedy policy convergence_threshold : float, optional (default: 1e-3) Learning will be stopped if the maximum difference of the value function between iterations is below this threshold random_state : int or RandomState, optional (default: None) Seed for the random number generator or RandomState object.

action_space : list: Actions that the agent can select from
gamma : float, optional (default: 0.9): Discount factor for the discounted infinite horizon model
epsilon : float, optional (default: 0.1): Exploration probability for epsilon-greedy policy
convergence_threshold : float, optional (default: 1e-3): Learning will be stopped if the maximum difference of the value function between iterations is below this threshold
random_state : int or RandomState, optional (default: None): Seed for the random number generator or RandomState object.

__init__(action_space, gamma=0.9, epsilon=0.1, random_state=None)[source]¶

get_args()¶

Get parameters for this estimator.

Returns:	params : mapping of string to any Parameter names mapped to their values.

get_behavior_from_results(result_path)¶

Recover search state from file.

Parameters:	result_path : string path in which we search for the file

get_best_behavior()[source]¶

Returns the best behavior found so far.

Returns:	behavior : Behavior mapping from input to output

get_next_behavior()[source]¶

Obtain next behavior for evaluation.

Returns:	behavior : Behavior mapping from input to output

init(n_inputs, n_outputs)[source]¶

Initialize the behavior search.

Parameters:	n_inputs : int number of inputs of the behavior n_outputs : int number of outputs of the behavior

is_behavior_learning_done()[source]¶

Check if the value function converged.

Returns:	finished : bool Is the learning of a behavior finished?

set_evaluation_feedback(feedbacks)[source]¶

Set feedback for the last behavior.

Parameters:	feedbacks : list of float feedback for each step or for the episode, depends on the problem

write_results(result_path)¶

Store current search state.

Parameters:	result_path : string path in which the state should be stored

bolero.behavior_search.MonteCarloRL¶

`bolero.behavior_search`.MonteCarloRL¶