Fork me on GitHub

bolero.behavior_search.MonteCarloRL

class bolero.behavior_search.MonteCarloRL(action_space, gamma=0.9, epsilon=0.1, random_state=None)[source]

Tabular Monte Carlo is a model-free reinforcement learning method.

This implements the epsilon-soft on-policy Monte Carlo control algorithm shown at page 120 of “Reinforcement Learning: An Introduction” (Sutton and Barto, 2nd edition, http://people.inf.elte.hu/lorincz/Files/RL_2006/SuttonBook.pdf). The action space and the state space must be discrete for this implementation.

Parameters:
action_space : list

Actions that the agent can select from

gamma : float, optional (default: 0.9)

Discount factor for the discounted infinite horizon model

epsilon : float, optional (default: 0.1)

Exploration probability for epsilon-greedy policy

convergence_threshold : float, optional (default: 1e-3)

Learning will be stopped if the maximum difference of the value function between iterations is below this threshold

random_state : int or RandomState, optional (default: None)

Seed for the random number generator or RandomState object.

__init__(action_space, gamma=0.9, epsilon=0.1, random_state=None)[source]
get_args()

Get parameters for this estimator.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

get_behavior_from_results(result_path)

Recover search state from file.

Parameters:
result_path : string

path in which we search for the file

get_best_behavior()[source]

Returns the best behavior found so far.

Returns:
behavior : Behavior

mapping from input to output

get_next_behavior()[source]

Obtain next behavior for evaluation.

Returns:
behavior : Behavior

mapping from input to output

init(n_inputs, n_outputs)[source]

Initialize the behavior search.

Parameters:
n_inputs : int

number of inputs of the behavior

n_outputs : int

number of outputs of the behavior

is_behavior_learning_done()[source]

Check if the value function converged.

Returns:
finished : bool

Is the learning of a behavior finished?

set_evaluation_feedback(feedbacks)[source]

Set feedback for the last behavior.

Parameters:
feedbacks : list of float

feedback for each step or for the episode, depends on the problem

write_results(result_path)

Store current search state.

Parameters:
result_path : string

path in which the state should be stored