Fork me on GitHub

bolero.environment.Catapult

class bolero.environment.Catapult(segments=10, catapult_pos=array([ 0., 0.]), velocity_penalty=0.1, context_distribution=None, context_interval=(2, 10), random_state=None, verbose=0)[source]

Catapult environment, a benchmark for contextual policy search.

In this benchmark problem, the agent controls a catapult which shoots onto specific target positions (the contexts) on a one-dimensional surface. The agent sets the parameters of the shot (velocity and angle of the catapult), and this environment simulates the shoot. The actual position where the object hits the ground is not communicated to the agent. Instead the agent is told only the cost of this specific trial, where cost is defined as cost = -abs(hit_position - target_position) - velocity_penalty * v, where target_position is the respective context, v ist the velocity of the shoot and velocity_penalty is configurable. Thus, this environment defines a contextual policy search problem.

See also

Bruno Castro da Silva, George Konidaris, Andrew Barto, “Active Learning of Parameterized Skills”, ICML 2014

Parameters:
segments : sequence of tuples or int, optional (default: 10)

Definition of the surface onto which the catapult throws. If an integer is passed, a surface consisting of the given number of segments is created randomly. Alternatively, the segments can be explicitly provided as a sequence of pairs, e.g. [(0.0, 0.0), (2.0, 1.0), (10.0, 1.0)]. Each element of the sequence defines one point of the surface. It is assumed that the elements of the sequence are ordered according to their first component. The surface is created by linearly connecting all points of the segments sequence.

catapult_pos : array-like, shape (2,), optional (default: (0, 0))

The x and y positions at which the catapult is placed.

velocity_penality : float>=0, optional (default: 0.1)

A factor which controls how strongly large velocities are penalized in the cost function. Larger values correspond to stronger penalties.

context_distribution : distribution from scipy.stats, optional (default: None)

The distribution from which the contexts (goal positions for the catapult) are drawn. If None is given, a uniform distribution on the interval [2, 10] based on the random_state is used.

context_interval : tuple, optional (default: (2, 10))

Interval of the target position

random_state : RandomState or int, optional (default: None)

Random number generator or seed

verbose : int, optional (default: 0)

Verbosity level

__init__(segments=10, catapult_pos=array([ 0., 0.]), velocity_penalty=0.1, context_distribution=None, context_interval=(2, 10), random_state=None, verbose=0)[source]
get_args()

Get parameters for this estimator.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

get_feedback(noisy=False)[source]

The reward of the last roll-out.

get_maximum_feedback(context)[source]

Returns the maximal feedback obtainable in given context.

get_num_context_dims()[source]

Get number of context dimensions.

get_num_inputs()[source]

Get number of inputs (desired state).

get_num_outputs()[source]

Get number of outputs (actual state).

get_outputs(values)[source]

Get current outputs.

is_evaluation_done()[source]

Test if time is over.

request_context(context=None)[source]

Request that a specific context is used.

Parameters:
context : ndarray, default=None

The requested context that shall be used in the next rollout. Defaults to None. In that case, the environment selects the next context

Returns:
context: ndarray

The actual context used in the next rollout. This environment accepts all external requests.

reset()[source]

Reset the catapult environment.

set_inputs(values)[source]

Set desired velocity and angle.

step_action()[source]

Execute step perfectly.