one_step

catapult

class CatapultExample(m, g_M, k_M, x_M, g_V, k_V, x_V)[source]

Bases: object

For calculating the quantities of the ‘illustrative example’ in [1]

See also

[1] F. Muratore, M. Gienger, J. Peters, “Assessing Transferability from Simulation to Reality for Reinforcement

Learning”, PAMI, 2021

Constructor

est_expec_return(th, n_M, n_V)[source]

Calculate the optimal objective function value.

Parameters:
  • th – policy parameter

  • n_M – number of Mars samples

  • n_V – number of Venus samples

Returns:

value of the estimated expected return

opt_est_expec_return(n_M, n_V)[source]

Calculate the optimal objective function value.

Parameters:
  • n_M – number of Mars samples

  • n_V – number of Venus samples

Returns:

optimal value of the estimated expected return

opt_policy_param(n_M, n_V)[source]

Compute the optimal policy parameter.

Parameters:
  • n_M – number of Mars samples

  • n_V – number of Venus samples

Returns:

optimal policy parameter

class CatapultSim(max_steps: int, example_config: bool)[source]

Bases: SimEnv, Serializable

In this special environment, the action is equal to the policy parameter. Therefore, it makes only sense to use it in combination with a linear policy that has only one constant feature.

Constructor

Parameters:
  • max_steps – maximum number of simulation steps

  • example_config – configuration for the ‘illustrative example’ in the journal

property act_space

Get the space of the actions.

property domain_param: dict

Get the environment’s domain parameters. If there are none, this method should return an emtpy dict. The domain parameters are synonymous to the parameters used by the simulator to run the physics simulation (e.g., masses, extents, or friction coefficients). This must include all parameters that can be randomized, but there might also be additional parameters that depend on the domain parameters.

classmethod get_nominal_domain_param() dict[source]

Get the nominal a.k.a. default domain parameters.

Note

This function is used to check which domain parameters exist.

property init_space

Get the initial state space.

name: str = 'cata'
property obs_space

Get the space of the observations (agent’s perception of the environment).

render(mode: RenderMode, render_step: int = 1)[source]

Visualize one time step of the simulation. The base version prints to console when the state exceeds its boundaries.

Parameters:
  • mode – render mode: console, video, or both

  • render_step – interval for rendering

reset(init_state: Optional[ndarray] = None, domain_param: Optional[dict] = None) ndarray[source]

Reset the environment to its initial state and optionally set different domain parameters.

Parameters:
  • init_state – set explicit initial state if not None. Must match init_space if any.

  • domain_param – set explicit domain parameters if not None

Return obs:

initial observation of the state.

property state_space

Get the space of the states (used for describing the environment).

step(act: ndarray) tuple[source]

Perform one time step of the simulation or on the real-world device. When a terminal condition is met, the reset function is called.

Note

This function is responsible for limiting the actions, i.e. has to call limit_act().

Parameters:

act – action to be taken in the step

Return obs:

current observation of the environment

Return reward:

reward depending on the selected reward function

Return done:

indicates whether the episode has ended

Return env_info:

contains diagnostic information about the environment

property task

Get the task describing what the agent should do in the environment.

rosenbrock

class RosenSim[source]

Bases: SimEnv, Serializable

This environment wraps the Rosenbrock function to use it as a test case for Pyrado algorithms.

Constructor

property act_space

Get the space of the actions.

property domain_param: dict

Get the environment’s domain parameters. If there are none, this method should return an emtpy dict. The domain parameters are synonymous to the parameters used by the simulator to run the physics simulation (e.g., masses, extents, or friction coefficients). This must include all parameters that can be randomized, but there might also be additional parameters that depend on the domain parameters.

classmethod get_nominal_domain_param() dict[source]

Get the nominal a.k.a. default domain parameters.

Note

This function is used to check which domain parameters exist.

property init_space

Get the initial state space.

name: str = 'rosen'
property obs_space

Get the space of the observations (agent’s perception of the environment).

render(mode: RenderMode, render_step: int = 1)[source]

Visualize one time step of the simulation. The base version prints to console when the state exceeds its boundaries.

Parameters:
  • mode – render mode: console, video, or both

  • render_step – interval for rendering

reset(init_state: Optional[ndarray] = None, domain_param: Optional[dict] = None) ndarray[source]

Reset the environment to its initial state and optionally set different domain parameters.

Parameters:
  • init_state – set explicit initial state if not None. Must match init_space if any.

  • domain_param – set explicit domain parameters if not None

Return obs:

initial observation of the state.

property state_space

Get the space of the states (used for describing the environment).

step(act: ndarray) tuple[source]

Perform one time step of the simulation or on the real-world device. When a terminal condition is met, the reset function is called.

Note

This function is responsible for limiting the actions, i.e. has to call limit_act().

Parameters:

act – action to be taken in the step

Return obs:

current observation of the environment

Return reward:

reward depending on the selected reward function

Return done:

indicates whether the episode has ended

Return env_info:

contains diagnostic information about the environment

property task: OptimProxyTask

Get the task describing what the agent should do in the environment.

two_dim_gaussian

class TwoDimGaussian[source]

Bases: SimEnv, Serializable

A toy model with complex 2-dim Gaussian posterior as described in [1]. This environment can be interpreted as a zero-step environment. We use the domain parameters to capture the

See also

[1] G. Papamakarios, D. Sterratt, I. Murray, “Sequential Neural Likelihood: Fast Likelihood-free Inference with

Autoregressive Flows”, AISTATS, 2019

Constructor

property act_space

Get the space of the actions.

static calc_constants(dp)[source]
property constants
property domain_param: dict

Get the environment’s domain parameters. If there are none, this method should return an emtpy dict. The domain parameters are synonymous to the parameters used by the simulator to run the physics simulation (e.g., masses, extents, or friction coefficients). This must include all parameters that can be randomized, but there might also be additional parameters that depend on the domain parameters.

classmethod get_nominal_domain_param() dict[source]

Get the nominal a.k.a. default domain parameters.

Note

This function is used to check which domain parameters exist.

property init_space

Get the initial state space.

log_prob(trajectory, params)[source]

Very ugly, but can be used to calculate the probability of a rollout in the case that we are interested on the exact posterior probability

Calculates the log-probability for a pair of states and domain parameters.

name: str = '2dg'
property obs_space

Get the space of the observations (agent’s perception of the environment).

render(mode: RenderMode, render_step: int = 1)[source]

Visualize one time step of the simulation. The base version prints to console when the state exceeds its boundaries.

Parameters:
  • mode – render mode: console, video, or both

  • render_step – interval for rendering

reset(init_state: Optional[ndarray] = None, domain_param: Optional[dict] = None) ndarray[source]

Resetting the environment generates the singular state which is of importance. This environment can be interpreted as a zero-step environment, because it does not depend on an action.

property state_space

Get the space of the states (used for describing the environment).

step(act: Optional[ndarray] = None) tuple[source]

Perform one time step of the simulation or on the real-world device. When a terminal condition is met, the reset function is called.

Note

This function is responsible for limiting the actions, i.e. has to call limit_act().

Parameters:

act – action to be taken in the step

Return obs:

current observation of the environment

Return reward:

reward depending on the selected reward function

Return done:

indicates whether the episode has ended

Return env_info:

contains diagnostic information about the environment

property task: OptimProxyTask

Get the task describing what the agent should do in the environment.

Module contents