one_step
catapult
- class CatapultExample(m, g_M, k_M, x_M, g_V, k_V, x_V)[source]
Bases:
object
For calculating the quantities of the ‘illustrative example’ in [1]
See also
- [1] F. Muratore, M. Gienger, J. Peters, “Assessing Transferability from Simulation to Reality for Reinforcement
Learning”, PAMI, 2021
Constructor
- est_expec_return(th, n_M, n_V)[source]
Calculate the optimal objective function value.
- Parameters:
th – policy parameter
n_M – number of Mars samples
n_V – number of Venus samples
- Returns:
value of the estimated expected return
- class CatapultSim(max_steps: int, example_config: bool)[source]
Bases:
SimEnv
,Serializable
In this special environment, the action is equal to the policy parameter. Therefore, it makes only sense to use it in combination with a linear policy that has only one constant feature.
Constructor
- Parameters:
max_steps – maximum number of simulation steps
example_config – configuration for the ‘illustrative example’ in the journal
- property act_space
Get the space of the actions.
- property domain_param: dict
Get the environment’s domain parameters. If there are none, this method should return an emtpy dict. The domain parameters are synonymous to the parameters used by the simulator to run the physics simulation (e.g., masses, extents, or friction coefficients). This must include all parameters that can be randomized, but there might also be additional parameters that depend on the domain parameters.
- classmethod get_nominal_domain_param() dict [source]
Get the nominal a.k.a. default domain parameters.
Note
This function is used to check which domain parameters exist.
- property init_space
Get the initial state space.
- name: str = 'cata'
- property obs_space
Get the space of the observations (agent’s perception of the environment).
- render(mode: RenderMode, render_step: int = 1)[source]
Visualize one time step of the simulation. The base version prints to console when the state exceeds its boundaries.
- Parameters:
mode – render mode: console, video, or both
render_step – interval for rendering
- reset(init_state: Optional[ndarray] = None, domain_param: Optional[dict] = None) ndarray [source]
Reset the environment to its initial state and optionally set different domain parameters.
- Parameters:
init_state – set explicit initial state if not None. Must match init_space if any.
domain_param – set explicit domain parameters if not None
- Return obs:
initial observation of the state.
- property state_space
Get the space of the states (used for describing the environment).
- step(act: ndarray) tuple [source]
Perform one time step of the simulation or on the real-world device. When a terminal condition is met, the reset function is called.
Note
This function is responsible for limiting the actions, i.e. has to call limit_act().
- Parameters:
act – action to be taken in the step
- Return obs:
current observation of the environment
- Return reward:
reward depending on the selected reward function
- Return done:
indicates whether the episode has ended
- Return env_info:
contains diagnostic information about the environment
- property task
Get the task describing what the agent should do in the environment.
rosenbrock
- class RosenSim[source]
Bases:
SimEnv
,Serializable
This environment wraps the Rosenbrock function to use it as a test case for Pyrado algorithms.
Constructor
- property act_space
Get the space of the actions.
- property domain_param: dict
Get the environment’s domain parameters. If there are none, this method should return an emtpy dict. The domain parameters are synonymous to the parameters used by the simulator to run the physics simulation (e.g., masses, extents, or friction coefficients). This must include all parameters that can be randomized, but there might also be additional parameters that depend on the domain parameters.
- classmethod get_nominal_domain_param() dict [source]
Get the nominal a.k.a. default domain parameters.
Note
This function is used to check which domain parameters exist.
- property init_space
Get the initial state space.
- name: str = 'rosen'
- property obs_space
Get the space of the observations (agent’s perception of the environment).
- render(mode: RenderMode, render_step: int = 1)[source]
Visualize one time step of the simulation. The base version prints to console when the state exceeds its boundaries.
- Parameters:
mode – render mode: console, video, or both
render_step – interval for rendering
- reset(init_state: Optional[ndarray] = None, domain_param: Optional[dict] = None) ndarray [source]
Reset the environment to its initial state and optionally set different domain parameters.
- Parameters:
init_state – set explicit initial state if not None. Must match init_space if any.
domain_param – set explicit domain parameters if not None
- Return obs:
initial observation of the state.
- property state_space
Get the space of the states (used for describing the environment).
- step(act: ndarray) tuple [source]
Perform one time step of the simulation or on the real-world device. When a terminal condition is met, the reset function is called.
Note
This function is responsible for limiting the actions, i.e. has to call limit_act().
- Parameters:
act – action to be taken in the step
- Return obs:
current observation of the environment
- Return reward:
reward depending on the selected reward function
- Return done:
indicates whether the episode has ended
- Return env_info:
contains diagnostic information about the environment
- property task: OptimProxyTask
Get the task describing what the agent should do in the environment.
two_dim_gaussian
- class TwoDimGaussian[source]
Bases:
SimEnv
,Serializable
A toy model with complex 2-dim Gaussian posterior as described in [1]. This environment can be interpreted as a zero-step environment. We use the domain parameters to capture the
See also
- [1] G. Papamakarios, D. Sterratt, I. Murray, “Sequential Neural Likelihood: Fast Likelihood-free Inference with
Autoregressive Flows”, AISTATS, 2019
Constructor
- property act_space
Get the space of the actions.
- property constants
- property domain_param: dict
Get the environment’s domain parameters. If there are none, this method should return an emtpy dict. The domain parameters are synonymous to the parameters used by the simulator to run the physics simulation (e.g., masses, extents, or friction coefficients). This must include all parameters that can be randomized, but there might also be additional parameters that depend on the domain parameters.
- classmethod get_nominal_domain_param() dict [source]
Get the nominal a.k.a. default domain parameters.
Note
This function is used to check which domain parameters exist.
- property init_space
Get the initial state space.
- log_prob(trajectory, params)[source]
Very ugly, but can be used to calculate the probability of a rollout in the case that we are interested on the exact posterior probability
Calculates the log-probability for a pair of states and domain parameters.
- name: str = '2dg'
- property obs_space
Get the space of the observations (agent’s perception of the environment).
- render(mode: RenderMode, render_step: int = 1)[source]
Visualize one time step of the simulation. The base version prints to console when the state exceeds its boundaries.
- Parameters:
mode – render mode: console, video, or both
render_step – interval for rendering
- reset(init_state: Optional[ndarray] = None, domain_param: Optional[dict] = None) ndarray [source]
Resetting the environment generates the singular state which is of importance. This environment can be interpreted as a zero-step environment, because it does not depend on an action.
- property state_space
Get the space of the states (used for describing the environment).
- step(act: Optional[ndarray] = None) tuple [source]
Perform one time step of the simulation or on the real-world device. When a terminal condition is met, the reset function is called.
Note
This function is responsible for limiting the actions, i.e. has to call limit_act().
- Parameters:
act – action to be taken in the step
- Return obs:
current observation of the environment
- Return reward:
reward depending on the selected reward function
- Return done:
indicates whether the episode has ended
- Return env_info:
contains diagnostic information about the environment
- property task: OptimProxyTask
Get the task describing what the agent should do in the environment.