mujoco

base

class MujocoSimEnv(model_path: str, frame_skip: int = 1, dt: Optional[float] = None, max_steps: int = inf, task_args: Optional[dict] = None)[source]

Bases: SimEnv, ABC, Serializable

Base class for MuJoCo environments. Uses Serializable to facilitate proper serialization.

Constructor

Parameters:

model_path – path to the MuJoCo xml model config file
frame_skip – number of simulation frames for which the same action is held, results in a multiplier of the time step size dt
dt – by default the time step size is the one from the mujoco config file multiplied by the number of frame skips (legacy from OpenAI environments). By passing an explicit dt value, this can be overwritten. Possible use case if if you know that you recorded a trajectory with a specific dt.
max_steps – max number of simulation time steps
task_args – arguments for the task construction, e.g dict(fwd_rew_weight=1.)

abstract property act_space: Space: Get the space of the actions.

configure_viewer()[source]: Configure the camera when the viewer is initialized. You need to set self.camera_config before.

property domain_param: dict: Get the environment’s domain parameters. If there are none, this method should return an emtpy dict. The domain parameters are synonymous to the parameters used by the simulator to run the physics simulation (e.g., masses, extents, or friction coefficients). This must include all parameters that can be randomized, but there might also be additional parameters that depend on the domain parameters.

property init_space: Space: Get the initial state space.

abstract property obs_space: Space: Get the space of the observations (agent’s perception of the environment).

render(mode: RenderMode = RenderMode(text=False, video=False, render=False), render_step: int = 1)[source]

Visualize one time step of the simulation. The base version prints to console when the state exceeds its boundaries.

Parameters:

mode – render mode: console, video, or both
render_step – interval for rendering

reset(init_state: Optional[ndarray] = None, domain_param: Optional[dict] = None) → ndarray[source]

Reset the environment to its initial state and optionally set different domain parameters.

Parameters:

init_state – set explicit initial state if not None. Must match init_space if any.
domain_param – set explicit domain parameters if not None

Return obs:

initial observation of the state.

abstract property state_space: Space: Get the space of the states (used for describing the environment).

step(act: ndarray) → tuple[source]

Perform one time step of the simulation or on the real-world device. When a terminal condition is met, the reset function is called.

Note

This function is responsible for limiting the actions, i.e. has to call limit_act().

Parameters:: act – action to be taken in the step
Return obs:: current observation of the environment
Return reward:: reward depending on the selected reward function
Return done:: indicates whether the episode has ended
Return env_info:: contains diagnostic information about the environment

property task: Task: Get the task describing what the agent should do in the environment.

openai_ant

class AntSim(frame_skip: int = 5, dt: Optional[float] = None, max_steps: Optional[int] = 1000, task_args: Optional[dict] = None)[source]

Bases: MujocoSimEnv, Serializable

The Ant (v3) MuJoCo simulation environment where a four-legged creature walks as fast as possible.

Note

The OpenAI Gym variant considers this task solved at a reward over 6000 (https://github.com/openai/gym/blob/master/gym/envs/__init__.py).

Constructor

Parameters:

frame_skip – number of simulation frames for which the same action is held, results in a multiplier of the time step size dt
dt – by default the time step size is the one from the mujoco config file multiplied by the number of frame skips (legacy from OpenAI environments). By passing an explicit dt value, this can be overwritten. Possible use case if if you know that you recorded a trajectory with a specific dt.
max_steps – max number of simulation time steps
task_args – arguments for the task construction, e.g dict(fwd_rew_weight=1.)

property act_space: Space: Get the space of the actions.

property contact_forces

classmethod get_nominal_domain_param() → dict[source]: Get the nominal a.k.a. default domain parameters.

Note

This function is used to check which domain parameters exist.

name: str = 'ant'

property obs_space: Space: Get the space of the observations (agent’s perception of the environment).

observe(state: ndarray) → ndarray[source]

Compute the (noise-free) observation from the current state.

Note

This method should be overwritten if the environment has a distinct observation space.

Parameters:: state – current state of the environment
Returns:: observation perceived to the agent

property state_space: Space: Get the space of the states (used for describing the environment).

openai_half_cheetah

class HalfCheetahSim(frame_skip: int = 5, dt: Optional[float] = None, max_steps: int = 1000, task_args: Optional[dict] = None)[source]

Bases: MujocoSimEnv, Serializable

The Half-Cheetah (v3) MuJoCo simulation environment where a planar cheetah-like robot tries to run forward.

Note

The OpenAI Gym variant considers this task solved at a reward over 4800 (https://github.com/openai/gym/blob/master/gym/envs/__init__.py).

Constructor

Parameters:

frame_skip – number of simulation frames for which the same action is held, results in a multiplier of the time step size dt
dt – by default the time step size is the one from the mujoco config file multiplied by the number of frame skips (legacy from OpenAI environments). By passing an explicit dt value, this can be overwritten. Possible use case if if you know that you recorded a trajectory with a specific dt.
max_steps – max number of simulation time steps
task_args – arguments for the task construction, e.g dict(fwd_rew_weight=1.)

property act_space: Space: Get the space of the actions.

classmethod get_nominal_domain_param() → dict[source]: Get the nominal a.k.a. default domain parameters.

Note

This function is used to check which domain parameters exist.

name: str = 'cth'

property obs_space: Space: Get the space of the observations (agent’s perception of the environment).

observe(state: ndarray) → ndarray[source]

Compute the (noise-free) observation from the current state.

Note

This method should be overwritten if the environment has a distinct observation space.

Parameters:: state – current state of the environment
Returns:: observation perceived to the agent

property state_space: Space: Get the space of the states (used for describing the environment).

openai_hopper

class HopperSim(frame_skip: int = 4, dt: Optional[float] = None, max_steps: int = 1000, task_args: Optional[dict] = None)[source]

Bases: MujocoSimEnv, Serializable

The Hopper (v3) MuJoCo simulation environment where a planar simplified one-legged robot tries to run forward.

Note

The OpenAI Gym variant considers this task solved at a reward over 3800 (https://github.com/openai/gym/blob/master/gym/envs/__init__.py).

Note

In contrast to the OpenAI Gym MoJoCo environments, Pyrado enables the randomization of the hoppers “healthy” state range. Moreover, the state space is constrained to the this part of the state space. In the original environment, the terminate_when_unhealthy is True by default.

Constructor

Parameters:

frame_skip – number of simulation frames for which the same action is held, results in a multiplier of the time step size dt
dt – by default the time step size is the one from the mujoco config file multiplied by the number of frame skips (legacy from OpenAI environments). By passing an explicit dt value, this can be overwritten. Possible use case if if you know that you recorded a trajectory with a specific dt.
max_steps – max number of simulation time steps
task_args – arguments for the task construction, e.g dict(fwd_rew_weight=1.)

property act_space: Space: Get the space of the actions.

classmethod get_nominal_domain_param() → dict[source]: Get the nominal a.k.a. default domain parameters.

Note

This function is used to check which domain parameters exist.

name: str = 'hop'

property obs_space: Space: Get the space of the observations (agent’s perception of the environment).

observe(state: ndarray) → ndarray[source]

Compute the (noise-free) observation from the current state.

Note

This method should be overwritten if the environment has a distinct observation space.

Parameters:: state – current state of the environment
Returns:: observation perceived to the agent

property state_space: Space: Get the space of the states (used for describing the environment).

openai_humanoid

class HumanoidSim(frame_skip: int = 5, dt: Optional[float] = None, max_steps: Optional[int] = 1000, task_args: Optional[dict] = None)[source]

Bases: MujocoSimEnv, Serializable

The Humanoid (v3) MuJoCo simulation environment where a humanoid robot tries to run forward.

Constructor

Parameters:

frame_skip – number of simulation frames for which the same action is held, results in a multiplier of the time step size dt
dt – by default the time step size is the one from the mujoco config file multiplied by the number of frame skips (legacy from OpenAI environments). By passing an explicit dt value, this can be overwritten. Possible use case if if you know that you recorded a trajectory with a specific dt.
max_steps – max number of simulation time steps
task_args – arguments for the task construction, e.g dict(fwd_rew_weight=1.)

property act_space: Space: Get the space of the actions.

classmethod get_nominal_domain_param() → dict[source]: Get the nominal a.k.a. default domain parameters.

Note

This function is used to check which domain parameters exist.

name: str = 'hum'

property obs_space: Space: Get the space of the observations (agent’s perception of the environment).

observe(state: ndarray) → ndarray[source]

Compute the (noise-free) observation from the current state.

Note

This method should be overwritten if the environment has a distinct observation space.

Parameters:: state – current state of the environment
Returns:: observation perceived to the agent

property state_space: Space: Get the space of the states (used for describing the environment).

quanser_qube

class QQubeMjSim(frame_skip: int = 4, dt: Optional[float] = None, max_steps: int = inf, task_args: Optional[dict] = None)[source]

Bases: MujocoSimEnv, Serializable

Constructor

Parameters:

frame_skip – number of simulation frames for which the same action is held, results in a multiplier of the time step size dt
dt – by default the time step size is the one from the mujoco config file multiplied by the number of frame skips (legacy from OpenAI environments). By passing an explicit dt value, this can be overwritten. Possible use case if if you know that you recorded a trajectory with a specific dt.
max_steps – max number of simulation time steps
task_args – arguments for the task construction

property act_space: Space: Get the space of the actions.

classmethod get_nominal_domain_param() → dict[source]: Get the nominal a.k.a. default domain parameters.

Note

This function is used to check which domain parameters exist.

property obs_space: Space: Get the space of the observations (agent’s perception of the environment).

observe(state: ndarray) → ndarray[source]

Compute the (noise-free) observation from the current state.

Note

This method should be overwritten if the environment has a distinct observation space.

Parameters:: state – current state of the environment
Returns:: observation perceived to the agent

class QQubeStabMjSim(frame_skip: int = 4, dt: Optional[float] = None, max_steps: int = inf, task_args: Optional[dict] = None)[source]

Bases: QQubeMjSim

Constructor

Parameters:

frame_skip – number of simulation frames for which the same action is held, results in a multiplier of the time step size dt
dt – by default the time step size is the one from the mujoco config file multiplied by the number of frame skips (legacy from OpenAI environments). By passing an explicit dt value, this can be overwritten. Possible use case if if you know that you recorded a trajectory with a specific dt.
max_steps – max number of simulation time steps
task_args – arguments for the task construction

property init_space: Space: Get the initial state space.

name: str = 'qq-mj-st'

property state_space: Space: Get the space of the states (used for describing the environment).

class QQubeSwingUpMjSim(frame_skip: int = 4, dt: Optional[float] = None, max_steps: int = inf, task_args: Optional[dict] = None)[source]

Bases: QQubeMjSim

Constructor

Parameters:

frame_skip – number of simulation frames for which the same action is held, results in a multiplier of the time step size dt
dt – by default the time step size is the one from the mujoco config file multiplied by the number of frame skips (legacy from OpenAI environments). By passing an explicit dt value, this can be overwritten. Possible use case if if you know that you recorded a trajectory with a specific dt.
max_steps – max number of simulation time steps
task_args – arguments for the task construction

property init_space: Space: Get the initial state space.

name: str = 'qq-mj-su'

property state_space: Space: Get the space of the states (used for describing the environment).

wam_base

class WAMSim(num_dof: int, model_path: str, frame_skip: int = 4, dt: Optional[float] = None, max_steps: int = inf, task_args: Optional[dict] = None)[source]

Bases: MujocoSimEnv, ABC, Serializable

Base class for WAM robotic arm from Barrett technologies

Constructor

Parameters:

num_dof – number of degrees of freedom (4 or 7), depending on which Barrett WAM setup being used
model_path – path to the MuJoCo xml model config file
frame_skip – number of simulation frames for which the same action is held, results in a multiplier of the time step size dt
dt – by default the time step size is the one from the mujoco config file multiplied by the number of frame skips (legacy from OpenAI environments). By passing an explicit dt value, this can be overwritten. Possible use case if if you know that you recorded a trajectory with a specific dt.
max_steps – max number of simulation time steps
task_args – arguments for the task construction

abstract property act_space: Space: Get the space of the actions.

classmethod get_nominal_domain_param(num_dof: int = 7) → dict[source]: Get the nominal a.k.a. default domain parameters.

Note

This function is used to check which domain parameters exist.

property num_dof: int: Get the number of degrees of freedom.

abstract property obs_space: Space: Get the space of the observations (agent’s perception of the environment).

abstract property state_space: Space: Get the space of the states (used for describing the environment).

property torque_space: Space: Get the space of joint torques.

mujoco

base

openai_ant

openai_half_cheetah

openai_hopper

openai_humanoid

quanser_qube

wam_base

wam_bic

wam_jsc

Module contents