special

domain_distribution

class DomainDistrParamPolicy(mapping: Dict[int, Tuple[str, str]], trafo_mask: Union[list, Tensor], prior: Optional[DomainRandomizer] = None, scale_params: bool = False, use_cuda: bool = False)[source]

Bases: Policy

A proxy to the Policy class in order to use the policy’s parameters as domain distribution parameters

Constructor

Parameters:

mapping – mapping from subsequent integers to domain distribution parameters, where the first string of the value tuple is the name of the domain parameter (e.g. mass, length) and the second string is the name of the distribution parameter (e.g. mean, std). The integers are indices of the numpy array which come from the algorithm.
trafo_mask – every domain parameter that is set to True in this mask will be learned via a ‘virtual’ parameter, i.e. in sqrt-space, and then finally squared to retrieve the domain parameter. This transformation is useful to avoid setting a negative variance.
prior – prior believe about the distribution parameters in from of a DomainRandomizer
scale_params – if True, the sqrt-transformed policy parameters are scaled in the range of \([-0.5, 0.5]\). The advantage of this is to make the parameter-based exploration easier.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

forward(obs: Optional[Tensor] = None) → Tensor[source]

Get the action according to the policy and the observations (forward pass).

Parameters:

args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state

Returns:

outputs, e.g. an action or an action and a hidden state

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

property mapping: Dict[int, Tuple[str, str]]: Get the mapping from subsequent integers to domain distribution parameters, where the first string of the value tuple is the name of the domain parameter and the second string is the name of the distribution parameter.

name: str = 'ddp'

transform_to_ddp_space(params: Tensor) → Tensor[source]

Get the transformed domain distribution parameters. Where ever the mask is True, the corresponding policy parameter is learned in sqrt space. Moreover, the policy parameters can be scaled.

Parameters:: params – policy parameters (can be the sqrt of the actual domain distribution parameter value)
Returns:: policy parameters transformed according to the mask

environment_specific

class QBallBalancerPDCtrl(env_spec: EnvSpec, state_des: Tensor = tensor([0.0, 0.0]), kp: Optional[Tensor] = None, kd: Optional[Tensor] = None, use_cuda: bool = False)[source]

Bases: Policy

PD-controller for the Quanser Ball Balancer. The only but significant difference of this controller to the other PD controller is the clipping of the actions.

Note

This class’s desired state specification deviates from the Pyrado policies which interact with a Task.

Constructor

Parameters:

env_spec – environment specification
state_des – tensor of desired x and y ball position [m]
kp – 2x2 tensor of constant controller feedback coefficients for error [V/m]
kd – 2x2 tensor of constant controller feedback coefficients for error time derivative [Vs/m]
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

forward(obs: Tensor) → Tensor[source]

Calculate the controller output.

Parameters:: obs – observation from the environment
Return act:: controller output [V]

init_param(kp: Optional[Tensor] = None, kd: Optional[Tensor] = None, verbose: bool = False, **kwargs)[source]

Initialize controller parameters.

Parameters:

kp – 2x2 tensor of constant controller feedback coefficients for error [V/m]
kd – 2x2 tensor of constant controller feedback coefficients for error time derivative [Vs/m]
verbose – print the controller’s gains

name: str = 'qbb-pd'

reset(**kwargs)[source]: Set the domain parameters defining the controller’s model using a dict called domain_param.

class QCartPoleGoToLimCtrl(init_state: ndarray, positive: bool = True)[source]

Bases: object

Controller for going to one of the joint limits (part of the calibration routine)

Constructor

Parameters:

init_state – initial state of the system
positive – direction switch

class QCartPoleSwingUpAndBalanceCtrl(env_spec: EnvSpec, long: bool = False, use_cuda: bool = False)[source]

Bases: Policy

Swing-up and balancing controller for the Quanser Cart-Pole

Constructor

Parameters:

env_spec – environment specification
long – flag for long or short pole
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property K_pd

forward(obs: Tensor) → Tensor[source]

Calculate the controller output.

Parameters:: obs – observation from the environment
Return act:: controller output [V]

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

property k_e

property k_p

name: str = 'qcp-sub'

property u_max

class QQubeEnergyCtrl(env_spec: EnvSpec, ref_energy: float, energy_gain: float, th_gain: float, acc_max: float, reset_domain_param: bool = True, use_cuda: bool = False)[source]

Bases: Policy

Energy-based controller used to swing the Furuta pendulum up

Constructor

Parameters:

env_spec – environment specification
ref_energy – reference energy level [J]
energy_gain – P-gain on the energy [m/s/J]
th_gain – P-gain on angle theta
acc_max – maximum linear acceleration of the pendulum pivot [m/s**2]
reset_domain_param – if True the domain parameters are reset if the they are present as a entry in the kwargs passed to reset(). If False they are ignored.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property E_gain: Get the energy gain, called \(\mu\) in the Quanser documentation.

property E_ref: Get the reference energy level.

forward(obs: Tensor) → Tensor[source]

Control step of energy-based controller which is used in the swing-up controller

Parameters:: obs – observations pre-processed in the forward method of QQubeSwingUpAndBalanceCtrl
Returns:: action

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

reset(**kwargs)[source]: If desired, set the domain parameters defining the controller’s model using a dict called domain_param.

training: bool

class QQubeGoToLimCtrl(positive: bool = True, cnt_done: int = 250)[source]

Bases: object

Controller for going to one of the joint limits (part of the calibration routine)

Constructor

Parameters:: positive – direction switch

class QQubePDCtrl(env_spec: EnvSpec, pd_gains: Tensor = tensor([4.0, 0.0, 1.0, 0.0]), th_des: float = 0.0, al_des: float = 0.0, tols: Tensor = tensor([0.0262, 0.0087, 0.0017, 0.0017], dtype=torch.float64), use_cuda: bool = False)[source]

Bases: Policy

PD-controller for the Quanser Qube. Drives Qube to \(x_{des} = [\theta_{des}, \alpha_{des}, 0.0, 0.0]\). Flag done is set when \(|x_des - x| < tol\).

Constructor

Parameters:

env_spec – environment specification
pd_gains – controller gains, the default values stabilize the pendulum at the center hanging down
th_des – desired rotary pole angle [rad]
al_des – desired pendulum pole angle [rad]
tols – tolerances for the desired angles \(\theta\) and \(\alpha\) [rad]
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

forward(meas: Tensor) → Tensor[source]

Get the action according to the policy and the observations (forward pass).

Parameters:

args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state

Returns:

outputs, e.g. an action or an action and a hidden state

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

training: bool

class QQubeSwingUpAndBalanceCtrl(env_spec: EnvSpec, ref_energy: float = 0.025, energy_gain: float = 50.0, energy_th_gain: float = 0.4, acc_max: float = 5.0, alpha_max_pd_enable: float = 20.0, pd_gains: Tensor = tensor([-2.0, 35.0, -1.5, 3.0]), reset_domain_param: bool = True, use_cuda: bool = False)[source]

Bases: Policy

Hybrid controller (QQubeEnergyCtrl, QQubePDCtrl) switching based on the pendulum pole angle alpha

Note

Extracted Quanser’s values from q_qube2_swingup.mdl

Constructor

Parameters:

env_spec – environment specification
ref_energy – reference energy level
energy_gain – P-gain on the difference to the reference energy
energy_th_gain – P-gain on angle theta for the Energy controller. This term does not exist in Quanser’s implementation. Its purpose it to keep the Qube from moving too much around the vertical axis, i.e. prevent bouncing against the mechanical boundaries.
acc_max – maximum acceleration
alpha_max_pd_enable – angle threshold for enabling the PD -controller [deg]
pd_gains – gains for the PD-controller
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

Note

The controller’s parameters strongly depend on the frequency at which it is operating.

forward(obs: tensor)[source]

Get the action according to the policy and the observations (forward pass).

Parameters:

args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state

Returns:

outputs, e.g. an action or an action and a hidden state

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'qq-sub'

pd_enabled(cos_al: [<class 'float'>, <class 'torch.Tensor'>]) → bool[source]

Check if the PD-controller should be enabled based oin a predefined threshold on the alpha angle.

Parameters:: cos_al – cosine of the pendulum pole angle
Returns:: bool if condition is met

reset(**kwargs)[source]: Reset the policy’s internal state. This should be called at the start of a rollout. The default implementation does nothing.

create_mg_joint_pos_policy(env: SimEnv, t_strike_end: float = 0.5) → TimePolicy[source]

Create a policy that executes the strike for mini golf by setting joint position commands. Used in the experiments of [1].

special

domain_distribution

environment_specific

Module contents