special
domain_distribution
- class DomainDistrParamPolicy(mapping: Dict[int, Tuple[str, str]], trafo_mask: Union[list, Tensor], prior: Optional[DomainRandomizer] = None, scale_params: bool = False, use_cuda: bool = False)[source]
Bases:
Policy
A proxy to the Policy class in order to use the policy’s parameters as domain distribution parameters
Constructor
- Parameters:
mapping – mapping from subsequent integers to domain distribution parameters, where the first string of the value tuple is the name of the domain parameter (e.g. mass, length) and the second string is the name of the distribution parameter (e.g. mean, std). The integers are indices of the numpy array which come from the algorithm.
trafo_mask – every domain parameter that is set to True in this mask will be learned via a ‘virtual’ parameter, i.e. in sqrt-space, and then finally squared to retrieve the domain parameter. This transformation is useful to avoid setting a negative variance.
prior – prior believe about the distribution parameters in from of a DomainRandomizer
scale_params – if True, the sqrt-transformed policy parameters are scaled in the range of \([-0.5, 0.5]\). The advantage of this is to make the parameter-based exploration easier.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- forward(obs: Optional[Tensor] = None) Tensor [source]
Get the action according to the policy and the observations (forward pass).
- Parameters:
args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state
- Returns:
outputs, e.g. an action or an action and a hidden state
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- property mapping: Dict[int, Tuple[str, str]]
Get the mapping from subsequent integers to domain distribution parameters, where the first string of the value tuple is the name of the domain parameter and the second string is the name of the distribution parameter.
- name: str = 'ddp'
- transform_to_ddp_space(params: Tensor) Tensor [source]
Get the transformed domain distribution parameters. Where ever the mask is True, the corresponding policy parameter is learned in sqrt space. Moreover, the policy parameters can be scaled.
- Parameters:
params – policy parameters (can be the sqrt of the actual domain distribution parameter value)
- Returns:
policy parameters transformed according to the mask
environment_specific
- class QBallBalancerPDCtrl(env_spec: EnvSpec, state_des: Tensor = tensor([0.0, 0.0]), kp: Optional[Tensor] = None, kd: Optional[Tensor] = None, use_cuda: bool = False)[source]
Bases:
Policy
PD-controller for the Quanser Ball Balancer. The only but significant difference of this controller to the other PD controller is the clipping of the actions.
Note
This class’s desired state specification deviates from the Pyrado policies which interact with a Task.
Constructor
- Parameters:
env_spec – environment specification
state_des – tensor of desired x and y ball position [m]
kp – 2x2 tensor of constant controller feedback coefficients for error [V/m]
kd – 2x2 tensor of constant controller feedback coefficients for error time derivative [Vs/m]
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- forward(obs: Tensor) Tensor [source]
Calculate the controller output.
- Parameters:
obs – observation from the environment
- Return act:
controller output [V]
- init_param(kp: Optional[Tensor] = None, kd: Optional[Tensor] = None, verbose: bool = False, **kwargs)[source]
Initialize controller parameters.
- Parameters:
kp – 2x2 tensor of constant controller feedback coefficients for error [V/m]
kd – 2x2 tensor of constant controller feedback coefficients for error time derivative [Vs/m]
verbose – print the controller’s gains
- name: str = 'qbb-pd'
- class QCartPoleGoToLimCtrl(init_state: ndarray, positive: bool = True)[source]
Bases:
object
Controller for going to one of the joint limits (part of the calibration routine)
Constructor
- Parameters:
init_state – initial state of the system
positive – direction switch
- class QCartPoleSwingUpAndBalanceCtrl(env_spec: EnvSpec, long: bool = False, use_cuda: bool = False)[source]
Bases:
Policy
Swing-up and balancing controller for the Quanser Cart-Pole
Constructor
- Parameters:
env_spec – environment specification
long – flag for long or short pole
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property K_pd
- forward(obs: Tensor) Tensor [source]
Calculate the controller output.
- Parameters:
obs – observation from the environment
- Return act:
controller output [V]
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- property k_e
- property k_p
- name: str = 'qcp-sub'
- property u_max
- class QQubeEnergyCtrl(env_spec: EnvSpec, ref_energy: float, energy_gain: float, th_gain: float, acc_max: float, reset_domain_param: bool = True, use_cuda: bool = False)[source]
Bases:
Policy
Energy-based controller used to swing the Furuta pendulum up
Constructor
- Parameters:
env_spec – environment specification
ref_energy – reference energy level [J]
energy_gain – P-gain on the energy [m/s/J]
th_gain – P-gain on angle theta
acc_max – maximum linear acceleration of the pendulum pivot [m/s**2]
reset_domain_param – if True the domain parameters are reset if the they are present as a entry in the kwargs passed to reset(). If False they are ignored.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property E_gain
Get the energy gain, called \(\mu\) in the Quanser documentation.
- property E_ref
Get the reference energy level.
- forward(obs: Tensor) Tensor [source]
Control step of energy-based controller which is used in the swing-up controller
- Parameters:
obs – observations pre-processed in the forward method of QQubeSwingUpAndBalanceCtrl
- Returns:
action
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- reset(**kwargs)[source]
If desired, set the domain parameters defining the controller’s model using a dict called domain_param.
- training: bool
- class QQubeGoToLimCtrl(positive: bool = True, cnt_done: int = 250)[source]
Bases:
object
Controller for going to one of the joint limits (part of the calibration routine)
Constructor
- Parameters:
positive – direction switch
- class QQubePDCtrl(env_spec: EnvSpec, pd_gains: Tensor = tensor([4.0, 0.0, 1.0, 0.0]), th_des: float = 0.0, al_des: float = 0.0, tols: Tensor = tensor([0.0262, 0.0087, 0.0017, 0.0017], dtype=torch.float64), use_cuda: bool = False)[source]
Bases:
Policy
PD-controller for the Quanser Qube. Drives Qube to \(x_{des} = [\theta_{des}, \alpha_{des}, 0.0, 0.0]\). Flag done is set when \(|x_des - x| < tol\).
Constructor
- Parameters:
env_spec – environment specification
pd_gains – controller gains, the default values stabilize the pendulum at the center hanging down
th_des – desired rotary pole angle [rad]
al_des – desired pendulum pole angle [rad]
tols – tolerances for the desired angles \(\theta\) and \(\alpha\) [rad]
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- forward(meas: Tensor) Tensor [source]
Get the action according to the policy and the observations (forward pass).
- Parameters:
args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state
- Returns:
outputs, e.g. an action or an action and a hidden state
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- training: bool
- class QQubeSwingUpAndBalanceCtrl(env_spec: EnvSpec, ref_energy: float = 0.025, energy_gain: float = 50.0, energy_th_gain: float = 0.4, acc_max: float = 5.0, alpha_max_pd_enable: float = 20.0, pd_gains: Tensor = tensor([-2.0, 35.0, -1.5, 3.0]), reset_domain_param: bool = True, use_cuda: bool = False)[source]
Bases:
Policy
Hybrid controller (QQubeEnergyCtrl, QQubePDCtrl) switching based on the pendulum pole angle alpha
Note
Extracted Quanser’s values from q_qube2_swingup.mdl
Constructor
- Parameters:
env_spec – environment specification
ref_energy – reference energy level
energy_gain – P-gain on the difference to the reference energy
energy_th_gain – P-gain on angle theta for the Energy controller. This term does not exist in Quanser’s implementation. Its purpose it to keep the Qube from moving too much around the vertical axis, i.e. prevent bouncing against the mechanical boundaries.
acc_max – maximum acceleration
alpha_max_pd_enable – angle threshold for enabling the PD -controller [deg]
pd_gains – gains for the PD-controller
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
Note
The controller’s parameters strongly depend on the frequency at which it is operating.
- forward(obs: tensor)[source]
Get the action according to the policy and the observations (forward pass).
- Parameters:
args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state
- Returns:
outputs, e.g. an action or an action and a hidden state
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- name: str = 'qq-sub'
- create_mg_joint_pos_policy(env: SimEnv, t_strike_end: float = 0.5) TimePolicy [source]
Create a policy that executes the strike for mini golf by setting joint position commands. Used in the experiments of [1].
See also
[1] F. Muratore, T. Gruner, F. Wiese, B. Belousov, M. Gienger, J. Peters, “TITLE”, VENUE, YEAR
- Parameters:
env – mini golf simulation environment
:param t_strike_end:time when to finish the movement [s] :return: policy which executes the strike solely dependent on the time
- create_pend_excitation_policy(env: PendulumSim, num_rollouts: int, f_sin: float = 1.0) PlaybackPolicy [source]
Create a policy that returns a previously recorded action time series. Used in the experiments of [1].
See also
[1] F. Muratore, T. Gruner, F. Wiese, B. Belousov, M. Gienger, J. Peters, “TITLE”, VENUE, YEAR
- Parameters:
env – pendulum simulation environment
num_rollouts – number of rollouts to store in the policy’s buffer
f_sin – frequency of the sinus [Hz]
- Returns:
policy with recorded action time series
- wam_jsp_7dof_sin(t: float, flip_sign: bool = False)[source]
A sin-based excitation function for the 7-DoF WAM, describing desired a desired joint angle offset and its velocity at every point in time
- Parameters:
t – time
flip_sign – if True, flip the sign
- Returns:
joint angle positions and velocities