feed_back

dual_rfb

class DualRBFLinearPolicy(spec: EnvSpec, rbf_hparam: dict, dim_mask: int = 2, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: LinearPolicy

A linear policy with RBF features which are also used to get the derivative of the features. The use-case in mind is a simple policy which generates the joint position and joint velocity commands for the internal PD-controller of a robot (e.g. Barrett WAM). By re-using the RBF, we reduce the number of parameters, while we can at the same time get the velocity information from the features, i.e. the derivative of the normalized Gaussians.

Constructor

Parameters:

spec – specification of environment
rbf_hparam – hyper-parameters for the RBF-features, see RBFFeat
dim_mask – number of RBF features to mask out at the beginning and the end of every dimension, pass 1 to remove the first and the last features for the policy, pass 0 to use all RBF features. Masking out RBFs makes sense if you want to obtain a smooth starting behavior.
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

forward(obs: Tensor) → Tensor[source]

Evaluate the features at the given observation or use given feature values

Parameters:: obs – observations from the environment
Returns:: actions

name: str = 'dualrbf'

fnn

class DiscreteActQValPolicy(spec: EnvSpec, net: Module, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: Policy

State-action value (Q-value) feed-forward neural network policy for discrete actions

Constructor

Parameters:

spec – environment specification
net – module that approximates the Q-values given the observations and possible (discrete) actions. Make sure to create this object with the correct input and output sizes by using DiscreteActQValPolicy.get_qfcn_input_size() and DiscreteActQValPolicy.get_qfcn_output_size().
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

forward(obs: Tensor) → Tensor[source]

Get the action according to the policy and the observations (forward pass).

Parameters:

args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state

Returns:

outputs, e.g. an action or an action and a hidden state

static get_qfcn_input_size(spec: EnvSpec) → int[source]: Get the flat input size.

static get_qfcn_output_size() → int[source]: Get the flat output size.

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'discrqval'

q_values_argmax(obs: Tensor) → Tensor[source]

Compute the state-action values for the given observations and the actions that maximize the estimated Q-Values. Since we operate on a discrete action space, we can construct a table.

Parameters:: obs – current observations
Returns:: Q-values for state-action combinations where the argmax actions, dimension equals flat action space dimension

class FNN(input_size: int, output_size: int, hidden_sizes: Sequence[int], hidden_nonlin: Union[Callable, Sequence[Callable]], dropout: Optional[float] = 0.0, output_nonlin: Optional[Callable] = None, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: Module

Feed-forward neural network

Constructor

Parameters:

input_size – number of inputs
output_size – number of outputs
hidden_sizes – sizes of hidden layers (every entry creates one hidden layer)
hidden_nonlin – nonlinearity for hidden layers
dropout – dropout probability, default = 0 deactivates dropout
output_nonlin – nonlinearity for output layer
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property device: str: Get the device (CPU or GPU) on which the FNN is stored.

forward(obs: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the network’s parameters. By default the parameters are initialized randomly.

Parameters:: init_values – Tensor of fixed initial network parameter values

property param_values: Tensor: Get the parameters of the policy as 1d array. The values are copied, modifying the return value does not propagate to the actual policy parameters.

training: bool

class FNNPolicy(spec: EnvSpec, hidden_sizes: Sequence[int], hidden_nonlin: Union[Callable, Sequence[Callable]], dropout: Optional[float] = 0.0, output_nonlin: Optional[Callable] = None, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: Policy

Feed-forward neural network policy

Constructor

Parameters:

spec – environment specification
hidden_sizes – sizes of hidden layer outputs. Every entry creates one hidden layer.
hidden_nonlin – nonlinearity for hidden layers
dropout – dropout probability, default = 0 deactivates dropout
output_nonlin – nonlinearity for output layer
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

forward(obs: Tensor) → Tensor[source]

Get the action according to the policy and the observations (forward pass).

Parameters:

args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state

Returns:

outputs, e.g. an action or an action and a hidden state

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'fnn'

linear

class LinearPolicy(spec: EnvSpec, feats: FeatureStack, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: Policy

A linear policy defined by the inner product of nonlinear features of the observations with the policy parameters

Constructor

Parameters:

spec – specification of environment
feats – list of feature functions
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the module to the GPU, False (default) to use the CPU

eval_feats(obs: Tensor) → Tensor[source]

Evaluate the features for the given observations.

Parameters:: obs – observation from the environment
Return feats_val:: the features’ values

property features: FeatureStack: Get the (nonlinear) feature transformations.

forward(obs: Tensor) → Tensor[source]

Evaluate the features at the given observation or use given feature values

Parameters:: obs – observations from the environment
Returns:: actions

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'lin'

two_headed_fnn

class TwoHeadedFNNPolicy(spec: EnvSpec, shared_hidden_sizes: Sequence[int], shared_hidden_nonlin: [Callable, Sequence[Callable]], head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: TwoHeadedPolicy

Policy architecture which has a common body and two heads that have a separate last layer

Constructor

Parameters:

spec – environment specification
shared_hidden_sizes – sizes of shared hidden layer outputs. Every entry creates one shared hidden layer.
shared_hidden_nonlin – nonlinearity for the shared hidden layers
head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim
head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim
head_1_output_nonlin – nonlinearity for output layer of the first head
head_2_output_nonlin – nonlinearity for output layer of the second head
shared_dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

forward(obs: Tensor) → Tuple[Tensor, Tensor][source]

Get the action according to the policy and the observations (forward pass).

Parameters:

args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state

Returns:

outputs, e.g. an action or an action and a hidden state

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'thfnn'

feed_back

dual_rfb

fnn

linear

two_headed_fnn

Module contents