feed_back

dual_rfb

class DualRBFLinearPolicy(spec: EnvSpec, rbf_hparam: dict, dim_mask: int = 2, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: LinearPolicy

A linear policy with RBF features which are also used to get the derivative of the features. The use-case in mind is a simple policy which generates the joint position and joint velocity commands for the internal PD-controller of a robot (e.g. Barrett WAM). By re-using the RBF, we reduce the number of parameters, while we can at the same time get the velocity information from the features, i.e. the derivative of the normalized Gaussians.

Constructor

Parameters:
  • spec – specification of environment

  • rbf_hparam – hyper-parameters for the RBF-features, see RBFFeat

  • dim_mask – number of RBF features to mask out at the beginning and the end of every dimension, pass 1 to remove the first and the last features for the policy, pass 0 to use all RBF features. Masking out RBFs makes sense if you want to obtain a smooth starting behavior.

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

forward(obs: Tensor) Tensor[source]

Evaluate the features at the given observation or use given feature values

Parameters:

obs – observations from the environment

Returns:

actions

name: str = 'dualrbf'

fnn

class DiscreteActQValPolicy(spec: EnvSpec, net: Module, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: Policy

State-action value (Q-value) feed-forward neural network policy for discrete actions

Constructor

Parameters:
  • spec – environment specification

  • net – module that approximates the Q-values given the observations and possible (discrete) actions. Make sure to create this object with the correct input and output sizes by using DiscreteActQValPolicy.get_qfcn_input_size() and DiscreteActQValPolicy.get_qfcn_output_size().

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

forward(obs: Tensor) Tensor[source]

Get the action according to the policy and the observations (forward pass).

Parameters:
  • args – inputs, e.g. an observation from the environment or an observation and a hidden state

  • kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state

Returns:

outputs, e.g. an action or an action and a hidden state

static get_qfcn_input_size(spec: EnvSpec) int[source]

Get the flat input size.

static get_qfcn_output_size() int[source]

Get the flat output size.

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:
  • init_values – tensor of fixed initial policy parameter values

  • kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'discrqval'
q_values_argmax(obs: Tensor) Tensor[source]

Compute the state-action values for the given observations and the actions that maximize the estimated Q-Values. Since we operate on a discrete action space, we can construct a table.

Parameters:

obs – current observations

Returns:

Q-values for state-action combinations where the argmax actions, dimension equals flat action space dimension

class FNN(input_size: int, output_size: int, hidden_sizes: Sequence[int], hidden_nonlin: Union[Callable, Sequence[Callable]], dropout: Optional[float] = 0.0, output_nonlin: Optional[Callable] = None, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: Module

Feed-forward neural network

Constructor

Parameters:
  • input_size – number of inputs

  • output_size – number of outputs

  • hidden_sizes – sizes of hidden layers (every entry creates one hidden layer)

  • hidden_nonlin – nonlinearity for hidden layers

  • dropout – dropout probability, default = 0 deactivates dropout

  • output_nonlin – nonlinearity for output layer

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property device: str

Get the device (CPU or GPU) on which the FNN is stored.

forward(obs: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the network’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – Tensor of fixed initial network parameter values

property param_values: Tensor

Get the parameters of the policy as 1d array. The values are copied, modifying the return value does not propagate to the actual policy parameters.

training: bool
class FNNPolicy(spec: EnvSpec, hidden_sizes: Sequence[int], hidden_nonlin: Union[Callable, Sequence[Callable]], dropout: Optional[float] = 0.0, output_nonlin: Optional[Callable] = None, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: Policy

Feed-forward neural network policy

Constructor

Parameters:
  • spec – environment specification

  • hidden_sizes – sizes of hidden layer outputs. Every entry creates one hidden layer.

  • hidden_nonlin – nonlinearity for hidden layers

  • dropout – dropout probability, default = 0 deactivates dropout

  • output_nonlin – nonlinearity for output layer

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

forward(obs: Tensor) Tensor[source]

Get the action according to the policy and the observations (forward pass).

Parameters:
  • args – inputs, e.g. an observation from the environment or an observation and a hidden state

  • kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state

Returns:

outputs, e.g. an action or an action and a hidden state

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:
  • init_values – tensor of fixed initial policy parameter values

  • kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'fnn'

linear

class LinearPolicy(spec: EnvSpec, feats: FeatureStack, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: Policy

A linear policy defined by the inner product of nonlinear features of the observations with the policy parameters

Constructor

Parameters:
  • spec – specification of environment

  • feats – list of feature functions

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the module to the GPU, False (default) to use the CPU

eval_feats(obs: Tensor) Tensor[source]

Evaluate the features for the given observations.

Parameters:

obs – observation from the environment

Return feats_val:

the features’ values

property features: FeatureStack

Get the (nonlinear) feature transformations.

forward(obs: Tensor) Tensor[source]

Evaluate the features at the given observation or use given feature values

Parameters:

obs – observations from the environment

Returns:

actions

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:
  • init_values – tensor of fixed initial policy parameter values

  • kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'lin'

two_headed_fnn

class TwoHeadedFNNPolicy(spec: EnvSpec, shared_hidden_sizes: Sequence[int], shared_hidden_nonlin: [Callable, Sequence[Callable]], head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: TwoHeadedPolicy

Policy architecture which has a common body and two heads that have a separate last layer

Constructor

Parameters:
  • spec – environment specification

  • shared_hidden_sizes – sizes of shared hidden layer outputs. Every entry creates one shared hidden layer.

  • shared_hidden_nonlin – nonlinearity for the shared hidden layers

  • head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim

  • head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim

  • head_1_output_nonlin – nonlinearity for output layer of the first head

  • head_2_output_nonlin – nonlinearity for output layer of the second head

  • shared_dropout – dropout probability, default = 0 deactivates dropout

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

forward(obs: Tensor) Tuple[Tensor, Tensor][source]

Get the action according to the policy and the observations (forward pass).

Parameters:
  • args – inputs, e.g. an observation from the environment or an observation and a hidden state

  • kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state

Returns:

outputs, e.g. an action or an action and a hidden state

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:
  • init_values – tensor of fixed initial policy parameter values

  • kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'thfnn'

Module contents