feed_back
dual_rfb
- class DualRBFLinearPolicy(spec: EnvSpec, rbf_hparam: dict, dim_mask: int = 2, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]
Bases:
LinearPolicy
A linear policy with RBF features which are also used to get the derivative of the features. The use-case in mind is a simple policy which generates the joint position and joint velocity commands for the internal PD-controller of a robot (e.g. Barrett WAM). By re-using the RBF, we reduce the number of parameters, while we can at the same time get the velocity information from the features, i.e. the derivative of the normalized Gaussians.
Constructor
- Parameters:
spec – specification of environment
rbf_hparam – hyper-parameters for the RBF-features, see RBFFeat
dim_mask – number of RBF features to mask out at the beginning and the end of every dimension, pass 1 to remove the first and the last features for the policy, pass 0 to use all RBF features. Masking out RBFs makes sense if you want to obtain a smooth starting behavior.
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- forward(obs: Tensor) Tensor [source]
Evaluate the features at the given observation or use given feature values
- Parameters:
obs – observations from the environment
- Returns:
actions
- name: str = 'dualrbf'
fnn
- class DiscreteActQValPolicy(spec: EnvSpec, net: Module, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]
Bases:
Policy
State-action value (Q-value) feed-forward neural network policy for discrete actions
Constructor
- Parameters:
spec – environment specification
net – module that approximates the Q-values given the observations and possible (discrete) actions. Make sure to create this object with the correct input and output sizes by using DiscreteActQValPolicy.get_qfcn_input_size() and DiscreteActQValPolicy.get_qfcn_output_size().
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- forward(obs: Tensor) Tensor [source]
Get the action according to the policy and the observations (forward pass).
- Parameters:
args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state
- Returns:
outputs, e.g. an action or an action and a hidden state
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- name: str = 'discrqval'
- q_values_argmax(obs: Tensor) Tensor [source]
Compute the state-action values for the given observations and the actions that maximize the estimated Q-Values. Since we operate on a discrete action space, we can construct a table.
- Parameters:
obs – current observations
- Returns:
Q-values for state-action combinations where the argmax actions, dimension equals flat action space dimension
- class FNN(input_size: int, output_size: int, hidden_sizes: Sequence[int], hidden_nonlin: Union[Callable, Sequence[Callable]], dropout: Optional[float] = 0.0, output_nonlin: Optional[Callable] = None, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]
Bases:
Module
Feed-forward neural network
Constructor
- Parameters:
input_size – number of inputs
output_size – number of outputs
hidden_sizes – sizes of hidden layers (every entry creates one hidden layer)
hidden_nonlin – nonlinearity for hidden layers
dropout – dropout probability, default = 0 deactivates dropout
output_nonlin – nonlinearity for output layer
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property device: str
Get the device (CPU or GPU) on which the FNN is stored.
- forward(obs: Tensor) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the network’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – Tensor of fixed initial network parameter values
- property param_values: Tensor
Get the parameters of the policy as 1d array. The values are copied, modifying the return value does not propagate to the actual policy parameters.
- training: bool
- class FNNPolicy(spec: EnvSpec, hidden_sizes: Sequence[int], hidden_nonlin: Union[Callable, Sequence[Callable]], dropout: Optional[float] = 0.0, output_nonlin: Optional[Callable] = None, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]
Bases:
Policy
Feed-forward neural network policy
Constructor
- Parameters:
spec – environment specification
hidden_sizes – sizes of hidden layer outputs. Every entry creates one hidden layer.
hidden_nonlin – nonlinearity for hidden layers
dropout – dropout probability, default = 0 deactivates dropout
output_nonlin – nonlinearity for output layer
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- forward(obs: Tensor) Tensor [source]
Get the action according to the policy and the observations (forward pass).
- Parameters:
args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state
- Returns:
outputs, e.g. an action or an action and a hidden state
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- name: str = 'fnn'
linear
- class LinearPolicy(spec: EnvSpec, feats: FeatureStack, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]
Bases:
Policy
A linear policy defined by the inner product of nonlinear features of the observations with the policy parameters
Constructor
- Parameters:
spec – specification of environment
feats – list of feature functions
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the module to the GPU, False (default) to use the CPU
- eval_feats(obs: Tensor) Tensor [source]
Evaluate the features for the given observations.
- Parameters:
obs – observation from the environment
- Return feats_val:
the features’ values
- property features: FeatureStack
Get the (nonlinear) feature transformations.
- forward(obs: Tensor) Tensor [source]
Evaluate the features at the given observation or use given feature values
- Parameters:
obs – observations from the environment
- Returns:
actions
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- name: str = 'lin'
two_headed_fnn
- class TwoHeadedFNNPolicy(spec: EnvSpec, shared_hidden_sizes: Sequence[int], shared_hidden_nonlin: [Callable, Sequence[Callable]], head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]
Bases:
TwoHeadedPolicy
Policy architecture which has a common body and two heads that have a separate last layer
Constructor
- Parameters:
spec – environment specification
shared_hidden_sizes – sizes of shared hidden layer outputs. Every entry creates one shared hidden layer.
shared_hidden_nonlin – nonlinearity for the shared hidden layers
head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim
head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim
head_1_output_nonlin – nonlinearity for output layer of the first head
head_2_output_nonlin – nonlinearity for output layer of the second head
shared_dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- forward(obs: Tensor) Tuple[Tensor, Tensor] [source]
Get the action according to the policy and the observations (forward pass).
- Parameters:
args – inputs, e.g. an observation from the environment or an observation and a hidden state
kwargs – inputs, e.g. an observation from the environment or an observation and a hidden state
- Returns:
outputs, e.g. an action or an action and a hidden state
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- name: str = 'thfnn'