recurrent
adn
- class ADNPolicy(spec: ~pyrado.utils.data_types.EnvSpec, activation_nonlin: [typing.Callable, typing.Sequence[typing.Callable]], potentials_dyn_fcn: ~typing.Callable, obs_layer: [<class 'torch.nn.modules.module.Module'>, <class 'pyrado.policies.base.Policy'>] = None, tau_init: float = 10.0, tau_learnable: bool = True, kappa_init: float = 0.001, kappa_learnable: bool = True, capacity_learnable: bool = True, potential_init_learnable: bool = False, init_param_kwargs: dict = None, use_cuda: bool = False)[source]
Bases:
PotentialBasedPolicy
Activation Dynamic Network (ADN)
Note
The policy’s outputs are a nonlinear function of the potentials. Thus, you have to make sure that the output range of that matches the action space of the environment.
See also
[1] T. Luksch, M. Gineger, M. Mühlig, T. Yoshiike, “Adaptive Movement Sequences and Predictive Decisions based on Hierarchical Dynamical Systems”, IROS, 2012
Constructor
- Parameters:
spec – environment specification
activation_nonlin – nonlinearity for output layer, highly suggested functions: to.sigmoid for position to.tasks, tanh for velocity tasks
potentials_dyn_fcn – function to compute the derivative of the neurons’ potentials
obs_layer – specify a custom Pytorch Module; by default (None) a linear layer with biases is used
tau_init – initial value for the shared time constant of the potentials
tau_learnable – flag to determine if the time constant is a learnable parameter or fixed
kappa_init – initial value for the cubic decay
kappa_learnable – flag to determine if cubic decay is a learnable parameter or fixed
capacity_learnable – flag to determine if capacity is a learnable parameter or fixed
potential_init_learnable – flag to determine if the initial potentials are a learnable parameter or fixed
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property capacity: [None, <class 'torch.Tensor'>]
Get the capacity parameter (exists for capacity-based dynamics functions), else return None.
- extra_repr() str [source]
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]
- Parameters:
obs – observation from the environment
hidden – the network’s hidden state. If None, use init_hidden()
- Returns:
action to be taken and new hidden state
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- name: str = 'adn'
- potentials_dot(potentials: Tensor, stimuli: Tensor) Tensor [source]
Compute the derivative of the neurons’ potentials per time step. \(/tau /dot{u} = f(u, s, h)\)
- Parameters:
potentials – current potential values
stimuli – sum of external and internal stimuli at the current point in time
- Returns:
time derivative of the potentials
- pd_capacity_21(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor [source]
Capacity-based dynamics with 2 stable (\(p=-C\), \(p=C\)) and 1 unstable fix points (\(p=0\)) for \(s=0\)
\(\tau \dot{p} = s - (h - p) (1 - \frac{(h - p)^2}{C^2})\)
Note
Intended to be used with sigmoid activation function, e.g. for the position tasks in RcsPySim.
- Parameters:
p – potential, higher values lead to higher activations
s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)
h – resting level, a.k.a. constant offset
tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)
kwargs – additional parameters to the potential dynamics
- pd_capacity_21_abs(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor [source]
Capacity-based dynamics with 2 stable (\(p=-C\), \(p=C\)) and 1 unstable fix points (\(p=0\)) for \(s=0\)
\(\tau \dot{p} = s - (h - p) (1 - \frac{\left| h - p \right|}{C})\)
The “absolute version” of pd_capacity_21 has a lower magnitude and a lower oder of the resulting polynomial.
Note
Intended to be used with sigmoid activation function, e.g. for the position tasks in RcsPySim.
- Parameters:
p – potential, higher values lead to higher activations
s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)
h – resting level, a.k.a. constant offset
tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)
kwargs – additional parameters to the potential dynamics
- pd_capacity_32(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor [source]
Capacity-based dynamics with 3 stable (\(p=-C\), \(p=0\), \(p=C\)) and 2 unstable fix points (\(p=-C/2\), \(p=C/2\)) for \(s=0\)
\(\tau \dot{p} = s - (h - p) (1 - \frac{(h - p)^2}{C^2}) (1 - \frac{(2(h - p))^2}{C^2})\)
Note
Intended to be used with tanh activation function, e.g. for the velocity tasks in RcsPySim.
- Parameters:
p – potential, higher values lead to higher activations
s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)
h – resting level, a.k.a. constant offset
tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)
kwargs – additional parameters to the potential dynamics
- pd_capacity_32_abs(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor [source]
Capacity-based dynamics with 3 stable (\(p=-C\), \(p=0\), \(p=C\)) and 2 unstable fix points (\(p=-C/2\), \(p=C/2\)) for \(s=0\)
\(\tau \dot{p} = \left( s + (h - p) (1 - \frac{\left| (h - p) \right|}{C}) (1 - \frac{2 \left| (h - p) \right|}{C}) \right)\)
The “absolute version” of pd_capacity_32 is less skewed due to a lower oder of the resulting polynomial.
Note
Intended to be used with tanh activation function, e.g. for the velocity tasks in RcsPySim.
- Parameters:
p – potential, higher values lead to higher activations
s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)
h – resting level, a.k.a. constant offset
tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)
kwargs – additional parameters to the potential dynamics
- pd_cubic(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor [source]
Basic proportional dynamics with additional cubic decay
\(\tau \dot{p} = s + h - p + \kappa (h - p)^3\)
- Parameters:
p – potential, higher values lead to higher activations
s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)
h – resting level, a.k.a. constant offset
tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)
kwargs – additional parameters to the potential dynamics
- pd_linear(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor [source]
Basic proportional dynamics
\(\tau \dot{p} = s - p\)
- Parameters:
p – potential, higher values lead to higher activations
s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)
h – resting level, a.k.a. constant offset
tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)
kwargs – additional parameters to the potential dynamics
base
- class RecurrentPolicy(spec: EnvSpec, use_cuda: bool)[source]
Bases:
Policy
,ABC
Base class for recurrent policies. The policy does not store the hidden state on it’s own, so it requires two arguments: (observation, hidden) and returns two values: (action, new_hidden). The hidden tensor is an 1-dim vector of state variables with unspecified meaning. In the batching case, it should be a 2-dim array, where the first dimension is the batch size matching that of the observations.
Constructor
- Parameters:
spec – environment specification
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- abstract evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') Tensor [source]
Re-evaluate the given rollout and return a derivable action tensor. This method makes sure that the gradient is propagated through the hidden state.
- Parameters:
rollout – complete rollout
hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Change this string for value functions.
- Returns:
actions with gradient data
- abstract forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]
- Parameters:
obs – observation from the environment
hidden – the network’s hidden state. If None, use init_hidden()
- Returns:
action to be taken and new hidden state
Get the number of hidden state variables.
Provide initial values for the hidden parameters. This should usually be a zero tensor.
- Parameters:
batch_size – number of states to track in parallel
- Returns:
Tensor of batch_size x hidden_size
- property is_recurrent: bool
Bool to signalise it the policy has a recurrent architecture.
- script() ScriptModule [source]
Create a ScriptModule from this policy. The returned module will always have the signature action = tm(observation, hidden). For recurrent networks, it returns a stateful module that keeps the hidden states internally. Such modules have a reset() method to reset the hidden states.
- training: bool
- class StatefulRecurrentNetwork(net: RecurrentPolicy)[source]
Bases:
Module
A scripted wrapper for a recurrent neural network that stores the hidden state.
Note
Use this for transfer to C++.
Constructor
- Parameters:
net – non-recurrent network to wrap
Note
Must not be a script module
- forward(inp)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- input_size: int
- output_size: int
Pack the hidden state returned by torch.nn.RNNBase subclasses into an 1d state vector. This is the reverse operation of default_unpack_hidden.
- Parameters:
hidden – unpacked hidden state, a tensor of num_recurrent_layers x batch_size x hidden_size
num_recurrent_layers – number of recurrent layers
hidden_size – size of the hidden layers (all equal)
batch_size – if not none, the result should be 2d, and the first dimension represents parts of a data batch
- Returns:
packed hidden state.
Unpack the flat hidden state vector into the form expected by torch.nn.RNNBase subclasses.
- Parameters:
hidden – packed hidden state
num_recurrent_layers – number of recurrent layers
hidden_size – size of the hidden layers (all equal)
batch_size – if not none, hidden is 2d, and the first dimension represents parts of a data batch
- Returns:
unpacked hidden state, a tensor of num_recurrent_layers x batch_size x hidden_size.
neural_fields
- class NFPolicy(spec: ~pyrado.utils.data_types.EnvSpec, hidden_size: int, obs_layer: [<class 'torch.nn.modules.module.Module'>, <class 'pyrado.policies.base.Policy'>] = None, activation_nonlin: ~typing.Callable = <built-in method sigmoid of type object>, mirrored_conv_weights: bool = True, conv_out_channels: int = 1, conv_kernel_size: int = None, conv_padding_mode: str = 'circular', tau_init: float = 10.0, tau_learnable: bool = True, kappa_init: float = 0.0, kappa_learnable: bool = True, potential_init_learnable: bool = False, init_param_kwargs: dict = None, use_cuda: bool = False)[source]
Bases:
PotentialBasedPolicy
Neural Fields (NF)
Note
The policy’s outputs are a nonlinear function of the potentials. Thus, you have to make sure that the output range of that matches the action space of the environment.
See also
[1] S.-I. Amari “Dynamics of Pattern Formation in Lateral-Inhibition Type Neural Fields”, Biological Cybernetics, 1977
Constructor
- Parameters:
spec – environment specification
hidden_size – number of neurons with potential
obs_layer – specify a custom PyTorch Module, by default (None) a linear layer with biases is used
activation_nonlin – nonlinearity to compute the activations from the potential levels
mirrored_conv_weights – re-use weights for the second half of the kernel to create a “symmetric” kernel
conv_out_channels – number of filter for the 1-dim convolution along the potential-based neurons
conv_kernel_size – size of the kernel for the 1-dim convolution along the potential-based neurons
tau_init – initial value for the shared time constant of the potentials
tau_learnable – flag to determine if the time constant is a learnable parameter or fixed
kappa_init – initial value for the cubic decay, pass 0 (default) to disable cubic decay
kappa_learnable – flag to determine if cubic decay is a learnable parameter or fixed
potential_init_learnable – flag to determine if the initial potentials are a learnable parameter or fixed
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]
- Parameters:
obs – observation from the environment
hidden – the network’s hidden state. If None, use init_hidden()
- Returns:
action to be taken and new hidden state
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- name: str = 'nf'
- potentials_dot(potentials: Tensor, stimuli: Tensor) Tensor [source]
Compute the derivative of the neurons’ potentials per time step. \(/tau /dot{u} = s + h - u + /kappa (h - u)^3, /quad /text{with} s = s_{int} + s_{ext} = W*o + /int{w(u, v) f(u) dv}\)
- Parameters:
potentials – current potential values
stimuli – sum of external and internal stimuli at the current point in time
- Returns:
time derivative of the potentials
potential_based
- class PotentialBasedPolicy(spec: ~pyrado.utils.data_types.EnvSpec, obs_layer: [<class 'torch.nn.modules.module.Module'>, <class 'pyrado.policies.base.Policy'>], activation_nonlin: ~typing.Callable, tau_init: float, tau_learnable: bool, kappa_init: float, kappa_learnable: bool, potential_init_learnable: bool, use_cuda: bool, hidden_size: ~typing.Optional[int] = None)[source]
Bases:
RecurrentPolicy
,ABC
Base class for policies that work with potential-based neutral networks
Constructor
- Parameters:
spec – environment specification
obs_layer – specify a custom PyTorch Module, by default (None) a linear layer with biases is used
activation_nonlin – nonlinearity to compute the activations from the potential levels
tau_init – initial value for the shared time constant of the potentials
tau_learnable – flag to determine if the time constant is a learnable parameter or fixed
kappa_init – initial value for the cubic decay, pass 0 (default) to disable cubic decay
kappa_learnable – flag to determine if cubic decay is a learnable parameter or fixed
potential_init_learnable – flag to determine if the initial potentials are a learnable parameter or fixed
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
hidden_size – number of neurons with potential, by default None which sets the number of hidden neurons to the flat number of actions (in order to be compatible with ADNPolicy)
- evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') Tensor [source]
Re-evaluate the given rollout and return a derivable action tensor. This method makes sure that the gradient is propagated through the hidden state.
- Parameters:
rollout – complete rollout
hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Change this string for value functions.
- Returns:
actions with gradient data
- extra_repr() str [source]
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- abstract forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]
- Parameters:
obs – observation from the environment
hidden – the network’s hidden state. If None, use init_hidden()
- Returns:
action to be taken and new hidden state
Get the number of hidden state variables.
Provide initial values for the hidden parameters. This should usually be a zero tensor.
- Parameters:
batch_size – number of states to track in parallel
- Returns:
Tensor of batch_size x hidden_size
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- property kappa: Tensor
Get the cubic decay parameter.
- name: str = None
- abstract potentials_dot(potentials: Tensor, stimuli: Tensor) Tensor [source]
Compute the derivative of the neurons’ potentials per time step.
- Parameters:
potentials – current potential values
stimuli – sum of external and internal stimuli at the current point in time
- Returns:
time derivative of the potentials
- property stimuli_external: Tensor
Get the neurons’ external stimuli, resulting from the current observations. This is used for recording during a rollout.
- property stimuli_internal: Tensor
Get the neurons’ internal stimuli, resulting from the previous activations of the neurons. This is used for recording during a rollout.
- property tau: Tensor
Get the time scale parameter.
rnn
- class GRUPolicy(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]
Bases:
RNNPolicyBase
Policy backed by a multi-layer GRU
Constructor
- Parameters:
spec – environment specification
hidden_size – size of the hidden layers (all equal)
num_recurrent_layers – number of equally sized hidden layers
output_nonlin – nonlinearity for output layer
dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- name: str = 'gru'
- recurrent_network_type
alias of
GRU
- class LSTMPolicy(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]
Bases:
RNNPolicyBase
Policy backed by a multi-layer LSTM
Constructor
- Parameters:
spec – environment specification
hidden_size – size of the hidden layers (all equal)
num_recurrent_layers – number of equally sized hidden layers
output_nonlin – nonlinearity for output layer
dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
Get the number of hidden state variables.
- name: str = 'lstm'
- recurrent_network_type
alias of
LSTM
- class RNNPolicy(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, hidden_nonlin: str = 'tanh', output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]
Bases:
RNNPolicyBase
Policy backed by a multi-layer RNN
Constructor
- Parameters:
spec – environment specification
hidden_size – size of the hidden layers (all equal)
num_recurrent_layers – number of equally sized hidden layers
hidden_nonlin – nonlinearity for the hidden rnn layers, either ‘tanh’ or ‘relu’
output_nonlin – nonlinearity for output layer
dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- name: str = 'rnn'
- recurrent_network_type
alias of
RNN
- class RNNPolicyBase(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]
Bases:
RecurrentPolicy
Base class for recurrent policies wrapping torch.nn.RNNBase subclasses
Constructor
- Parameters:
spec – environment specification
hidden_size – size of the hidden layers (all equal)
num_recurrent_layers – number of equally sized hidden layers
output_nonlin – nonlinearity for output layer
dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') Tensor [source]
Re-evaluate the given rollout and return a derivable action tensor. This method makes sure that the gradient is propagated through the hidden state.
- Parameters:
rollout – complete rollout
hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Change this string for value functions.
- Returns:
actions with gradient data
- forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]
- Parameters:
obs – observation from the environment
hidden – the network’s hidden state. If None, use init_hidden()
- Returns:
action to be taken and new hidden state
Get the number of hidden state variables.
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- recurrent_network_type = None
- training: bool
two_headed_rnn
- class TwoHeadedGRUPolicy(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]
Bases:
TwoHeadedRNNPolicyBase
Two-headed policy backed by a multi-layer GRU
Constructor
- Parameters:
spec – environment specification
shared_hidden_size – size of the hidden layers (all equal)
shared_num_recurrent_layers – number of recurrent layers
head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim
head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim
head_1_output_nonlin – nonlinearity for output layer of the first head
head_2_output_nonlin – nonlinearity for output layer of the second head
shared_dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- name: str = 'thgru'
- recurrent_network_type
alias of
GRU
- class TwoHeadedLSTMPolicy(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]
Bases:
TwoHeadedRNNPolicyBase
Two-headed policy backed by a multi-layer LSTM
Constructor
- Parameters:
spec – environment specification
shared_hidden_size – size of the hidden layers (all equal)
shared_num_recurrent_layers – number of recurrent layers
head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim
head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim
head_1_output_nonlin – nonlinearity for output layer of the first head
head_2_output_nonlin – nonlinearity for output layer of the second head
shared_dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
Get the number of hidden state variables.
- name: str = 'thlstm'
- recurrent_network_type
alias of
LSTM
- class TwoHeadedRNNPolicy(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, shared_hidden_nonlin: str = 'tanh', head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]
Bases:
TwoHeadedRNNPolicyBase
Two-headed policy backed by a multi-layer RNN
Constructor
- Parameters:
spec – environment specification
shared_hidden_size – size of the hidden layers (all equal)
shared_num_recurrent_layers – number of recurrent layers
shared_hidden_nonlin – nonlinearity for the shared hidden rnn layers, either ‘tanh’ or ‘relu’
head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim
head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim
head_1_output_nonlin – nonlinearity for output layer of the first head
head_2_output_nonlin – nonlinearity for output layer of the second head
shared_dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- name: str = 'thrnn'
- recurrent_network_type
alias of
RNN
- class TwoHeadedRNNPolicyBase(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]
Bases:
TwoHeadedPolicy
,RecurrentPolicy
Base class for recurrent policies, which are wrapping torch.nn.RNNBase, and have a common body and two heads that have a separate last layer
Constructor
- Parameters:
spec – environment specification
shared_hidden_size – size of the hidden layers (all equal)
shared_num_recurrent_layers – number of recurrent layers
head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim
head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim
head_1_output_nonlin – nonlinearity for output layer of the first head
head_2_output_nonlin – nonlinearity for output layer of the second head
shared_dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') Tuple[Tensor, Tensor] [source]
Re-evaluate the given rollout and return a derivable action tensor. The default implementation simply calls forward().
- Parameters:
rollout – complete rollout
hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Defaults to ‘hidden_states’. Change for value functions.
- Returns:
actions with gradient data
- forward(obs: Tensor, hidden: Optional[Tensor] = None) Tuple[Tensor, Tensor, Tensor] [source]
- Parameters:
obs – observation from the environment
hidden – the network’s hidden state. If None, use init_hidden()
- Returns:
action to be taken and new hidden state
Get the number of hidden state variables.
- init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
Initialize the policy’s parameters. By default the parameters are initialized randomly.
- Parameters:
init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization
- recurrent_network_type = None
- training: bool