recurrent

adn

class ADNPolicy(spec: ~pyrado.utils.data_types.EnvSpec, activation_nonlin: [typing.Callable, typing.Sequence[typing.Callable]], potentials_dyn_fcn: ~typing.Callable, obs_layer: [<class 'torch.nn.modules.module.Module'>, <class 'pyrado.policies.base.Policy'>] = None, tau_init: float = 10.0, tau_learnable: bool = True, kappa_init: float = 0.001, kappa_learnable: bool = True, capacity_learnable: bool = True, potential_init_learnable: bool = False, init_param_kwargs: dict = None, use_cuda: bool = False)[source]

Bases: PotentialBasedPolicy

Activation Dynamic Network (ADN)

Note

The policy’s outputs are a nonlinear function of the potentials. Thus, you have to make sure that the output range of that matches the action space of the environment.

base

class RecurrentPolicy(spec: EnvSpec, use_cuda: bool)[source]

Bases: Policy, ABC

Base class for recurrent policies. The policy does not store the hidden state on it’s own, so it requires two arguments: (observation, hidden) and returns two values: (action, new_hidden). The hidden tensor is an 1-dim vector of state variables with unspecified meaning. In the batching case, it should be a 2-dim array, where the first dimension is the batch size matching that of the observations.

Constructor

Parameters:

spec – environment specification
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

abstract evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') → Tensor[source]

Re-evaluate the given rollout and return a derivable action tensor. This method makes sure that the gradient is propagated through the hidden state.

Parameters:

rollout – complete rollout
hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Change this string for value functions.

Returns:

actions with gradient data

abstract forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]

Parameters:

obs – observation from the environment
hidden – the network’s hidden state. If None, use init_hidden()

Returns:

action to be taken and new hidden state

abstract property hidden_size: int: Get the number of hidden state variables.

init_hidden(batch_size: Optional[int] = None) → Tensor[source]

Provide initial values for the hidden parameters. This should usually be a zero tensor.

Parameters:: batch_size – number of states to track in parallel
Returns:: Tensor of batch_size x hidden_size

property is_recurrent: bool: Bool to signalise it the policy has a recurrent architecture.

script() → ScriptModule[source]: Create a ScriptModule from this policy. The returned module will always have the signature action = tm(observation, hidden). For recurrent networks, it returns a stateful module that keeps the hidden states internally. Such modules have a reset() method to reset the hidden states.

training: bool

class StatefulRecurrentNetwork(net: RecurrentPolicy)[source]

Bases: Module

A scripted wrapper for a recurrent neural network that stores the hidden state.

Note

Use this for transfer to C++.

Constructor

Parameters:: net – non-recurrent network to wrap

Note

Must not be a script module

forward(inp)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_size: int

output_size: int

reset()[source]: Reset the policy’s internal state.

default_pack_hidden(hidden: Tensor, num_recurrent_layers, hidden_size: int, batch_size: Optional[int] = None)[source]

Pack the hidden state returned by torch.nn.RNNBase subclasses into an 1d state vector. This is the reverse operation of default_unpack_hidden.

Parameters:

hidden – unpacked hidden state, a tensor of num_recurrent_layers x batch_size x hidden_size
num_recurrent_layers – number of recurrent layers
hidden_size – size of the hidden layers (all equal)
batch_size – if not none, the result should be 2d, and the first dimension represents parts of a data batch

Returns:

packed hidden state.

default_unpack_hidden(hidden: Tensor, num_recurrent_layers: int, hidden_size: int, batch_size: Optional[int] = None)[source]

Unpack the flat hidden state vector into the form expected by torch.nn.RNNBase subclasses.

Parameters:

hidden – packed hidden state
num_recurrent_layers – number of recurrent layers
hidden_size – size of the hidden layers (all equal)
batch_size – if not none, hidden is 2d, and the first dimension represents parts of a data batch

Returns:

unpacked hidden state, a tensor of num_recurrent_layers x batch_size x hidden_size.

neural_fields

class NFPolicy(spec: ~pyrado.utils.data_types.EnvSpec, hidden_size: int, obs_layer: [<class 'torch.nn.modules.module.Module'>, <class 'pyrado.policies.base.Policy'>] = None, activation_nonlin: ~typing.Callable = <built-in method sigmoid of type object>, mirrored_conv_weights: bool = True, conv_out_channels: int = 1, conv_kernel_size: int = None, conv_padding_mode: str = 'circular', tau_init: float = 10.0, tau_learnable: bool = True, kappa_init: float = 0.0, kappa_learnable: bool = True, potential_init_learnable: bool = False, init_param_kwargs: dict = None, use_cuda: bool = False)[source]

Bases: PotentialBasedPolicy

Neural Fields (NF)

Note

The policy’s outputs are a nonlinear function of the potentials. Thus, you have to make sure that the output range of that matches the action space of the environment.

potential_based

class PotentialBasedPolicy(spec: ~pyrado.utils.data_types.EnvSpec, obs_layer: [<class 'torch.nn.modules.module.Module'>, <class 'pyrado.policies.base.Policy'>], activation_nonlin: ~typing.Callable, tau_init: float, tau_learnable: bool, kappa_init: float, kappa_learnable: bool, potential_init_learnable: bool, use_cuda: bool, hidden_size: ~typing.Optional[int] = None)[source]

Bases: RecurrentPolicy, ABC

Base class for policies that work with potential-based neutral networks

Constructor

Parameters:

spec – environment specification
obs_layer – specify a custom PyTorch Module, by default (None) a linear layer with biases is used
activation_nonlin – nonlinearity to compute the activations from the potential levels
tau_init – initial value for the shared time constant of the potentials
tau_learnable – flag to determine if the time constant is a learnable parameter or fixed
kappa_init – initial value for the cubic decay, pass 0 (default) to disable cubic decay
kappa_learnable – flag to determine if cubic decay is a learnable parameter or fixed
potential_init_learnable – flag to determine if the initial potentials are a learnable parameter or fixed
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
hidden_size – number of neurons with potential, by default None which sets the number of hidden neurons to the flat number of actions (in order to be compatible with ADNPolicy)

evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') → Tensor[source]

Re-evaluate the given rollout and return a derivable action tensor. This method makes sure that the gradient is propagated through the hidden state.

Parameters:

rollout – complete rollout
hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Change this string for value functions.

Returns:

actions with gradient data

extra_repr() → str[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

abstract forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]

Parameters:

obs – observation from the environment
hidden – the network’s hidden state. If None, use init_hidden()

Returns:

action to be taken and new hidden state

property hidden_size: int: Get the number of hidden state variables.

init_hidden(batch_size: Optional[int] = None) → Tensor[source]

Provide initial values for the hidden parameters. This should usually be a zero tensor.

Parameters:: batch_size – number of states to track in parallel
Returns:: Tensor of batch_size x hidden_size

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

property kappa: Tensor: Get the cubic decay parameter.

name: str = None

abstract potentials_dot(potentials: Tensor, stimuli: Tensor) → Tensor[source]

Compute the derivative of the neurons’ potentials per time step.

Parameters:

potentials – current potential values
stimuli – sum of external and internal stimuli at the current point in time

Returns:

time derivative of the potentials

property stimuli_external: Tensor: Get the neurons’ external stimuli, resulting from the current observations. This is used for recording during a rollout.

property stimuli_internal: Tensor: Get the neurons’ internal stimuli, resulting from the previous activations of the neurons. This is used for recording during a rollout.

property tau: Tensor: Get the time scale parameter.

rnn

class GRUPolicy(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: RNNPolicyBase

Policy backed by a multi-layer GRU

Constructor

Parameters:

spec – environment specification
hidden_size – size of the hidden layers (all equal)
num_recurrent_layers – number of equally sized hidden layers
output_nonlin – nonlinearity for output layer
dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

name: str = 'gru'

recurrent_network_type: alias of GRU

class LSTMPolicy(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: RNNPolicyBase

Policy backed by a multi-layer LSTM

Constructor

Parameters:

spec – environment specification
hidden_size – size of the hidden layers (all equal)
num_recurrent_layers – number of equally sized hidden layers
output_nonlin – nonlinearity for output layer
dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property hidden_size: int: Get the number of hidden state variables.

name: str = 'lstm'

recurrent_network_type: alias of LSTM

class RNNPolicy(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, hidden_nonlin: str = 'tanh', output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: RNNPolicyBase

Policy backed by a multi-layer RNN

Constructor

Parameters:

spec – environment specification
hidden_size – size of the hidden layers (all equal)
num_recurrent_layers – number of equally sized hidden layers
hidden_nonlin – nonlinearity for the hidden rnn layers, either ‘tanh’ or ‘relu’
output_nonlin – nonlinearity for output layer
dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

name: str = 'rnn'

recurrent_network_type: alias of RNN

class RNNPolicyBase(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: RecurrentPolicy

Base class for recurrent policies wrapping torch.nn.RNNBase subclasses

Constructor

Parameters:

spec – environment specification
hidden_size – size of the hidden layers (all equal)
num_recurrent_layers – number of equally sized hidden layers
output_nonlin – nonlinearity for output layer
dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') → Tensor[source]

Re-evaluate the given rollout and return a derivable action tensor. This method makes sure that the gradient is propagated through the hidden state.

Parameters:

rollout – complete rollout
hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Change this string for value functions.

Returns:

actions with gradient data

forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]

Parameters:

obs – observation from the environment
hidden – the network’s hidden state. If None, use init_hidden()

Returns:

action to be taken and new hidden state

property hidden_size: int: Get the number of hidden state variables.

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

recurrent_network_type = None

training: bool

two_headed_rnn

class TwoHeadedGRUPolicy(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: TwoHeadedRNNPolicyBase

Two-headed policy backed by a multi-layer GRU

Constructor

Parameters:

spec – environment specification
shared_hidden_size – size of the hidden layers (all equal)
shared_num_recurrent_layers – number of recurrent layers
head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim
head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim
head_1_output_nonlin – nonlinearity for output layer of the first head
head_2_output_nonlin – nonlinearity for output layer of the second head
shared_dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

name: str = 'thgru'

recurrent_network_type: alias of GRU

class TwoHeadedLSTMPolicy(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: TwoHeadedRNNPolicyBase

Two-headed policy backed by a multi-layer LSTM

Constructor

Parameters:

spec – environment specification
shared_hidden_size – size of the hidden layers (all equal)
shared_num_recurrent_layers – number of recurrent layers
head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim
head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim
head_1_output_nonlin – nonlinearity for output layer of the first head
head_2_output_nonlin – nonlinearity for output layer of the second head
shared_dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property hidden_size: int: Get the number of hidden state variables.

name: str = 'thlstm'

recurrent_network_type: alias of LSTM

class TwoHeadedRNNPolicy(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, shared_hidden_nonlin: str = 'tanh', head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: TwoHeadedRNNPolicyBase

Two-headed policy backed by a multi-layer RNN

Constructor

Parameters:

spec – environment specification
shared_hidden_size – size of the hidden layers (all equal)
shared_num_recurrent_layers – number of recurrent layers
shared_hidden_nonlin – nonlinearity for the shared hidden rnn layers, either ‘tanh’ or ‘relu’
head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim
head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim
head_1_output_nonlin – nonlinearity for output layer of the first head
head_2_output_nonlin – nonlinearity for output layer of the second head
shared_dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

name: str = 'thrnn'

recurrent_network_type: alias of RNN

class TwoHeadedRNNPolicyBase(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: TwoHeadedPolicy, RecurrentPolicy

Base class for recurrent policies, which are wrapping torch.nn.RNNBase, and have a common body and two heads that have a separate last layer

Constructor

Parameters:

spec – environment specification
shared_hidden_size – size of the hidden layers (all equal)
shared_num_recurrent_layers – number of recurrent layers
head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim
head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim
head_1_output_nonlin – nonlinearity for output layer of the first head
head_2_output_nonlin – nonlinearity for output layer of the second head
shared_dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') → Tuple[Tensor, Tensor][source]

Re-evaluate the given rollout and return a derivable action tensor. The default implementation simply calls forward().

Parameters:

rollout – complete rollout
hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Defaults to ‘hidden_states’. Change for value functions.

Returns:

actions with gradient data

forward(obs: Tensor, hidden: Optional[Tensor] = None) → Tuple[Tensor, Tensor, Tensor][source]

Parameters:

obs – observation from the environment
hidden – the network’s hidden state. If None, use init_hidden()

Returns:

action to be taken and new hidden state

property hidden_size: int: Get the number of hidden state variables.

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:

init_values – tensor of fixed initial policy parameter values
kwargs – additional keyword arguments for the policy parameter initialization

recurrent_network_type = None

training: bool

recurrent

adn

base

neural_fields

potential_based

rnn

two_headed_rnn

Module contents