recurrent

adn

class ADNPolicy(spec: ~pyrado.utils.data_types.EnvSpec, activation_nonlin: [typing.Callable, typing.Sequence[typing.Callable]], potentials_dyn_fcn: ~typing.Callable, obs_layer: [<class 'torch.nn.modules.module.Module'>, <class 'pyrado.policies.base.Policy'>] = None, tau_init: float = 10.0, tau_learnable: bool = True, kappa_init: float = 0.001, kappa_learnable: bool = True, capacity_learnable: bool = True, potential_init_learnable: bool = False, init_param_kwargs: dict = None, use_cuda: bool = False)[source]

Bases: PotentialBasedPolicy

Activation Dynamic Network (ADN)

Note

The policy’s outputs are a nonlinear function of the potentials. Thus, you have to make sure that the output range of that matches the action space of the environment.

See also

[1] T. Luksch, M. Gineger, M. Mühlig, T. Yoshiike, “Adaptive Movement Sequences and Predictive Decisions based on Hierarchical Dynamical Systems”, IROS, 2012

Constructor

Parameters:
  • spec – environment specification

  • activation_nonlin – nonlinearity for output layer, highly suggested functions: to.sigmoid for position to.tasks, tanh for velocity tasks

  • potentials_dyn_fcn – function to compute the derivative of the neurons’ potentials

  • obs_layer – specify a custom Pytorch Module; by default (None) a linear layer with biases is used

  • tau_init – initial value for the shared time constant of the potentials

  • tau_learnable – flag to determine if the time constant is a learnable parameter or fixed

  • kappa_init – initial value for the cubic decay

  • kappa_learnable – flag to determine if cubic decay is a learnable parameter or fixed

  • capacity_learnable – flag to determine if capacity is a learnable parameter or fixed

  • potential_init_learnable – flag to determine if the initial potentials are a learnable parameter or fixed

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property capacity: [None, <class 'torch.Tensor'>]

Get the capacity parameter (exists for capacity-based dynamics functions), else return None.

extra_repr() str[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]
Parameters:
  • obs – observation from the environment

  • hidden – the network’s hidden state. If None, use init_hidden()

Returns:

action to be taken and new hidden state

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:
  • init_values – tensor of fixed initial policy parameter values

  • kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'adn'
potentials_dot(potentials: Tensor, stimuli: Tensor) Tensor[source]

Compute the derivative of the neurons’ potentials per time step. \(/tau /dot{u} = f(u, s, h)\)

Parameters:
  • potentials – current potential values

  • stimuli – sum of external and internal stimuli at the current point in time

Returns:

time derivative of the potentials

pd_capacity_21(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor[source]

Capacity-based dynamics with 2 stable (\(p=-C\), \(p=C\)) and 1 unstable fix points (\(p=0\)) for \(s=0\)

\(\tau \dot{p} = s - (h - p) (1 - \frac{(h - p)^2}{C^2})\)

Note

Intended to be used with sigmoid activation function, e.g. for the position tasks in RcsPySim.

Parameters:
  • p – potential, higher values lead to higher activations

  • s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)

  • h – resting level, a.k.a. constant offset

  • tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)

  • kwargs – additional parameters to the potential dynamics

pd_capacity_21_abs(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor[source]

Capacity-based dynamics with 2 stable (\(p=-C\), \(p=C\)) and 1 unstable fix points (\(p=0\)) for \(s=0\)

\(\tau \dot{p} = s - (h - p) (1 - \frac{\left| h - p \right|}{C})\)

The “absolute version” of pd_capacity_21 has a lower magnitude and a lower oder of the resulting polynomial.

Note

Intended to be used with sigmoid activation function, e.g. for the position tasks in RcsPySim.

Parameters:
  • p – potential, higher values lead to higher activations

  • s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)

  • h – resting level, a.k.a. constant offset

  • tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)

  • kwargs – additional parameters to the potential dynamics

pd_capacity_32(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor[source]

Capacity-based dynamics with 3 stable (\(p=-C\), \(p=0\), \(p=C\)) and 2 unstable fix points (\(p=-C/2\), \(p=C/2\)) for \(s=0\)

\(\tau \dot{p} = s - (h - p) (1 - \frac{(h - p)^2}{C^2}) (1 - \frac{(2(h - p))^2}{C^2})\)

Note

Intended to be used with tanh activation function, e.g. for the velocity tasks in RcsPySim.

Parameters:
  • p – potential, higher values lead to higher activations

  • s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)

  • h – resting level, a.k.a. constant offset

  • tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)

  • kwargs – additional parameters to the potential dynamics

pd_capacity_32_abs(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor[source]

Capacity-based dynamics with 3 stable (\(p=-C\), \(p=0\), \(p=C\)) and 2 unstable fix points (\(p=-C/2\), \(p=C/2\)) for \(s=0\)

\(\tau \dot{p} = \left( s + (h - p) (1 - \frac{\left| (h - p) \right|}{C}) (1 - \frac{2 \left| (h - p) \right|}{C}) \right)\)

The “absolute version” of pd_capacity_32 is less skewed due to a lower oder of the resulting polynomial.

Note

Intended to be used with tanh activation function, e.g. for the velocity tasks in RcsPySim.

Parameters:
  • p – potential, higher values lead to higher activations

  • s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)

  • h – resting level, a.k.a. constant offset

  • tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)

  • kwargs – additional parameters to the potential dynamics

pd_cubic(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor[source]

Basic proportional dynamics with additional cubic decay

\(\tau \dot{p} = s + h - p + \kappa (h - p)^3\)

Parameters:
  • p – potential, higher values lead to higher activations

  • s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)

  • h – resting level, a.k.a. constant offset

  • tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)

  • kwargs – additional parameters to the potential dynamics

pd_linear(p: Tensor, s: Tensor, h: Tensor, tau: Tensor, **kwargs) Tensor[source]

Basic proportional dynamics

\(\tau \dot{p} = s - p\)

Parameters:
  • p – potential, higher values lead to higher activations

  • s – stimulus, higher values lead to larger changes of the potentials (depends on the dynamics function)

  • h – resting level, a.k.a. constant offset

  • tau – time scaling factor, higher values lead to slower changes of the potentials (linear dependency)

  • kwargs – additional parameters to the potential dynamics

base

class RecurrentPolicy(spec: EnvSpec, use_cuda: bool)[source]

Bases: Policy, ABC

Base class for recurrent policies. The policy does not store the hidden state on it’s own, so it requires two arguments: (observation, hidden) and returns two values: (action, new_hidden). The hidden tensor is an 1-dim vector of state variables with unspecified meaning. In the batching case, it should be a 2-dim array, where the first dimension is the batch size matching that of the observations.

Constructor

Parameters:
  • spec – environment specification

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

abstract evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') Tensor[source]

Re-evaluate the given rollout and return a derivable action tensor. This method makes sure that the gradient is propagated through the hidden state.

Parameters:
  • rollout – complete rollout

  • hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Change this string for value functions.

Returns:

actions with gradient data

abstract forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]
Parameters:
  • obs – observation from the environment

  • hidden – the network’s hidden state. If None, use init_hidden()

Returns:

action to be taken and new hidden state

abstract property hidden_size: int

Get the number of hidden state variables.

init_hidden(batch_size: Optional[int] = None) Tensor[source]

Provide initial values for the hidden parameters. This should usually be a zero tensor.

Parameters:

batch_size – number of states to track in parallel

Returns:

Tensor of batch_size x hidden_size

property is_recurrent: bool

Bool to signalise it the policy has a recurrent architecture.

script() ScriptModule[source]

Create a ScriptModule from this policy. The returned module will always have the signature action = tm(observation, hidden). For recurrent networks, it returns a stateful module that keeps the hidden states internally. Such modules have a reset() method to reset the hidden states.

training: bool
class StatefulRecurrentNetwork(net: RecurrentPolicy)[source]

Bases: Module

A scripted wrapper for a recurrent neural network that stores the hidden state.

Note

Use this for transfer to C++.

Constructor

Parameters:

net – non-recurrent network to wrap

Note

Must not be a script module

forward(inp)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

input_size: int
output_size: int
reset()[source]

Reset the policy’s internal state.

default_pack_hidden(hidden: Tensor, num_recurrent_layers, hidden_size: int, batch_size: Optional[int] = None)[source]

Pack the hidden state returned by torch.nn.RNNBase subclasses into an 1d state vector. This is the reverse operation of default_unpack_hidden.

Parameters:
  • hidden – unpacked hidden state, a tensor of num_recurrent_layers x batch_size x hidden_size

  • num_recurrent_layers – number of recurrent layers

  • hidden_size – size of the hidden layers (all equal)

  • batch_size – if not none, the result should be 2d, and the first dimension represents parts of a data batch

Returns:

packed hidden state.

default_unpack_hidden(hidden: Tensor, num_recurrent_layers: int, hidden_size: int, batch_size: Optional[int] = None)[source]

Unpack the flat hidden state vector into the form expected by torch.nn.RNNBase subclasses.

Parameters:
  • hidden – packed hidden state

  • num_recurrent_layers – number of recurrent layers

  • hidden_size – size of the hidden layers (all equal)

  • batch_size – if not none, hidden is 2d, and the first dimension represents parts of a data batch

Returns:

unpacked hidden state, a tensor of num_recurrent_layers x batch_size x hidden_size.

neural_fields

class NFPolicy(spec: ~pyrado.utils.data_types.EnvSpec, hidden_size: int, obs_layer: [<class 'torch.nn.modules.module.Module'>, <class 'pyrado.policies.base.Policy'>] = None, activation_nonlin: ~typing.Callable = <built-in method sigmoid of type object>, mirrored_conv_weights: bool = True, conv_out_channels: int = 1, conv_kernel_size: int = None, conv_padding_mode: str = 'circular', tau_init: float = 10.0, tau_learnable: bool = True, kappa_init: float = 0.0, kappa_learnable: bool = True, potential_init_learnable: bool = False, init_param_kwargs: dict = None, use_cuda: bool = False)[source]

Bases: PotentialBasedPolicy

Neural Fields (NF)

Note

The policy’s outputs are a nonlinear function of the potentials. Thus, you have to make sure that the output range of that matches the action space of the environment.

See also

[1] S.-I. Amari “Dynamics of Pattern Formation in Lateral-Inhibition Type Neural Fields”, Biological Cybernetics, 1977

Constructor

Parameters:
  • spec – environment specification

  • hidden_size – number of neurons with potential

  • obs_layer – specify a custom PyTorch Module, by default (None) a linear layer with biases is used

  • activation_nonlin – nonlinearity to compute the activations from the potential levels

  • mirrored_conv_weights – re-use weights for the second half of the kernel to create a “symmetric” kernel

  • conv_out_channels – number of filter for the 1-dim convolution along the potential-based neurons

  • conv_kernel_size – size of the kernel for the 1-dim convolution along the potential-based neurons

  • tau_init – initial value for the shared time constant of the potentials

  • tau_learnable – flag to determine if the time constant is a learnable parameter or fixed

  • kappa_init – initial value for the cubic decay, pass 0 (default) to disable cubic decay

  • kappa_learnable – flag to determine if cubic decay is a learnable parameter or fixed

  • potential_init_learnable – flag to determine if the initial potentials are a learnable parameter or fixed

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]
Parameters:
  • obs – observation from the environment

  • hidden – the network’s hidden state. If None, use init_hidden()

Returns:

action to be taken and new hidden state

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:
  • init_values – tensor of fixed initial policy parameter values

  • kwargs – additional keyword arguments for the policy parameter initialization

name: str = 'nf'
potentials_dot(potentials: Tensor, stimuli: Tensor) Tensor[source]

Compute the derivative of the neurons’ potentials per time step. \(/tau /dot{u} = s + h - u + /kappa (h - u)^3, /quad /text{with} s = s_{int} + s_{ext} = W*o + /int{w(u, v) f(u) dv}\)

Parameters:
  • potentials – current potential values

  • stimuli – sum of external and internal stimuli at the current point in time

Returns:

time derivative of the potentials

potential_based

class PotentialBasedPolicy(spec: ~pyrado.utils.data_types.EnvSpec, obs_layer: [<class 'torch.nn.modules.module.Module'>, <class 'pyrado.policies.base.Policy'>], activation_nonlin: ~typing.Callable, tau_init: float, tau_learnable: bool, kappa_init: float, kappa_learnable: bool, potential_init_learnable: bool, use_cuda: bool, hidden_size: ~typing.Optional[int] = None)[source]

Bases: RecurrentPolicy, ABC

Base class for policies that work with potential-based neutral networks

Constructor

Parameters:
  • spec – environment specification

  • obs_layer – specify a custom PyTorch Module, by default (None) a linear layer with biases is used

  • activation_nonlin – nonlinearity to compute the activations from the potential levels

  • tau_init – initial value for the shared time constant of the potentials

  • tau_learnable – flag to determine if the time constant is a learnable parameter or fixed

  • kappa_init – initial value for the cubic decay, pass 0 (default) to disable cubic decay

  • kappa_learnable – flag to determine if cubic decay is a learnable parameter or fixed

  • potential_init_learnable – flag to determine if the initial potentials are a learnable parameter or fixed

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

  • hidden_size – number of neurons with potential, by default None which sets the number of hidden neurons to the flat number of actions (in order to be compatible with ADNPolicy)

evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') Tensor[source]

Re-evaluate the given rollout and return a derivable action tensor. This method makes sure that the gradient is propagated through the hidden state.

Parameters:
  • rollout – complete rollout

  • hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Change this string for value functions.

Returns:

actions with gradient data

extra_repr() str[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

abstract forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]
Parameters:
  • obs – observation from the environment

  • hidden – the network’s hidden state. If None, use init_hidden()

Returns:

action to be taken and new hidden state

property hidden_size: int

Get the number of hidden state variables.

init_hidden(batch_size: Optional[int] = None) Tensor[source]

Provide initial values for the hidden parameters. This should usually be a zero tensor.

Parameters:

batch_size – number of states to track in parallel

Returns:

Tensor of batch_size x hidden_size

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:
  • init_values – tensor of fixed initial policy parameter values

  • kwargs – additional keyword arguments for the policy parameter initialization

property kappa: Tensor

Get the cubic decay parameter.

name: str = None
abstract potentials_dot(potentials: Tensor, stimuli: Tensor) Tensor[source]

Compute the derivative of the neurons’ potentials per time step.

Parameters:
  • potentials – current potential values

  • stimuli – sum of external and internal stimuli at the current point in time

Returns:

time derivative of the potentials

property stimuli_external: Tensor

Get the neurons’ external stimuli, resulting from the current observations. This is used for recording during a rollout.

property stimuli_internal: Tensor

Get the neurons’ internal stimuli, resulting from the previous activations of the neurons. This is used for recording during a rollout.

property tau: Tensor

Get the time scale parameter.

rnn

class GRUPolicy(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: RNNPolicyBase

Policy backed by a multi-layer GRU

Constructor

Parameters:
  • spec – environment specification

  • hidden_size – size of the hidden layers (all equal)

  • num_recurrent_layers – number of equally sized hidden layers

  • output_nonlin – nonlinearity for output layer

  • dropout – dropout probability, default = 0 deactivates dropout

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

name: str = 'gru'
recurrent_network_type

alias of GRU

class LSTMPolicy(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: RNNPolicyBase

Policy backed by a multi-layer LSTM

Constructor

Parameters:
  • spec – environment specification

  • hidden_size – size of the hidden layers (all equal)

  • num_recurrent_layers – number of equally sized hidden layers

  • output_nonlin – nonlinearity for output layer

  • dropout – dropout probability, default = 0 deactivates dropout

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property hidden_size: int

Get the number of hidden state variables.

name: str = 'lstm'
recurrent_network_type

alias of LSTM

class RNNPolicy(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, hidden_nonlin: str = 'tanh', output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: RNNPolicyBase

Policy backed by a multi-layer RNN

Constructor

Parameters:
  • spec – environment specification

  • hidden_size – size of the hidden layers (all equal)

  • num_recurrent_layers – number of equally sized hidden layers

  • hidden_nonlin – nonlinearity for the hidden rnn layers, either ‘tanh’ or ‘relu’

  • output_nonlin – nonlinearity for output layer

  • dropout – dropout probability, default = 0 deactivates dropout

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

name: str = 'rnn'
recurrent_network_type

alias of RNN

class RNNPolicyBase(spec: EnvSpec, hidden_size: int, num_recurrent_layers: int, output_nonlin: Optional[Callable] = None, dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: RecurrentPolicy

Base class for recurrent policies wrapping torch.nn.RNNBase subclasses

Constructor

Parameters:
  • spec – environment specification

  • hidden_size – size of the hidden layers (all equal)

  • num_recurrent_layers – number of equally sized hidden layers

  • output_nonlin – nonlinearity for output layer

  • dropout – dropout probability, default = 0 deactivates dropout

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') Tensor[source]

Re-evaluate the given rollout and return a derivable action tensor. This method makes sure that the gradient is propagated through the hidden state.

Parameters:
  • rollout – complete rollout

  • hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Change this string for value functions.

Returns:

actions with gradient data

forward(obs: ~torch.Tensor, hidden: ~typing.Optional[~torch.Tensor] = None) -> (<class 'torch.Tensor'>, <class 'torch.Tensor'>)[source]
Parameters:
  • obs – observation from the environment

  • hidden – the network’s hidden state. If None, use init_hidden()

Returns:

action to be taken and new hidden state

property hidden_size: int

Get the number of hidden state variables.

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:
  • init_values – tensor of fixed initial policy parameter values

  • kwargs – additional keyword arguments for the policy parameter initialization

recurrent_network_type = None
training: bool

two_headed_rnn

class TwoHeadedGRUPolicy(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: TwoHeadedRNNPolicyBase

Two-headed policy backed by a multi-layer GRU

Constructor

Parameters:
  • spec – environment specification

  • shared_hidden_size – size of the hidden layers (all equal)

  • shared_num_recurrent_layers – number of recurrent layers

  • head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim

  • head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim

  • head_1_output_nonlin – nonlinearity for output layer of the first head

  • head_2_output_nonlin – nonlinearity for output layer of the second head

  • shared_dropout – dropout probability, default = 0 deactivates dropout

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

name: str = 'thgru'
recurrent_network_type

alias of GRU

class TwoHeadedLSTMPolicy(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: TwoHeadedRNNPolicyBase

Two-headed policy backed by a multi-layer LSTM

Constructor

Parameters:
  • spec – environment specification

  • shared_hidden_size – size of the hidden layers (all equal)

  • shared_num_recurrent_layers – number of recurrent layers

  • head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim

  • head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim

  • head_1_output_nonlin – nonlinearity for output layer of the first head

  • head_2_output_nonlin – nonlinearity for output layer of the second head

  • shared_dropout – dropout probability, default = 0 deactivates dropout

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property hidden_size: int

Get the number of hidden state variables.

name: str = 'thlstm'
recurrent_network_type

alias of LSTM

class TwoHeadedRNNPolicy(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, shared_hidden_nonlin: str = 'tanh', head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False)[source]

Bases: TwoHeadedRNNPolicyBase

Two-headed policy backed by a multi-layer RNN

Constructor

Parameters:
  • spec – environment specification

  • shared_hidden_size – size of the hidden layers (all equal)

  • shared_num_recurrent_layers – number of recurrent layers

  • shared_hidden_nonlin – nonlinearity for the shared hidden rnn layers, either ‘tanh’ or ‘relu’

  • head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim

  • head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim

  • head_1_output_nonlin – nonlinearity for output layer of the first head

  • head_2_output_nonlin – nonlinearity for output layer of the second head

  • shared_dropout – dropout probability, default = 0 deactivates dropout

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

name: str = 'thrnn'
recurrent_network_type

alias of RNN

class TwoHeadedRNNPolicyBase(spec: EnvSpec, shared_hidden_size: int, shared_num_recurrent_layers: int, head_1_size: Optional[int] = None, head_2_size: Optional[int] = None, head_1_output_nonlin: Optional[Callable] = None, head_2_output_nonlin: Optional[Callable] = None, shared_dropout: float = 0.0, init_param_kwargs: Optional[dict] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: TwoHeadedPolicy, RecurrentPolicy

Base class for recurrent policies, which are wrapping torch.nn.RNNBase, and have a common body and two heads that have a separate last layer

Constructor

Parameters:
  • spec – environment specification

  • shared_hidden_size – size of the hidden layers (all equal)

  • shared_num_recurrent_layers – number of recurrent layers

  • head_1_size – size of the fully connected layer for head 1, if None this is set to the action space dim

  • head_2_size – size of the fully connected layer for head 2, if None this is set to the action space dim

  • head_1_output_nonlin – nonlinearity for output layer of the first head

  • head_2_output_nonlin – nonlinearity for output layer of the second head

  • shared_dropout – dropout probability, default = 0 deactivates dropout

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

evaluate(rollout: StepSequence, hidden_states_name: str = 'hidden_states') Tuple[Tensor, Tensor][source]

Re-evaluate the given rollout and return a derivable action tensor. The default implementation simply calls forward().

Parameters:
  • rollout – complete rollout

  • hidden_states_name – name of hidden states rollout entry, used for recurrent networks. Defaults to ‘hidden_states’. Change for value functions.

Returns:

actions with gradient data

forward(obs: Tensor, hidden: Optional[Tensor] = None) Tuple[Tensor, Tensor, Tensor][source]
Parameters:
  • obs – observation from the environment

  • hidden – the network’s hidden state. If None, use init_hidden()

Returns:

action to be taken and new hidden state

property hidden_size: int

Get the number of hidden state variables.

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

Initialize the policy’s parameters. By default the parameters are initialized randomly.

Parameters:
  • init_values – tensor of fixed initial policy parameter values

  • kwargs – additional keyword arguments for the policy parameter initialization

recurrent_network_type = None
training: bool

Module contents