sampling

bootstrapping

bootstrap_ci(data: ndarray, stat_fcn: Callable, num_reps: int, alpha: float, ci_sides: int, bias_correction: bool = False, studentized: bool = False, seed: Optional[int] = None)[source]

Re-sampling input data using the nonparametric bootstrap method, computing bootstrap replications using stat_fcn and computing a confidence interval on the statistic of interest given by stat_fcn which needs to expect the argument axis (like numpy functions do).

cvar_sampler

class CVaRSampler(wrapped_sampler, epsilon: float, gamma: float = 1.0, *, min_rollouts: Optional[int] = None, min_steps: Optional[int] = None)[source]

Bases: SamplerBase, LoggerAware

Samples rollouts to optimize the CVaR of the discounted return. This is done by sampling more rollouts, and then only using the epsilon-qunatile of them.

Constructor

Parameters:

wrapped_sampler – the inner sampler used to sample the full data set
epsilon – quantile of rollouts that will be kept
gamma – discount factor to compute the discounted return, default is 1 (no discount)
min_rollouts – minimum number of complete rollouts to sample
min_steps – minimum total number of steps to sample

reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]

Reset the sampler after changes were made to the environment or the policy, optionally replacing one of them.

Most samplers will be implemented in parallel, so if there are changes to the environment or the policy, they will not automatically propagate to all processes. This method exists as a workaround; call it to force a reinitialization of environment and policy in all subprocesses.

Note that you don’t need to call this if the policy parameters change, since that is to be expected between sampling runs, the sample() method takes care of this on it’s own.

You can use the env and policy parameters to completely replace the stored environment or policy.

Parameters:

env – new environment to use, or None to keep the old one
policy – new policy to use, or None to keep the old one

sample() → List[StepSequence][source]

Generate a list of rollouts. This method works exactly as specified in the class description.

Returns:: sampled rollouts

set_min_count(min_rollouts=None, min_steps=None)[source]

Adapt the sampling boundaries.

Parameters:

min_rollouts – minimum number of complete rollouts to sample
min_steps – minimum total number of steps to sample

select_cvar(rollouts, epsilon: float, gamma: float = 1.0)[source]

Select a subset of rollouts so that their mean discounted return is the CVaR(eps) of the full rollout set.

Parameters:

rollouts – list of rollouts
epsilon – chosen return quantile
gamma – discount factor to compute the discounted return, default is 1 (no discount)

Returns:

list of selected rollouts

data_format

cat_to_format(data: Union[dict, tuple, Sequence], data_format: str)[source]

Concatenate the generic data in the given data format. For dicts, the dict elements are stacked individually. A list of dicts is treated as a dict of lists.

Parameters:

data – input data
data_format – numpy or torch

Returns:

numpy.ndarray or torch.Tensor, or dict of these

new_tuple(nt_type, values)[source]

Create a new tuple of the same type as nt_type. This handles the constructor differences between tuple and NamedTuples

Parameters:

nt_type – type of tuple
values – values as sequence

Returns:

new named tuple

stack_to_format(data: Union[dict, tuple, Sequence], data_format: str)[source]

Stack the generic data in the given data format. For dicts, the dict elements are stacked individually. A list of dicts is treated as a dict of lists.

Parameters:

data – input data
data_format – ‘numpy’ or ‘torch’

Returns:

numpy array or PyTorch tensor, or dict of these

to_format(data, data_format, data_type=None)[source]

Convert the tensor data to the given data format.

Parameters:

data – input data
data_format – numpy or torch
data_type – type to return data in. When None is passed, the data type is left unchanged.

Returns:

numpy.ndarray or torch.Tensor

hyper_sphere

sample_from_hyper_sphere_surface(num_dim: int, method: str) → Tensor[source]

Sampling from the surface of a multidimensional unit sphere.

parallel_evaluation

eval_domain_params(pool: ~pyrado.sampling.sampler_pool.SamplerPool, env: ~pyrado.environments.sim_base.SimEnv, policy: ~pyrado.policies.base.Policy, params: ~typing.List[~typing.Dict], init_state: ~typing.Optional[~numpy.ndarray] = None, seed: int = <object object>) → List[StepSequence][source]

Evaluate a policy on a multidimensional grid of domain parameters.

Parameters:

pool – parallel sampler
env – environment to evaluate in
policy – policy to evaluate
params – multidimensional grid of domain parameters
init_state – initial state of the environment which will be fixed if not set to None

Returns:

list of rollouts

eval_domain_params_with_segmentwise_reset(pool: SamplerPool, env_sim: SimEnv, policy: Policy, segments_real_all: List[List[StepSequence]], domain_params_ml_all: List[List[dict]], stop_on_done: bool, use_rec: bool) → List[List[StepSequence]][source]

Evaluate a policy for a given set of domain parameters, synchronizing the segments’ initial states with the given target domain segments

Parameters:

pool – parallel sampler
env_sim – environment to evaluate in
policy – policy to evaluate
segments_real_all – all segments from the target domain rollout
domain_params_ml_all – all domain parameters to evaluate over
stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).
use_rec – True if pre-recorded actions have been used to generate the rollouts

Returns:

list of segments of rollouts

eval_nominal_domain(pool: SamplerPool, env: SimEnv, policy: Policy, init_states: List[ndarray]) → List[StepSequence][source]

Evaluate a policy using the nominal (set in the given environment) domain parameters.

Parameters:

pool – parallel sampler
env – environment to evaluate in
policy – policy to evaluate
init_states – initial states of the environment which will be fixed if not set to None

Returns:

list of rollouts

eval_randomized_domain(pool: SamplerPool, env: SimEnv, randomizer: DomainRandomizer, policy: Policy, init_states: List[ndarray]) → List[StepSequence][source]

Evaluate a policy in a randomized domain.

Parameters:

pool – parallel sampler
env – environment to evaluate in
randomizer – randomizer used to sample random domain instances, inherited from DomainRandomizer
policy – policy to evaluate
init_states – initial states of the environment which will be fixed if not set to None

Returns:

list of rollouts

parallel_rollout_sampler

class ParallelRolloutSampler(env, policy, num_workers: int, *, min_rollouts: ~typing.Optional[int] = None, min_steps: ~typing.Optional[int] = None, show_progress_bar: bool = True, seed: int = <object object>)[source]

Bases: SamplerBase, Serializable

Class for sampling from multiple environments in parallel

Constructor

Parameters:

env – environment to sample from
policy – policy to act in the environment (can also be an exploration strategy)
num_workers – number of parallel samplers
min_rollouts – minimum number of complete rollouts to sample
min_steps – minimum total number of steps to sample
show_progress_bar – it True, display a progress bar using tqdm
seed – seed value for the random number generators, pass None for no seeding; defaults to the last seed that was set with pyrado.set_seed

reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]

Re-initialize the sampler.

Parameters:

env – the environment which the policy operates
policy – the policy used for sampling

sample(init_states: Optional[List[ndarray]] = None, domain_params: Optional[List[dict]] = None, eval: bool = False) → List[StepSequence][source]

Do the sampling according to the previously given environment, policy, and number of steps/rollouts.

Note

This method is not thread-safe! See for example the usage of self._sample_count.

Parameters:

init_states – initial states forw run_map(), pass None (default) to sample from the environment’s initial state space
domain_params – domain parameters for run_map(), pass None (default) to not explicitly set them
eval – pass False if the rollout is executed during training, else True. Forwarded to rollout().

Returns:

list of sampled rollouts

parameter_exploration_sampler

class ParameterExplorationSampler(env: Union[SimEnv, EnvWrapper], policy: Policy, num_init_states_per_domain: int, num_domains: int, num_workers: int, seed: Optional[int] = None)[source]

Bases: Serializable

Parallel sampler for parameter exploration

Constructor

Parameters:

env – environment to sample from
policy – policy used for sampling
num_init_states_per_domain – number of rollouts to cover the variance over initial states
num_domains – number of rollouts due to the variance over domain parameters
num_workers – number of parallel samplers
seed – seed value for the random number generators, pass None for no seeding; defaults to the last seed that was set with pyrado.set_seed

property num_rollouts_per_param: int: Get the number of rollouts per policy parameter set.

reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]

Re-initialize the sampler.

Parameters:

env – the environment which the policy operates
policy – the policy used for sampling

sample(param_sets: Tensor, init_states: Optional[List[ndarray]] = None) → ParameterSamplingResult[source]

Sample rollouts for a given set of parameters.

Note

This method is not thread-safe! See for example the usage of self._sample_count.

Parameters:

param_sets – sets of policy parameters
init_states – fixed initial states, pass None to randomly sample initial states

Returns:

data structure containing the policy parameter sets and the associated rollout data

class ParameterSample(params: Tensor, rollouts: List[StepSequence])[source]

Bases: tuple

Stores policy parameters and associated rollouts.

Create new instance of ParameterSample(params, rollouts)

property mean_undiscounted_return: float: Get the mean of the undiscounted returns over all rollouts.

property num_rollouts: int: Get the number of rollouts.

property params: Alias for field number 0

property rollouts: Alias for field number 1

class ParameterSamplingResult(samples: Sequence[ParameterSample])[source]

Bases: Sequence[ParameterSample]

Result of a parameter exploration sampling run. On one hand, this is a list of ParameterSamples. On the other hand, this allows to query combined tensors of parameters and mean returns.

Constructor

Parameters:: samples – list of parameter samples

mean_returns()[source]: Get all parameter sample means return as a N-dim vector, where N is the number of samples.

num_rollouts()[source]: Get the total number of rollouts for all samples.

parameters()[source]: Get all policy parameters as NxP matrix, where N is the number of samples and P is the policy param dim.

rollouts()[source]: Get all rollouts for all samples, i.e. a list of pop_size items, each a list of nom_rollouts rollouts.

rollout

after_rollout_query(env: Env, policy: Policy, rollout: StepSequence) → Tuple[bool, Optional[ndarray], Optional[dict]][source]

Ask the user what to do after a rollout has been animated.

Parameters:

env – environment used for the rollout
policy – policy used for the rollout
rollout – collected data from the rollout

Returns:

done flag, initial state, and domain parameters

rollout(env: Env, policy: Union[Module, Policy, Callable], eval: bool = False, max_steps: Optional[int] = None, reset_kwargs: Optional[dict] = None, render_mode: RenderMode = RenderMode(text=False, video=False, render=False), render_step: int = 1, no_reset: bool = False, no_close: bool = False, record_dts: bool = False, stop_on_done: bool = True, seed: Optional[int] = None, sub_seed: Optional[int] = None, sub_sub_seed: Optional[int] = None) → StepSequence[source]

Perform a rollout (i.e. sample a trajectory) in the given environment using given policy.

Parameters:

env – environment to use (SimEnv or RealEnv)
policy – policy to determine the next action given the current observation. This policy may be wrapped by an exploration strategy.
eval – pass False if the rollout is executed during training, else True. Forwarded to PyTorch Module.
max_steps – maximum number of time steps, if None the environment’s property is used
reset_kwargs – keyword arguments passed to environment’s reset function
render_mode – determines if the user sees an animation, console prints, or nothing
render_step – rendering interval, renders every step if set to 1
no_reset – do not reset the environment before running the rollout
no_close – do not close (and disconnect) the environment after running the rollout
record_dts – flag if the time intervals of different parts of one step should be recorded (for debugging)
stop_on_done – set to false to ignore the environment’s done flag (for debugging)
seed – seed value for the random number generators, pass None for no seeding

:return paths of the observations, actions, rewards, and information about the environment as well as the policy

sampler

class SamplerBase(*, min_rollouts: Optional[int] = None, min_steps: Optional[int] = None)[source]

Bases: ABC

A sampler generates a list of rollouts in some unspecified way.

Since the sampling might occur in parallel, there is no way to reliably generate an exact amount of samples. The sampler can however guarantee a minimum amount of samples to be available. The sampler does not discard any samples on it’s own, all sampled data will be returned. There are two ways to regulate the sampling process: 1. the minimum number of rollouts 2. the minimum number of steps in all rollouts

At least one of these bounds must be specified. If both are set, the sampler will only terminate once both are fulfilled.

Constructor

Parameters:

min_rollouts – minimum number of complete rollouts to sample
min_steps – minimum total number of steps to sample

abstract reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]

Reset the sampler after changes were made to the environment or the policy, optionally replacing one of them.

Most samplers will be implemented in parallel, so if there are changes to the environment or the policy, they will not automatically propagate to all processes. This method exists as a workaround; call it to force a reinitialization of environment and policy in all subprocesses.

Note that you don’t need to call this if the policy parameters change, since that is to be expected between sampling runs, the sample() method takes care of this on it’s own.

You can use the env and policy parameters to completely replace the stored environment or policy.

Parameters:

env – new environment to use, or None to keep the old one
policy – new policy to use, or None to keep the old one

abstract sample() → List[StepSequence][source]

Generate a list of rollouts. This method works exactly as specified in the class description.

Returns:: sampled rollouts

set_min_count(min_rollouts: Optional[int] = None, min_steps: Optional[int] = None)[source]

Adapt the sampling boundaries.

Parameters:

min_rollouts – minimum number of complete rollouts to sample
min_steps – minimum total number of steps to sample

sampler_pool

class GlobalNamespace[source]

Bases: object

Type of the worker’s global namespace

class SamplerPool(num_threads: int)[source]

Bases: object

A process pool capable of executing operations in parallel. This differs from the multiprocessing.Pool class in that it explicitly incorporates process-local state.

Every parallel function gets a GlobalNamespace object as first argument, which can hold arbitrary worker-local state. This allows for certain optimizations. For example, when the parallel operation requires an object that is expensive to transmit, we can create this object once in each process, store it in the namespace, and then use it in every map function call.

This class also contains additional methods to call a function exactly once in each worker, to setup worker-local state.

invoke_all(func, *args, **kwargs)[source]

Invoke func on all workers using the same argument values. The return values are collected into a list.

Parameters:: func – the first argument of func will be a worker-local namespace

invoke_all_map(func, arglist)[source]: Invoke func(arg) on all workers using one argument from the list for each ordered worker. The length of the argument list must match the number of workers. The first argument of func will be a worker-local namespace. The return values are collected into a list.

run_collect(n, func, *args, collect_progressbar: Optional[tqdm] = None, min_runs=None, **kwargs) → tuple[source]

Collect at least n samples from func, where the number of samples per run can vary.

This is done by calling res, ns = func(G, *args, **kwargs) until the sum of ns exceeds n.

This is intended for situations like reinforcement learning runs. If the environment ends up in an error state, you get less samples per run. To ensure a stable learning behaviour, you can specify the minimum amount of samples to collect before returning.

Since the workers can only check the amount of samples before starting a run, you will likely get more samples than the minimum. No generated samples that are part of a rollout are dropped. However, if some rollouts where sampled that are “too much”, those will be dropped to get seed- determinism across different number of workers.

Parameters:

n – minimum number of samples to collect
func – sampler function, must be pickleable
args – remaining positional args are passed to the function
collect_progressbar – tdqm progress bar to use; default None
min_runs – optionally specify a minimum amount of runs to be executed before returning
kwargs – remaining keyword args are passed to the function

Returns:

list of results

Returns:

total number of samples

run_map(func, arglist: list, progressbar: Optional[tqdm] = None)[source]

A parallel version of [func(G, arg) for arg in arglist]. There is no deterministic assignment of workers to arglist elements. Optionally runs with progress bar.

Parameters:

func – mapper function, must be pickleable
arglist – list of function args
progressbar – optional progress bar from the tqdm library

Returns:

list of results

set_seed(seed)[source]

Set a deterministic seed on all workers.

Note

This is intended to only be used in legacy evaluation scripts! For new code and everything that should really be reproducible, pass the seed to the sample() method of a ParallelRolloutSampler.

Parameters:: seed – seed value for the random number generators

stop()[source]: Terminate all workers.

sbi_embeddings

class AllStepsEmbedding(spec: EnvSpec, dim_data: int, len_rollouts: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which computes the same features of the rollouts states and actions as done in [1]

[1] F. Ramos, R.C. Possas, D. Fox, “BayesSim: adaptive domain randomization via probabilistic inference for: robotics simulators”, RSS, 2019

Constructor

Parameters:

spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the state and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
len_rollouts – number of time steps per rollout without considering a potential downsampling later (must be the same for all rollouts)
downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property dim_output: int: Get the dimension of the embeddings output, i.e. its feature dimension.

name: str = 'asemb'

requires_target_domain_data: bool = False

summary_statistic(data: Tensor) → Tensor[source]

Returns the full states of the rollout as a vector i.e.: the time-steps and state dimension are flattend into one dimension.

Parameters:: data – states and actions of a rollout or segment to be transformed for inference
Returns:: all states as a flattened vector

class BayesSimEmbedding(spec: EnvSpec, dim_data: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which computes the same features of the rollouts states and actions as done in [1]

[1] F. Ramos, R.C. Possas, D. Fox, “BayesSim: adaptive domain randomization via probabilistic inference for: robotics simulators”, RSS, 2019

Constructor

Parameters:

spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
downsampling_factor – skip evey downsampling_factor time series sample, no downsampling by default
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property dim_output: int: Get the dimension of the embeddings output, i.e. its feature dimension.

name: str = 'bsemb'

requires_target_domain_data: bool = False

summary_statistic(data: Tensor) → Tensor[source]

Computing summary statistics based on approach in [1], see eq. (22). This method guarantees output which has the same size for every trajectory.

[1] F. Ramos, R.C. Possas, D. Fox, “BayesSim: adaptive domain randomization via probabilistic inference for: robotics simulators”, RSS, 2019

Parameters:: data – states and actions of a rollout or segment to be transformed for inference
Returns:: summary statistics of the rollout

class DeltaStepsEmbedding(spec: EnvSpec, dim_data: int, len_rollouts: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which returns the change in the states between consecutive time steps of the rollouts

Constructor

Parameters:

spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the state and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
len_rollouts – number of time steps per rollout without considering a potential downsampling later (must be the same for all rollouts)
downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property dim_output: int: Get the dimension of the embeddings output, i.e. its feature dimension.

name: str = 'dsemb'

requires_target_domain_data: bool = False

summary_statistic(data: Tensor) → Tensor[source]

Returns the last states of the rollout as a vector.

Parameters:: data – states and actions of a rollout or segment to be transformed for inference
Returns:: all states as a flattened vector

class DynamicTimeWarpingEmbedding(spec: EnvSpec, dim_data: int, step_pattern: Optional[Union[str, StepPattern]] = None, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which uses the dtw-python package to compute the Dynamic Time Warping (DTW) distance between the states as features of the data

Constructor

Parameters:

spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
step_pattern – method passed to dtw-python for computing the distance. Here the same default as in the dtw-python package is used (“symmetric2”). To for example use the Rabiner-Juang type VI-c unsmoothed recursion step pattern pass dtw.stepPattern.rabinerJuangStepPattern(6, “c”)
downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property dim_output: int: Get the dimension of the embeddings output, i.e. its feature dimension.

name: str = 'dtwemb'

requires_target_domain_data: bool = True

summary_statistic(data: Tensor) → Tensor[source]

Returns the dynamic time warping distance between the simulated rollouts” and the real rollouts’ states.

Note

It is necessary to take the mean over all distances since the same function is used to compute the observations (for sbi) form the target domain rollouts. At this point in time there might be only one target domain rollout, thus the target domain rollouts are only compared with themselves, thus yield a scalar distance value.

Parameters:: data – data tensor containing the simulated states (1st part of the 1st half of the 1st dim) and the real states (1st part of the 2nd half of the 1st dim)
Returns:: dynamic time warping distance in multi-dim state space

class Embedding(spec: EnvSpec, dim_data: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: ABC, Module

Base class for all embeddings used for simulation-based inference with time series data

Note

The features of each rollout are concatenated, and since the inference procedure requires a consistent size of the inputs, it is necessary that all rollouts yield the same number of features, i.e. have equal length!

Constructor

Parameters:

spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
downsampling_factor – skip evey downsampling_factor time series sample, no downsampling by default
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property device: str: Get the device (CPU or GPU) on which the embedding is stored.

abstract property dim_output: int: Get the dimension of the embeddings output, i.e. its feature dimension.

forward(data: Tensor) → Tensor[source]

Transforms rollouts into the observations used for likelihood-free inference. Currently a state-representation as well as state-action summary-statistics are available.

Parameters:: data – packed data of shape [batch_size, num_rollouts, len_time_series, dim_data]
Returns:: features of the data extracted from the embedding of shape [[batch_size, num_rollouts * dim_feat]

forward_one_batch(data_batch: Tensor) → Tensor[source]

Iterate over all rollouts and compute the features for each rollout separately, then average the features over the rollouts.

Parameters:: data_batch – data batch of shape [num_rollouts, len_time_series, dim_data]
Returns:: concatenation of the features for each rollout

name: str

static pack(data: Tensor) → Tensor[source]

Reshape the data such that the shape is [batch_dim, num_rollouts, data_points_flattened].

Parameters:: data – un-packed a.k.a. un-flattened data
Returns:: packed a.k.a. flattened data

requires_target_domain_data: bool

abstract summary_statistic(data: Tensor) → Tensor[source]

static unpack(data: Tensor, dim_data_orig: int) → Tensor[source]

Reshape the data such that the shape is [batch_dim, num_rollouts, len_time_series, dim_data].

Parameters:

data – packed a.k.a. flattened data
dim_data_orig – dimension of the original data

Returns:

un-pack a.k.a. un-flattened data

class LastStepEmbedding(spec: EnvSpec, dim_data: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which selects the last state of the rollouts

Constructor

Parameters:

spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
downsampling_factor – skip evey downsampling_factor time series sample, no downsampling by default
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property dim_output: int: Get the dimension of the embeddings output, i.e. its feature dimension.

name: str = 'lsemb'

requires_target_domain_data: bool = False

summary_statistic(data: Tensor) → Tensor[source]

Returns the last state of the rollout as a vector.

Parameters:: data – states and actions of a rollout or segment to be transformed for inference
Returns:: last states as a vector

class RNNEmbedding(spec: ~pyrado.utils.data_types.EnvSpec, dim_data: int, hidden_size: int, num_recurrent_layers: int, output_size: int, recurrent_network_type: type = <class 'torch.nn.modules.rnn.RNN'>, only_last_output: bool = False, len_rollouts: ~typing.Optional[int] = None, output_nonlin: ~typing.Optional[~typing.Callable] = None, dropout: float = 0.0, init_param_kwargs: ~typing.Optional[dict] = None, downsampling_factor: int = 1, state_mask_labels: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Union[int, str]], ~typing.List[~typing.Union[int, str]]]] = None, act_mask_labels: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Union[int, str]], ~typing.List[~typing.Union[int, str]]]] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which uses an recurrent neural network, e.g. RNN, LSTM, or GRU, to compute features of the rollouts

Constructor

Parameters:

spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the state and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
hidden_size – size of the hidden layers (all equal)
num_recurrent_layers – number of equally sized hidden layers
recurrent_network_type – PyTorch recurrent network class, e.g. nn.RNN, nn.LSTM, or nn.GRU
output_size – size of the features at every time step, which are eventually reshaped into a vector
only_last_output – if True, only the last output of the network is used as a feature for sbi, else there will be an output every downsampling_factor time steps. Moreover, if True the constructor does not need to know how long the rollouts are.
len_rollouts – number of time steps per rollout without considering a potential downsampling later (must be the same for all rollouts)
output_nonlin – nonlinearity for output layer
dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor
downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU

property dim_output: int: Get the dimension of the embeddings output, i.e. its feature dimension.

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]

name: str = 'rnnemb'

requires_target_domain_data: bool = False

summary_statistic(data: Tensor) → Tensor[source]

Pass the time series data through a recurrent neural network.

Parameters:: data – states and actions of a rollout or segment to be transformed for inference
Returns:: features obtained from the RNN at every time step, fattened into a vector

sbi_rollout_sampler

class RealRolloutSamplerForSBI(env: Env, policy: Policy, embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, stop_on_done: bool = True)[source]

Bases: RolloutSamplerForSBI, Serializable

Wrapper to make SimuRLacra’s real environments similar to the sbi simulator

Constructor

Parameters:

env – environment which the policy operates, in sim-to-real settings this is a real-world device, i.e. RealEnv, but in a sim-to-sim experiment this can be a (randomized) SimEnv
policy – policy used for sampling the rollout
embedding – embedding used for pre-processing the data before (later) passing it to the posterior
num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).

class RecRolloutSamplerForSBI(rollouts_dir: str, embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, rand_init_rollout: bool = True)[source]

Bases: RealRolloutSamplerForSBI, Serializable

Wrapper to yield pre-recorded rollouts similar to the sbi simulator

Constructor

Parameters:

rollouts_dir – directory where to find the of pre-recorded rollouts
num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
embedding – embedding used for pre-processing the data before (later) passing it to the posterior
len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
rand_init_rollout – if True, chose the first rollout at random, and then cycle through the list

property num_rollouts: int: Get the number of stored rollouts.

property ring_idx: int: Get the buffer’s index.

class RolloutSamplerForSBI(env: Env, policy: Policy, embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, stop_on_done: bool = True)[source]

Bases: ABC, Serializable

Wrapper to do enable the sbi simulator instance to make rollouts from SimuRLacra environments as if the environment was a callable that only needs the simulator parameters as inputs

Note

The features of each rollout are concatenated, and since the inference procedure requires a consistent size of the inputs, it is necessary that all rollouts yield the same number of features, i.e. have equal length!

Constructor

Parameters:

env – environment which the policy operates, in sim-to-real settings this is a real-world device, buy in a sim-to-sim experiment this can be a (randomized) SimEnv. We strip all domain randomization wrappers from this env since we want to randomize it manually here.
policy – policy used for sampling the rollout
embedding – embedding used for pre-processing the data before (later) passing it to the posterior
num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).

static get_dim_data(spec: EnvSpec) → int[source]

Compute the dimension of the data which is extracted from the rollouts.

Parameters:: spec – environment specification
Returns:: dimension of one data sample, i.e. one time step

class SimRolloutSamplerForSBI(env: Union[SimEnv, EnvWrapper], policy: Policy, dp_mapping: Mapping[int, str], embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, stop_on_done: bool = True, rollouts_real: Optional[List[StepSequence]] = None, use_rec_act: bool = True)[source]

Bases: RolloutSamplerForSBI, Serializable

Wrapper to make SimuRLacra’s simulation environments usable as simulators for the sbi package

Constructor

Parameters:

env – environment which the policy operates, which must not be a randomized environment since we want to randomize it manually via the domain parameters coming from the sbi package
policy – policy used for sampling the rollout
dp_mapping – mapping from subsequent integers (starting at 0) to domain parameter names (e.g. mass)
embedding – embedding used for pre-processing the data before (later) passing it to the posterior
num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).
rollouts_real – list of rollouts recorded from the target domain, which are used to sync the simulations’ initial states
use_rec_act – if True the recorded actions form the target domain are used to generate the rollout during simulation (feed-forward). If False there policy is used to generate (potentially) state-dependent actions (feed-back).

check_domain_params(rollouts: Union[List[StepSequence], StepSequence], domain_param_value: ndarray, domain_param_names: Union[List[str], ValuesView])[source]

Verify if the domain parameters in the rollout are actually the ones commanded.

Parameters:

rollouts – simulated rollouts or rollout segments
domain_param_value – one set of domain parameters as commanded
domain_param_names – names of the domain parameters to set, i.e. values of the domain parameter mapping

sequences

sequence(x_init, iterations, iterator_function, dtype=<class 'int'>)[source]

sequence_add_init(x_init, iter, dtype=<class 'int'>)[source]

Mathematical sequence: x_n = x_0 * n

Parameters:

x_init – initial values of the sequence
iter – iteration until the sequence should be evaluated
dtype – data type to cast to (either int of float)

Returns: