sampling

bootstrapping

bootstrap_ci(data: ndarray, stat_fcn: Callable, num_reps: int, alpha: float, ci_sides: int, bias_correction: bool = False, studentized: bool = False, seed: Optional[int] = None)[source]

Re-sampling input data using the nonparametric bootstrap method, computing bootstrap replications using stat_fcn and computing a confidence interval on the statistic of interest given by stat_fcn which needs to expect the argument axis (like numpy functions do).

Parameters:
  • data – data to bootstrap from (for now only 1D arrays supported)

  • stat_fcn – function to compute a statistic of interest (e.g. mean, variance) on bootstrap samples

  • num_reps – number of samples in every bootstrap sample

  • alpha – determines the confidence level \(1 - \alpha \in [0, 1]\)

  • ci_sides – one or two-sided confidence interval

  • axis – axis to compute along in case of 2-dim data

  • bias_correction – bool to decide if the bias should be subtracted (see [2]). However, the confidence intervals are constructed independent of the bias-correction (see [5, p.7]). The bias-correction can be dangerous in practice. Even though T_bc(D) is less biased than T(D), the bias-corrected estimator may have substantially larger variance. This is due to a possibly higher variability in the estimate of the bias, particularly when computed from small data sets. Other estimates of the bias-correction factor than stat_emp possible, see [4].

  • studentized – flag to determine if the method based on the t-distribution is used (leads to a wider ci)

  • seed – value for the random number generators’ seeds, pass None to skip seeding

Returns:

mean of the bootstrap replications, and the confidence interval

cvar_sampler

class CVaRSampler(wrapped_sampler, epsilon: float, gamma: float = 1.0, *, min_rollouts: Optional[int] = None, min_steps: Optional[int] = None)[source]

Bases: SamplerBase, LoggerAware

Samples rollouts to optimize the CVaR of the discounted return. This is done by sampling more rollouts, and then only using the epsilon-qunatile of them.

Constructor

Parameters:
  • wrapped_sampler – the inner sampler used to sample the full data set

  • epsilon – quantile of rollouts that will be kept

  • gamma – discount factor to compute the discounted return, default is 1 (no discount)

  • min_rollouts – minimum number of complete rollouts to sample

  • min_steps – minimum total number of steps to sample

reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]

Reset the sampler after changes were made to the environment or the policy, optionally replacing one of them.

Most samplers will be implemented in parallel, so if there are changes to the environment or the policy, they will not automatically propagate to all processes. This method exists as a workaround; call it to force a reinitialization of environment and policy in all subprocesses.

Note that you don’t need to call this if the policy parameters change, since that is to be expected between sampling runs, the sample() method takes care of this on it’s own.

You can use the env and policy parameters to completely replace the stored environment or policy.

Parameters:
  • env – new environment to use, or None to keep the old one

  • policy – new policy to use, or None to keep the old one

sample() List[StepSequence][source]

Generate a list of rollouts. This method works exactly as specified in the class description.

Returns:

sampled rollouts

set_min_count(min_rollouts=None, min_steps=None)[source]

Adapt the sampling boundaries.

Parameters:
  • min_rollouts – minimum number of complete rollouts to sample

  • min_steps – minimum total number of steps to sample

select_cvar(rollouts, epsilon: float, gamma: float = 1.0)[source]

Select a subset of rollouts so that their mean discounted return is the CVaR(eps) of the full rollout set.

Parameters:
  • rollouts – list of rollouts

  • epsilon – chosen return quantile

  • gamma – discount factor to compute the discounted return, default is 1 (no discount)

Returns:

list of selected rollouts

data_format

cat_to_format(data: Union[dict, tuple, Sequence], data_format: str)[source]

Concatenate the generic data in the given data format. For dicts, the dict elements are stacked individually. A list of dicts is treated as a dict of lists.

Parameters:
  • data – input data

  • data_format – numpy or torch

Returns:

numpy.ndarray or torch.Tensor, or dict of these

new_tuple(nt_type, values)[source]

Create a new tuple of the same type as nt_type. This handles the constructor differences between tuple and NamedTuples

Parameters:
  • nt_type – type of tuple

  • values – values as sequence

Returns:

new named tuple

stack_to_format(data: Union[dict, tuple, Sequence], data_format: str)[source]

Stack the generic data in the given data format. For dicts, the dict elements are stacked individually. A list of dicts is treated as a dict of lists.

Parameters:
  • data – input data

  • data_format – ‘numpy’ or ‘torch’

Returns:

numpy array or PyTorch tensor, or dict of these

to_format(data, data_format, data_type=None)[source]

Convert the tensor data to the given data format.

Parameters:
  • data – input data

  • data_format – numpy or torch

  • data_type – type to return data in. When None is passed, the data type is left unchanged.

Returns:

numpy.ndarray or torch.Tensor

hyper_sphere

sample_from_hyper_sphere_surface(num_dim: int, method: str) Tensor[source]

Sampling from the surface of a multidimensional unit sphere.

See also

[1] G. Marsaglia, “Choosing a Point from the Surface of a Sphere”, Ann. Math. Statist., 1972

Parameters:
  • num_dim – number of dimensions of the sphere

  • method – approach used to acquire the samples

Returns:

sample with L2-norm equal 1

parallel_evaluation

eval_domain_params(pool: ~pyrado.sampling.sampler_pool.SamplerPool, env: ~pyrado.environments.sim_base.SimEnv, policy: ~pyrado.policies.base.Policy, params: ~typing.List[~typing.Dict], init_state: ~typing.Optional[~numpy.ndarray] = None, seed: int = <object object>) List[StepSequence][source]

Evaluate a policy on a multidimensional grid of domain parameters.

Parameters:
  • pool – parallel sampler

  • env – environment to evaluate in

  • policy – policy to evaluate

  • params – multidimensional grid of domain parameters

  • init_state – initial state of the environment which will be fixed if not set to None

Returns:

list of rollouts

eval_domain_params_with_segmentwise_reset(pool: SamplerPool, env_sim: SimEnv, policy: Policy, segments_real_all: List[List[StepSequence]], domain_params_ml_all: List[List[dict]], stop_on_done: bool, use_rec: bool) List[List[StepSequence]][source]

Evaluate a policy for a given set of domain parameters, synchronizing the segments’ initial states with the given target domain segments

Parameters:
  • pool – parallel sampler

  • env_sim – environment to evaluate in

  • policy – policy to evaluate

  • segments_real_all – all segments from the target domain rollout

  • domain_params_ml_all – all domain parameters to evaluate over

  • stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).

  • use_recTrue if pre-recorded actions have been used to generate the rollouts

Returns:

list of segments of rollouts

eval_nominal_domain(pool: SamplerPool, env: SimEnv, policy: Policy, init_states: List[ndarray]) List[StepSequence][source]

Evaluate a policy using the nominal (set in the given environment) domain parameters.

Parameters:
  • pool – parallel sampler

  • env – environment to evaluate in

  • policy – policy to evaluate

  • init_states – initial states of the environment which will be fixed if not set to None

Returns:

list of rollouts

eval_randomized_domain(pool: SamplerPool, env: SimEnv, randomizer: DomainRandomizer, policy: Policy, init_states: List[ndarray]) List[StepSequence][source]

Evaluate a policy in a randomized domain.

Parameters:
  • pool – parallel sampler

  • env – environment to evaluate in

  • randomizer – randomizer used to sample random domain instances, inherited from DomainRandomizer

  • policy – policy to evaluate

  • init_states – initial states of the environment which will be fixed if not set to None

Returns:

list of rollouts

parallel_rollout_sampler

class ParallelRolloutSampler(env, policy, num_workers: int, *, min_rollouts: ~typing.Optional[int] = None, min_steps: ~typing.Optional[int] = None, show_progress_bar: bool = True, seed: int = <object object>)[source]

Bases: SamplerBase, Serializable

Class for sampling from multiple environments in parallel

Constructor

Parameters:
  • env – environment to sample from

  • policy – policy to act in the environment (can also be an exploration strategy)

  • num_workers – number of parallel samplers

  • min_rollouts – minimum number of complete rollouts to sample

  • min_steps – minimum total number of steps to sample

  • show_progress_bar – it True, display a progress bar using tqdm

  • seed – seed value for the random number generators, pass None for no seeding; defaults to the last seed that was set with pyrado.set_seed

reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]

Re-initialize the sampler.

Parameters:
  • env – the environment which the policy operates

  • policy – the policy used for sampling

sample(init_states: Optional[List[ndarray]] = None, domain_params: Optional[List[dict]] = None, eval: bool = False) List[StepSequence][source]

Do the sampling according to the previously given environment, policy, and number of steps/rollouts.

Note

This method is not thread-safe! See for example the usage of self._sample_count.

Parameters:
  • init_states – initial states forw run_map(), pass None (default) to sample from the environment’s initial state space

  • domain_params – domain parameters for run_map(), pass None (default) to not explicitly set them

  • eval – pass False if the rollout is executed during training, else True. Forwarded to rollout().

Returns:

list of sampled rollouts

parameter_exploration_sampler

class ParameterExplorationSampler(env: Union[SimEnv, EnvWrapper], policy: Policy, num_init_states_per_domain: int, num_domains: int, num_workers: int, seed: Optional[int] = None)[source]

Bases: Serializable

Parallel sampler for parameter exploration

Constructor

Parameters:
  • env – environment to sample from

  • policy – policy used for sampling

  • num_init_states_per_domain – number of rollouts to cover the variance over initial states

  • num_domains – number of rollouts due to the variance over domain parameters

  • num_workers – number of parallel samplers

  • seed – seed value for the random number generators, pass None for no seeding; defaults to the last seed that was set with pyrado.set_seed

property num_rollouts_per_param: int

Get the number of rollouts per policy parameter set.

reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]

Re-initialize the sampler.

Parameters:
  • env – the environment which the policy operates

  • policy – the policy used for sampling

sample(param_sets: Tensor, init_states: Optional[List[ndarray]] = None) ParameterSamplingResult[source]

Sample rollouts for a given set of parameters.

Note

This method is not thread-safe! See for example the usage of self._sample_count.

Parameters:
  • param_sets – sets of policy parameters

  • init_states – fixed initial states, pass None to randomly sample initial states

Returns:

data structure containing the policy parameter sets and the associated rollout data

class ParameterSample(params: Tensor, rollouts: List[StepSequence])[source]

Bases: tuple

Stores policy parameters and associated rollouts.

Create new instance of ParameterSample(params, rollouts)

property mean_undiscounted_return: float

Get the mean of the undiscounted returns over all rollouts.

property num_rollouts: int

Get the number of rollouts.

property params

Alias for field number 0

property rollouts

Alias for field number 1

class ParameterSamplingResult(samples: Sequence[ParameterSample])[source]

Bases: Sequence[ParameterSample]

Result of a parameter exploration sampling run. On one hand, this is a list of ParameterSamples. On the other hand, this allows to query combined tensors of parameters and mean returns.

Constructor

Parameters:

samples – list of parameter samples

mean_returns()[source]

Get all parameter sample means return as a N-dim vector, where N is the number of samples.

num_rollouts()[source]

Get the total number of rollouts for all samples.

parameters()[source]

Get all policy parameters as NxP matrix, where N is the number of samples and P is the policy param dim.

rollouts()[source]

Get all rollouts for all samples, i.e. a list of pop_size items, each a list of nom_rollouts rollouts.

rollout

after_rollout_query(env: Env, policy: Policy, rollout: StepSequence) Tuple[bool, Optional[ndarray], Optional[dict]][source]

Ask the user what to do after a rollout has been animated.

Parameters:
  • env – environment used for the rollout

  • policy – policy used for the rollout

  • rollout – collected data from the rollout

Returns:

done flag, initial state, and domain parameters

rollout(env: Env, policy: Union[Module, Policy, Callable], eval: bool = False, max_steps: Optional[int] = None, reset_kwargs: Optional[dict] = None, render_mode: RenderMode = RenderMode(text=False, video=False, render=False), render_step: int = 1, no_reset: bool = False, no_close: bool = False, record_dts: bool = False, stop_on_done: bool = True, seed: Optional[int] = None, sub_seed: Optional[int] = None, sub_sub_seed: Optional[int] = None) StepSequence[source]

Perform a rollout (i.e. sample a trajectory) in the given environment using given policy.

Parameters:
  • env – environment to use (SimEnv or RealEnv)

  • policy – policy to determine the next action given the current observation. This policy may be wrapped by an exploration strategy.

  • eval – pass False if the rollout is executed during training, else True. Forwarded to PyTorch Module.

  • max_steps – maximum number of time steps, if None the environment’s property is used

  • reset_kwargs – keyword arguments passed to environment’s reset function

  • render_mode – determines if the user sees an animation, console prints, or nothing

  • render_step – rendering interval, renders every step if set to 1

  • no_reset – do not reset the environment before running the rollout

  • no_close – do not close (and disconnect) the environment after running the rollout

  • record_dts – flag if the time intervals of different parts of one step should be recorded (for debugging)

  • stop_on_done – set to false to ignore the environment’s done flag (for debugging)

  • seed – seed value for the random number generators, pass None for no seeding

:return paths of the observations, actions, rewards, and information about the environment as well as the policy

sampler

class SamplerBase(*, min_rollouts: Optional[int] = None, min_steps: Optional[int] = None)[source]

Bases: ABC

A sampler generates a list of rollouts in some unspecified way.

Since the sampling might occur in parallel, there is no way to reliably generate an exact amount of samples. The sampler can however guarantee a minimum amount of samples to be available. The sampler does not discard any samples on it’s own, all sampled data will be returned. There are two ways to regulate the sampling process: 1. the minimum number of rollouts 2. the minimum number of steps in all rollouts

At least one of these bounds must be specified. If both are set, the sampler will only terminate once both are fulfilled.

Constructor

Parameters:
  • min_rollouts – minimum number of complete rollouts to sample

  • min_steps – minimum total number of steps to sample

abstract reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]

Reset the sampler after changes were made to the environment or the policy, optionally replacing one of them.

Most samplers will be implemented in parallel, so if there are changes to the environment or the policy, they will not automatically propagate to all processes. This method exists as a workaround; call it to force a reinitialization of environment and policy in all subprocesses.

Note that you don’t need to call this if the policy parameters change, since that is to be expected between sampling runs, the sample() method takes care of this on it’s own.

You can use the env and policy parameters to completely replace the stored environment or policy.

Parameters:
  • env – new environment to use, or None to keep the old one

  • policy – new policy to use, or None to keep the old one

abstract sample() List[StepSequence][source]

Generate a list of rollouts. This method works exactly as specified in the class description.

Returns:

sampled rollouts

set_min_count(min_rollouts: Optional[int] = None, min_steps: Optional[int] = None)[source]

Adapt the sampling boundaries.

Parameters:
  • min_rollouts – minimum number of complete rollouts to sample

  • min_steps – minimum total number of steps to sample

sampler_pool

class GlobalNamespace[source]

Bases: object

Type of the worker’s global namespace

class SamplerPool(num_threads: int)[source]

Bases: object

A process pool capable of executing operations in parallel. This differs from the multiprocessing.Pool class in that it explicitly incorporates process-local state.

Every parallel function gets a GlobalNamespace object as first argument, which can hold arbitrary worker-local state. This allows for certain optimizations. For example, when the parallel operation requires an object that is expensive to transmit, we can create this object once in each process, store it in the namespace, and then use it in every map function call.

This class also contains additional methods to call a function exactly once in each worker, to setup worker-local state.

invoke_all(func, *args, **kwargs)[source]

Invoke func on all workers using the same argument values. The return values are collected into a list.

Parameters:

func – the first argument of func will be a worker-local namespace

invoke_all_map(func, arglist)[source]

Invoke func(arg) on all workers using one argument from the list for each ordered worker. The length of the argument list must match the number of workers. The first argument of func will be a worker-local namespace. The return values are collected into a list.

run_collect(n, func, *args, collect_progressbar: Optional[tqdm] = None, min_runs=None, **kwargs) tuple[source]

Collect at least n samples from func, where the number of samples per run can vary.

This is done by calling res, ns = func(G, *args, **kwargs) until the sum of ns exceeds n.

This is intended for situations like reinforcement learning runs. If the environment ends up in an error state, you get less samples per run. To ensure a stable learning behaviour, you can specify the minimum amount of samples to collect before returning.

Since the workers can only check the amount of samples before starting a run, you will likely get more samples than the minimum. No generated samples that are part of a rollout are dropped. However, if some rollouts where sampled that are “too much”, those will be dropped to get seed- determinism across different number of workers.

Parameters:
  • n – minimum number of samples to collect

  • func – sampler function, must be pickleable

  • args – remaining positional args are passed to the function

  • collect_progressbartdqm progress bar to use; default None

  • min_runs – optionally specify a minimum amount of runs to be executed before returning

  • kwargs – remaining keyword args are passed to the function

Returns:

list of results

Returns:

total number of samples

run_map(func, arglist: list, progressbar: Optional[tqdm] = None)[source]

A parallel version of [func(G, arg) for arg in arglist]. There is no deterministic assignment of workers to arglist elements. Optionally runs with progress bar.

Parameters:
  • func – mapper function, must be pickleable

  • arglist – list of function args

  • progressbar – optional progress bar from the tqdm library

Returns:

list of results

set_seed(seed)[source]

Set a deterministic seed on all workers.

Note

This is intended to only be used in legacy evaluation scripts! For new code and everything that should really be reproducible, pass the seed to the sample() method of a ParallelRolloutSampler.

Parameters:

seed – seed value for the random number generators

stop()[source]

Terminate all workers.

sbi_embeddings

class AllStepsEmbedding(spec: EnvSpec, dim_data: int, len_rollouts: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which computes the same features of the rollouts states and actions as done in [1]

[1] F. Ramos, R.C. Possas, D. Fox, “BayesSim: adaptive domain randomization via probabilistic inference for

robotics simulators”, RSS, 2019

Constructor

Parameters:
  • spec – environment specification

  • dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the state and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.

  • len_rollouts – number of time steps per rollout without considering a potential downsampling later (must be the same for all rollouts)

  • downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()

  • state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.

  • act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property dim_output: int

Get the dimension of the embeddings output, i.e. its feature dimension.

name: str = 'asemb'
requires_target_domain_data: bool = False
summary_statistic(data: Tensor) Tensor[source]
Returns the full states of the rollout as a vector i.e.

the time-steps and state dimension are flattend into one dimension.

Parameters:

data – states and actions of a rollout or segment to be transformed for inference

Returns:

all states as a flattened vector

class BayesSimEmbedding(spec: EnvSpec, dim_data: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which computes the same features of the rollouts states and actions as done in [1]

[1] F. Ramos, R.C. Possas, D. Fox, “BayesSim: adaptive domain randomization via probabilistic inference for

robotics simulators”, RSS, 2019

Constructor

Parameters:
  • spec – environment specification

  • dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.

  • downsampling_factor – skip evey downsampling_factor time series sample, no downsampling by default

  • state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.

  • act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property dim_output: int

Get the dimension of the embeddings output, i.e. its feature dimension.

name: str = 'bsemb'
requires_target_domain_data: bool = False
summary_statistic(data: Tensor) Tensor[source]

Computing summary statistics based on approach in [1], see eq. (22). This method guarantees output which has the same size for every trajectory.

[1] F. Ramos, R.C. Possas, D. Fox, “BayesSim: adaptive domain randomization via probabilistic inference for

robotics simulators”, RSS, 2019

Parameters:

data – states and actions of a rollout or segment to be transformed for inference

Returns:

summary statistics of the rollout

class DeltaStepsEmbedding(spec: EnvSpec, dim_data: int, len_rollouts: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which returns the change in the states between consecutive time steps of the rollouts

Constructor

Parameters:
  • spec – environment specification

  • dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the state and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.

  • len_rollouts – number of time steps per rollout without considering a potential downsampling later (must be the same for all rollouts)

  • downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()

  • state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.

  • act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property dim_output: int

Get the dimension of the embeddings output, i.e. its feature dimension.

name: str = 'dsemb'
requires_target_domain_data: bool = False
summary_statistic(data: Tensor) Tensor[source]

Returns the last states of the rollout as a vector.

Parameters:

data – states and actions of a rollout or segment to be transformed for inference

Returns:

all states as a flattened vector

class DynamicTimeWarpingEmbedding(spec: EnvSpec, dim_data: int, step_pattern: Optional[Union[str, StepPattern]] = None, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which uses the dtw-python package to compute the Dynamic Time Warping (DTW) distance between the states as features of the data

Constructor

Parameters:
  • spec – environment specification

  • dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.

  • step_pattern – method passed to dtw-python for computing the distance. Here the same default as in the dtw-python package is used (“symmetric2”). To for example use the Rabiner-Juang type VI-c unsmoothed recursion step pattern pass dtw.stepPattern.rabinerJuangStepPattern(6, “c”)

  • downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()

  • state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.

  • act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property dim_output: int

Get the dimension of the embeddings output, i.e. its feature dimension.

name: str = 'dtwemb'
requires_target_domain_data: bool = True
summary_statistic(data: Tensor) Tensor[source]

Returns the dynamic time warping distance between the simulated rollouts” and the real rollouts’ states.

Note

It is necessary to take the mean over all distances since the same function is used to compute the observations (for sbi) form the target domain rollouts. At this point in time there might be only one target domain rollout, thus the target domain rollouts are only compared with themselves, thus yield a scalar distance value.

Parameters:

data – data tensor containing the simulated states (1st part of the 1st half of the 1st dim) and the real states (1st part of the 2nd half of the 1st dim)

Returns:

dynamic time warping distance in multi-dim state space

class Embedding(spec: EnvSpec, dim_data: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: ABC, Module

Base class for all embeddings used for simulation-based inference with time series data

Note

The features of each rollout are concatenated, and since the inference procedure requires a consistent size of the inputs, it is necessary that all rollouts yield the same number of features, i.e. have equal length!

Constructor

Parameters:
  • spec – environment specification

  • dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.

  • downsampling_factor – skip evey downsampling_factor time series sample, no downsampling by default

  • state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.

  • act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property device: str

Get the device (CPU or GPU) on which the embedding is stored.

abstract property dim_output: int

Get the dimension of the embeddings output, i.e. its feature dimension.

forward(data: Tensor) Tensor[source]

Transforms rollouts into the observations used for likelihood-free inference. Currently a state-representation as well as state-action summary-statistics are available.

Parameters:

data – packed data of shape [batch_size, num_rollouts, len_time_series, dim_data]

Returns:

features of the data extracted from the embedding of shape [[batch_size, num_rollouts * dim_feat]

forward_one_batch(data_batch: Tensor) Tensor[source]

Iterate over all rollouts and compute the features for each rollout separately, then average the features over the rollouts.

Parameters:

data_batch – data batch of shape [num_rollouts, len_time_series, dim_data]

Returns:

concatenation of the features for each rollout

name: str
static pack(data: Tensor) Tensor[source]

Reshape the data such that the shape is [batch_dim, num_rollouts, data_points_flattened].

Parameters:

data – un-packed a.k.a. un-flattened data

Returns:

packed a.k.a. flattened data

requires_target_domain_data: bool
abstract summary_statistic(data: Tensor) Tensor[source]
static unpack(data: Tensor, dim_data_orig: int) Tensor[source]

Reshape the data such that the shape is [batch_dim, num_rollouts, len_time_series, dim_data].

Parameters:
  • data – packed a.k.a. flattened data

  • dim_data_orig – dimension of the original data

Returns:

un-pack a.k.a. un-flattened data

class LastStepEmbedding(spec: EnvSpec, dim_data: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which selects the last state of the rollouts

Constructor

Parameters:
  • spec – environment specification

  • dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.

  • downsampling_factor – skip evey downsampling_factor time series sample, no downsampling by default

  • state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.

  • act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property dim_output: int

Get the dimension of the embeddings output, i.e. its feature dimension.

name: str = 'lsemb'
requires_target_domain_data: bool = False
summary_statistic(data: Tensor) Tensor[source]

Returns the last state of the rollout as a vector.

Parameters:

data – states and actions of a rollout or segment to be transformed for inference

Returns:

last states as a vector

class RNNEmbedding(spec: ~pyrado.utils.data_types.EnvSpec, dim_data: int, hidden_size: int, num_recurrent_layers: int, output_size: int, recurrent_network_type: type = <class 'torch.nn.modules.rnn.RNN'>, only_last_output: bool = False, len_rollouts: ~typing.Optional[int] = None, output_nonlin: ~typing.Optional[~typing.Callable] = None, dropout: float = 0.0, init_param_kwargs: ~typing.Optional[dict] = None, downsampling_factor: int = 1, state_mask_labels: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Union[int, str]], ~typing.List[~typing.Union[int, str]]]] = None, act_mask_labels: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Union[int, str]], ~typing.List[~typing.Union[int, str]]]] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]

Bases: Embedding

Embedding for simulation-based inference with time series data which uses an recurrent neural network, e.g. RNN, LSTM, or GRU, to compute features of the rollouts

Constructor

Parameters:
  • spec – environment specification

  • dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the state and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.

  • hidden_size – size of the hidden layers (all equal)

  • num_recurrent_layers – number of equally sized hidden layers

  • recurrent_network_type – PyTorch recurrent network class, e.g. nn.RNN, nn.LSTM, or nn.GRU

  • output_size – size of the features at every time step, which are eventually reshaped into a vector

  • only_last_output – if True, only the last output of the network is used as a feature for sbi, else there will be an output every downsampling_factor time steps. Moreover, if True the constructor does not need to know how long the rollouts are.

  • len_rollouts – number of time steps per rollout without considering a potential downsampling later (must be the same for all rollouts)

  • output_nonlin – nonlinearity for output layer

  • dropout – dropout probability, default = 0 deactivates dropout

  • init_param_kwargs – additional keyword arguments for the policy parameter initialization

  • recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor

  • downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()

  • state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.

  • act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.

  • use_cudaTrue to move the policy to the GPU, False (default) to use the CPU

property dim_output: int

Get the dimension of the embeddings output, i.e. its feature dimension.

init_param(init_values: Optional[Tensor] = None, **kwargs)[source]
name: str = 'rnnemb'
requires_target_domain_data: bool = False
summary_statistic(data: Tensor) Tensor[source]

Pass the time series data through a recurrent neural network.

Parameters:

data – states and actions of a rollout or segment to be transformed for inference

Returns:

features obtained from the RNN at every time step, fattened into a vector

sbi_rollout_sampler

class RealRolloutSamplerForSBI(env: Env, policy: Policy, embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, stop_on_done: bool = True)[source]

Bases: RolloutSamplerForSBI, Serializable

Wrapper to make SimuRLacra’s real environments similar to the sbi simulator

Constructor

Parameters:
  • env – environment which the policy operates, in sim-to-real settings this is a real-world device, i.e. RealEnv, but in a sim-to-sim experiment this can be a (randomized) SimEnv

  • policy – policy used for sampling the rollout

  • embedding – embedding used for pre-processing the data before (later) passing it to the posterior

  • num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.

  • len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.

  • stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).

class RecRolloutSamplerForSBI(rollouts_dir: str, embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, rand_init_rollout: bool = True)[source]

Bases: RealRolloutSamplerForSBI, Serializable

Wrapper to yield pre-recorded rollouts similar to the sbi simulator

Constructor

Parameters:
  • rollouts_dir – directory where to find the of pre-recorded rollouts

  • num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.

  • embedding – embedding used for pre-processing the data before (later) passing it to the posterior

  • len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.

  • rand_init_rollout – if True, chose the first rollout at random, and then cycle through the list

property num_rollouts: int

Get the number of stored rollouts.

property ring_idx: int

Get the buffer’s index.

class RolloutSamplerForSBI(env: Env, policy: Policy, embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, stop_on_done: bool = True)[source]

Bases: ABC, Serializable

Wrapper to do enable the sbi simulator instance to make rollouts from SimuRLacra environments as if the environment was a callable that only needs the simulator parameters as inputs

Note

The features of each rollout are concatenated, and since the inference procedure requires a consistent size of the inputs, it is necessary that all rollouts yield the same number of features, i.e. have equal length!

Constructor

Parameters:
  • env – environment which the policy operates, in sim-to-real settings this is a real-world device, buy in a sim-to-sim experiment this can be a (randomized) SimEnv. We strip all domain randomization wrappers from this env since we want to randomize it manually here.

  • policy – policy used for sampling the rollout

  • embedding – embedding used for pre-processing the data before (later) passing it to the posterior

  • num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.

  • len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.

  • stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).

static get_dim_data(spec: EnvSpec) int[source]

Compute the dimension of the data which is extracted from the rollouts.

Parameters:

spec – environment specification

Returns:

dimension of one data sample, i.e. one time step

class SimRolloutSamplerForSBI(env: Union[SimEnv, EnvWrapper], policy: Policy, dp_mapping: Mapping[int, str], embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, stop_on_done: bool = True, rollouts_real: Optional[List[StepSequence]] = None, use_rec_act: bool = True)[source]

Bases: RolloutSamplerForSBI, Serializable

Wrapper to make SimuRLacra’s simulation environments usable as simulators for the sbi package

Constructor

Parameters:
  • env – environment which the policy operates, which must not be a randomized environment since we want to randomize it manually via the domain parameters coming from the sbi package

  • policy – policy used for sampling the rollout

  • dp_mapping – mapping from subsequent integers (starting at 0) to domain parameter names (e.g. mass)

  • embedding – embedding used for pre-processing the data before (later) passing it to the posterior

  • num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.

  • len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.

  • stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).

  • rollouts_real – list of rollouts recorded from the target domain, which are used to sync the simulations’ initial states

  • use_rec_act – if True the recorded actions form the target domain are used to generate the rollout during simulation (feed-forward). If False there policy is used to generate (potentially) state-dependent actions (feed-back).

check_domain_params(rollouts: Union[List[StepSequence], StepSequence], domain_param_value: ndarray, domain_param_names: Union[List[str], ValuesView])[source]

Verify if the domain parameters in the rollout are actually the ones commanded.

Parameters:
  • rollouts – simulated rollouts or rollout segments

  • domain_param_value – one set of domain parameters as commanded

  • domain_param_names – names of the domain parameters to set, i.e. values of the domain parameter mapping

sequences

sequence(x_init, iterations, iterator_function, dtype=<class 'int'>)[source]
sequence_add_init(x_init, iter, dtype=<class 'int'>)[source]

Mathematical sequence: x_n = x_0 * n

Parameters:
  • x_init – initial values of the sequence

  • iter – iteration until the sequence should be evaluated

  • dtype – data type to cast to (either int of float)

Returns:

element at the given iteration and array of the whole sequence

sequence_const(x_init, iter, dtype=<class 'int'>)[source]

Mathematical sequence: x_n = x_0

Parameters:
  • x_init – constant values of the sequence

  • iter – iteration until the sequence should be evaluated

  • dtype – data type to cast to (either int of float)

Returns:

element at the given iteration and array of the whole sequence

sequence_nlog2(x_init, iter, dtype=<class 'int'>)[source]

Mathematical sequence: x_n = x_0 * n * log2(n+2), with log2 being the base 2 logarithm

Parameters:
  • x_init – initial values of the sequence

  • iter – iteration until the sequence should be evaluated

  • dtype – data type to cast to (either int of float)

Returns:

element at the given iteration and array of the whole sequence

sequence_plus_one(x_init, iter, dtype=<class 'int'>)[source]

Mathematical sequence: x_n = x_0 + n

Parameters:
  • x_init – initial values of the sequence

  • iter – iteration until the sequence should be evaluated

  • dtype – data type to cast to (either int of float)

Returns:

element at the given iteration and array of the whole sequence

sequence_rec_double(x_init, iter, dtype=<class 'int'>)[source]

Mathematical sequence: x_n = x_{n-1} * 2

Parameters:
  • x_init – initial values of the sequence

  • iter – iteration until the sequence should be evaluated

  • dtype – data type to cast to (either int of float)

Returns:

element at the given iteration and array of the whole sequence

sequence_rec_sqrt(x_init, iter, dtype=<class 'int'>)[source]

Mathematical sequence: x_n = x_{n-1} * sqrt(n)

Parameters:
  • x_init – initial values of the sequence

  • iter – iteration until the sequence should be evaluated

  • dtype – data type to cast to (either int of float)

Returns:

element at the given iteration and array of the whole sequence

sequence_sqrt(x_init, iter, dtype=<class 'int'>)[source]

Mathematical sequence: x_n = x_0 * sqrt(n)

Parameters:
  • x_init – initial values of the sequence

  • iter – iteration until the sequence should be evaluated

  • dtype – data type to cast to (either int of float)

Returns:

element at the given iteration and array of the whole sequence

step_sequence

class DictIndexProxy(obj: dict, index: int, path: Optional[str] = None)[source]

Bases: object

Views a slice through a dict of lists or tensors.

class Step(rollout, index)[source]

Bases: DictIndexProxy

A single step in a rollout.

This object is a proxy, referring a specific index in the rollout. When querying an attribute from the step, it will try to return the corresponding slice from the rollout. Additionally, one can prefix attributes with next_ to access the value for the next step, i.e. next_observations the observation made at the start of the next step.

Constructor

Parameters:
  • rolloutStepSequence object to which this step belongs

  • index – index of this step in the rollout

class StepSequence(*, complete: bool = True, rollout_info=None, data_format: Optional[str] = None, done: Optional[ndarray] = None, continuous: bool = True, rollout_bounds=None, rewards: Sequence, observations: Sequence, actions: Sequence, **data)[source]

Bases: Sequence[Step]

A sequence of steps.

During the rollout, the values of different variables are recorded. This class provides efficient storage and access for these values. The constructor accepts a list of step entries for each variable. For every step, the list should contain a Tensor/ndarray of values for that step. The shape of these tensors must be the same for all step entries. The passed tensors are then stacked, so that the first dimension is the step count. Some values, like the observations, can have one more element then there are steps to encode the state after the last step. Additionally, the step entries may be dicts to support keyed storage. A list of dicts is converted to a dict of lists, each of which will be regularly stacked. Apart from the variable-based view, the rollout can also be seen as a sequence of steps. Each Step object is a proxy, it’s attributes refer to the respective slice of the corresponding variable. The only required result variable are rewards, observations`, and actions. All other variables are optional. Common optional ones are states and rollout_info.

Note

Storing PyTorch tensors with gradient tracing is NOT supported. The rationale behind this is eager error avoidance. The only reason you would add them is to profit from the optimized slicing, but using that with gradient tracking risks lingering incomplete graphs.

Constructor

Parameters:
  • completeFalse if the rollout is incomplete, i.e. as part of a mini-batch

  • rollout_info – data staying constant through the whole episode

  • data_format – ‘torch’ to use Tensors, ‘numpy’ to use ndarrays. Will use Tensors if any data argument does, else ndarrays

  • done – boolean ndarray, specifying for each step whether it led to termination. The last step of continuous rollouts, i.e. not mini-batches, is done if complete is True.

  • continuous – true if the steps form one continuous sequence.

  • rewards – sequence of reward values, determines sequence length

  • observations – sequence of observation values, the length must be len(rewards) + 1

  • actions – sequence of action values, the length must be len(rewards)

  • data – additional data lists, their length must be len(rewards) or len(rewards) + 1

actions: Union[ndarray, Tensor]
add_data(name: str, value=None, item_shape: Optional[tuple] = None, with_after_last: bool = False)[source]

Add a new data field to the step sequence.

Parameters:
  • name – string for the name

  • value – the data

  • item_shape – shape to store the data in

  • with_after_lastTrue if there is one more element than the length (e.g. last observation)

classmethod concat(parts: Sequence[StepSequence], data_format: Optional[str] = None, truncate_last: bool = True)[source]

Concatenate multiple step sequences into one, truncating the last observation.

Parameters:
  • parts – batch of sequences to concatenate

  • data_format – torch to use Tensors, numpy to use ndarrays, None to choose automatically

  • truncate_last – remove the last step from each part, highly recommended to be True

Returns:

concatenated sequence of Steps

convert(data_format: str, data_type=None)[source]

Convert data to specified format.

Parameters:
  • data_format – torch to use Tensors, numpy to use ndarrays

  • data_type – optional torch/numpy dtype for data. When None is passed, the data type is left unchanged.

property data_format: str

Get the name of data format (‘torch’ or ‘numpy’).

property data_names: Sequence[str]

Get the list of data attribute names.

discounted_return(gamma: float) -> (<class 'torch.Tensor'>, <class 'numpy.ndarray'>)[source]

Compute the discounted return.

Parameters:

gamma – temporal discount factor

Returns:

exponentially weighted sum of rewards

classmethod from_pandas(df: DataFrame, env_spec: EnvSpec, continuous: bool = True, task: Optional[Task] = None) StepSequence[source]

Generate a StepSequence object from a Pandas DataFrame instance. Not all data fields are supported. The fields ‘rewards’ is mandatory.

Parameters:
  • df – Pandas DataFrame holding the data in 1-dim arrays

  • env_spec – environment specifications which labels are used to slice the DataFrame

  • continuousTrue if the rollout to be reconstructed was continuous

  • task – task containing the reward function(s) that can be used to recompute the rewards from the recorded observations and actions

Returns:

new StepSequence

get_data_values(name: str, truncate_last: bool = False)[source]

Return the data tensor stored under the given name.

Parameters:
  • name – data name

  • truncate_last – True to truncate the length+1 entry if present

get_rollout(index)[source]

Get an indexed sub-rollout.

Parameters:

index – generic index of sub-rollout, negative values, slices and iterables are allowed

Returns:

selected subset.

iterate_rollouts()[source]

Iterate over all sub-rollouts of a concatenated rollout.

property length: int

Get the length of the rollout (does not include the final step).

numpy(data_type=None)[source]

Convert data to numpy `ndarray.

Parameters:

data_type – type to return data in. When None is passed, the data type is left unchanged.

observations: Union[ndarray, Tensor]
classmethod pad(rollout: StepSequence, len_to_pad_to: int, pad_value: Union[int, float] = 0)[source]

Add steps to the end of a given rollout. The entires of the steps are filled with pad_value. So far, only numpy arrays and PyTorch tensors are padded (see data_format).

Parameters:
  • rollout – rollout to be padded, modified in-place

  • len_to_pad_to – length of the resulting rollout (without the final state)

  • pad_value – scalar value to pad with

classmethod process_data(rollout: ~pyrado.sampling.step_sequence.StepSequence, fcn: ~typing.Callable, fcn_arg_name: str, fcn_arg_types: ~typing.Union[type, ~typing.Tuple[type]] = <class 'numpy.ndarray'>, include_fields: ~typing.Optional[~typing.Sequence[str]] = None, exclude_fields: ~typing.Optional[~typing.Sequence[str]] = None, **process_fcn_kwargs)[source]

Process all data fields of a rollouts using an arbitrary function. Optionally, some fields can be excluded.

Parameters:
  • rolloutStepSequence holding the data

  • fcn – function (of one remaining input) to used manipulate the data fields, e.g. scipy.filtfilt()

  • fcn_arg_name – sting of the remaining input of process_fcn(), e.g. x for scipy.filtfilt()

  • fcn_arg_types – type or tuple thereof which are expected as input to fcn()

  • include_fields – list of field names to include for processing, pass None to not include everything. If specified, only fields from this selection will be considered

  • exclude_fields – list of field names to exclude from processing, pass None to not exclude anything

  • process_fcn_kwargs – keyword arguments forwarded to process_fcn()

Returns:

new StepSequence instance with processed data

required_fields = {}
rewards: Union[ndarray, Tensor]
property rollout_bounds: ndarray
property rollout_count

Count the number of sub-rollouts inside this step sequence.

property rollout_lengths

Lengths of sub-rollouts.

sample_w_next(batch_size: int) tuple[source]

Sample a random batch of steps from a together with the associated next steps. Similar to split_shuffled_batches with complete_rollouts=False

Parameters:

batch_size – number of steps to sample

Returns:

randomly sampled batch of steps

split_ordered_batches(batch_size: Optional[int] = None, num_batches: Optional[int] = None)[source]

Batch generation. Split the step collection into ordered mini-batches of size batch_size.

Parameters:
  • batch_size – number of steps per batch, i.e. variable number of batches

  • num_batches – number of batches to split the rollout in, i.e. variable batch size

Note

Left out the option to return complete rollouts like for split_shuffled_batches.

split_shuffled_batches(batch_size: int, complete_rollouts: bool = False)[source]

Batch generation. Split the step collection into random mini-batches of size batch_size.

Parameters:
  • batch_size – number of steps per batch

  • complete_rollouts – if complete_rollouts = True, the batches will not contain partial rollouts. However, the size of the returned batches cannot be strictly maintained in this case.

Note

This method is also supposed to be called for recurrent networks, which have a different evaluate() method that recognized where the rollouts end within a batch.

torch(data_type=None)[source]

Convert data to PyTorch tensors.

Parameters:

data_type – type to return data in. When None is passed, the data type is left unchanged.

undiscounted_return() float[source]

Compute the undiscounted return.

Returns:

sum of rewards

check_act_equal(rollout_1: Union[StepSequence, List[StepSequence]], rollout_2: Union[StepSequence, List[StepSequence]], check_applied: bool = False)[source]

Check if the actions of two rollouts or pairwise two rollouts in in two lists are approximately the same

Parameters:
  • rollout_1 – rollouts or list of rollouts

  • rollout_2 – rollouts or list of rollouts

  • check_applied – if True check the actions applied to the environment instead of the commanded ones

Returns:

True if the actions match

discounted_reverse_cumsum(data, gamma: float)[source]

Use a linear filter to compute the reverse discounted cumulative sum.

Note

scipy.signal.lfilter assumes an initialization with 0 by default.

Parameters:
  • data – input data with samples along the 0 axis (e.g. time series)

  • gamma – discount factor

Returns:

cumulative sums for every step

discounted_value(rollout: StepSequence, gamma: float)[source]

Compute the discounted state values for one rollout.

Parameters:
  • rollout – input data

  • gamma – temporal discount factor

Returns:

state values for every time step in the rollout

discounted_values(rollouts: Sequence[StepSequence], gamma: float, data_format: Optional[str] = 'torch')[source]

Compute the discounted state values for multiple rollouts.

Parameters:
  • rollouts – input data

  • gamma – temporal discount factor

  • data_format – data format of the given

Returns:

state values for every time step in the rollouts (concatenated sequence across rollouts)

gae_returns(rollout: StepSequence, gamma: float = 0.99, lamb: float = 0.95)[source]

Compute returns using generalized advantage estimation.

See also

[1] J. Schulmann, P. Moritz, S. Levine, M. Jordan, P. Abbeel, ‘High-Dimensional Continuous Control Using Generalized Advantage Estimation’, ICLR 2016

Parameters:
  • rollout – sequence of steps

  • gamma – temporal discount factor

  • lamb – discount factor

Returns:

estimated advantage

utils

gen_ordered_batch_idcs(batch_size: int, data_size: int, sorted: bool = True)[source]

Helper function for doing SGD on mini-batches that returns the indies for the mini-batch samples

Parameters:
  • batch_size – number of samples in each mini-batch

  • data_size – total number of samples

  • sorted – if False, the order of batches is randomized (but the order within them is preserved)

Returns:

generator for lists of random indices of sub-samples

Usage:

If num_rollouts = 2, data_size = 5 and sorted = False, then the output might be ((2, 3), (0, 1), (4,)) If num_rollouts = 2, data_size = 5 and sorted = True, then the output will be ((0, 1), (2, 3), (4,))

gen_ordered_batches(inp: Iterable, batch_size: int)[source]

Helper function that cuts the input into equal sized chunks

Parameters:
  • inp – iterable input

  • batch_size – number of samples in each mini-batch

Returns:

iterator over the input

gen_shuffled_batch_idcs(batch_size: int, data_size: int)[source]

Helper function for doing SGD on mini-batches that returns the indies for the mini-batch samples

Parameters:
  • batch_size – number of samples in each mini-batch

  • data_size – total number of samples

Returns:

generator for lists of random indices of sub-samples

Usage:

If num_rollouts = 2 and data_size = 5, then the output might be ((0, 3), (2, 1), (4,))

shuffled_ordered_batches(inp: Iterable, batch_size: int)[source]

Helper function that cuts the input into equal sized chunks with the original ordering, but shuffled order among the chunks

Parameters:
  • inp – iterable input

  • batch_size – number of samples in each mini-batch

Returns:

list of randomly ordered mini-batches which within themselves have the original ordering

Module contents