sampling
bootstrapping
- bootstrap_ci(data: ndarray, stat_fcn: Callable, num_reps: int, alpha: float, ci_sides: int, bias_correction: bool = False, studentized: bool = False, seed: Optional[int] = None)[source]
Re-sampling input data using the nonparametric bootstrap method, computing bootstrap replications using stat_fcn and computing a confidence interval on the statistic of interest given by stat_fcn which needs to expect the argument axis (like numpy functions do).
See also
[1] https://projecteuclid.org/download/pdf_1/euclid.ss/1032280214 [2] https://people.csail.mit.edu/tommi/papers/SteJaa-nips03.pdf [3] Cameron & Trivedi, “Microeconometrics: Methods and Applications”, 2005, page 361 [4] http://users.stat.umn.edu/~helwig/notes/bootci-Notes.pdf [5] https://www.diva-portal.org/smash/get/diva2:130905/FULLTEXT01.pdf [6] https://www.ethz.ch/content/dam/ethz/special-interest/math/statistics/sfs/Education/Advanced%20Studies%20in%20Applied%20Statistics/course-material-1719/Nonparametric%20Methods/lecture_2up.pdf [7] https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf
- Parameters:
data – data to bootstrap from (for now only 1D arrays supported)
stat_fcn – function to compute a statistic of interest (e.g. mean, variance) on bootstrap samples
num_reps – number of samples in every bootstrap sample
alpha – determines the confidence level \(1 - \alpha \in [0, 1]\)
ci_sides – one or two-sided confidence interval
axis – axis to compute along in case of 2-dim data
bias_correction – bool to decide if the bias should be subtracted (see [2]). However, the confidence intervals are constructed independent of the bias-correction (see [5, p.7]). The bias-correction can be dangerous in practice. Even though T_bc(D) is less biased than T(D), the bias-corrected estimator may have substantially larger variance. This is due to a possibly higher variability in the estimate of the bias, particularly when computed from small data sets. Other estimates of the bias-correction factor than stat_emp possible, see [4].
studentized – flag to determine if the method based on the t-distribution is used (leads to a wider ci)
seed – value for the random number generators’ seeds, pass None to skip seeding
- Returns:
mean of the bootstrap replications, and the confidence interval
cvar_sampler
- class CVaRSampler(wrapped_sampler, epsilon: float, gamma: float = 1.0, *, min_rollouts: Optional[int] = None, min_steps: Optional[int] = None)[source]
Bases:
SamplerBase
,LoggerAware
Samples rollouts to optimize the CVaR of the discounted return. This is done by sampling more rollouts, and then only using the epsilon-qunatile of them.
Constructor
- Parameters:
wrapped_sampler – the inner sampler used to sample the full data set
epsilon – quantile of rollouts that will be kept
gamma – discount factor to compute the discounted return, default is 1 (no discount)
min_rollouts – minimum number of complete rollouts to sample
min_steps – minimum total number of steps to sample
- reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]
Reset the sampler after changes were made to the environment or the policy, optionally replacing one of them.
Most samplers will be implemented in parallel, so if there are changes to the environment or the policy, they will not automatically propagate to all processes. This method exists as a workaround; call it to force a reinitialization of environment and policy in all subprocesses.
Note that you don’t need to call this if the policy parameters change, since that is to be expected between sampling runs, the sample() method takes care of this on it’s own.
You can use the env and policy parameters to completely replace the stored environment or policy.
- Parameters:
env – new environment to use, or None to keep the old one
policy – new policy to use, or None to keep the old one
- sample() List[StepSequence] [source]
Generate a list of rollouts. This method works exactly as specified in the class description.
- Returns:
sampled rollouts
- select_cvar(rollouts, epsilon: float, gamma: float = 1.0)[source]
Select a subset of rollouts so that their mean discounted return is the CVaR(eps) of the full rollout set.
- Parameters:
rollouts – list of rollouts
epsilon – chosen return quantile
gamma – discount factor to compute the discounted return, default is 1 (no discount)
- Returns:
list of selected rollouts
data_format
- cat_to_format(data: Union[dict, tuple, Sequence], data_format: str)[source]
Concatenate the generic data in the given data format. For dicts, the dict elements are stacked individually. A list of dicts is treated as a dict of lists.
- Parameters:
data – input data
data_format – numpy or torch
- Returns:
numpy.ndarray or torch.Tensor, or dict of these
- new_tuple(nt_type, values)[source]
Create a new tuple of the same type as nt_type. This handles the constructor differences between tuple and NamedTuples
- Parameters:
nt_type – type of tuple
values – values as sequence
- Returns:
new named tuple
- stack_to_format(data: Union[dict, tuple, Sequence], data_format: str)[source]
Stack the generic data in the given data format. For dicts, the dict elements are stacked individually. A list of dicts is treated as a dict of lists.
- Parameters:
data – input data
data_format – ‘numpy’ or ‘torch’
- Returns:
numpy array or PyTorch tensor, or dict of these
hyper_sphere
- sample_from_hyper_sphere_surface(num_dim: int, method: str) Tensor [source]
Sampling from the surface of a multidimensional unit sphere.
See also
[1] G. Marsaglia, “Choosing a Point from the Surface of a Sphere”, Ann. Math. Statist., 1972
- Parameters:
num_dim – number of dimensions of the sphere
method – approach used to acquire the samples
- Returns:
sample with L2-norm equal 1
parallel_evaluation
- eval_domain_params(pool: ~pyrado.sampling.sampler_pool.SamplerPool, env: ~pyrado.environments.sim_base.SimEnv, policy: ~pyrado.policies.base.Policy, params: ~typing.List[~typing.Dict], init_state: ~typing.Optional[~numpy.ndarray] = None, seed: int = <object object>) List[StepSequence] [source]
Evaluate a policy on a multidimensional grid of domain parameters.
- Parameters:
pool – parallel sampler
env – environment to evaluate in
policy – policy to evaluate
params – multidimensional grid of domain parameters
init_state – initial state of the environment which will be fixed if not set to None
- Returns:
list of rollouts
- eval_domain_params_with_segmentwise_reset(pool: SamplerPool, env_sim: SimEnv, policy: Policy, segments_real_all: List[List[StepSequence]], domain_params_ml_all: List[List[dict]], stop_on_done: bool, use_rec: bool) List[List[StepSequence]] [source]
Evaluate a policy for a given set of domain parameters, synchronizing the segments’ initial states with the given target domain segments
- Parameters:
pool – parallel sampler
env_sim – environment to evaluate in
policy – policy to evaluate
segments_real_all – all segments from the target domain rollout
domain_params_ml_all – all domain parameters to evaluate over
stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).
use_rec – True if pre-recorded actions have been used to generate the rollouts
- Returns:
list of segments of rollouts
- eval_nominal_domain(pool: SamplerPool, env: SimEnv, policy: Policy, init_states: List[ndarray]) List[StepSequence] [source]
Evaluate a policy using the nominal (set in the given environment) domain parameters.
- Parameters:
pool – parallel sampler
env – environment to evaluate in
policy – policy to evaluate
init_states – initial states of the environment which will be fixed if not set to None
- Returns:
list of rollouts
- eval_randomized_domain(pool: SamplerPool, env: SimEnv, randomizer: DomainRandomizer, policy: Policy, init_states: List[ndarray]) List[StepSequence] [source]
Evaluate a policy in a randomized domain.
- Parameters:
pool – parallel sampler
env – environment to evaluate in
randomizer – randomizer used to sample random domain instances, inherited from DomainRandomizer
policy – policy to evaluate
init_states – initial states of the environment which will be fixed if not set to None
- Returns:
list of rollouts
parallel_rollout_sampler
- class ParallelRolloutSampler(env, policy, num_workers: int, *, min_rollouts: ~typing.Optional[int] = None, min_steps: ~typing.Optional[int] = None, show_progress_bar: bool = True, seed: int = <object object>)[source]
Bases:
SamplerBase
,Serializable
Class for sampling from multiple environments in parallel
Constructor
- Parameters:
env – environment to sample from
policy – policy to act in the environment (can also be an exploration strategy)
num_workers – number of parallel samplers
min_rollouts – minimum number of complete rollouts to sample
min_steps – minimum total number of steps to sample
show_progress_bar – it True, display a progress bar using tqdm
seed – seed value for the random number generators, pass None for no seeding; defaults to the last seed that was set with pyrado.set_seed
- reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]
Re-initialize the sampler.
- Parameters:
env – the environment which the policy operates
policy – the policy used for sampling
- sample(init_states: Optional[List[ndarray]] = None, domain_params: Optional[List[dict]] = None, eval: bool = False) List[StepSequence] [source]
Do the sampling according to the previously given environment, policy, and number of steps/rollouts.
Note
This method is not thread-safe! See for example the usage of self._sample_count.
- Parameters:
init_states – initial states forw run_map(), pass None (default) to sample from the environment’s initial state space
domain_params – domain parameters for run_map(), pass None (default) to not explicitly set them
eval – pass False if the rollout is executed during training, else True. Forwarded to rollout().
- Returns:
list of sampled rollouts
parameter_exploration_sampler
- class ParameterExplorationSampler(env: Union[SimEnv, EnvWrapper], policy: Policy, num_init_states_per_domain: int, num_domains: int, num_workers: int, seed: Optional[int] = None)[source]
Bases:
Serializable
Parallel sampler for parameter exploration
Constructor
- Parameters:
env – environment to sample from
policy – policy used for sampling
num_init_states_per_domain – number of rollouts to cover the variance over initial states
num_domains – number of rollouts due to the variance over domain parameters
num_workers – number of parallel samplers
seed – seed value for the random number generators, pass None for no seeding; defaults to the last seed that was set with pyrado.set_seed
- property num_rollouts_per_param: int
Get the number of rollouts per policy parameter set.
- reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]
Re-initialize the sampler.
- Parameters:
env – the environment which the policy operates
policy – the policy used for sampling
- sample(param_sets: Tensor, init_states: Optional[List[ndarray]] = None) ParameterSamplingResult [source]
Sample rollouts for a given set of parameters.
Note
This method is not thread-safe! See for example the usage of self._sample_count.
- Parameters:
param_sets – sets of policy parameters
init_states – fixed initial states, pass None to randomly sample initial states
- Returns:
data structure containing the policy parameter sets and the associated rollout data
- class ParameterSample(params: Tensor, rollouts: List[StepSequence])[source]
Bases:
tuple
Stores policy parameters and associated rollouts.
Create new instance of ParameterSample(params, rollouts)
- property mean_undiscounted_return: float
Get the mean of the undiscounted returns over all rollouts.
- property num_rollouts: int
Get the number of rollouts.
- property params
Alias for field number 0
- property rollouts
Alias for field number 1
- class ParameterSamplingResult(samples: Sequence[ParameterSample])[source]
Bases:
Sequence
[ParameterSample
]Result of a parameter exploration sampling run. On one hand, this is a list of ParameterSamples. On the other hand, this allows to query combined tensors of parameters and mean returns.
Constructor
- Parameters:
samples – list of parameter samples
- mean_returns()[source]
Get all parameter sample means return as a N-dim vector, where N is the number of samples.
rollout
- after_rollout_query(env: Env, policy: Policy, rollout: StepSequence) Tuple[bool, Optional[ndarray], Optional[dict]] [source]
Ask the user what to do after a rollout has been animated.
- Parameters:
env – environment used for the rollout
policy – policy used for the rollout
rollout – collected data from the rollout
- Returns:
done flag, initial state, and domain parameters
- rollout(env: Env, policy: Union[Module, Policy, Callable], eval: bool = False, max_steps: Optional[int] = None, reset_kwargs: Optional[dict] = None, render_mode: RenderMode = RenderMode(text=False, video=False, render=False), render_step: int = 1, no_reset: bool = False, no_close: bool = False, record_dts: bool = False, stop_on_done: bool = True, seed: Optional[int] = None, sub_seed: Optional[int] = None, sub_sub_seed: Optional[int] = None) StepSequence [source]
Perform a rollout (i.e. sample a trajectory) in the given environment using given policy.
- Parameters:
env – environment to use (SimEnv or RealEnv)
policy – policy to determine the next action given the current observation. This policy may be wrapped by an exploration strategy.
eval – pass False if the rollout is executed during training, else True. Forwarded to PyTorch Module.
max_steps – maximum number of time steps, if None the environment’s property is used
reset_kwargs – keyword arguments passed to environment’s reset function
render_mode – determines if the user sees an animation, console prints, or nothing
render_step – rendering interval, renders every step if set to 1
no_reset – do not reset the environment before running the rollout
no_close – do not close (and disconnect) the environment after running the rollout
record_dts – flag if the time intervals of different parts of one step should be recorded (for debugging)
stop_on_done – set to false to ignore the environment’s done flag (for debugging)
seed – seed value for the random number generators, pass None for no seeding
:return paths of the observations, actions, rewards, and information about the environment as well as the policy
sampler
- class SamplerBase(*, min_rollouts: Optional[int] = None, min_steps: Optional[int] = None)[source]
Bases:
ABC
A sampler generates a list of rollouts in some unspecified way.
Since the sampling might occur in parallel, there is no way to reliably generate an exact amount of samples. The sampler can however guarantee a minimum amount of samples to be available. The sampler does not discard any samples on it’s own, all sampled data will be returned. There are two ways to regulate the sampling process: 1. the minimum number of rollouts 2. the minimum number of steps in all rollouts
At least one of these bounds must be specified. If both are set, the sampler will only terminate once both are fulfilled.
Constructor
- Parameters:
min_rollouts – minimum number of complete rollouts to sample
min_steps – minimum total number of steps to sample
- abstract reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]
Reset the sampler after changes were made to the environment or the policy, optionally replacing one of them.
Most samplers will be implemented in parallel, so if there are changes to the environment or the policy, they will not automatically propagate to all processes. This method exists as a workaround; call it to force a reinitialization of environment and policy in all subprocesses.
Note that you don’t need to call this if the policy parameters change, since that is to be expected between sampling runs, the sample() method takes care of this on it’s own.
You can use the env and policy parameters to completely replace the stored environment or policy.
- Parameters:
env – new environment to use, or None to keep the old one
policy – new policy to use, or None to keep the old one
- abstract sample() List[StepSequence] [source]
Generate a list of rollouts. This method works exactly as specified in the class description.
- Returns:
sampled rollouts
sampler_pool
- class SamplerPool(num_threads: int)[source]
Bases:
object
A process pool capable of executing operations in parallel. This differs from the multiprocessing.Pool class in that it explicitly incorporates process-local state.
Every parallel function gets a GlobalNamespace object as first argument, which can hold arbitrary worker-local state. This allows for certain optimizations. For example, when the parallel operation requires an object that is expensive to transmit, we can create this object once in each process, store it in the namespace, and then use it in every map function call.
This class also contains additional methods to call a function exactly once in each worker, to setup worker-local state.
- invoke_all(func, *args, **kwargs)[source]
Invoke func on all workers using the same argument values. The return values are collected into a list.
- Parameters:
func – the first argument of func will be a worker-local namespace
- invoke_all_map(func, arglist)[source]
Invoke func(arg) on all workers using one argument from the list for each ordered worker. The length of the argument list must match the number of workers. The first argument of func will be a worker-local namespace. The return values are collected into a list.
- run_collect(n, func, *args, collect_progressbar: Optional[tqdm] = None, min_runs=None, **kwargs) tuple [source]
Collect at least n samples from func, where the number of samples per run can vary.
This is done by calling res, ns = func(G, *args, **kwargs) until the sum of ns exceeds n.
This is intended for situations like reinforcement learning runs. If the environment ends up in an error state, you get less samples per run. To ensure a stable learning behaviour, you can specify the minimum amount of samples to collect before returning.
Since the workers can only check the amount of samples before starting a run, you will likely get more samples than the minimum. No generated samples that are part of a rollout are dropped. However, if some rollouts where sampled that are “too much”, those will be dropped to get seed- determinism across different number of workers.
- Parameters:
n – minimum number of samples to collect
func – sampler function, must be pickleable
args – remaining positional args are passed to the function
collect_progressbar – tdqm progress bar to use; default None
min_runs – optionally specify a minimum amount of runs to be executed before returning
kwargs – remaining keyword args are passed to the function
- Returns:
list of results
- Returns:
total number of samples
- run_map(func, arglist: list, progressbar: Optional[tqdm] = None)[source]
A parallel version of [func(G, arg) for arg in arglist]. There is no deterministic assignment of workers to arglist elements. Optionally runs with progress bar.
- Parameters:
func – mapper function, must be pickleable
arglist – list of function args
progressbar – optional progress bar from the tqdm library
- Returns:
list of results
- set_seed(seed)[source]
Set a deterministic seed on all workers.
Note
This is intended to only be used in legacy evaluation scripts! For new code and everything that should really be reproducible, pass the seed to the sample() method of a ParallelRolloutSampler.
- Parameters:
seed – seed value for the random number generators
sbi_embeddings
- class AllStepsEmbedding(spec: EnvSpec, dim_data: int, len_rollouts: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]
Bases:
Embedding
Embedding for simulation-based inference with time series data which computes the same features of the rollouts states and actions as done in [1]
- [1] F. Ramos, R.C. Possas, D. Fox, “BayesSim: adaptive domain randomization via probabilistic inference for
robotics simulators”, RSS, 2019
Constructor
- Parameters:
spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the state and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
len_rollouts – number of time steps per rollout without considering a potential downsampling later (must be the same for all rollouts)
downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property dim_output: int
Get the dimension of the embeddings output, i.e. its feature dimension.
- name: str = 'asemb'
- requires_target_domain_data: bool = False
- summary_statistic(data: Tensor) Tensor [source]
- Returns the full states of the rollout as a vector i.e.
the time-steps and state dimension are flattend into one dimension.
- Parameters:
data – states and actions of a rollout or segment to be transformed for inference
- Returns:
all states as a flattened vector
- class BayesSimEmbedding(spec: EnvSpec, dim_data: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]
Bases:
Embedding
Embedding for simulation-based inference with time series data which computes the same features of the rollouts states and actions as done in [1]
- [1] F. Ramos, R.C. Possas, D. Fox, “BayesSim: adaptive domain randomization via probabilistic inference for
robotics simulators”, RSS, 2019
Constructor
- Parameters:
spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
downsampling_factor – skip evey downsampling_factor time series sample, no downsampling by default
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property dim_output: int
Get the dimension of the embeddings output, i.e. its feature dimension.
- name: str = 'bsemb'
- requires_target_domain_data: bool = False
- summary_statistic(data: Tensor) Tensor [source]
Computing summary statistics based on approach in [1], see eq. (22). This method guarantees output which has the same size for every trajectory.
- [1] F. Ramos, R.C. Possas, D. Fox, “BayesSim: adaptive domain randomization via probabilistic inference for
robotics simulators”, RSS, 2019
- Parameters:
data – states and actions of a rollout or segment to be transformed for inference
- Returns:
summary statistics of the rollout
- class DeltaStepsEmbedding(spec: EnvSpec, dim_data: int, len_rollouts: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]
Bases:
Embedding
Embedding for simulation-based inference with time series data which returns the change in the states between consecutive time steps of the rollouts
Constructor
- Parameters:
spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the state and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
len_rollouts – number of time steps per rollout without considering a potential downsampling later (must be the same for all rollouts)
downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property dim_output: int
Get the dimension of the embeddings output, i.e. its feature dimension.
- name: str = 'dsemb'
- requires_target_domain_data: bool = False
- class DynamicTimeWarpingEmbedding(spec: EnvSpec, dim_data: int, step_pattern: Optional[Union[str, StepPattern]] = None, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]
Bases:
Embedding
Embedding for simulation-based inference with time series data which uses the dtw-python package to compute the Dynamic Time Warping (DTW) distance between the states as features of the data
Constructor
- Parameters:
spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
step_pattern – method passed to dtw-python for computing the distance. Here the same default as in the dtw-python package is used (“symmetric2”). To for example use the Rabiner-Juang type VI-c unsmoothed recursion step pattern pass dtw.stepPattern.rabinerJuangStepPattern(6, “c”)
downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property dim_output: int
Get the dimension of the embeddings output, i.e. its feature dimension.
- name: str = 'dtwemb'
- requires_target_domain_data: bool = True
- summary_statistic(data: Tensor) Tensor [source]
Returns the dynamic time warping distance between the simulated rollouts” and the real rollouts’ states.
Note
It is necessary to take the mean over all distances since the same function is used to compute the observations (for sbi) form the target domain rollouts. At this point in time there might be only one target domain rollout, thus the target domain rollouts are only compared with themselves, thus yield a scalar distance value.
- Parameters:
data – data tensor containing the simulated states (1st part of the 1st half of the 1st dim) and the real states (1st part of the 2nd half of the 1st dim)
- Returns:
dynamic time warping distance in multi-dim state space
- class Embedding(spec: EnvSpec, dim_data: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]
Bases:
ABC
,Module
Base class for all embeddings used for simulation-based inference with time series data
Note
The features of each rollout are concatenated, and since the inference procedure requires a consistent size of the inputs, it is necessary that all rollouts yield the same number of features, i.e. have equal length!
Constructor
- Parameters:
spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
downsampling_factor – skip evey downsampling_factor time series sample, no downsampling by default
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property device: str
Get the device (CPU or GPU) on which the embedding is stored.
- abstract property dim_output: int
Get the dimension of the embeddings output, i.e. its feature dimension.
- forward(data: Tensor) Tensor [source]
Transforms rollouts into the observations used for likelihood-free inference. Currently a state-representation as well as state-action summary-statistics are available.
- Parameters:
data – packed data of shape [batch_size, num_rollouts, len_time_series, dim_data]
- Returns:
features of the data extracted from the embedding of shape [[batch_size, num_rollouts * dim_feat]
- forward_one_batch(data_batch: Tensor) Tensor [source]
Iterate over all rollouts and compute the features for each rollout separately, then average the features over the rollouts.
- Parameters:
data_batch – data batch of shape [num_rollouts, len_time_series, dim_data]
- Returns:
concatenation of the features for each rollout
- name: str
- static pack(data: Tensor) Tensor [source]
Reshape the data such that the shape is [batch_dim, num_rollouts, data_points_flattened].
- Parameters:
data – un-packed a.k.a. un-flattened data
- Returns:
packed a.k.a. flattened data
- requires_target_domain_data: bool
- class LastStepEmbedding(spec: EnvSpec, dim_data: int, downsampling_factor: int = 1, state_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, act_mask_labels: Optional[Union[Tuple[Union[int, str]], List[Union[int, str]]]] = None, use_cuda: bool = False)[source]
Bases:
Embedding
Embedding for simulation-based inference with time series data which selects the last state of the rollouts
Constructor
- Parameters:
spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the states and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
downsampling_factor – skip evey downsampling_factor time series sample, no downsampling by default
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property dim_output: int
Get the dimension of the embeddings output, i.e. its feature dimension.
- name: str = 'lsemb'
- requires_target_domain_data: bool = False
- class RNNEmbedding(spec: ~pyrado.utils.data_types.EnvSpec, dim_data: int, hidden_size: int, num_recurrent_layers: int, output_size: int, recurrent_network_type: type = <class 'torch.nn.modules.rnn.RNN'>, only_last_output: bool = False, len_rollouts: ~typing.Optional[int] = None, output_nonlin: ~typing.Optional[~typing.Callable] = None, dropout: float = 0.0, init_param_kwargs: ~typing.Optional[dict] = None, downsampling_factor: int = 1, state_mask_labels: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Union[int, str]], ~typing.List[~typing.Union[int, str]]]] = None, act_mask_labels: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Union[int, str]], ~typing.List[~typing.Union[int, str]]]] = None, use_cuda: bool = False, **recurrent_net_kwargs)[source]
Bases:
Embedding
Embedding for simulation-based inference with time series data which uses an recurrent neural network, e.g. RNN, LSTM, or GRU, to compute features of the rollouts
Constructor
- Parameters:
spec – environment specification
dim_data – number of dimensions of one data sample, i.e. one time step. By default, this is the sum of the state and action spaces’ flat dimensions. This number is doubled if the embedding target domain data.
hidden_size – size of the hidden layers (all equal)
num_recurrent_layers – number of equally sized hidden layers
recurrent_network_type – PyTorch recurrent network class, e.g. nn.RNN, nn.LSTM, or nn.GRU
output_size – size of the features at every time step, which are eventually reshaped into a vector
only_last_output – if True, only the last output of the network is used as a feature for sbi, else there will be an output every downsampling_factor time steps. Moreover, if True the constructor does not need to know how long the rollouts are.
len_rollouts – number of time steps per rollout without considering a potential downsampling later (must be the same for all rollouts)
output_nonlin – nonlinearity for output layer
dropout – dropout probability, default = 0 deactivates dropout
init_param_kwargs – additional keyword arguments for the policy parameter initialization
recurrent_net_kwargs – any extra kwargs are passed to the recurrent net’s constructor
downsampling_factor – skip evey downsampling_factor time series sample, the downsampling is done in the base class before calling summary_statistic()
state_mask_labels – list or tuple of integers or stings to select specific states from their space. By default None all states are passed to sbi.
act_mask_labels – list or tuple of integers or stings to select specific actions from their space. By default None all actions are passed to sbi.
use_cuda – True to move the policy to the GPU, False (default) to use the CPU
- property dim_output: int
Get the dimension of the embeddings output, i.e. its feature dimension.
- name: str = 'rnnemb'
- requires_target_domain_data: bool = False
sbi_rollout_sampler
- class RealRolloutSamplerForSBI(env: Env, policy: Policy, embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, stop_on_done: bool = True)[source]
Bases:
RolloutSamplerForSBI
,Serializable
Wrapper to make SimuRLacra’s real environments similar to the sbi simulator
Constructor
- Parameters:
env – environment which the policy operates, in sim-to-real settings this is a real-world device, i.e. RealEnv, but in a sim-to-sim experiment this can be a (randomized) SimEnv
policy – policy used for sampling the rollout
embedding – embedding used for pre-processing the data before (later) passing it to the posterior
num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).
- class RecRolloutSamplerForSBI(rollouts_dir: str, embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, rand_init_rollout: bool = True)[source]
Bases:
RealRolloutSamplerForSBI
,Serializable
Wrapper to yield pre-recorded rollouts similar to the sbi simulator
Constructor
- Parameters:
rollouts_dir – directory where to find the of pre-recorded rollouts
num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
embedding – embedding used for pre-processing the data before (later) passing it to the posterior
len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
rand_init_rollout – if True, chose the first rollout at random, and then cycle through the list
- property num_rollouts: int
Get the number of stored rollouts.
- property ring_idx: int
Get the buffer’s index.
- class RolloutSamplerForSBI(env: Env, policy: Policy, embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, stop_on_done: bool = True)[source]
Bases:
ABC
,Serializable
Wrapper to do enable the sbi simulator instance to make rollouts from SimuRLacra environments as if the environment was a callable that only needs the simulator parameters as inputs
Note
The features of each rollout are concatenated, and since the inference procedure requires a consistent size of the inputs, it is necessary that all rollouts yield the same number of features, i.e. have equal length!
Constructor
- Parameters:
env – environment which the policy operates, in sim-to-real settings this is a real-world device, buy in a sim-to-sim experiment this can be a (randomized) SimEnv. We strip all domain randomization wrappers from this env since we want to randomize it manually here.
policy – policy used for sampling the rollout
embedding – embedding used for pre-processing the data before (later) passing it to the posterior
num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).
- class SimRolloutSamplerForSBI(env: Union[SimEnv, EnvWrapper], policy: Policy, dp_mapping: Mapping[int, str], embedding: Embedding, num_segments: Optional[int] = None, len_segments: Optional[int] = None, stop_on_done: bool = True, rollouts_real: Optional[List[StepSequence]] = None, use_rec_act: bool = True)[source]
Bases:
RolloutSamplerForSBI
,Serializable
Wrapper to make SimuRLacra’s simulation environments usable as simulators for the sbi package
Constructor
- Parameters:
env – environment which the policy operates, which must not be a randomized environment since we want to randomize it manually via the domain parameters coming from the sbi package
policy – policy used for sampling the rollout
dp_mapping – mapping from subsequent integers (starting at 0) to domain parameter names (e.g. mass)
embedding – embedding used for pre-processing the data before (later) passing it to the posterior
num_segments – number of segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
len_segments – length of the segments in which the rollouts are split into. For every segment, the initial state of the simulation is reset, and thus for every set the features of the trajectories are computed separately. Either specify num_segments or len_segments.
stop_on_done – if True, the rollouts are stopped as soon as they hit the state or observation space boundaries. This behavior is save, but can lead to short trajectories which are eventually padded with zeroes. Chose False to ignore the boundaries (dangerous on the real system).
rollouts_real – list of rollouts recorded from the target domain, which are used to sync the simulations’ initial states
use_rec_act – if True the recorded actions form the target domain are used to generate the rollout during simulation (feed-forward). If False there policy is used to generate (potentially) state-dependent actions (feed-back).
- check_domain_params(rollouts: Union[List[StepSequence], StepSequence], domain_param_value: ndarray, domain_param_names: Union[List[str], ValuesView])[source]
Verify if the domain parameters in the rollout are actually the ones commanded.
- Parameters:
rollouts – simulated rollouts or rollout segments
domain_param_value – one set of domain parameters as commanded
domain_param_names – names of the domain parameters to set, i.e. values of the domain parameter mapping
sequences
- sequence_add_init(x_init, iter, dtype=<class 'int'>)[source]
Mathematical sequence: x_n = x_0 * n
- Parameters:
x_init – initial values of the sequence
iter – iteration until the sequence should be evaluated
dtype – data type to cast to (either int of float)
- Returns:
element at the given iteration and array of the whole sequence
- sequence_const(x_init, iter, dtype=<class 'int'>)[source]
Mathematical sequence: x_n = x_0
- Parameters:
x_init – constant values of the sequence
iter – iteration until the sequence should be evaluated
dtype – data type to cast to (either int of float)
- Returns:
element at the given iteration and array of the whole sequence
- sequence_nlog2(x_init, iter, dtype=<class 'int'>)[source]
Mathematical sequence: x_n = x_0 * n * log2(n+2), with log2 being the base 2 logarithm
- Parameters:
x_init – initial values of the sequence
iter – iteration until the sequence should be evaluated
dtype – data type to cast to (either int of float)
- Returns:
element at the given iteration and array of the whole sequence
- sequence_plus_one(x_init, iter, dtype=<class 'int'>)[source]
Mathematical sequence: x_n = x_0 + n
- Parameters:
x_init – initial values of the sequence
iter – iteration until the sequence should be evaluated
dtype – data type to cast to (either int of float)
- Returns:
element at the given iteration and array of the whole sequence
- sequence_rec_double(x_init, iter, dtype=<class 'int'>)[source]
Mathematical sequence: x_n = x_{n-1} * 2
- Parameters:
x_init – initial values of the sequence
iter – iteration until the sequence should be evaluated
dtype – data type to cast to (either int of float)
- Returns:
element at the given iteration and array of the whole sequence
- sequence_rec_sqrt(x_init, iter, dtype=<class 'int'>)[source]
Mathematical sequence: x_n = x_{n-1} * sqrt(n)
- Parameters:
x_init – initial values of the sequence
iter – iteration until the sequence should be evaluated
dtype – data type to cast to (either int of float)
- Returns:
element at the given iteration and array of the whole sequence
- sequence_sqrt(x_init, iter, dtype=<class 'int'>)[source]
Mathematical sequence: x_n = x_0 * sqrt(n)
- Parameters:
x_init – initial values of the sequence
iter – iteration until the sequence should be evaluated
dtype – data type to cast to (either int of float)
- Returns:
element at the given iteration and array of the whole sequence
step_sequence
- class DictIndexProxy(obj: dict, index: int, path: Optional[str] = None)[source]
Bases:
object
Views a slice through a dict of lists or tensors.
- class Step(rollout, index)[source]
Bases:
DictIndexProxy
A single step in a rollout.
This object is a proxy, referring a specific index in the rollout. When querying an attribute from the step, it will try to return the corresponding slice from the rollout. Additionally, one can prefix attributes with next_ to access the value for the next step, i.e. next_observations the observation made at the start of the next step.
Constructor
- Parameters:
rollout – StepSequence object to which this step belongs
index – index of this step in the rollout
- class StepSequence(*, complete: bool = True, rollout_info=None, data_format: Optional[str] = None, done: Optional[ndarray] = None, continuous: bool = True, rollout_bounds=None, rewards: Sequence, observations: Sequence, actions: Sequence, **data)[source]
Bases:
Sequence
[Step
]A sequence of steps.
During the rollout, the values of different variables are recorded. This class provides efficient storage and access for these values. The constructor accepts a list of step entries for each variable. For every step, the list should contain a Tensor/ndarray of values for that step. The shape of these tensors must be the same for all step entries. The passed tensors are then stacked, so that the first dimension is the step count. Some values, like the observations, can have one more element then there are steps to encode the state after the last step. Additionally, the step entries may be dicts to support keyed storage. A list of dicts is converted to a dict of lists, each of which will be regularly stacked. Apart from the variable-based view, the rollout can also be seen as a sequence of steps. Each Step object is a proxy, it’s attributes refer to the respective slice of the corresponding variable. The only required result variable are rewards, observations`, and actions. All other variables are optional. Common optional ones are states and rollout_info.
Note
Storing PyTorch tensors with gradient tracing is NOT supported. The rationale behind this is eager error avoidance. The only reason you would add them is to profit from the optimized slicing, but using that with gradient tracking risks lingering incomplete graphs.
Constructor
- Parameters:
complete – False if the rollout is incomplete, i.e. as part of a mini-batch
rollout_info – data staying constant through the whole episode
data_format – ‘torch’ to use Tensors, ‘numpy’ to use ndarrays. Will use Tensors if any data argument does, else ndarrays
done – boolean ndarray, specifying for each step whether it led to termination. The last step of continuous rollouts, i.e. not mini-batches, is done if complete is True.
continuous – true if the steps form one continuous sequence.
rewards – sequence of reward values, determines sequence length
observations – sequence of observation values, the length must be len(rewards) + 1
actions – sequence of action values, the length must be len(rewards)
data – additional data lists, their length must be len(rewards) or len(rewards) + 1
- actions: Union[ndarray, Tensor]
- add_data(name: str, value=None, item_shape: Optional[tuple] = None, with_after_last: bool = False)[source]
Add a new data field to the step sequence.
- Parameters:
name – string for the name
value – the data
item_shape – shape to store the data in
with_after_last – True if there is one more element than the length (e.g. last observation)
- classmethod concat(parts: Sequence[StepSequence], data_format: Optional[str] = None, truncate_last: bool = True)[source]
Concatenate multiple step sequences into one, truncating the last observation.
- Parameters:
parts – batch of sequences to concatenate
data_format – torch to use Tensors, numpy to use ndarrays, None to choose automatically
truncate_last – remove the last step from each part, highly recommended to be True
- Returns:
concatenated sequence of Steps
- convert(data_format: str, data_type=None)[source]
Convert data to specified format.
- Parameters:
data_format – torch to use Tensors, numpy to use ndarrays
data_type – optional torch/numpy dtype for data. When None is passed, the data type is left unchanged.
- property data_format: str
Get the name of data format (‘torch’ or ‘numpy’).
- property data_names: Sequence[str]
Get the list of data attribute names.
- discounted_return(gamma: float) -> (<class 'torch.Tensor'>, <class 'numpy.ndarray'>)[source]
Compute the discounted return.
- Parameters:
gamma – temporal discount factor
- Returns:
exponentially weighted sum of rewards
- classmethod from_pandas(df: DataFrame, env_spec: EnvSpec, continuous: bool = True, task: Optional[Task] = None) StepSequence [source]
Generate a StepSequence object from a Pandas DataFrame instance. Not all data fields are supported. The fields ‘rewards’ is mandatory.
- Parameters:
df – Pandas DataFrame holding the data in 1-dim arrays
env_spec – environment specifications which labels are used to slice the DataFrame
continuous – True if the rollout to be reconstructed was continuous
task – task containing the reward function(s) that can be used to recompute the rewards from the recorded observations and actions
- Returns:
new StepSequence
- get_data_values(name: str, truncate_last: bool = False)[source]
Return the data tensor stored under the given name.
- Parameters:
name – data name
truncate_last – True to truncate the length+1 entry if present
- get_rollout(index)[source]
Get an indexed sub-rollout.
- Parameters:
index – generic index of sub-rollout, negative values, slices and iterables are allowed
- Returns:
selected subset.
- property length: int
Get the length of the rollout (does not include the final step).
- numpy(data_type=None)[source]
Convert data to numpy `ndarray.
- Parameters:
data_type – type to return data in. When None is passed, the data type is left unchanged.
- observations: Union[ndarray, Tensor]
- classmethod pad(rollout: StepSequence, len_to_pad_to: int, pad_value: Union[int, float] = 0)[source]
Add steps to the end of a given rollout. The entires of the steps are filled with pad_value. So far, only numpy arrays and PyTorch tensors are padded (see data_format).
- Parameters:
rollout – rollout to be padded, modified in-place
len_to_pad_to – length of the resulting rollout (without the final state)
pad_value – scalar value to pad with
- classmethod process_data(rollout: ~pyrado.sampling.step_sequence.StepSequence, fcn: ~typing.Callable, fcn_arg_name: str, fcn_arg_types: ~typing.Union[type, ~typing.Tuple[type]] = <class 'numpy.ndarray'>, include_fields: ~typing.Optional[~typing.Sequence[str]] = None, exclude_fields: ~typing.Optional[~typing.Sequence[str]] = None, **process_fcn_kwargs)[source]
Process all data fields of a rollouts using an arbitrary function. Optionally, some fields can be excluded.
- Parameters:
rollout – StepSequence holding the data
fcn – function (of one remaining input) to used manipulate the data fields, e.g. scipy.filtfilt()
fcn_arg_name – sting of the remaining input of process_fcn(), e.g. x for scipy.filtfilt()
fcn_arg_types – type or tuple thereof which are expected as input to fcn()
include_fields – list of field names to include for processing, pass None to not include everything. If specified, only fields from this selection will be considered
exclude_fields – list of field names to exclude from processing, pass None to not exclude anything
process_fcn_kwargs – keyword arguments forwarded to process_fcn()
- Returns:
new StepSequence instance with processed data
- required_fields = {}
- rewards: Union[ndarray, Tensor]
- property rollout_bounds: ndarray
- property rollout_count
Count the number of sub-rollouts inside this step sequence.
- property rollout_lengths
Lengths of sub-rollouts.
- sample_w_next(batch_size: int) tuple [source]
Sample a random batch of steps from a together with the associated next steps. Similar to split_shuffled_batches with complete_rollouts=False
- Parameters:
batch_size – number of steps to sample
- Returns:
randomly sampled batch of steps
- split_ordered_batches(batch_size: Optional[int] = None, num_batches: Optional[int] = None)[source]
Batch generation. Split the step collection into ordered mini-batches of size batch_size.
- Parameters:
batch_size – number of steps per batch, i.e. variable number of batches
num_batches – number of batches to split the rollout in, i.e. variable batch size
Note
Left out the option to return complete rollouts like for split_shuffled_batches.
- split_shuffled_batches(batch_size: int, complete_rollouts: bool = False)[source]
Batch generation. Split the step collection into random mini-batches of size batch_size.
- Parameters:
batch_size – number of steps per batch
complete_rollouts – if complete_rollouts = True, the batches will not contain partial rollouts. However, the size of the returned batches cannot be strictly maintained in this case.
Note
This method is also supposed to be called for recurrent networks, which have a different evaluate() method that recognized where the rollouts end within a batch.
- check_act_equal(rollout_1: Union[StepSequence, List[StepSequence]], rollout_2: Union[StepSequence, List[StepSequence]], check_applied: bool = False)[source]
Check if the actions of two rollouts or pairwise two rollouts in in two lists are approximately the same
- Parameters:
rollout_1 – rollouts or list of rollouts
rollout_2 – rollouts or list of rollouts
check_applied – if True check the actions applied to the environment instead of the commanded ones
- Returns:
True if the actions match
- discounted_reverse_cumsum(data, gamma: float)[source]
Use a linear filter to compute the reverse discounted cumulative sum.
Note
scipy.signal.lfilter assumes an initialization with 0 by default.
- Parameters:
data – input data with samples along the 0 axis (e.g. time series)
gamma – discount factor
- Returns:
cumulative sums for every step
- discounted_value(rollout: StepSequence, gamma: float)[source]
Compute the discounted state values for one rollout.
- Parameters:
rollout – input data
gamma – temporal discount factor
- Returns:
state values for every time step in the rollout
- discounted_values(rollouts: Sequence[StepSequence], gamma: float, data_format: Optional[str] = 'torch')[source]
Compute the discounted state values for multiple rollouts.
- Parameters:
rollouts – input data
gamma – temporal discount factor
data_format – data format of the given
- Returns:
state values for every time step in the rollouts (concatenated sequence across rollouts)
- gae_returns(rollout: StepSequence, gamma: float = 0.99, lamb: float = 0.95)[source]
Compute returns using generalized advantage estimation.
See also
[1] J. Schulmann, P. Moritz, S. Levine, M. Jordan, P. Abbeel, ‘High-Dimensional Continuous Control Using Generalized Advantage Estimation’, ICLR 2016
- Parameters:
rollout – sequence of steps
gamma – temporal discount factor
lamb – discount factor
- Returns:
estimated advantage
utils
- gen_ordered_batch_idcs(batch_size: int, data_size: int, sorted: bool = True)[source]
Helper function for doing SGD on mini-batches that returns the indies for the mini-batch samples
- Parameters:
batch_size – number of samples in each mini-batch
data_size – total number of samples
sorted – if False, the order of batches is randomized (but the order within them is preserved)
- Returns:
generator for lists of random indices of sub-samples
- Usage:
If num_rollouts = 2, data_size = 5 and sorted = False, then the output might be ((2, 3), (0, 1), (4,)) If num_rollouts = 2, data_size = 5 and sorted = True, then the output will be ((0, 1), (2, 3), (4,))
- gen_ordered_batches(inp: Iterable, batch_size: int)[source]
Helper function that cuts the input into equal sized chunks
- Parameters:
inp – iterable input
batch_size – number of samples in each mini-batch
- Returns:
iterator over the input
- gen_shuffled_batch_idcs(batch_size: int, data_size: int)[source]
Helper function for doing SGD on mini-batches that returns the indies for the mini-batch samples
- Parameters:
batch_size – number of samples in each mini-batch
data_size – total number of samples
- Returns:
generator for lists of random indices of sub-samples
- Usage:
If num_rollouts = 2 and data_size = 5, then the output might be ((0, 3), (2, 1), (4,))
- shuffled_ordered_batches(inp: Iterable, batch_size: int)[source]
Helper function that cuts the input into equal sized chunks with the original ordering, but shuffled order among the chunks
- Parameters:
inp – iterable input
batch_size – number of samples in each mini-batch
- Returns:
list of randomly ordered mini-batches which within themselves have the original ordering