stopping_criteria
predefined_criteria
- class AlwaysStopStoppingCriterion[source]
Bases:
StoppingCriterion
Stopping criterion that is always met.
- class CustomStoppingCriterion(criterion_fn: Callable[[Any], bool], name: Optional[str] = None)[source]
Bases:
StoppingCriterion
Custom stopping criterion that takes an arbitrary callable to evaluate.
Constructor.
- Parameters:
criterion_fn – signature [Algorithm] -> bool; gets evaluated when is_met is called; allows for custom functionality, e.g. if an algorithm requires special treatment; the given algorithm is the same that was passed to the is_met method
name – name of the stopping criterion, used for str(..) and ´repr(..)`
- class IterCountStoppingCriterion(max_iter: int)[source]
Bases:
StoppingCriterion
Uses the iteration number as a stopping criterion, i.e. sets a maximum number of iterations.
Constructor.
- Parameters:
max_iter – maximum number of iterations
- class NeverStopStoppingCriterion[source]
Bases:
StoppingCriterion
Stopping criterion that is never met.
- class SampleCountStoppingCriterion(max_sample_count: int)[source]
Bases:
StoppingCriterion
Uses the sampler count as a stopping criterion, i.e. sets a maximum number samples.
Constructor.
- Parameters:
max_sample_count – maximum sample count
- class ToggleableStoppingCriterion(met: bool = False)[source]
Bases:
StoppingCriterion
Stopping criterion that can be turned on/off from the outside.
Constructor.
- Parameters:
met – initialization of the return value of is_met
rollout_based_criteria
- class ConvergenceStoppingCriterion(convergence_probability_threshold: float = 0.99, num_iter: Optional[int] = None, return_statistic: ReturnStatistic = ReturnStatistic.median, num_lookbacks: int = 1)[source]
Bases:
ReturnStatisticBasedStoppingCriterion
Checks for convergence of the returns for a given statistic that can be specified in the constructor. This is done by fitting a linear regression model to all the previous statistics (stored in a list) and performing a Wald test with a t-distribution of the test statistic (with the null hypothesis that the slope is zero). The resulting p-value is called the probability of convergence and is used for checking if the algorithm has converged.
This procedure can intuitively be explained by measuring “how flat the returns are” in the presence of noise. It has the advantage over just checking how much the return changes that it is independent of the noise on the returns, i.e. no specific threshold has to be hand-tuned.
This criterion has to modes: moving and cumulative. In the moving mode, only the latest M values are used for fitting the linear model, and in the first M - 1 iterations the criterion is treated as not being met. In the cumulative mode, all the previous values are used and only the first iteration is treated as not being met as there have to be at least two points to fit a linear model. While the former is primarily useful for convergence checking for a regular algorithm, the latter is primarily useful for checking convergence of the subroutine in a meta-algorithm as here it is possible that convergence kicks in far at the beginning of the learning process as the environment did not change much (see for example SPDR).
It might be helpful to use this stopping criterion in conjunction with an iterations criterion (IterCountStoppingCriterion) to ensure that the algorithm does not terminate prematurely due to initialization issues. For example, PPO usually takes some iterations to make progress which leads to a flat learning curve that however does not correspond to the algorithm being converged.
Constructor.
- Parameters:
convergence_probability_threshold – threshold of the p-value above which the algorithm is considered to be converged; defaults to 0.99, i.e. a 99% certainty that the data can be explained
num_iter – number of iterations to use for the moving mode. If None, the cumulative mode is used
return_statistic – statistic to compute; defaults to median
num_lookbacks – over how many iterations the statistic should be computed; for example, a value of two means that the rollouts of both the current and the previous iteration will be used for computing the statistic; defaults to one
- class MinReturnStoppingCriterion(return_threshold: float, return_statistic: ReturnStatistic = ReturnStatistic.min)[source]
Bases:
ReturnStatisticBasedStoppingCriterion
Uses any statistic (defaulting to min) of the return of the latest rollout as a stopping criterion and stops if this statistic exceeds a certain threshold.
Constructor.
- Parameters:
return_threshold – return threshold; if the return statistic reaches this threshold, the stopping criterion is met
return_statistic – statistic to compute; defaults to minimum
- class ReturnStatistic(value)[source]
Bases:
Enum
All the different return statistics supported by ReturnStatisticBasedStoppingCriterion.
- max = 1
- mean = 3
- median = 2
- min = 0
- variance = 4
- class ReturnStatisticBasedStoppingCriterion(return_statistic: ReturnStatistic = ReturnStatistic.median, num_lookbacks: int = 1)[source]
Bases:
RolloutBasedStoppingCriterion
Abstract extension of the base RolloutBasedStoppingCriterion class for criteria that are based on a specific statistic of the returns of rollouts of the last iteration.
Constructor.
- Parameters:
return_statistic – statistic to compute; defaults to median
num_lookbacks – over how many iterations the statistic should be computed; for example, a value of two means that the rollouts of both the current and the previous iteration will be used for computing the statistic; defaults to one
- class RolloutBasedStoppingCriterion[source]
Bases:
StoppingCriterion
Abstract extension of the base StoppingCriterion class for criteria that are based on having access to rollouts.
Note
Requires the algorithm to expose a RolloutSavingWrapper via a property sampler.
- is_met(algo) bool [source]
Gets the sampler from the algorithm, checks if it is a RolloutSavingWrapper and forwards the check if of the stopping criterion to _is_met_with_sampler(..).
- Parameters:
algo – instance of Algorithm that has to be evaluated
- Returns:
True if the criterion is met, and False otherwise
stopping_criterion
- class StoppingCriterion[source]
Bases:
ABC
Base class for the stopping criterion. A stopping criterion takes an algorithm (and hence its current state) and decides whether the algorithm should terminate. A common stopping criterion is e.g. reaching a set number of iterations.
- abstract is_met(algo) bool [source]
Checks whether the stopping criterion is met.
Note
Has to be overwritten by sub-classes.
- Parameters:
algo – instance of Algorithm that has to be evaluated
- Returns:
True if the criterion is met, and False otherwise
- reset() NoReturn [source]
Resets the internal state of this stopping criterion. Has to be called when the algorithm is reset. If suppress_next_reset was invoked right before invoking this method, the stopping criterion will not be reset.
Note
Do not overwrite this method directly! Rather implement _reset instead to not bypass the suppression mechanism.