stopping_criteria

predefined_criteria

class AlwaysStopStoppingCriterion[source]

Bases: StoppingCriterion

Stopping criterion that is always met.

is_met(algo) bool[source]

Checks whether the stopping criterion is met.

Note

Has to be overwritten by sub-classes.

Parameters:

algo – instance of Algorithm that has to be evaluated

Returns:

True if the criterion is met, and False otherwise

class CustomStoppingCriterion(criterion_fn: Callable[[Any], bool], name: Optional[str] = None)[source]

Bases: StoppingCriterion

Custom stopping criterion that takes an arbitrary callable to evaluate.

Constructor.

Parameters:
  • criterion_fn – signature [Algorithm] -> bool; gets evaluated when is_met is called; allows for custom functionality, e.g. if an algorithm requires special treatment; the given algorithm is the same that was passed to the is_met method

  • name – name of the stopping criterion, used for str(..) and ´repr(..)`

is_met(algo) bool[source]

Checks whether the stopping criterion is met.

Note

Has to be overwritten by sub-classes.

Parameters:

algo – instance of Algorithm that has to be evaluated

Returns:

True if the criterion is met, and False otherwise

class IterCountStoppingCriterion(max_iter: int)[source]

Bases: StoppingCriterion

Uses the iteration number as a stopping criterion, i.e. sets a maximum number of iterations.

Constructor.

Parameters:

max_iter – maximum number of iterations

is_met(algo) bool[source]

Checks whether the stopping criterion is met.

Note

Has to be overwritten by sub-classes.

Parameters:

algo – instance of Algorithm that has to be evaluated

Returns:

True if the criterion is met, and False otherwise

class NeverStopStoppingCriterion[source]

Bases: StoppingCriterion

Stopping criterion that is never met.

is_met(algo) bool[source]

Checks whether the stopping criterion is met.

Note

Has to be overwritten by sub-classes.

Parameters:

algo – instance of Algorithm that has to be evaluated

Returns:

True if the criterion is met, and False otherwise

class SampleCountStoppingCriterion(max_sample_count: int)[source]

Bases: StoppingCriterion

Uses the sampler count as a stopping criterion, i.e. sets a maximum number samples.

Constructor.

Parameters:

max_sample_count – maximum sample count

is_met(algo) bool[source]

Checks whether the stopping criterion is met.

Note

Has to be overwritten by sub-classes.

Parameters:

algo – instance of Algorithm that has to be evaluated

Returns:

True if the criterion is met, and False otherwise

class ToggleableStoppingCriterion(met: bool = False)[source]

Bases: StoppingCriterion

Stopping criterion that can be turned on/off from the outside.

Constructor.

Parameters:

met – initialization of the return value of is_met

is_met(algo=None) bool[source]

Checks whether the stopping criterion is met.

Note

Has to be overwritten by sub-classes.

Parameters:

algo – instance of Algorithm that has to be evaluated

Returns:

True if the criterion is met, and False otherwise

off() NoReturn[source]
on() NoReturn[source]
toggle() bool[source]

rollout_based_criteria

class ConvergenceStoppingCriterion(convergence_probability_threshold: float = 0.99, num_iter: Optional[int] = None, return_statistic: ReturnStatistic = ReturnStatistic.median, num_lookbacks: int = 1)[source]

Bases: ReturnStatisticBasedStoppingCriterion

Checks for convergence of the returns for a given statistic that can be specified in the constructor. This is done by fitting a linear regression model to all the previous statistics (stored in a list) and performing a Wald test with a t-distribution of the test statistic (with the null hypothesis that the slope is zero). The resulting p-value is called the probability of convergence and is used for checking if the algorithm has converged.

This procedure can intuitively be explained by measuring “how flat the returns are” in the presence of noise. It has the advantage over just checking how much the return changes that it is independent of the noise on the returns, i.e. no specific threshold has to be hand-tuned.

This criterion has to modes: moving and cumulative. In the moving mode, only the latest M values are used for fitting the linear model, and in the first M - 1 iterations the criterion is treated as not being met. In the cumulative mode, all the previous values are used and only the first iteration is treated as not being met as there have to be at least two points to fit a linear model. While the former is primarily useful for convergence checking for a regular algorithm, the latter is primarily useful for checking convergence of the subroutine in a meta-algorithm as here it is possible that convergence kicks in far at the beginning of the learning process as the environment did not change much (see for example SPDR).

It might be helpful to use this stopping criterion in conjunction with an iterations criterion (IterCountStoppingCriterion) to ensure that the algorithm does not terminate prematurely due to initialization issues. For example, PPO usually takes some iterations to make progress which leads to a flat learning curve that however does not correspond to the algorithm being converged.

Constructor.

Parameters:
  • convergence_probability_threshold – threshold of the p-value above which the algorithm is considered to be converged; defaults to 0.99, i.e. a 99% certainty that the data can be explained

  • num_iter – number of iterations to use for the moving mode. If None, the cumulative mode is used

  • return_statistic – statistic to compute; defaults to median

  • num_lookbacks – over how many iterations the statistic should be computed; for example, a value of two means that the rollouts of both the current and the previous iteration will be used for computing the statistic; defaults to one

class MinReturnStoppingCriterion(return_threshold: float, return_statistic: ReturnStatistic = ReturnStatistic.min)[source]

Bases: ReturnStatisticBasedStoppingCriterion

Uses any statistic (defaulting to min) of the return of the latest rollout as a stopping criterion and stops if this statistic exceeds a certain threshold.

Constructor.

Parameters:
  • return_threshold – return threshold; if the return statistic reaches this threshold, the stopping criterion is met

  • return_statistic – statistic to compute; defaults to minimum

class ReturnStatistic(value)[source]

Bases: Enum

All the different return statistics supported by ReturnStatisticBasedStoppingCriterion.

max = 1
mean = 3
median = 2
min = 0
variance = 4
class ReturnStatisticBasedStoppingCriterion(return_statistic: ReturnStatistic = ReturnStatistic.median, num_lookbacks: int = 1)[source]

Bases: RolloutBasedStoppingCriterion

Abstract extension of the base RolloutBasedStoppingCriterion class for criteria that are based on a specific statistic of the returns of rollouts of the last iteration.

Constructor.

Parameters:
  • return_statistic – statistic to compute; defaults to median

  • num_lookbacks – over how many iterations the statistic should be computed; for example, a value of two means that the rollouts of both the current and the previous iteration will be used for computing the statistic; defaults to one

class RolloutBasedStoppingCriterion[source]

Bases: StoppingCriterion

Abstract extension of the base StoppingCriterion class for criteria that are based on having access to rollouts.

Note

Requires the algorithm to expose a RolloutSavingWrapper via a property sampler.

is_met(algo) bool[source]

Gets the sampler from the algorithm, checks if it is a RolloutSavingWrapper and forwards the check if of the stopping criterion to _is_met_with_sampler(..).

Parameters:

algo – instance of Algorithm that has to be evaluated

Returns:

True if the criterion is met, and False otherwise

stopping_criterion

class StoppingCriterion[source]

Bases: ABC

Base class for the stopping criterion. A stopping criterion takes an algorithm (and hence its current state) and decides whether the algorithm should terminate. A common stopping criterion is e.g. reaching a set number of iterations.

abstract is_met(algo) bool[source]

Checks whether the stopping criterion is met.

Note

Has to be overwritten by sub-classes.

Parameters:

algo – instance of Algorithm that has to be evaluated

Returns:

True if the criterion is met, and False otherwise

reset() NoReturn[source]

Resets the internal state of this stopping criterion. Has to be called when the algorithm is reset. If suppress_next_reset was invoked right before invoking this method, the stopping criterion will not be reset.

Note

Do not overwrite this method directly! Rather implement _reset instead to not bypass the suppression mechanism.

suppress_next_reset() NoReturn[source]

Suppresses the next reset call as described in reset.

Module contents