algorithms

test_algorithms

ex_dir(tmpdir)[source]

test_actor_critic(ex_dir, env: SimEnv, policy: Policy, algo, algo_hparam, vfcn_type, use_cuda)[source]

test_arpl(ex_dir, env)[source]

test_arpl_observation(env)[source]

test_arpl_wrappers(env)[source]

test_param_expl(ex_dir, env, policy, algo_class, algo_hparam)[source]

test_rff_regression(ex_dir, num_feat_per_dim: int, loss_fcn: Callable, algo_hparam: dict)[source]

test_sac_fill_memory_with_trained_policy(ex_dir, env, fill_with_trained_policy)[source]

test_snapshots_notmeta(ex_dir, env: SimEnv, policy, algo_class, algo_hparam)[source]

test_soft_update(env, policy: Policy)[source]

test_time_series_prediction(ex_dir, dataset_ts, env: MockEnv, policy: Policy, windowed: bool, cascaded: bool)[source]

test_training_parameter_exploring(ex_dir, env: SimEnv, algo, algo_hparam)[source]

test_meta

test_stopping_criteria

class ExposingReturnStatisticBasedStoppingCriterion(return_statistic: ReturnStatistic = ReturnStatistic.median, num_lookbacks: int = 1)[source]

Bases: ReturnStatisticBasedStoppingCriterion

Constructor.

Parameters:

return_statistic – statistic to compute; defaults to median
num_lookbacks – over how many iterations the statistic should be computed; for example, a value of two means that the rollouts of both the current and the previous iteration will be used for computing the statistic; defaults to one

class MockSampler(step_sequences: Optional[List[StepSequence]] = None)[source]

Bases: SamplerBase

Constructor

Parameters:

min_rollouts – minimum number of complete rollouts to sample
min_steps – minimum total number of steps to sample

reinit(env: Optional[Env] = None, policy: Optional[Policy] = None)[source]

Reset the sampler after changes were made to the environment or the policy, optionally replacing one of them.

Most samplers will be implemented in parallel, so if there are changes to the environment or the policy, they will not automatically propagate to all processes. This method exists as a workaround; call it to force a reinitialization of environment and policy in all subprocesses.

Note that you don’t need to call this if the policy parameters change, since that is to be expected between sampling runs, the sample() method takes care of this on it’s own.

You can use the env and policy parameters to completely replace the stored environment or policy.

Parameters:

env – new environment to use, or None to keep the old one
policy – new policy to use, or None to keep the old one

sample() → List[StepSequence][source]

Generate a list of rollouts. This method works exactly as specified in the class description.

Returns:: sampled rollouts

test_criterion_always()[source]

test_criterion_combination_and()[source]

test_criterion_combination_or()[source]

test_criterion_custom(is_met_expected)[source]

test_criterion_iter_count_equal()[source]

test_criterion_iter_count_higher()[source]

test_criterion_iter_count_lower()[source]

test_criterion_never()[source]

test_criterion_return_statistic_based_check_min(statistic, expected)[source]

test_criterion_rollout_based_convergence_equal()[source]

test_criterion_rollout_based_convergence_higher()[source]

test_criterion_rollout_based_convergence_history_filling()[source]

test_criterion_rollout_based_convergence_lower()[source]

test_criterion_rollout_based_convergence_none()[source]

test_criterion_rollout_based_convergence_regress_constant_one()[source]

test_criterion_rollout_based_convergence_regress_constant_zero()[source]

test_criterion_rollout_based_convergence_regress_not_constant()[source]

test_criterion_rollout_based_convergence_regress_random()[source]

test_criterion_rollout_based_convergence_subset(num_iter, expected)[source]

test_criterion_rollout_based_min_min_return_equal()[source]

test_criterion_rollout_based_min_min_return_higher()[source]

test_criterion_rollout_based_min_min_return_lower()[source]

test_criterion_rollout_based_no_sampler()[source]

test_criterion_rollout_based_wrong_sampler()[source]

test_criterion_sample_count_equal()[source]

test_criterion_sample_count_higher()[source]

test_criterion_sample_count_lower()[source]

test_criterion_toggleable_init_default()[source]

test_criterion_toggleable_init_met()[source]

test_criterion_toggleable_init_not_met()[source]

test_criterion_toggleable_set_on_off_init_met()[source]

test_criterion_toggleable_set_on_off_init_not_met()[source]

test_magic_function_implementation_and()[source]

test_magic_function_implementation_or()[source]

test_utils

test_action_statistics(env: SimEnv, policy: Policy)[source]

test_adr_reward_generator(env)[source]

test_adr_reward_generator_save_load(env, tmp_path)[source]

test_get_grad_via_torch()[source]

test_until_thold_exceeded(thold, max_iter)[source]

algorithms

test_algorithms

test_meta

test_stopping_criteria

test_utils

Module contents