How to create an algorithm -------------------------- This file provides a scheme of how to write an algorithm in Pyrado. We differentiate between step-based and episodic algorithms. In general, the first type inherits from `Algorithm` and randomizes the actions, whereas the second type randomizes the policy parameters and inherits from `ParameterExploring` (which also inherits from `Algorithm`). Start by creating a new class which inherits from `Algorithm` or `ParameterExploring`. .. code-block:: python from typing import Sequence import pyrado from pyrado.algorithms.base import Algorithm from pyrado.environments.base import Env from pyrado.logger.step import StepLogger from pyrado.sampling.step_sequence import StepSequence class MFA(Algorithm): """ Michael's Fancy Algorithm (MFA) TODO a detailed description ... """ name: str = 'mfo' # TODO define an acronym that is used for saving and loading (lower case string recommended) def __init__(self, # required args for every algorithm (workarounds are possible) save_dir: str, env: Env, policy: Policy, max_iter: int, # TODO args specific to your algorithm ... logger: StepLogger = None): # Call Algorithm's constructor. This instantiates the default step logger which works in most cases. # If you want a logger that only logs every N steps, check out `pyrado/algorithms/sac.py` super().__init__(save_dir, max_iter, policy, logger) # TODO store the inputs # TODO create an exploration strategy # TODO create a sampler # TODO set up an optimizer Your algorithm must implement a `step()` function that performs a single iteration of the algorithm. This includes collecting the data, updating the parameters, and adding the metrics of interest to the logger. Does not update the `curr_iter` attribute since this is done in the `train()` method of the base function. If this algorithm is run as a subroutine of a meta-algorithm, `meta_info` contains a dict of information about the current iteration of the meta-algorithm, else leave it to `None`. For examples of meta-algorithms see `algorithms/spota.py` or `algorithms/bayern.py`. .. code-block:: python def step(self, snapshot_mode: str, meta_info: dict = None): # TODO sample data (e.g. steps or rollouts) using the sampler created in `__init__()` # TODO compute metrics and add them to the logger # TODO pass the sampled data to the algorithm's `update()` method # TODO save snapshot data Moreover, it is recommended to do the parameter update into a separate `update()` method. Doing this is not strictly necessary and `update()` can have different inputs. Usually it is a batch of rollouts. .. code-block:: python def update(self, rollouts: Sequence[StepSequence]): # TODO compute stuff from the agent's experience, i.e. the rollouts # TODO apply some nasty hacks to make the theory work (looking at you gradient clipping) # TODO update the parameters of the policy and optionally the exploration strategy # TODO optionally add some logging In most cases you also want to override the `reset()` function. The base version resets the exploration strategy, the iteration counter, and optionally sets a random seed, so be sure to call it. In most cases there are more things to reset (e.g. the sampler). .. code-block:: python def reset(self, seed: int = None): # Call the Algorithm's reset function super().reset(seed) # TODO Re-initialize sampler in case env or policy changed # TODO reset variables custom to your algorithm You can override `stopping_criterion_met()` to specify additional stopping criteria. Any subclass of `Algorithm` will always stop if the `curr_iter` counter is equal to `max_iter`. .. code-block:: python def stopping_criterion_met(self) -> bool: return False # TODO specify a stopping criterion for your algorithm The following function is called for saving (every step). The base class `Algorithm` saves the policy. .. code-block:: python def save_snapshot(self, meta_info: dict = None, algo_name: str = 'algo'): # Call Algorithm's save method super().save_snapshot(meta_info, algo_name) # TODO save what needs to be saved (specific to your algorithm) This tutorial is not meant to be exhaustive, but to give you an intuition what needs to be done. I suggest to have a look at the existing algorithms and get some inspiration.