Skip to content

Checkpoint feature

Implement a checkpoint feature that allows to continue computation.

Possible things to consider:

  • a cluster automatically restarts a job -> automatic detection of a checkpoint and start there
  • but the same job can be run multiple times
  • we need to include intermediate results and data in addition to the models/pipeline
  • only some models/pipeline configurations allow checkpoints

Maybe we can reuse the backup_dir of #9 (closed).

Related to #4 and #15.

Edited by User expired