nextflow-io/nextflow

Ability to use a time delay in 'errorStrategy' to re-submit scheduler jobs

Open

#721 opened on May 30, 2018

View on GitHub
 (17 comments) (0 reactions) (0 assignees)Groovy (3,382 stars) (784 forks)batch import
help wanted

Description

Our HPC system is giving large amounts of errors like this:

Error executing process > 'sambamba_dedup_flagstat (1)'

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  qsub -terse .command.run

Command exit status:
  1

Command output:
  Unable to run job: failed receiving gdi request response for mid=1 (got syncron message receive timeout error)..
  Exiting.

This is the result of the scheduler (SGE) being unable to submit jobs or respond to queries. If I try again later, for example after 5 minutes, the job may be successfully scheduled.

The errorStrategy described here shows how to increase resource allotment on error.

However, in this case I need to wait a minute or two before attempt to submit the job again.

Is it possible to accomplish this or get this feature in Nextflow? An 'errorStrategy' of 'retry in 5 minutes'.

Contributor guide