nextflow-io/nextflow
View on GitHubAbility to use a time delay in 'errorStrategy' to re-submit scheduler jobs
Open
#721 opened on May 30, 2018
help wanted
Description
Our HPC system is giving large amounts of errors like this:
Error executing process > 'sambamba_dedup_flagstat (1)'
Caused by:
Failed to submit process to grid scheduler for execution
Command executed:
qsub -terse .command.run
Command exit status:
1
Command output:
Unable to run job: failed receiving gdi request response for mid=1 (got syncron message receive timeout error)..
Exiting.
This is the result of the scheduler (SGE) being unable to submit jobs or respond to queries. If I try again later, for example after 5 minutes, the job may be successfully scheduled.
The errorStrategy described here shows how to increase resource allotment on error.
However, in this case I need to wait a minute or two before attempt to submit the job again.
Is it possible to accomplish this or get this feature in Nextflow? An 'errorStrategy' of 'retry in 5 minutes'.