SLURM timeouts

classic Classic list List threaded Threaded
1 message Options
| Threaded
Open this post in threaded view

SLURM timeouts


I'm running a fork of galaxy-central latest_2014.08.11. The instance is
configured to run jobs on a SLURM cluster. The problem is that the SLURM
controller sometimes becomes too busy which results in errors like: INFO 2014-10-23 21:10:47,768 (1813/22896754) job
left DRM queue with following message: code 1: slurm_load_jobs error:
Socket timed out on send/recv operation,job_id: 22896754

This causes Galaxy to assume that the job has failed: ERROR 2014-10-23 21:10:47,881 (1813/22896754) Job
output not returned from cluster: [Errno 2] No such file or directory:

This happens with both and Is there any way to handle this
condition in Galaxy?


Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

To search Galaxy mailing lists use the unified search at: