Quantcast

Galaxy+Slurm (with elastic cluster) error: "Job output not returned from cluster"

classic Classic list List threaded Threaded
2 messages Options
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Galaxy+Slurm (with elastic cluster) error: "Job output not returned from cluster"

Marco Tangaro
Dear all,
I've a issue using Galaxy with elastic cluster support.
It is provided by integrating SLURM and a worker node is added as soon as jobs are submitted through the Galaxy portal.

When a job is submitted, the node takes some minutes to be configured.
After ~7 minutes, Galaxy give me a failure message "Job output not returned from cluster".
On the contrary if the node is already up, everything is ok.
I tried only a very simple job getting the UCSC human genome.
I'm using the master galaxy branch with postgresq+nginx+uwsgi+proftpd.

Here is my job_conf.xml configuration:
https://gist.github.com/mtangaro/c0528c3d9a7b44b3bab35dbd947f2c81

I'm not a slurm expert. I went through the mailing list archive, but I did not solved the issue.
Thanks a lot for your help.
Marco

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Galaxy+Slurm (with elastic cluster) error: "Job output not returned from cluster"

Enis Afgan-3
Hi Marco,
You could do something similar to what we're doing with CloudMan: add a placeholder node to Slurm. This will make Slurm accept a job but not run it until a node is added that is actually capable of executing it. The placeholder node is just a definition in slurm.conf that has state=future. Check out slurm.conf we're using: https://github.com/galaxyproject/cloudman/blob/master/cm/conftemplates/slurm.conf.default#L37

Cheers,
Enis

On Tue, Nov 1, 2016 at 1:04 PM, Marco Tangaro <[hidden email]> wrote:
Dear all,
I've a issue using Galaxy with elastic cluster support.
It is provided by integrating SLURM and a worker node is added as soon as jobs are submitted through the Galaxy portal.

When a job is submitted, the node takes some minutes to be configured.
After ~7 minutes, Galaxy give me a failure message "Job output not returned from cluster".
On the contrary if the node is already up, everything is ok.
I tried only a very simple job getting the UCSC human genome.
I'm using the master galaxy branch with postgresq+nginx+uwsgi+proftpd.

Here is my job_conf.xml configuration:
https://gist.github.com/mtangaro/c0528c3d9a7b44b3bab35dbd947f2c81

I'm not a slurm expert. I went through the mailing list archive, but I did not solved the issue.
Thanks a lot for your help.
Marco

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Loading...