Jobs deleted staying in 'dr' status

classic Classic list List threaded Threaded
4 messages Options
| Threaded
Open this post in threaded view
|

Jobs deleted staying in 'dr' status

Mathieu Bahin
Hi all,

We have been developing our own Galaxy instance for a while now. We have a cluster on which the job are sent to be executed, it is managed through SGE. Usually, communication between SGE and DRMAA is ok and we don't have any problem with that.

When a job is deleted by the user, most of the times, the job disappears but sometimes, we don't know why, the job stays and has the status 'dr' within SGE. If we don't kill it 'manually', it stays forever. It is not always the same tools which produces this error.
Have you any idea why how manage it ?

We have another problem, a display one.
Since the last update, we experienced problem with the history. Launching a job, it was very long to appear in the history and we had to refresh.
We got a more recent version of firefox and it seems that we don't have the problem anymore. But some user of ours can't update their firefox. Are you aware of that problem ?

Cheers,
Genouest Platform, Rennes - FRANCE


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: Jobs deleted staying in 'dr' status

Peter Cock
On Thu, Sep 12, 2013 at 2:01 PM, Mathieu Bahin <[hidden email]> wrote:

> Hi all,
>
> We have been developing our own Galaxy instance for a while now. We have a
> cluster on which the job are sent to be executed, it is managed through SGE.
> Usually, communication between SGE and DRMAA is ok and we don't have any
> problem with that.
>
> When a job is deleted by the user, most of the times, the job disappears but
> sometimes, we don't know why, the job stays and has the status 'dr' within
> SGE. If we don't kill it 'manually', it stays forever. It is not always the
> same tools which produces this error.
> Have you any idea why how manage it ?

I have noticed problem with our DRMMA/SGE setup where a
user can cancel a large job (using the job splitter in at least some
cases), but Galaxy does not seem to cancel the jobs on the cluster.
I've not tried to diagnose this yet - it could be a similar issue though.

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: Jobs deleted staying in 'dr' status

hackdna


On 9/12/13 10:35 AM, "Peter Cock" <[hidden email]> wrote:

>On Thu, Sep 12, 2013 at 2:01 PM, Mathieu Bahin <[hidden email]>
>wrote:
>> Hi all,
>>
>> We have been developing our own Galaxy instance for a while now. We
>>have a
>> cluster on which the job are sent to be executed, it is managed through
>>SGE.
>> Usually, communication between SGE and DRMAA is ok and we don't have any
>> problem with that.
>>
>> When a job is deleted by the user, most of the times, the job
>>disappears but
>> sometimes, we don't know why, the job stays and has the status 'dr'
>>within
>> SGE. If we don't kill it 'manually', it stays forever. It is not always
>>the
>> same tools which produces this error.
>> Have you any idea why how manage it ?
>
>I have noticed problem with our DRMMA/SGE setup where a
>user can cancel a large job (using the job splitter in at least some
>cases), but Galaxy does not seem to cancel the jobs on the cluster.
>I've not tried to diagnose this yet - it could be a similar issue though.

Also, in our DRMAA/LSF setup (using a fork of the latest galaxy-dist) jobs
generated by the current workflow step continue running on the cluster
after history is deleted.

Ilya


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: Jobs deleted staying in 'dr' status

Nicola Soranzo
Il 2013-10-11 17:21 Sytchev, Ilya ha scritto:

> On 9/12/13 10:35 AM, "Peter Cock" <[hidden email]> wrote:
>
>>On Thu, Sep 12, 2013 at 2:01 PM, Mathieu Bahin
>> <[hidden email]>
>>wrote:
>>> Hi all,
>>>
>>> We have been developing our own Galaxy instance for a while now. We
>>>have a
>>> cluster on which the job are sent to be executed, it is managed
>>> through
>>>SGE.
>>> Usually, communication between SGE and DRMAA is ok and we don't
>>> have any
>>> problem with that.
>>>
>>> When a job is deleted by the user, most of the times, the job
>>>disappears but
>>> sometimes, we don't know why, the job stays and has the status 'dr'
>>>within
>>> SGE. If we don't kill it 'manually', it stays forever. It is not
>>> always
>>>the
>>> same tools which produces this error.
>>> Have you any idea why how manage it ?
>>
>>I have noticed problem with our DRMMA/SGE setup where a
>>user can cancel a large job (using the job splitter in at least some
>>cases), but Galaxy does not seem to cancel the jobs on the cluster.
>>I've not tried to diagnose this yet - it could be a similar issue
>> though.
>
> Also, in our DRMAA/LSF setup (using a fork of the latest galaxy-dist)
> jobs
> generated by the current workflow step continue running on the
> cluster
> after history is deleted.
>
> Ilya

Hi Ilya,
I also see this behaviour with DRMAA/GridEngine.
I think this has been already reported:

https://trello.com/c/1whC9did/245-currently-running-jobs-in-deleted-histories-should-be-killed

Please upvote it!

Best,
Nicola
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/