unable to stob jobs

classic Classic list List threaded Threaded
3 messages Options
| Threaded
Open this post in threaded view
|

unable to stob jobs

Michael Mason
Hello all, 

I am running a Galaxy instance on slurm. I am unable to stop jobs via the admin "Manage Jobs" window. Via postgres, there are no jobs in the job table. Any thoughts?
I realize this may not be dev but I am at a loss of what to do.
Thanks
Mike

--CONFIDENTIALITY NOTICE--: The information contained in this email is intended for the exclusive use of the addressee and may contain confidential information. If you are not the intended recipient, you are hereby notified that any form of dissemination of this communication is strictly prohibited. www.benaroyaresearch.org
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: unable to stob jobs

John Chilton-4
When was the last time you updated Galaxy - we fixed a bug that would
be causing this behavior several releases ago.

Otherwise - additional information would help - like are you using the
DRMAA job runner or the newer specialized Slurm job runner and are
there any details in the Galaxy log that might be helpful? I would
expect to see a stack trace of some kind if there are problems like
this. If there are no stack traces and you have an update-to-date
Galaxy - perhaps checking the SLURM logs for errors might also provide
additional insight.

-John

On Tue, Aug 12, 2014 at 12:10 PM, Michael Mason
<[hidden email]> wrote:

> Hello all,
>
> I am running a Galaxy instance on slurm. I am unable to stop jobs via the
> admin "Manage Jobs" window. Via postgres, there are no jobs in the job
> table. Any thoughts?
> I realize this may not be dev but I am at a loss of what to do.
> Thanks
> Mike
> ________________________________
> --CONFIDENTIALITY NOTICE--: The information contained in this email is
> intended for the exclusive use of the addressee and may contain confidential
> information. If you are not the intended recipient, you are hereby notified
> that any form of dissemination of this communication is strictly prohibited.
> www.benaroyaresearch.org
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: unable to stob jobs

John Chilton-4
In reply to this post by Michael Mason
Fantastic - thanks for the update! I hope it is okay - I am cc'ing the
-dev list so if people search on this problem in the future and search
the can see what the problem turned out to be and how to fix it. Enjoy
your job killing.

-John

On Fri, Aug 15, 2014 at 12:47 PM, Michael Mason
<[hidden email]> wrote:

> Hi John,
>
> Just an update on the hypothesis. Our server guy figured out that the nfs
> memory setting was such that Galaxy used all the memory. He believes this
> resulted in no memory for jobs to be killed. He changed the setting so
> that there is some leftover memory when I am using Galaxy at an all out
> pace and I was able to kill jobs this morning.
> Thanks again.
> Mike
>
> On 8/15/14 9:32 AM, "John Chilton" <[hidden email]> wrote:
>
>>Very interesting - whenever you get a chance I would try to kill
>>simple jobs when Galaxy is not under load to verify the problem is
>>related to Galaxy's job load. If you still have problems I would then
>>try again to get that stack trace.
>>
>>Good luck,
>>
>>-John
>>
>>On Wed, Aug 13, 2014 at 6:46 PM, Michael Mason
>><[hidden email]> wrote:
>>> Hi John,
>>>
>>> We actually went ahead and recovered an archived instance from Friday.
>>> This should should be the same build though. Because of the archived
>>> instance I believe we lost the stack trace but I'll check with the IT
>>> folks.
>>> Thanks for your help. Below is the tip call. BTW we are using Galaxy
>>>with
>>> fastq's from a Fluidigm's C1 machine. This means we often run 100-200
>>> libraries on a single flow cell. The resulting 100-200 libraries tend to
>>> tax Galaxy. But it tends to handle it though this may be the root cause
>>>of
>>> our difficulty killing jobs.
>>>
>>> tip
>>> 13753:d3b1f484c4b6bbb3daa50fa167eef97a384890b3
>>> latest_2014.06.02
>>> 13742:8a863a311a6c9f14b302799bffcf94df9186fef7
>>> release_2014.06.02
>>> 13712:7e257c7b10badb65772b1528cb61d58175a42e47
>>> latest_2014.04.14
>>> 13085:68a8b0397947c732b28207d465d3f3c4e2a7a8a0
>>> release_2014.04.14
>>> 13064:9e53251b0b7e93b9563008a2b112f2e815a04bbc
>>> release_2014.02.10
>>> 12440:5e605ed6069fe4c5ca9875e95e91b2713499e8ca
>>> release_2013.11.04
>>> 11218:26f58e05aa1068761660681583821e21e6cbf7ab
>>> release_2013.08.12
>>> 10392:1ae95b3aa98d1ccf15b243ac3ce6a895eb7efc53
>>> release_2013.06.03
>>> 9943:524f246ca85395082719ae7a6ff72260d7ad5612
>>> security_2013.04.08
>>> 9292:2cc8d10988e03257dc7b97f8bb332c7df745d1dd
>>> release_2013.04.01
>>> 9231:75f09617abaadbc8cc732bb8ee519decaeb56ea7
>>> release_2013.02.08
>>> 8794:1c717491139269651bb59687563da9410b84c65d
>>> release_2013.01.13
>>> 8530:a4113cc1cb5eaa68091c9a73375f00555b66dd11
>>>
>>>
>>>
>>>
>>> On 8/13/14 1:39 PM, "John Chilton" <[hidden email]> wrote:
>>>
>>>>When was the last time you updated Galaxy - we fixed a bug that would
>>>>be causing this behavior several releases ago.
>>>>
>>>>Otherwise - additional information would help - like are you using the
>>>>DRMAA job runner or the newer specialized Slurm job runner and are
>>>>there any details in the Galaxy log that might be helpful? I would
>>>>expect to see a stack trace of some kind if there are problems like
>>>>this. If there are no stack traces and you have an update-to-date
>>>>Galaxy - perhaps checking the SLURM logs for errors might also provide
>>>>additional insight.
>>>>
>>>>-John
>>>>
>>>>On Tue, Aug 12, 2014 at 12:10 PM, Michael Mason
>>>><[hidden email]> wrote:
>>>>> Hello all,
>>>>>
>>>>> I am running a Galaxy instance on slurm. I am unable to stop jobs via
>>>>>the
>>>>> admin "Manage Jobs" window. Via postgres, there are no jobs in the job
>>>>> table. Any thoughts?
>>>>> I realize this may not be dev but I am at a loss of what to do.
>>>>> Thanks
>>>>> Mike
>>>>> ________________________________
>>>>> --CONFIDENTIALITY NOTICE--: The information contained in this email is
>>>>> intended for the exclusive use of the addressee and may contain
>>>>>confidential
>>>>> information. If you are not the intended recipient, you are hereby
>>>>>notified
>>>>> that any form of dissemination of this communication is strictly
>>>>>prohibited.
>>>>> www.benaroyaresearch.org
>>>>>
>>>>> ___________________________________________________________
>>>>> Please keep all replies on the list by using "reply all"
>>>>> in your mail client.  To manage your subscriptions to this
>>>>> and other Galaxy lists, please use the interface at:
>>>>>   http://lists.bx.psu.edu/
>>>>>
>>>>> To search Galaxy mailing lists use the unified search at:
>>>>>   http://galaxyproject.org/search/mailinglists/
>>>
>>> ________________________________
>>> --CONFIDENTIALITY NOTICE--: The information contained in this email is
>>>intended for the exclusive use of the addressee and may contain
>>>confidential information. If you are not the intended recipient, you are
>>>hereby notified that any form of dissemination of this communication is
>>>strictly prohibited. www.benaroyaresearch.org
>
> ________________________________
> --CONFIDENTIALITY NOTICE--: The information contained in this email is intended for the exclusive use of the addressee and may contain confidential information. If you are not the intended recipient, you are hereby notified that any form of dissemination of this communication is strictly prohibited. www.benaroyaresearch.org

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/