pbs pro and galaxy (change status error)

classic Classic list List threaded Threaded
4 messages Options
| Threaded
Open this post in threaded view
|

pbs pro and galaxy (change status error)

Laure QUINTRIC
Hello,

We almost succeed in using drmaa-0.4b3 with galaxy and pbs pro :
The job launched from galaxy runs on our cluster but when job status
changes to finished, there's an error in drmaa python egg.

Here is the server log :

galaxy.jobs.runners.drmaa ERROR 2010-11-26 17:11:10,857
(21/516559.service0.ice.ifremer.fr) Unable to check job status
Traceback (most recent call last):
   File
"/home12/caparmor/bioinfo/galaxy_dist/lib/galaxy/jobs/runners/drmaa.py",
line 252, in check_watched_items
     state = self.ds.jobStatus( job_id )
   File
"/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/__init__.py",
line 522, in jobStatus
   File
"/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/helpers.py",
line 213, in c
     return f(*(args + (error_buffer, sizeof(error_buffer))))
   File
"/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/errors.py", line
90, in error_check
     raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value))
InternalException: code 1: pbs_statjob: Job %s has finished
galaxy.jobs.runners.drmaa WARNING 2010-11-26 17:11:10,861
(21/516559.service0.ice.ifremer.fr) job will now be errored
galaxy.jobs.runners.drmaa DEBUG 2010-11-26 17:11:10,986
(21/516559.service0.ice.ifremer.fr) User killed running job, but error
encountered removing from DRM queue: code 1: pbs_deljob: Job %s has finished

Any idea ?

Thanks a lot

Laure


| Threaded
Open this post in threaded view
|

Re: pbs pro and galaxy (change status error)

Nate Coraor (nate@bx.psu.edu)
Laure QUINTRIC wrote:

> Hello,
>
> We almost succeed in using drmaa-0.4b3 with galaxy and pbs pro :
> The job launched from galaxy runs on our cluster but when job status
> changes to finished, there's an error in drmaa python egg.
>
> Here is the server log :
>
> galaxy.jobs.runners.drmaa ERROR 2010-11-26 17:11:10,857
> (21/516559.service0.ice.ifremer.fr) Unable to check job status
> Traceback (most recent call last):
>   File "/home12/caparmor/bioinfo/galaxy_dist/lib/galaxy/jobs/runners/drmaa.py",
> line 252, in check_watched_items
>     state = self.ds.jobStatus( job_id )
>   File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/__init__.py",
> line 522, in jobStatus
>   File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/helpers.py",
> line 213, in c
>     return f(*(args + (error_buffer, sizeof(error_buffer))))
>   File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/errors.py",
> line 90, in error_check
>     raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value))
> InternalException: code 1: pbs_statjob: Job %s has finished
> galaxy.jobs.runners.drmaa WARNING 2010-11-26 17:11:10,861
> (21/516559.service0.ice.ifremer.fr) job will now be errored
> galaxy.jobs.runners.drmaa DEBUG 2010-11-26 17:11:10,986
> (21/516559.service0.ice.ifremer.fr) User killed running job, but
> error encountered removing from DRM queue: code 1: pbs_deljob: Job
> %s has finished
>
> Any idea ?

The job has successfully completed, but it's being treated as an error
by the drmaa library.

I can't really test this so there's no way for me to know whether any
other calls to drmaa.Session().jobStatus() can possibly raise an
InternalException.  The following will get the job completion to
succeed, but it could cause other failed jobs to not be marked as
failed.

In lib/galaxy/jobs/runners/drmaa.py, locate:

            except drmaa.InvalidJobException:

and change it to:

            except ( drmaa.InvalidJobException, drmaa.InternalException ):

If you wanted to keep an eye on the value of the error in the log, you
could do the following instead:

            except ( drmaa.InvalidJobException, drmaa.InternalException ), e:
                log.debug( "(%s/%s) job left DRM queue with following message: %s" % ( galaxy_job_id, job_id, e ) )

Please do let us know if you get it working.  There are quite a few
people hoping to get Galaxy working on PBS Pro.

--nate

>
> Thanks a lot
>
> Laure
>
> _______________________________________________
> galaxy-dev mailing list
> [hidden email]
> http://lists.bx.psu.edu/listinfo/galaxy-dev

| Threaded
Open this post in threaded view
|

Re: pbs pro and galaxy (change status error)

Laure QUINTRIC
Hello,

Changing the exception management was the right thing to do !!
Actually, as the job status on the cluster is Finished without error,
even if drmaa.py cannot get the status finished (I think it's because of
the limitations of drmaa library), galaxy considers the job as finish
now and not as failed as it was before, so I can get my result inside
galaxy web interface.

Maybe this should be integrated in Galaxy for next release ?

Thanks

Laure

On 30/11/2010 22:07, Nate Coraor wrote:

> Laure QUINTRIC wrote:
>> Hello,
>>
>> We almost succeed in using drmaa-0.4b3 with galaxy and pbs pro :
>> The job launched from galaxy runs on our cluster but when job status
>> changes to finished, there's an error in drmaa python egg.
>>
>> Here is the server log :
>>
>> galaxy.jobs.runners.drmaa ERROR 2010-11-26 17:11:10,857
>> (21/516559.service0.ice.ifremer.fr) Unable to check job status
>> Traceback (most recent call last):
>>    File "/home12/caparmor/bioinfo/galaxy_dist/lib/galaxy/jobs/runners/drmaa.py",
>> line 252, in check_watched_items
>>      state = self.ds.jobStatus( job_id )
>>    File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/__init__.py",
>> line 522, in jobStatus
>>    File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/helpers.py",
>> line 213, in c
>>      return f(*(args + (error_buffer, sizeof(error_buffer))))
>>    File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/errors.py",
>> line 90, in error_check
>>      raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value))
>> InternalException: code 1: pbs_statjob: Job %s has finished
>> galaxy.jobs.runners.drmaa WARNING 2010-11-26 17:11:10,861
>> (21/516559.service0.ice.ifremer.fr) job will now be errored
>> galaxy.jobs.runners.drmaa DEBUG 2010-11-26 17:11:10,986
>> (21/516559.service0.ice.ifremer.fr) User killed running job, but
>> error encountered removing from DRM queue: code 1: pbs_deljob: Job
>> %s has finished
>>
>> Any idea ?
>
> The job has successfully completed, but it's being treated as an error
> by the drmaa library.
>
> I can't really test this so there's no way for me to know whether any
> other calls to drmaa.Session().jobStatus() can possibly raise an
> InternalException.  The following will get the job completion to
> succeed, but it could cause other failed jobs to not be marked as
> failed.
>
> In lib/galaxy/jobs/runners/drmaa.py, locate:
>
>              except drmaa.InvalidJobException:
>
> and change it to:
>
>              except ( drmaa.InvalidJobException, drmaa.InternalException ):
>
> If you wanted to keep an eye on the value of the error in the log, you
> could do the following instead:
>
>              except ( drmaa.InvalidJobException, drmaa.InternalException ), e:
>                  log.debug( "(%s/%s) job left DRM queue with following message: %s" % ( galaxy_job_id, job_id, e ) )
>
> Please do let us know if you get it working.  There are quite a few
> people hoping to get Galaxy working on PBS Pro.
>
> --nate
>
>>
>> Thanks a lot
>>
>> Laure
>>
>> _______________________________________________
>> galaxy-dev mailing list
>> [hidden email]
>> http://lists.bx.psu.edu/listinfo/galaxy-dev

| Threaded
Open this post in threaded view
|

Re: pbs pro and galaxy (change status error)

Nate Coraor (nate@bx.psu.edu)
Laure QUINTRIC wrote:

> Hello,
>
> Changing the exception management was the right thing to do !!
> Actually, as the job status on the cluster is Finished without
> error, even if drmaa.py cannot get the status finished (I think it's
> because of the limitations of drmaa library), galaxy considers the
> job as finish now and not as failed as it was before, so I can get
> my result inside galaxy web interface.
>
> Maybe this should be integrated in Galaxy for next release ?

Hi Laure,

I'll make the change, but please let me know if you find that any errors
or failures are being treated as completed jobs.

Thanks,
--nate

>
> Thanks
>
> Laure
>
> On 30/11/2010 22:07, Nate Coraor wrote:
> >Laure QUINTRIC wrote:
> >>Hello,
> >>
> >>We almost succeed in using drmaa-0.4b3 with galaxy and pbs pro :
> >>The job launched from galaxy runs on our cluster but when job status
> >>changes to finished, there's an error in drmaa python egg.
> >>
> >>Here is the server log :
> >>
> >>galaxy.jobs.runners.drmaa ERROR 2010-11-26 17:11:10,857
> >>(21/516559.service0.ice.ifremer.fr) Unable to check job status
> >>Traceback (most recent call last):
> >>   File "/home12/caparmor/bioinfo/galaxy_dist/lib/galaxy/jobs/runners/drmaa.py",
> >>line 252, in check_watched_items
> >>     state = self.ds.jobStatus( job_id )
> >>   File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/__init__.py",
> >>line 522, in jobStatus
> >>   File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/helpers.py",
> >>line 213, in c
> >>     return f(*(args + (error_buffer, sizeof(error_buffer))))
> >>   File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/errors.py",
> >>line 90, in error_check
> >>     raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value))
> >>InternalException: code 1: pbs_statjob: Job %s has finished
> >>galaxy.jobs.runners.drmaa WARNING 2010-11-26 17:11:10,861
> >>(21/516559.service0.ice.ifremer.fr) job will now be errored
> >>galaxy.jobs.runners.drmaa DEBUG 2010-11-26 17:11:10,986
> >>(21/516559.service0.ice.ifremer.fr) User killed running job, but
> >>error encountered removing from DRM queue: code 1: pbs_deljob: Job
> >>%s has finished
> >>
> >>Any idea ?
> >
> >The job has successfully completed, but it's being treated as an error
> >by the drmaa library.
> >
> >I can't really test this so there's no way for me to know whether any
> >other calls to drmaa.Session().jobStatus() can possibly raise an
> >InternalException.  The following will get the job completion to
> >succeed, but it could cause other failed jobs to not be marked as
> >failed.
> >
> >In lib/galaxy/jobs/runners/drmaa.py, locate:
> >
> >             except drmaa.InvalidJobException:
> >
> >and change it to:
> >
> >             except ( drmaa.InvalidJobException, drmaa.InternalException ):
> >
> >If you wanted to keep an eye on the value of the error in the log, you
> >could do the following instead:
> >
> >             except ( drmaa.InvalidJobException, drmaa.InternalException ), e:
> >                 log.debug( "(%s/%s) job left DRM queue with following message: %s" % ( galaxy_job_id, job_id, e ) )
> >
> >Please do let us know if you get it working.  There are quite a few
> >people hoping to get Galaxy working on PBS Pro.
> >
> >--nate
> >
> >>
> >>Thanks a lot
> >>
> >>Laure
> >>
> >>_______________________________________________
> >>galaxy-dev mailing list
> >>[hidden email]
> >>http://lists.bx.psu.edu/listinfo/galaxy-dev