Galaxy Job Error

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Galaxy Job Error

evan clark
Has anyone seen a similar error like this before. We are unsure if
galaxy is causing the issue or it is being cuased by slurm as it seems
galaxy is prematurely deleting a file.

galaxy.jobs.runners DEBUG 2017-06-14 19:36:45,719 (3261/143577) Unable
to cleanup
/cm/shared/apps/galaxy/prod/galaxy/database/jobs_directory/003/3261/galaxy_3261.ec:
[Errno 2] No such file or directory:
'/cm/shared/apps/galaxy/prod/galaxy/database/jobs_directory/003/3261/galaxy_3261.ec'
galaxy.jobs.output_checker DEBUG 2017-06-14 19:36:45,725 Tool produced
standard error failing job - [slurmstepd: get_exit_code task 0 died by
signal
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: Galaxy Job Error

Marius van den Beek
Hi Evan,

If this is something that you rarely see and your filesystem is on a network drive,
you could try setting `retry_job_output_collection` to 10 in the galaxy.ini file.
This will let galaxy try to collection the output files 10 times,
which should help if there is some delay for the filesystem to be updated.

Best,
Marius

On 14 June 2017 at 22:02, evan clark <[hidden email]> wrote:
Has anyone seen a similar error like this before. We are unsure if galaxy is causing the issue or it is being cuased by slurm as it seems galaxy is prematurely deleting a file.

galaxy.jobs.runners DEBUG 2017-06-14 19:36:45,719 (3261/143577) Unable to cleanup /cm/shared/apps/galaxy/prod/galaxy/database/jobs_directory/003/3261/galaxy_3261.ec: [Errno 2] No such file or directory: '/cm/shared/apps/galaxy/prod/galaxy/database/jobs_directory/003/3261/galaxy_3261.ec'
galaxy.jobs.output_checker DEBUG 2017-06-14 19:36:45,725 Tool produced standard error failing job - [slurmstepd: get_exit_code task 0 died by signal
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: Galaxy Job Error

evan clark
That may work, but now we are getting this error.

Exception happened during processing of request from ('127.0.0.1', 43030)
Traceback (most recent call last):
  File "/cm/shared/apps/galaxy/prod/galaxy/.venv/lib/python2.7/site-packages/paste/httpserver.py", line 1085, in process_request_in_thread
    self.finish_request(request, client_address)
  File "/cm/shared/apps/galaxy/requirements/python/lib/python2.7/SocketServer.py", line 331, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/cm/shared/apps/galaxy/requirements/python/lib/python2.7/SocketServer.py", line 654, in __init__
    self.finish()
  File "/cm/shared/apps/galaxy/requirements/python/lib/python2.7/SocketServer.py", line 713, in finish
    self.wfile.close()
  File "/cm/shared/apps/galaxy/requirements/python/lib/python2.7/socket.py", line 283, in close
    self.flush()
  File "/cm/shared/apps/galaxy/requirements/python/lib/python2.7/socket.py", line 307, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe

June 14, 2017 4:31 PM
Hi Evan,

If this is something that you rarely see and your filesystem is on a network drive,
you could try setting `retry_job_output_collection` to 10 in the galaxy.ini file.
This will let galaxy try to collection the output files 10 times,
which should help if there is some delay for the filesystem to be updated.

Best,
Marius




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Reply | Threaded
Open this post in threaded view
|

Re: Galaxy Job Error

Marius van den Beek
My understanding is that this is not an error, it's an interruption in the communication with a client. This should be harmless if it doesn't happen at a massive rate. You probably have many more of these in your logs.

On Jun 14, 2017 11:04 PM, "evan clark" <[hidden email]> wrote:
That may work, but now we are getting this error.

Exception happened during processing of request from ('127.0.0.1', 43030)
Traceback (most recent call last):
  File "/cm/shared/apps/galaxy/prod/galaxy/.venv/lib/python2.7/site-packages/paste/httpserver.py", line 1085, in process_request_in_thread
    self.finish_request(request, client_address)
  File "/cm/shared/apps/galaxy/requirements/python/lib/python2.7/SocketServer.py", line 331, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/cm/shared/apps/galaxy/requirements/python/lib/python2.7/SocketServer.py", line 654, in __init__
    self.finish()
  File "/cm/shared/apps/galaxy/requirements/python/lib/python2.7/SocketServer.py", line 713, in finish
    self.wfile.close()
  File "/cm/shared/apps/galaxy/requirements/python/lib/python2.7/socket.py", line 283, in close
    self.flush()
  File "/cm/shared/apps/galaxy/requirements/python/lib/python2.7/socket.py", line 307, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe

June 14, 2017 4:31 PM
Hi Evan,

If this is something that you rarely see and your filesystem is on a network drive,
you could try setting `retry_job_output_collection` to 10 in the galaxy.ini file.
This will let galaxy try to collection the output files 10 times,
which should help if there is some delay for the filesystem to be updated.

Best,
Marius





___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/