Error cleaning up Condor jobs

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Error cleaning up Condor jobs

Branden Timm-2
Hi All,
   I've been working to configure a new Galaxy instance to run jobs
under Condor.  Things are 99% working at this point, but what seems to
be happening is after the Condor job finishes Galaxy tries to clean up a
cluster file that isn't there, namely the .ec (exit code) file.  
Relevant log info: DEBUG 2013-05-07 15:02:49,364 (1985) Working directory for
job is: /home/GLBRCORG/galaxy/database/job_working_directory/001/1985 DEBUG 2013-05-07 15:02:49,387 (1985) Dispatching to
condor runner DEBUG 2013-05-07 15:02:49,720 (1985) Persisting job
destination (destination id: condor) INFO 2013-05-07 15:02:49,761 (1985) Job dispatched DEBUG 2013-05-07 15:02:56,368 (1985)
submitting file /home/GLBRCORG/galaxy/database/condor/ DEBUG 2013-05-07 15:02:56,369 (1985) command
is: python
'/home/GLBRCORG/galaxy/database/files/002/dataset_2842.dat' ''; cd
/home/GLBRCORG/galaxy/database/job_working_directory/001/1985 .
/home/GLBRCORG/galaxy/database/job_working_directory/001/1985/galaxy.json /home/GLBRCORG/galaxy/database/job_working_directory/001/1985/metadata_in_HistoryDatasetAssociation_3161_are5Bg,/home/GLBRCORG/galaxy/database/job_working_directory/001/1985/metadata_kwds_HistoryDatasetAssociation_3161_p73Yus,/home/GLBRCORG/galaxy/database/job_working_directory/001/1985/metadata_out_HistoryDatasetAssociation_3161_tLqep6,/home/GLBRCORG/galaxy/database/job_working_directory/001/1985/metadata_results_HistoryDatasetAssociation_3161_3QSW5X,,/home/GLBRCORG/galaxy/database/job_working_directory/001/1985/metadata_override_HistoryDatasetAssociation_3161_JUFvmk INFO 2013-05-07 15:02:58,960 (1985) queued as 15 DEBUG 2013-05-07 15:02:59,110 (1985) Persisting job
destination (destination id: condor) DEBUG 2013-05-07 15:02:59,536 (1985/15) job
is now running DEBUG 2013-05-07 15:07:16,966 (1985/15) job
is now running DEBUG 2013-05-07 15:07:17,279 (1985/15) job
has completed DEBUG 2013-05-07 15:07:17,417 (1985/15) Unable to
cleanup /home/GLBRCORG/galaxy/database/condor/ [Errno 2]
No such file or directory:
'/home/GLBRCORG/galaxy/database/condor/' DEBUG 2013-05-07 15:07:17,560 setting dataset state to ERROR DEBUG 2013-05-07 15:07:17,961 job 1985 ended
galaxy.datatypes.metadata DEBUG 2013-05-07 15:07:17,961 Cleaning up
external metadata files

I've done a watch on the condor job directory, and as far as I can tell never gets created.  From a cursory look at
lib/galaxy/jobs/runners/ and, it looks like the
cleanup is happening in the AsynchronousJobState::cleanup method, which
iterates on the cleanup_file_attributes list.  I naively tried to
override cleanup_file_attributes in CondorJobState to disinclude
'exit_code_file', to no avail.

I'm hoping somebody can spot where the hiccup is here.  Another question
that is on my mind is should a failure to cleanup cluster files set the
dataset state to ERROR?  An inspection of the output file from my job
leads me to believe it finished just fine, and indicating failure to the
user because Galaxy couldn't cleanup a 1b error code file seems a little
extreme to me.


Branden Timm
[hidden email]
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

To search Galaxy mailing lists use the unified search at: