job_wrapper.finish - No JSON object could be decoded:

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

job_wrapper.finish - No JSON object could be decoded:

Peter Cock
Hi all,

Some of our split BLAST jobs using the parallelism feature have
been failing (apparently for a while, so this is not a recent problem
from changes in the BLAST wrappers) as follows:

<quote>
The Galaxy framework encountered the following error while attempting
to run the tool:

Traceback (most recent call last):
  File "/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/runners/tasks.py",
line 141, in queue_job
    job_wrapper.finish( stdout, stderr, job_exit_code )
  File "/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line
962, in finish
    if ( not self.external_output_metadata.external_metadata_set_successfully(
dataset, self.sa_session ) and
self.app.config.retry_metadata_internally ):
  File "/mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/metadata.py",
line 638, in external_metadata_set_successfully
    rval, rstring = simplejson.load( open(
metadata_files.filename_results_code ) )
  File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py",
line 328, in load
    use_decimal=use_decimal, **kw)
  File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py",
line 384, in loads
    return _default_decoder.decode(s)
  File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py",
line 402, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py",
line 420, in raw_decode
    raise JSONDecodeError("No JSON object could be decoded", s, idx)
JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)

Tool execution generated the following error message:

Unable to finish job
</quote>

Reading the code, this is breaking after all the child jobs have
finished, and the data has been merged, during the external
set metadata step.

This specific example was BLAST XML output, and the merge
had produced a 4.2 GB file which ended mid record (evidently
the output merge was not really successful).

However, as a partial XML file, it would be invalid and might
this cause problems with something in the metadata setting
code - and explain the reported error?

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: job_wrapper.finish - No JSON object could be decoded:

Peter Cock
On Tue, Mar 11, 2014 at 11:44 AM, Peter Cock <[hidden email]> wrote:

> Hi all,
>
> Some of our split BLAST jobs using the parallelism feature have
> been failing (apparently for a while, so this is not a recent problem
> from changes in the BLAST wrappers) as follows:
>
> <quote>
> The Galaxy framework encountered the following error while attempting
> to run the tool:
>
> Traceback (most recent call last):
>   File "/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/runners/tasks.py",
> line 141, in queue_job
>     job_wrapper.finish( stdout, stderr, job_exit_code )
>   File "/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line
> 962, in finish
>     if ( not self.external_output_metadata.external_metadata_set_successfully(
> dataset, self.sa_session ) and
> self.app.config.retry_metadata_internally ):
>   File "/mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/metadata.py",
> line 638, in external_metadata_set_successfully
>     rval, rstring = simplejson.load( open(
> metadata_files.filename_results_code ) )
>   File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py",
> line 328, in load
>     use_decimal=use_decimal, **kw)
>   File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py",
> line 384, in loads
>     return _default_decoder.decode(s)
>   File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py",
> line 402, in decode
>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>   File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py",
> line 420, in raw_decode
>     raise JSONDecodeError("No JSON object could be decoded", s, idx)
> JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)
>
> Tool execution generated the following error message:
>
> Unable to finish job
> </quote>
>
> Reading the code, this is breaking after all the child jobs have
> finished, and the data has been merged, during the external
> set metadata step.
>
> This specific example was BLAST XML output, and the merge
> had produced a 4.2 GB file which ended mid record (evidently
> the output merge was not really successful).
>
> However, as a partial XML file, it would be invalid and might
> this cause problems with something in the metadata setting
> code - and explain the reported error?
>
> Peter

Clues from my log:

2014-03-11T07:11:16.635421+00:00 ppserver galaxy.jobs.splitters.multi
ERROR 2014-03-11 07:11:16,563 Error merging files#012Traceback (most
recent call last):#012  File
"/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/splitters/multi.py", line
153, in do_merge#012    output_type.merge(output_files,
output_file_name, **extra_merge_args)#012  File
"/mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/data.py", line 572, in
merge#012    shutil.copyfileobj(open(fsrc, 'rb'), fdst)#012  File
"/usr/lib64/python2.6/shutil.py", line 31, in copyfileobj#012
fdst.write(buf)#012IOError: [Errno 28] No space left on device

2014-03-11T07:11:25.594665+00:00 ppserver galaxy.jobs.runners.tasks
ERROR 2014-03-11 07:11:25,558 Job wrapper finish method
failed#012Traceback (most recent call last):#012  File
"/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/runners/tasks.py", line 141,
in queue_job#012    job_wrapper.finish( stdout, stderr, job_exit_code
)#012  File "/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py",
line 962, in finish#012    if ( not
self.external_output_metadata.external_metadata_set_successfully(
dataset, self.sa_session ) and
self.app.config.retry_metadata_internally ):#012  File
"/mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/metadata.py", line 638,
in external_metadata_set_successfully#012    rval, rstring =
simplejson.load( open( metadata_files.filename_results_code ) )#012
File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py",
line 328, in load#012    use_decimal=use_decimal, **kw)#012  File
"/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py",
line 384, in loads#012    return _default_decoder.decode(s)#012  File
"/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py",
line 402, in decode#012    obj, end = self.raw_decode(s, idx=_w(s,
0).end())#012  File
"/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py",
line 420, in raw_decode#012    raise JSONDecodeError("No JSON object
could be decoded", s, idx)#012JSONDecodeError: No JSON object could be
decoded: line 1 column 0 (char 0)

So this looks like the root cause was a full partition, but Galaxy handled
this poorly (showing a JSON error rather than disk full).

This proposed patch should give a more helpful error message from the
metadata script, which is a start:

$ hg diff lib/galaxy/datatypes/metadata.py
diff -r 26f58e05aa10 lib/galaxy/datatypes/metadata.py
--- a/lib/galaxy/datatypes/metadata.py    Mon Nov 04 14:58:55 2013 -0500
+++ b/lib/galaxy/datatypes/metadata.py    Tue Mar 11 14:12:09 2014 +0000
@@ -635,7 +635,14 @@
         metadata_files = self.get_output_filenames_by_dataset(
dataset, sa_session )
         if not metadata_files:
             return False # this file doesn't exist
-        rval, rstring = simplejson.load( open(
metadata_files.filename_results_code ) )
+        if 0 == os.path.getsize(metadata_files.filename_results_code):
+            log.debug( 'setting metadata externally failed for %s %s:
%s empty' % ( dataset.__class__.__name__, dataset.id,
metadata_files.filename_results_code ) )
+            return False
+        try:
+            rval, rstring = simplejson.load( open(
metadata_files.filename_results_code ) )
+        except Exception as e:
+            log.debug( 'setting metadata externally failed for %s %s:
%s' % ( dataset.__class__.__name__, dataset.id, e ) )
+            return False
         if not rval:
             log.debug( 'setting metadata externally failed for %s %s:
%s' % ( dataset.__class__.__name__, dataset.id, rstring ) )
         return rval


Testing this is complicated without setting up a dedicated (small)
partition though to deliberately run out of space...

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: job_wrapper.finish - No JSON object could be decoded:

Peter Cock
On Tue, Mar 11, 2014 at 4:04 PM, Peter Cock <[hidden email]> wrote:

> On Tue, Mar 11, 2014 at 11:44 AM, Peter Cock <[hidden email]> wrote:
>> Hi all,
>>
>> Some of our split BLAST jobs using the parallelism feature have
>> been failing (apparently for a while, so this is not a recent problem
>> from changes in the BLAST wrappers) as follows:
>>
>> <quote>
>> The Galaxy framework encountered the following error while attempting
>> to run the tool:
>>
>> Traceback (most recent call last):
>>   File "/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/runners/tasks.py",
>> line 141, in queue_job
>>     job_wrapper.finish( stdout, stderr, job_exit_code )
>>   File "/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line
>> 962, in finish
>>     if ( not self.external_output_metadata.external_metadata_set_successfully(
>> dataset, self.sa_session ) and
>> self.app.config.retry_metadata_internally ):
>>   File "/mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/metadata.py",
>> line 638, in external_metadata_set_successfully
>>     rval, rstring = simplejson.load( open(
>> metadata_files.filename_results_code ) )
>>   File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py",
>> line 328, in load
>>     use_decimal=use_decimal, **kw)
>>   File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py",
>> line 384, in loads
>>     return _default_decoder.decode(s)
>>   File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py",
>> line 402, in decode
>>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>>   File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py",
>> line 420, in raw_decode
>>     raise JSONDecodeError("No JSON object could be decoded", s, idx)
>> JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)
>>
>> Tool execution generated the following error message:
>>
>> Unable to finish job
>> </quote>
>>
>> Reading the code, this is breaking after all the child jobs have
>> finished, and the data has been merged, during the external
>> set metadata step.
>>
>> This specific example was BLAST XML output, and the merge
>> had produced a 4.2 GB file which ended mid record (evidently
>> the output merge was not really successful).
>>
>> However, as a partial XML file, it would be invalid and might
>> this cause problems with something in the metadata setting
>> code - and explain the reported error?
>>
>> Peter
>
> Clues from my log:
>
> 2014-03-11T07:11:16.635421+00:00 ppserver galaxy.jobs.splitters.multi
> ERROR 2014-03-11 07:11:16,563 Error merging files#012Traceback (most
> recent call last):#012  File
> "/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/splitters/multi.py", line
> 153, in do_merge#012    output_type.merge(output_files,
> output_file_name, **extra_merge_args)#012  File
> "/mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/data.py", line 572, in
> merge#012    shutil.copyfileobj(open(fsrc, 'rb'), fdst)#012  File
> "/usr/lib64/python2.6/shutil.py", line 31, in copyfileobj#012
> fdst.write(buf)#012IOError: [Errno 28] No space left on device
>
> 2014-03-11T07:11:25.594665+00:00 ppserver galaxy.jobs.runners.tasks
> ERROR 2014-03-11 07:11:25,558 Job wrapper finish method
> failed#012Traceback (most recent call last):#012  File
> "/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/runners/tasks.py", line 141,
> in queue_job#012    job_wrapper.finish( stdout, stderr, job_exit_code
> )#012  File "/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py",
> line 962, in finish#012    if ( not
> self.external_output_metadata.external_metadata_set_successfully(
> dataset, self.sa_session ) and
> self.app.config.retry_metadata_internally ):#012  File
> "/mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/metadata.py", line 638,
> in external_metadata_set_successfully#012    rval, rstring =
> simplejson.load( open( metadata_files.filename_results_code ) )#012
> File "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py",
> line 328, in load#012    use_decimal=use_decimal, **kw)#012  File
> "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/__init__.py",
> line 384, in loads#012    return _default_decoder.decode(s)#012  File
> "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py",
> line 402, in decode#012    obj, end = self.raw_decode(s, idx=_w(s,
> 0).end())#012  File
> "/mnt/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.6-linux-x86_64-ucs4.egg/simplejson/decoder.py",
> line 420, in raw_decode#012    raise JSONDecodeError("No JSON object
> could be decoded", s, idx)#012JSONDecodeError: No JSON object could be
> decoded: line 1 column 0 (char 0)
>
> So this looks like the root cause was a full partition, but Galaxy handled
> this poorly (showing a JSON error rather than disk full).
>
> This proposed patch should give a more helpful error message from the
> metadata script, which is a start:
>
> $ hg diff lib/galaxy/datatypes/metadata.py
> diff -r 26f58e05aa10 lib/galaxy/datatypes/metadata.py
> --- a/lib/galaxy/datatypes/metadata.py    Mon Nov 04 14:58:55 2013 -0500
> +++ b/lib/galaxy/datatypes/metadata.py    Tue Mar 11 14:12:09 2014 +0000
> @@ -635,7 +635,14 @@
>          metadata_files = self.get_output_filenames_by_dataset(
> dataset, sa_session )
>          if not metadata_files:
>              return False # this file doesn't exist
> -        rval, rstring = simplejson.load( open(
> metadata_files.filename_results_code ) )
> +        if 0 == os.path.getsize(metadata_files.filename_results_code):
> +            log.debug( 'setting metadata externally failed for %s %s:
> %s empty' % ( dataset.__class__.__name__, dataset.id,
> metadata_files.filename_results_code ) )
> +            return False
> +        try:
> +            rval, rstring = simplejson.load( open(
> metadata_files.filename_results_code ) )
> +        except Exception as e:
> +            log.debug( 'setting metadata externally failed for %s %s:
> %s' % ( dataset.__class__.__name__, dataset.id, e ) )
> +            return False
>          if not rval:
>              log.debug( 'setting metadata externally failed for %s %s:
> %s' % ( dataset.__class__.__name__, dataset.id, rstring ) )
>          return rval
>
>
> Testing this is complicated without setting up a dedicated (small)
> partition though to deliberately run out of space...
>
> Peter

Hmm. The overnight test again filled the disk, but with different
symptoms - this time the tool stdout contained:

Error merging files

And the stderr,

[Errno 28] No space left on device

Plus harmless logging warnings from the child tasks, e.g.
Selenocysteine (U) at position 18 replaced by X

My patch above seems to have helped - the is no scary
JSONDecodeError exception in the log this time:

2014-03-12T06:56:50.208222+00:00 ppserver galaxy.jobs.splitters.multi
ERROR 2014-03-12 06:56:50,109 Error merging files#012Traceback (most
recent call last):#012  File
"/mnt/galaxy/galaxy-dist/lib/galaxy/jobs/splitters/multi.py", line
153, in do_merge#012    output_type.merge(output_files,
output_file_name, **extra_merge_args)#012  File
"/mnt/galaxy/galaxy-dist/lib/galaxy/datatypes/data.py", line 572, in
merge#012    shutil.copyfileobj(open(fsrc, 'rb'), fdst)#012  File
"/usr/lib64/python2.6/shutil.py", line 31, in copyfileobj#012
fdst.write(buf)#012IOError: [Errno 28] No space left on device
2014-03-12T06:57:02.954064+00:00 ppserver galaxy.jobs INFO 2014-03-12
06:57:02,953 stderr for job 9798 is greater than 32K, only a portion
will be logged to database

That's the good news - part of Galaxy is now reporting there
was an error merging files and it was out of disk space.
The merged BLAST XML file terminates mid tag.

The bad news is that despite this the history entry was
green (OK).

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/