Workflow bug when *.loc entries are missing

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Workflow bug when *.loc entries are missing

Peter Cock
Hi all,

I was testing a new workflow which I developed on the current
stable release, Galaxy 15.03, and uploaded to the Tool Shed
yesterday:

https://github.com/peterjc/galaxy_blast/tree/master/workflows/blast_top_hit_species
http://toolshed.g2.bx.psu.edu/view/peterjc/blast_top_hit_species

I then switched to my development instance running the dev
branch from github,

1. Log in as an Admin
2. Install http://toolshed.g2.bx.psu.edu/view/peterjc/blast_top_hit_species
3. Upload suggested test dataset (in workflow README file):
http://nematode.net/Data/nacobbus_aberrans_transcript_assembly/N.abberans_reference_no_contam.zip
4. Click on "All Workflows" (bottom of left pane tool list)
5. Click on "Species of top BLAST hits (imported from uploaded file)"
6. On preview screen, examine step 3, blastx
7. Notice it will run against Protein BLAST database "nr"
8. Click "Run workflow"

As noted in the README file, this workflow assumes the local
blastdb_p.loc file includes an entry "nr" for a current/recent
mirror of the NCBI "non-redundant" protein BLAST database.

Initially my development galaxy had an empty blastdb_p.loc
file, so I expected the workflow to spot this problem at the preview
step (before ever clicking "run workflow").

Instead it proceeds and attempted to run BLASTX without no database
(which of course failed):

blastx -query "/mnt/galaxy/repositories/galaxy/database/files/000/dataset_81.dat"
-db "" -query_gencode 1 -evalue 0.001 -out
"/mnt/galaxy/repositories/galaxy/database/files/000/dataset_82.dat"
-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend
sstart send evalue bitscore staxids sscinames scomnames sblastnames
sskingdoms" -num_threads "${GALAXY_SLOTS:-8}" -strand both -matrix
BLOSUM62 -seg yes -max_target_seqs 1

That's the bad news: It fails, but it fails with a reasonably clear error:

Fatal error: Exit code 2 ()
BLAST Database error: Database name is required.

Good news: If I update blastdb_p.loc to add the "nr" database it works.

Very bad news: If I update blastdb_p.loc to add other databases (but
not an entry for "nr"), the workflow defaults to the first database in
blastdb_p.loc

Summary: Workflow requesting key entry "xxx" from example.loc
(a) example.loc empty, gets empty path, fails cleanly.
(b) example.loc contains xxx, gets correct path, works.
(c) example.loc non empty but lacks xxx, runs with WRONG path.

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Workflow bug when *.loc entries are missing

Peter Cock
On Tue, Mar 31, 2015 at 11:31 AM, Peter Cock <[hidden email]> wrote:
> ...
>
> Summary: Workflow requesting key entry "xxx" from example.loc
> (a) example.loc empty, gets empty path, fails cleanly.
> (b) example.loc contains xxx, gets correct path, works.
> (c) example.loc non empty but lacks xxx, runs with WRONG path.
>
> Peter

That was running:

commit 1bfb977c7293744cf3e277e4b887bf79cacceddf
Author: John Chilton <[hidden email]>
Date:   Thu Mar 19 09:19:16 2015 -0400

Confirmed with the very latest dev branch commit,

commit b448ccb93bf432fa929fcdf520c9508fb6e5c525
Merge: db86fc4 080a5d0
Author: John Chilton <[hidden email]>
Date:   Mon Mar 30 20:33:33 2015 -0500

Problem persists on this instance after switching to the
release_15.03 tag (which ought to be the same code
on github and bitbucket - thankfully the dev branch has
not yet changed the schema, so this switchover worked):

commit a6256b547115292fb5a9270ddabd542bced182f3
Merge: a812633 65a42e9
Author: Dannon Baker <[hidden email]>

Regards,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Workflow bug when *.loc entries are missing

Peter Cock
On Tue, Mar 31, 2015 at 12:04 PM, Peter Cock <[hidden email]> wrote:

> On Tue, Mar 31, 2015 at 11:31 AM, Peter Cock <[hidden email]> wrote:
>> ...
>>
>> Summary: Workflow requesting key entry "xxx" from example.loc
>> (a) example.loc empty, gets empty path, fails cleanly.
>> (b) example.loc contains xxx, gets correct path, works.
>> (c) example.loc non empty but lacks xxx, runs with WRONG path.
>>
>> Peter
>

I've started adding debug logging to traces this, focusing on
lib/galaxy/tools/parameters/dynamic_options.py

At the "workflow preview", the BLAST database is still "nr"
(and has not yet been validated). At this point class
DynamicOptions method get_fields and get_options have
been called, so the mismatch could be spotted...

After clicking "run workflow", by the time the DynamicOptions
methods get_fields_by_value and get_field_by_name_for_value
are called "nr" has been replaced with the first entry in the
loc file.

I might need a few more clues to make any headway here.
The code in lib/galaxy/workflow/run.py is not yet clear to me...

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Workflow bug when *.loc entries are missing

Daniel Blankenberg
Hi Peter,

I just wanted to chime in and say that I have been able to reproduce this issue locally (on a different tool), and I’d consider it a pretty significant bug for reproducibility, etc.


Thanks,

Dan


On Mar 31, 2015, at 12:33 PM, Peter Cock <[hidden email]> wrote:

> On Tue, Mar 31, 2015 at 12:04 PM, Peter Cock <[hidden email]> wrote:
>> On Tue, Mar 31, 2015 at 11:31 AM, Peter Cock <[hidden email]> wrote:
>>> ...
>>>
>>> Summary: Workflow requesting key entry "xxx" from example.loc
>>> (a) example.loc empty, gets empty path, fails cleanly.
>>> (b) example.loc contains xxx, gets correct path, works.
>>> (c) example.loc non empty but lacks xxx, runs with WRONG path.
>>>
>>> Peter
>>
>
> I've started adding debug logging to traces this, focusing on
> lib/galaxy/tools/parameters/dynamic_options.py
>
> At the "workflow preview", the BLAST database is still "nr"
> (and has not yet been validated). At this point class
> DynamicOptions method get_fields and get_options have
> been called, so the mismatch could be spotted...
>
> After clicking "run workflow", by the time the DynamicOptions
> methods get_fields_by_value and get_field_by_name_for_value
> are called "nr" has been replaced with the first entry in the
> loc file.
>
> I might need a few more clues to make any headway here.
> The code in lib/galaxy/workflow/run.py is not yet clear to me...
>
> Peter
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Workflow bug when *.loc entries are missing

Peter Cock
Thanks Dan,

That's very reassuring - will you file a Trello issue, or should I?

Peter


On Tue, Mar 31, 2015 at 5:56 PM, Daniel Blankenberg <[hidden email]> wrote:

>
> Hi Peter,
>
> I just wanted to chime in and say that I have been able to reproduce
> this issue locally (on a different tool), and I’d consider it a pretty
> significant bug for reproducibility, etc.
>
> Thanks,
>
> Dan

>>> On Tue, Mar 31, 2015 at 11:31 AM, Peter Cock <[hidden email]> wrote:
>>>> ...
>>>>
>>>> Summary: Workflow requesting key entry "xxx" from example.loc
>>>> (a) example.loc empty, gets empty path, fails cleanly.
>>>> (b) example.loc contains xxx, gets correct path, works.
>>>> (c) example.loc non empty but lacks xxx, runs with WRONG path.
>>>>
>>>> Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Workflow bug when *.loc entries are missing

John Chilton-4
I think it is probably a pair of bugs you are seeing:

The wrong database being selected is probably this:
https://trello.com/c/72WuZ8mu. It would be interesting to know if
reverting this pull request
https://bitbucket.org/galaxy/galaxy-central/pull-request/433/fix-allow-editing-workflows-on-the-fly-for/diff
fixes it - that would be a good indication this is the problem.

The tool running even though no valid inputs are found - is a sort of
known-ish issue but I cannot find any Trello card on it. I know I have
some WIP on a fix here:
https://github.com/jmchilton/galaxy/commit/900b7bbbf7f0084a86da9b737967fd8eb9d9f5fe

... and a related failing test case here.
https://github.com/galaxyproject/galaxy/blob/dev/test/api/test_tools.py#L223

The fix unfortunately is messy and will likely break stuff.

-John

On Tue, Mar 31, 2015 at 1:03 PM, Peter Cock <[hidden email]> wrote:

> Thanks Dan,
>
> That's very reassuring - will you file a Trello issue, or should I?
>
> Peter
>
>
> On Tue, Mar 31, 2015 at 5:56 PM, Daniel Blankenberg <[hidden email]> wrote:
>>
>> Hi Peter,
>>
>> I just wanted to chime in and say that I have been able to reproduce
>> this issue locally (on a different tool), and I’d consider it a pretty
>> significant bug for reproducibility, etc.
>>
>> Thanks,
>>
>> Dan
>
>>>> On Tue, Mar 31, 2015 at 11:31 AM, Peter Cock <[hidden email]> wrote:
>>>>> ...
>>>>>
>>>>> Summary: Workflow requesting key entry "xxx" from example.loc
>>>>> (a) example.loc empty, gets empty path, fails cleanly.
>>>>> (b) example.loc contains xxx, gets correct path, works.
>>>>> (c) example.loc non empty but lacks xxx, runs with WRONG path.
>>>>>
>>>>> Peter
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Workflow bug when *.loc entries are missing

Peter Cock
On Tue, Mar 31, 2015 at 6:12 PM, John Chilton <[hidden email]> wrote:
> I think it is probably a pair of bugs you are seeing:
>

Having tried the revert, yes, I think so too.

> The wrong database being selected is probably this:
> https://trello.com/c/72WuZ8mu. It would be interesting to know if
> reverting this pull request
> https://bitbucket.org/galaxy/galaxy-central/pull-request/433/fix-allow-editing-workflows-on-the-fly-for/diff
> fixes it - that would be a good indication this is the problem.

Looking at your comments on BitBucket, it was this commit in particular:
https://bitbucket.org/galaxy/galaxy-central/commits/757412c63654ff16f6cd6a0b6ff776b25e0b5b03

i.e. The commit also known as:
https://github.com/galaxyproject/galaxy/commit/667c04844e35e76a698161fff6c88cb03be8396a

Unfortunately a clean revert is not possible,

$ git revert -n 667c04844e35e76a698161fff6c88cb03be8396a
warning: too many files (created: 778 deleted: 393), skipping inexact
rename detection
Automatic revert failed.  After resolving the conflicts,
mark the corrected paths with 'git add <paths>' or 'git rm <paths>'
and commit the result.

This seems to work (after resetting the symlink static/scripts/packed
which seemed to have been a recent change):

https://github.com/peterjc/galaxy/commit/d6a2ded29c2bc404a0375fe4f6111311eea8a412

i.e. With this change, my workflow trying to use "nr" no longer
defaults to the first entry in blastdb_p.loc if "nr" was missing.

Instead, as with an empty blastdb_p.loc, the workflow runs with
an empty path for the database (and blastx fails cleanly).

This is MUCH better, since the workflow now fails rather than runs
and gives misleading data.

> The tool running even though no valid inputs are found - is a sort of
> known-ish issue but I cannot find any Trello card on it. I know I have
> some WIP on a fix here:
> https://github.com/jmchilton/galaxy/commit/900b7bbbf7f0084a86da9b737967fd8eb9d9f5fe
>
> ... and a related failing test case here.
> https://github.com/galaxyproject/galaxy/blob/dev/test/api/test_tools.py#L223
>
> The fix unfortunately is messy and will likely break stuff.
>
> -John

Running tools anyway where the expected example.loc entry is
missing is bad, but most tools would fail cleanly when given an
empty path - so this is less critical than the first half of the bug.

i.e. Can we address the side effects of Saket's commit?:

https://github.com/galaxyproject/galaxy/commit/667c04844e35e76a698161fff6c88cb03be8396a
https://bitbucket.org/galaxy/galaxy-central/commits/757412c63654ff16f6cd6a0b6ff776b25e0b5b03

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Workflow bug when *.loc entries are missing

saketkc
On 1 April 2015 at 02:06, Peter Cock <[hidden email]> wrote:

> On Tue, Mar 31, 2015 at 6:12 PM, John Chilton <[hidden email]> wrote:
>> I think it is probably a pair of bugs you are seeing:
>>
>
> Having tried the revert, yes, I think so too.
>
>> The wrong database being selected is probably this:
>> https://trello.com/c/72WuZ8mu. It would be interesting to know if
>> reverting this pull request
>> https://bitbucket.org/galaxy/galaxy-central/pull-request/433/fix-allow-editing-workflows-on-the-fly-for/diff
>> fixes it - that would be a good indication this is the problem.
>
> Looking at your comments on BitBucket, it was this commit in particular:
> https://bitbucket.org/galaxy/galaxy-central/commits/757412c63654ff16f6cd6a0b6ff776b25e0b5b03
>
> i.e. The commit also known as:
> https://github.com/galaxyproject/galaxy/commit/667c04844e35e76a698161fff6c88cb03be8396a
>
> Unfortunately a clean revert is not possible,
>
> $ git revert -n 667c04844e35e76a698161fff6c88cb03be8396a
> warning: too many files (created: 778 deleted: 393), skipping inexact
> rename detection
> Automatic revert failed.  After resolving the conflicts,
> mark the corrected paths with 'git add <paths>' or 'git rm <paths>'
> and commit the result.
>
> This seems to work (after resetting the symlink static/scripts/packed
> which seemed to have been a recent change):
>
> https://github.com/peterjc/galaxy/commit/d6a2ded29c2bc404a0375fe4f6111311eea8a412
>
> i.e. With this change, my workflow trying to use "nr" no longer
> defaults to the first entry in blastdb_p.loc if "nr" was missing.
>
> Instead, as with an empty blastdb_p.loc, the workflow runs with
> an empty path for the database (and blastx fails cleanly).
>
> This is MUCH better, since the workflow now fails rather than runs
> and gives misleading data.
>
>> The tool running even though no valid inputs are found - is a sort of
>> known-ish issue but I cannot find any Trello card on it. I know I have
>> some WIP on a fix here:
>> https://github.com/jmchilton/galaxy/commit/900b7bbbf7f0084a86da9b737967fd8eb9d9f5fe
>>
>> ... and a related failing test case here.
>> https://github.com/galaxyproject/galaxy/blob/dev/test/api/test_tools.py#L223
>>
>> The fix unfortunately is messy and will likely break stuff.
>>
>> -John
>
> Running tools anyway where the expected example.loc entry is
> missing is bad, but most tools would fail cleanly when given an
> empty path - so this is less critical than the first half of the bug.
>
> i.e. Can we address the side effects of Saket's commit?:
>
> https://github.com/galaxyproject/galaxy/commit/667c04844e35e76a698161fff6c88cb03be8396a
> https://bitbucket.org/galaxy/galaxy-central/commits/757412c63654ff16f6cd6a0b6ff776b25e0b5b03
>

Hi Peter,

Sorry, this really is a major bug. I agree this was pushed without
much testing. In fact, I did not at all consider the use case you just
pointed out. I can send a PR that simply reverts the change, since I
am not sure I will have enough time to look into it deeper.

Looks like I have broken more things than what I have contributed.

Saket

> Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Workflow bug when *.loc entries are missing

Peter Cock
On Wed, Apr 1, 2015 at 11:29 AM, Saket Choudhary <[hidden email]> wrote:

> On 1 April 2015 at 02:06, Peter Cock <[hidden email]> wrote:
>> On Tue, Mar 31, 2015 at 6:12 PM, John Chilton <[hidden email]> wrote:
>>> I think it is probably a pair of bugs you are seeing:
>>>
>>> ...
>>>
>>> The fix unfortunately is messy and will likely break stuff.
>>>
>>> -John
>>
>> Running tools anyway where the expected example.loc entry is
>> missing is bad, but most tools would fail cleanly when given an
>> empty path - so this is less critical than the first half of the bug.
>>
>> i.e. Can we address the side effects of Saket's commit?:
>>
>> https://github.com/galaxyproject/galaxy/commit/667c04844e35e76a698161fff6c88cb03be8396a
>> https://bitbucket.org/galaxy/galaxy-central/commits/757412c63654ff16f6cd6a0b6ff776b25e0b5b03
>>
>
> Hi Peter,
>
> Sorry, this really is a major bug. I agree this was pushed without
> much testing. In fact, I did not at all consider the use case you just
> pointed out. I can send a PR that simply reverts the change, since I
> am not sure I will have enough time to look into it deeper.
>
> Looks like I have broken more things than what I have contributed.
>
> Saket

Thanks Saket,

I didn't mean to criticize - only to ensure you were aware of this
(in case you had any insight about how to fix it). Galaxy is amazingly
complicated, so subtle side-effects like this can be hard to avoid -
especially on the interactive side where testing is harder.

[We don't yet have a proper test framework for workflows - although
John has been working on that.]

I don't understand this area of Galaxy at all (mako templates etc),
so I think we'll need to defer to the full time developers who know
this bit best.

Regards,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Workflow bug when *.loc entries are missing

John Chilton-4
On Wed, Apr 1, 2015 at 6:50 AM, Peter Cock <[hidden email]> wrote:

> On Wed, Apr 1, 2015 at 11:29 AM, Saket Choudhary <[hidden email]> wrote:
>> On 1 April 2015 at 02:06, Peter Cock <[hidden email]> wrote:
>>> On Tue, Mar 31, 2015 at 6:12 PM, John Chilton <[hidden email]> wrote:
>>>> I think it is probably a pair of bugs you are seeing:
>>>>
>>>> ...
>>>>
>>>> The fix unfortunately is messy and will likely break stuff.
>>>>
>>>> -John
>>>
>>> Running tools anyway where the expected example.loc entry is
>>> missing is bad, but most tools would fail cleanly when given an
>>> empty path - so this is less critical than the first half of the bug.
>>>
>>> i.e. Can we address the side effects of Saket's commit?:
>>>
>>> https://github.com/galaxyproject/galaxy/commit/667c04844e35e76a698161fff6c88cb03be8396a
>>> https://bitbucket.org/galaxy/galaxy-central/commits/757412c63654ff16f6cd6a0b6ff776b25e0b5b03
>>>
>>
>> Hi Peter,
>>
>> Sorry, this really is a major bug. I agree this was pushed without
>> much testing. In fact, I did not at all consider the use case you just
>> pointed out. I can send a PR that simply reverts the change, since I
>> am not sure I will have enough time to look into it deeper.
>>
>> Looks like I have broken more things than what I have contributed.
>>
>> Saket
>
> Thanks Saket,
>
> I didn't mean to criticize - only to ensure you were aware of this
> (in case you had any insight about how to fix it). Galaxy is amazingly
> complicated, so subtle side-effects like this can be hard to avoid -
> especially on the interactive side where testing is harder.
>
> [We don't yet have a proper test framework for workflows - although
> John has been working on that.]
>
> I don't understand this area of Galaxy at all (mako templates etc),
> so I think we'll need to defer to the full time developers who know
> this bit best.

Definitely - I will definitely take a crack at fixing it - though I
cannot do it this week. Monday I will set aside some to try to tweak
it so the fix for radio buttons will still work without actually
resubmitting every workflow parameter - if I cannot fix it I will
revert the commit.

-John

P.S.

I am with Peter on this Saket - I wouldn't feel bad - this is a really
subtle bug. I do know that feeling of being worried the bugs I have
added to Galaxy greatly out number the features though. I know that
you have not actually done this - but I have added so many more bugs
than you that I am less sure about myself.


>
> Regards,
>
> Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Workflow bug when *.loc entries are missing

Peter Cock
John,

I've filed a Trello card for this (please edit as you see fit):

https://trello.com/c/lkYlW14W/2615-workflows-break-when-select-parameter-loc-entry-is-missing

Peter

On Thu, Apr 2, 2015 at 2:11 PM, John Chilton <[hidden email]> wrote:
>
> Definitely - I will definitely take a crack at fixing it - though I
> cannot do it this week. Monday I will set aside some to try to tweak
> it so the fix for radio buttons will still work without actually
> resubmitting every workflow parameter - if I cannot fix it I will
> revert the commit.
>
> -John
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/