inconsistent state set for workflow submissions [using bioblend]

classic Classic list List threaded Threaded
2 messages Options
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

inconsistent state set for workflow submissions [using bioblend]

Aarthi Mohan
Hi all, 


I have a local galaxy instance with basic configuration, and automated workflow submission to galaxy using script written with bioblend. 

Steps within the script:
  • step1: data upload
  • step2: dataset_collection creation
  • step3: workflow submission
  • step4: every 5 minutes, poll galaxy for the history status; if state == "ok", workflow execution was successful
  • step5: download the datasets marked available. 

In my case, when I do a workflow submissions in parallel with different datasets, the status is immediately set to "ok" for few of them after dataset upload and control returns (but galaxy has "queued" the execution when checked from UI). So for these submissions, although it did not fail or raise error, the status is misleading. I didn't find anything strange happening from the galaxy logs as well

My question specifically are,
1. Am I correct in using the "state" from the following call to say that the execution has done? It has worked for the successfully completed submissions. 
historyClient.get_status(history_id)['state'] 
2. Should I use a production setup? Strangely, none of these jobs fail and I can see them finishing in the UI, except that their "state" is set to "ok" and hence the script ends. 

Appreciate your thoughts in this.

Galaxy version 15.03. 

Thanks for your help!

Best regards,
Aarthi

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: inconsistent state set for workflow submissions [using bioblend]

John Chilton-4
The history state being 'ok' isn't a great metric for determining if
the workflow is complete. The history state essentially only tells you
if there are datasets of certain states in the history. At the start
of the workflow - the invocation may be backgrounded and getting ready
to run so there may be no pending datasets known to Galaxy yet and so
the state may be okay. The workflow invocation may be paused waiting
on some condition and not creating datasets yet.

My strategy for monitoring workflow invocations is to:
 - First wait for the invocation to be completely scheduled.
 -Then wait for all the history

Here is some sample code in bioblend for waiting on the workflow
invocation state:

https://github.com/galaxyproject/bioblend/blob/master/bioblend/_tests/TestGalaxyWorkflows.py#L57

Here is some code in the test framework that waits on workflows by
waiting on workflow invocations and then the history state:

https://github.com/galaxyproject/galaxy/blob/dev/test/base/populators.py#L275
https://github.com/galaxyproject/galaxy/blob/dev/test/api/test_workflows.py#L1203

Even this is still probably not "the right" thing to do - I guess we
should be waiting on the jobs instead of the history.  But my guess is
waiting on the invocation will fix your problems... hopefully anyway.

Good luck and thanks for using Galaxy. Let us know how it goes.

-John






On Mon, Jul 10, 2017 at 1:17 AM, Aarthi Mohan <[hidden email]> wrote:

> Hi all,
>
>
> I have a local galaxy instance with basic configuration, and automated
> workflow submission to galaxy using script written with bioblend.
>
> Steps within the script:
>
> step1: data upload
> step2: dataset_collection creation
> step3: workflow submission
> step4: every 5 minutes, poll galaxy for the history status; if state ==
> "ok", workflow execution was successful
> step5: download the datasets marked available.
>
>
> In my case, when I do a workflow submissions in parallel with different
> datasets, the status is immediately set to "ok" for few of them after
> dataset upload and control returns (but galaxy has "queued" the execution
> when checked from UI). So for these submissions, although it did not fail or
> raise error, the status is misleading. I didn't find anything strange
> happening from the galaxy logs as well
>
> My question specifically are,
> 1. Am I correct in using the "state" from the following call to say that the
> execution has done? It has worked for the successfully completed
> submissions.
>>
>> historyClient.get_status(history_id)['state']
>
> 2. Should I use a production setup? Strangely, none of these jobs fail and I
> can see them finishing in the UI, except that their "state" is set to "ok"
> and hence the script ends.
>
> Appreciate your thoughts in this.
>
> Galaxy version 15.03.
>
> Thanks for your help!
>
> Best regards,
> Aarthi
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
Loading...