Tools with multiple input files, and multiple output files

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Tools with multiple input files, and multiple output files

Peter Cock
Hi all,

I'm looking for examples of tools which take multiple input
files (one or more, determined at run time) and produce
multiple output files (one for each input file). Any
specific suggestions?

I have a number of sequence filtering/renaming tools
where this might be useful - in some cases taking
multiple input files and producing a single output is
fine, but in general I'd like to know how to preserve a
one to one mapping from input files to output files.

I realise this may overlap slightly with the work John is
doing on dataset collections, but for now I'd like to target
the current Galaxy feature set.

In some of the simpler cases, if I have N input datasets
and want N output files, I can just run the tool N times .
This means more steps in the Galaxy GUI, but it isn't
very complicated.

However, for the current problem I need access to all
the inputs at once for setting overall data derived
parameters.

Regards,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Tools with multiple input files, and multiple output files

John Chilton-3
Hey Peter,

Have you seen this solution?

https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_of_Output_datasets_cannot_be_determined_until_tool_run

It always seems to get mentioned when this topic is brought up. It has
serious limitations in terms or workflow running, but it can probably
be made to work for individual tool executions.

Otherwise I would wait for future feature sets :), maybe other people
have some good ideas however.

-John

On Wed, Feb 19, 2014 at 7:16 AM, Peter Cock <[hidden email]> wrote:

> Hi all,
>
> I'm looking for examples of tools which take multiple input
> files (one or more, determined at run time) and produce
> multiple output files (one for each input file). Any
> specific suggestions?
>
> I have a number of sequence filtering/renaming tools
> where this might be useful - in some cases taking
> multiple input files and producing a single output is
> fine, but in general I'd like to know how to preserve a
> one to one mapping from input files to output files.
>
> I realise this may overlap slightly with the work John is
> doing on dataset collections, but for now I'd like to target
> the current Galaxy feature set.
>
> In some of the simpler cases, if I have N input datasets
> and want N output files, I can just run the tool N times .
> This means more steps in the Galaxy GUI, but it isn't
> very complicated.
>
> However, for the current problem I need access to all
> the inputs at once for setting overall data derived
> parameters.
>
> Regards,
>
> Peter
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Tools with multiple input files, and multiple output files

Peter Cock
On Wed, Feb 19, 2014 at 2:16 PM, John Chilton <[hidden email]> wrote:
> Hey Peter,
>
> Have you seen this solution?
>
> https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_of_Output_datasets_cannot_be_determined_until_tool_run
>
> It always seems to get mentioned when this topic is brought up. It has
> serious limitations in terms or workflow running, but it can probably
> be made to work for individual tool executions.

Hmm. Any real life examples of this in use? I guess looking for
the magic string $__new_file_path__ in the <command> tag
is the only way to identify this?

> Otherwise I would wait for future feature sets :), maybe other people
> have some good ideas however.

I'll ponder this a bit more then... thanks.

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Tools with multiple input files, and multiple output files

cjav
On Wednesday, February 19, 2014, Peter Cock <[hidden email]> wrote:
On Wed, Feb 19, 2014 at 2:16 PM, John Chilton <<a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;jmchilton@bx.psu.edu&#39;)">jmchilton@...> wrote:
> Hey Peter,
>
> Have you seen this solution?
>
> https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_of_Output_datasets_cannot_be_determined_until_tool_run
>
> It always seems to get mentioned when this topic is brought up. It has
> serious limitations in terms or workflow running, but it can probably
> be made to work for individual tool executions.

Hmm. Any real life examples of this in use? I guess looking for
the magic string $__new_file_path__ in the <command> tag
is the only way to identify this?


Hi Peter,

I used the current functionality here:

The code for the underlining tool is available here:

Hope it helps,
Carlos 

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Tools with multiple input files, and multiple output files

Michael Crusoe

FYI: it is my understanding that such tools cannot be used in a workflow at this time.

On Feb 19, 2014 10:08 AM, "Carlos Borroto" <[hidden email]> wrote:
On Wednesday, February 19, 2014, Peter Cock <[hidden email]> wrote:
On Wed, Feb 19, 2014 at 2:16 PM, John Chilton <[hidden email]> wrote:
> Hey Peter,
>
> Have you seen this solution?
>
> https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_of_Output_datasets_cannot_be_determined_until_tool_run
>
> It always seems to get mentioned when this topic is brought up. It has
> serious limitations in terms or workflow running, but it can probably
> be made to work for individual tool executions.

Hmm. Any real life examples of this in use? I guess looking for
the magic string $__new_file_path__ in the <command> tag
is the only way to identify this?


Hi Peter,

I used the current functionality here:

The code for the underlining tool is available here:

Hope it helps,
Carlos 

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Tools with multiple input files, and multiple output files

Peter Cock
> On Feb 19, 2014 10:08 AM, "Carlos Borroto" <[hidden email]> wrote:
>>
>> Hi Peter,
>>
>> I used the current functionality here:
>> http://toolshed.g2.bx.psu.edu/view/cjav/split_by_barcode
>>
>> The code for the underlining tool is available here:
>> https://github.com/cjav/ngs-tools
>>
>> Hope it helps,
>> Carlos

Thanks - a non-trivial example to study always helps :)

On Wed, Feb 19, 2014 at 3:33 PM, Michael Crusoe
<[hidden email]> wrote:
> FYI: it is my understanding that such tools cannot be used in a workflow at
> this time.

That would be a problem :(

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Tools with multiple input files, and multiple output files

cjav
On Wed, Feb 19, 2014 at 11:04 AM, Peter Cock <[hidden email]> wrote:
> On Wed, Feb 19, 2014 at 3:33 PM, Michael Crusoe
> <[hidden email]> wrote:
>> FYI: it is my understanding that such tools cannot be used in a workflow at
>> this time.
>
> That would be a problem :(

Well, I haven't checked but I don't see why it would be a problem if
your tool is the last one in the workflow. It cannot be an
intermediary step as there is no easy way to link the outputs of a
this kind of tool to the next tool.

--Carlos
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Tools with multiple input files, and multiple output files

Jim Johnson-3
In reply to this post by Peter Cock
I suggested augmenting the tool_conf syntax as part of the DataCollection development.

To replace the need for the the multiple output determined at runtime,
I suggest being able to declare data collections within the outputs  tags,  and being able to use regular expressions in the from_work_dir  param to populate the collections.
In workflows, one would want to be able to hook a data collection output to a data input.


Mothur Metagenomics tool that has an output per distance label and calculator method

An example of declaring a list of outputs, which will determined at run time based on from_work_dir regular expression:
<tool id="mothur_classify_otu" name="Classify.otu" version="1.20.0" force_history_refresh="True">
   ...
   <outputs>
     <dataset_collection type="list" label="${tool.name} on ${on_string} consensus taxonomies">
       <data format="cons.taxonomy" name="splicing_diff" label="${tool.name} on ${on_string}: ${file_name}" from_work_dir="^\S+?\.(unique|[0-9.]*\.cons\.taxonomy)$" />
     </dataset_collection>
     <dataset_collection type="list" label="{tool.name} on ${on_string} taxomy summaries">
       <data format="cons.taxonomy" name="splicing_diff" label="${tool.name} on ${on_string}: ${file_name}" from_work_dir="^\S+?\.(unique|[0-9.]*\.cons\.tax\.summary)$" />
     </dataset_collection>
   </outputs>


> Hey Peter,
>
> Have you seen this solution?
>
> https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_of_Output_datasets_cannot_be_determined_until_tool_run
>
> It always seems to get mentioned when this topic is brought up. It has
> serious limitations in terms or workflow running, but it can probably
> be made to work for individual tool executions.
>
> Otherwise I would wait for future feature sets, maybe other people
> have some good ideas however.
>
> -John
>
> On Wed, Feb 19, 2014 at 7:16 AM, Peter Cock<[hidden email]>  wrote:
>> Hi all,
>>
>> I'm looking for examples of tools which take multiple input
>> files (one or more, determined at run time) and produce
>> multiple output files (one for each input file). Any
>> specific suggestions?
>>
>> I have a number of sequence filtering/renaming tools
>> where this might be useful - in some cases taking
>> multiple input files and producing a single output is
>> fine, but in general I'd like to know how to preserve a
>> one to one mapping from input files to output files.
>>
>> I realise this may overlap slightly with the work John is
>> doing on dataset collections, but for now I'd like to target
>> the current Galaxy feature set.
>>
>> In some of the simpler cases, if I have N input datasets
>> and want N output files, I can just run the tool N times .
>> This means more steps in the Galaxy GUI, but it isn't
>> very complicated.
>>
>> However, for the current problem I need access to all
>> the inputs at once for setting overall data derived
>> parameters.
>>
>> Regards,
>>
>> Peter
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>    http://lists.bx.psu.edu/
>>
>> To search Galaxy mailing lists use the unified search at:
>>    http://galaxyproject.org/search/mailinglists/



--
James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/