Determining datatype inheritance in tool XML Cheetah

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Determining datatype inheritance in tool XML Cheetah

Peter Cock
Hi all,

I've just uploaded a simple sequence composition tool to the
Test Tool Shed:

https://testtoolshed.g2.bx.psu.edu/view/peterjc/seq_composition
https://github.com/peterjc/pico_galaxy/commit/45669446f5a14fd90a8a0d9d7430499de2fb3493

This accepts multiple input in FASTA, FASTQ, or SFF format -
and allows a mixture of these:

<inputs>
    <param name="input_file" type="data" format="fasta,fastq,sff"
multiple="true" label="Sequence file" help="FASTA, FASTQ, or SFF
format." />
</inputs>

In order to build the command line string, I am currently using this
for loop:

<command interpreter="python">
seq_composition.py -o "$output_file"
##For loop over inputs
#for i in $input_file
--$i.ext "${i}"
#end for
</command>

This results in things like this being run:

seq_composition.py -o XXX.dat --fastqsanger XXX.dat --sff XXX.dat

This works, but means my Python script has to know about not just
the core data types that I specified in my input parameter XML
(fasta,fastq,sff) but also any subclasses (e.g. fastqsanger).

It seems what I want/need would be something along these lines
in pseudo-code to map any datatype which is a subclass for fastq
to use a single command line option:

<command interpreter="python">
seq_composition.py -o "$output_file"
##For loop over inputs
#for i in $input_file
#if isinstance($i.datatype, fastq):
--fastq "${i}"
#else
--$i.ext "${i}"
#end if
#end for
</command>

This mock example borrows from the Python isinstance function,
but of course some Galaxy datatypes are defined as subclasses
at the XML level rather than literally at the Python class level.

This should result in getting the following regardless of which
flavour of FASTQ the input dataset had assigned:

seq_composition.py -o XXX.dat --fastq XXX.dat --sff XXX.dat

Does anyone have any Tool XML examples probing an input file's
datatype in this way?

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Determining datatype inheritance in tool XML Cheetah

John Chilton-4
Fun question! I have opened a pull request with my answer -
https://bitbucket.org/galaxy/galaxy-central/pull-request/457/allow-cheetah-tool-templates-to-reason/diff.

There are three different hacks you can use right now... here is a
diff against tools/filters/catWrapper.xml I was using the to test them
- all of them require more about the internals of Galaxy then I really
think should be exposed to the tool (or tool author).

diff --git a/tools/filters/catWrapper.xml b/tools/filters/catWrapper.xml
index ec52ba8..060362b 100644
--- a/tools/filters/catWrapper.xml
+++ b/tools/filters/catWrapper.xml
@@ -7,6 +7,11 @@
         #for $q in $queries
             ${q.input2}
         #end for
+        #import galaxy.datatypes.sequence
+        ; echo "${isinstance($input1.datatype,
galaxy.datatypes.sequence.Fastq )}"
+        ; echo
"$input1.datatype.matches_any([galaxy.datatypes.sequence.Fastq])"
+        ; echo "$input1.datatype.matches_any([
$__app__.datatypes_registry.get_datatype_by_extension( 'fastq' )])"
+        ; echo "$input1.is_of_type( 'fastq' )" <!-- Doesn't work yet -->
     </command>
     <inputs>
         <param name="input1" type="data" label="Concatenate Dataset"/>

I think the last variant of this is what you want though
$input.is_of_type( ext ). You don't need to know the full module path
to the parent type - you are referring to it using the same extension
the rest of the tool uses and it doesn't require the use of $__app__
which... well we shouldn't be exposing to tools - it is not safe and
is a hindrance to ensuring backward compatibility.

Hope this helps.

-John


On Tue, Aug 12, 2014 at 11:53 AM, Peter Cock <[hidden email]> wrote:

> Hi all,
>
> I've just uploaded a simple sequence composition tool to the
> Test Tool Shed:
>
> https://testtoolshed.g2.bx.psu.edu/view/peterjc/seq_composition
> https://github.com/peterjc/pico_galaxy/commit/45669446f5a14fd90a8a0d9d7430499de2fb3493
>
> This accepts multiple input in FASTA, FASTQ, or SFF format -
> and allows a mixture of these:
>
> <inputs>
>     <param name="input_file" type="data" format="fasta,fastq,sff"
> multiple="true" label="Sequence file" help="FASTA, FASTQ, or SFF
> format." />
> </inputs>
>
> In order to build the command line string, I am currently using this
> for loop:
>
> <command interpreter="python">
> seq_composition.py -o "$output_file"
> ##For loop over inputs
> #for i in $input_file
> --$i.ext "${i}"
> #end for
> </command>
>
> This results in things like this being run:
>
> seq_composition.py -o XXX.dat --fastqsanger XXX.dat --sff XXX.dat
>
> This works, but means my Python script has to know about not just
> the core data types that I specified in my input parameter XML
> (fasta,fastq,sff) but also any subclasses (e.g. fastqsanger).
>
> It seems what I want/need would be something along these lines
> in pseudo-code to map any datatype which is a subclass for fastq
> to use a single command line option:
>
> <command interpreter="python">
> seq_composition.py -o "$output_file"
> ##For loop over inputs
> #for i in $input_file
> #if isinstance($i.datatype, fastq):
> --fastq "${i}"
> #else
> --$i.ext "${i}"
> #end if
> #end for
> </command>
>
> This mock example borrows from the Python isinstance function,
> but of course some Galaxy datatypes are defined as subclasses
> at the XML level rather than literally at the Python class level.
>
> This should result in getting the following regardless of which
> flavour of FASTQ the input dataset had assigned:
>
> seq_composition.py -o XXX.dat --fastq XXX.dat --sff XXX.dat
>
> Does anyone have any Tool XML examples probing an input file's
> datatype in this way?
>
> Peter
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Determining datatype inheritance in tool XML Cheetah

Peter Cock
On Tue, Aug 12, 2014 at 5:31 PM, John Chilton <[hidden email]> wrote:

> Fun question! I have opened a pull request with my answer -
> https://bitbucket.org/galaxy/galaxy-central/pull-request/457/allow-cheetah-tool-templates-to-reason/diff.
>
> There are three different hacks you can use right now... here is a
> diff against tools/filters/catWrapper.xml I was using the to test them
> - all of them require more about the internals of Galaxy then I really
> think should be exposed to the tool (or tool author).
>
> diff --git a/tools/filters/catWrapper.xml b/tools/filters/catWrapper.xml
> index ec52ba8..060362b 100644
> --- a/tools/filters/catWrapper.xml
> +++ b/tools/filters/catWrapper.xml
> @@ -7,6 +7,11 @@
>          #for $q in $queries
>              ${q.input2}
>          #end for
> +        #import galaxy.datatypes.sequence
> +        ; echo "${isinstance($input1.datatype,
> galaxy.datatypes.sequence.Fastq )}"
> +        ; echo
> "$input1.datatype.matches_any([galaxy.datatypes.sequence.Fastq])"
> +        ; echo "$input1.datatype.matches_any([
> $__app__.datatypes_registry.get_datatype_by_extension( 'fastq' )])"
> +        ; echo "$input1.is_of_type( 'fastq' )" <!-- Doesn't work yet -->
>      </command>
>      <inputs>
>          <param name="input1" type="data" label="Concatenate Dataset"/>
>
> I think the last variant of this is what you want though
> $input.is_of_type( ext ). You don't need to know the full module path
> to the parent type - you are referring to it using the same extension
> the rest of the tool uses and it doesn't require the use of $__app__
> which... well we shouldn't be exposing to tools - it is not safe and
> is a hindrance to ensuring backward compatibility.
>
> Hope this helps.

That looks good John :)

I had considered something like your first hack using isinstance,
but much prefer your proposed $input.is_of_type(ext) solution :)

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/