Data Tables and *.loc files: Using named columns versus from_data_table

classic Classic list List threaded Threaded
3 messages Options
| Threaded
Open this post in threaded view
|

Data Tables and *.loc files: Using named columns versus from_data_table

Peter Cock
Hi all,

In discussion about adding an NCBI BLAST data manager
https://github.com/peterjc/galaxy_blast/issues/22 based on
Dan's example, Michael Li has suggested using the new(ish)
Data Table functionality of Galaxy for using *.loc files:
https://wiki.galaxyproject.org/Admin/Tools/Data%20Tables

Currently the BLAST+ wrappers access the blastdb.loc file
for picking a system installed nucleotide BLAST database
like this:

    <param name="database" type="select" label="Nucleotide BLAST database">
        <options from_file="blastdb.loc">
            <column name="value" index="0"/>
            <column name="name" index="1"/>
            <column name="path" index="2"/>
        </options>
    </param>

See https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/ncbi_macros.xml

With the from_data_table feature this which would be much shorter:

    <param name="database" type="select" label="Nucleotide BLAST database">
        <options from_data_table="blastdb" />
    </param>

For this to work, the column information must instead be
defined centrally in ``tool_data_table_conf.xml`` (via a
``tool_data_table_conf.xml.sample`` file), e.g.

    <table name="blastdb" comment_char="#">
        <columns>value, name, path</columns>
        <file path="tool-data/blastdb.loc" />
    </table>

For simple tools this seems quite neat, but within a single tool
suite using XML macros seems equally effective for centrally
defining the columns in the *.loc files (we do this currently).

However, what worries me is the data table XML configuration
file adds a new complexity for dependency management between
different ToolShed repositories using a *.loc file (like the *.loc
files for BLAST databases).

For the BLAST database *.loc files, the simplest solution seems
to be not to use the Data Tables feature (as we do now).

The next best solution seems to be to put the sample *.loc files
and associated data table definition XML files into a shared
ToolShed repository (called called blast_data_tables, or
blast_databases?) which would be declared as a dependency
of anything using the BLAST database *.loc files (e.g. the
BLAST+ wrappers and any data managers).

[This would be like the existing blast_datatypes ToolShed
repository which is a declared dependency of many tools
using BLAST]

Is that a good plan? What benefits does it have over simply
not using the Data Table functionality?

Thanks,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: Data Tables and *.loc files: Using named columns versus from_data_table

Daniel Blankenberg
Hi Peter,

Having a standalone repository that just contained the tool data table and .loc file that could be a dependency of other repositories would be a good way to go here. Unfortunately, this isn’t supported right now. I’ve opened a trello card for this: https://trello.com/c/VZxV08Qt

However, even though you currently need to include the tool data table definition and .loc sample in each repository in order for the tool to be valid, it is still a best practice to use tool data tables.


Thanks,

Dan

On Apr 9, 2014, at 7:04 AM, Peter Cock <[hidden email]> wrote:

> Hi all,
>
> In discussion about adding an NCBI BLAST data manager
> https://github.com/peterjc/galaxy_blast/issues/22 based on
> Dan's example, Michael Li has suggested using the new(ish)
> Data Table functionality of Galaxy for using *.loc files:
> https://wiki.galaxyproject.org/Admin/Tools/Data%20Tables
>
> Currently the BLAST+ wrappers access the blastdb.loc file
> for picking a system installed nucleotide BLAST database
> like this:
>
>    <param name="database" type="select" label="Nucleotide BLAST database">
>        <options from_file="blastdb.loc">
>            <column name="value" index="0"/>
>            <column name="name" index="1"/>
>            <column name="path" index="2"/>
>        </options>
>    </param>
>
> See https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/ncbi_macros.xml
>
> With the from_data_table feature this which would be much shorter:
>
>    <param name="database" type="select" label="Nucleotide BLAST database">
>        <options from_data_table="blastdb" />
>    </param>
>
> For this to work, the column information must instead be
> defined centrally in ``tool_data_table_conf.xml`` (via a
> ``tool_data_table_conf.xml.sample`` file), e.g.
>
>    <table name="blastdb" comment_char="#">
>        <columns>value, name, path</columns>
>        <file path="tool-data/blastdb.loc" />
>    </table>
>
> For simple tools this seems quite neat, but within a single tool
> suite using XML macros seems equally effective for centrally
> defining the columns in the *.loc files (we do this currently).
>
> However, what worries me is the data table XML configuration
> file adds a new complexity for dependency management between
> different ToolShed repositories using a *.loc file (like the *.loc
> files for BLAST databases).
>
> For the BLAST database *.loc files, the simplest solution seems
> to be not to use the Data Tables feature (as we do now).
>
> The next best solution seems to be to put the sample *.loc files
> and associated data table definition XML files into a shared
> ToolShed repository (called called blast_data_tables, or
> blast_databases?) which would be declared as a dependency
> of anything using the BLAST database *.loc files (e.g. the
> BLAST+ wrappers and any data managers).
>
> [This would be like the existing blast_datatypes ToolShed
> repository which is a declared dependency of many tools
> using BLAST]
>
> Is that a good plan? What benefits does it have over simply
> not using the Data Table functionality?
>
> Thanks,
>
> Peter
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: Data Tables and *.loc files: Using named columns versus from_data_table

Peter Cock
On Wed, Apr 9, 2014 at 4:14 PM, Daniel Blankenberg <[hidden email]> wrote:

> Hi Peter,
>
> Having a standalone repository that just contained the tool data table
> and .loc file that could be a dependency of other repositories would
> be a good way to go here. Unfortunately, this isn’t supported right
> now. I’ve opened a trello card for this: https://trello.com/c/VZxV08Qt
>
> However, even though you currently need to include the tool data table
> definition and .loc sample in each repository in order for the tool to be
> valid, it is still a best practice to use tool data tables.

OK, thanks Dan.

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/