a question on data manager and data table

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

a question on data manager and data table

Rui Wang
Hi All,

I just made a new local instance, installed the fetch_all_fasta data manager, downloaded mm9 fasta. Then I noticed the following:

in tool_data_table_conf.xml, it has:

    <table name="all_fasta" comment_char="#">
        <columns>value, dbkey, name, path</columns>
        <file path="tool-data/all_fasta.loc" />
    </table>

however in shed_tool_data_table_conf.xml, it has:

<table comment_char="#" name="all_fasta">
        <columns>value, dbkey, name, path</columns>
        <file path="/auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_all_fasta/cca219f2b212/all_fasta.loc" />
    <tool_shed_repository><tool_shed>toolshed.g2.bx.psu.edu</tool_shed><repository_name>data_manager_fetch_genome_all_fasta</repository_name><repository_owner>devteam</repository_owner><installed_changeset_revision>cca219f2b212</installed_changeset_revision></tool_shed_repository></table>

in tool-data, all_fasta.loc is empty, but in the all_fasta.loc of the shed_tool entry, it shows:

mm9     mm9     Mouse July 2007 (NCBI37/mm9) (mm9)      /auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data/mm9/seq/mm9.fa

So if I try "Extract Genomic DNA function", I could see the parameter passed to the command line is 

-g "/auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data" 

which does not host the data. It should be at least

-g "/auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data/mm9/seq"

I thought that data manager will automatically populate these loc files? Am I missing something obvious?  I could manually modify tool_data_table_conf.xml to let it point to


This would work, but it is ugly. Could someone please give me a hand to fix this?

Thanks,
Rui

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: a question on data manager and data table

Daniel Blankenberg
Hi Rui,

The Extract Genomic DNA Tool has not yet been updated to work with tool Data Tables. 

Additionally, this tool requires TwoBit formatted files when selecting from built-in data — when selecting a FASTA from your history, the fasta file is converted to TwoBit internally in the tool before extracting sequence chunks.

There is a TwoBit builder Data Manager under development (https://github.com/galaxyproject/tools-devteam/tree/master/data_managers/data_manager_twobit_builder), that hasn’t made it out to the ToolShed just yet, but it won’t be helpful until the  the Extract Genomic DNA Tool is updated: https://trello.com/c/8unnSp7H


Thanks for using Galaxy,

Dan


On Apr 7, 2015, at 12:16 AM, Beginner TI <[hidden email]> wrote:

Hi All,

I just made a new local instance, installed the fetch_all_fasta data manager, downloaded mm9 fasta. Then I noticed the following:

in tool_data_table_conf.xml, it has:

    <table name="all_fasta" comment_char="#">
        <columns>value, dbkey, name, path</columns>
        <file path="tool-data/all_fasta.loc" />
    </table>

however in shed_tool_data_table_conf.xml, it has:

<table comment_char="#" name="all_fasta">
        <columns>value, dbkey, name, path</columns>
        <file path="/auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_all_fasta/cca219f2b212/all_fasta.loc" />
    <tool_shed_repository><tool_shed>toolshed.g2.bx.psu.edu</tool_shed><repository_name>data_manager_fetch_genome_all_fasta</repository_name><repository_owner>devteam</repository_owner><installed_changeset_revision>cca219f2b212</installed_changeset_revision></tool_shed_repository></table>

in tool-data, all_fasta.loc is empty, but in the all_fasta.loc of the shed_tool entry, it shows:

mm9     mm9     Mouse July 2007 (NCBI37/mm9) (mm9)      /auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data/mm9/seq/mm9.fa

So if I try "Extract Genomic DNA function", I could see the parameter passed to the command line is 

-g "/auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data" 

which does not host the data. It should be at least

-g "/auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data/mm9/seq"

I thought that data manager will automatically populate these loc files? Am I missing something obvious?  I could manually modify tool_data_table_conf.xml to let it point to


This would work, but it is ugly. Could someone please give me a hand to fix this?

Thanks,
Rui
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: a question on data manager and data table

Daniel Blankenberg
Hi Rui,

(cc’d back to list)


One of the nice things about tool Data Tables is that it allows us to have multiple ‘.loc’ files as the source for the content for a single data table — so we can have many ‘all_fasta.loc’ files, but one ‘all_fasta’ tool data table, with merged contents, and a tool working on ‘all_fasta’ data table doesn’t know or care about the actual .loc files on disk. Then any tool that uses the ‘all_fasta’ table in an input parameter can simply call input.fields.path, to have the actual path to the fasta file passed to the tool.


1.which file should the downloaded fasta data modify? tool_data_table_conf.xml,  or shed_tool_data_table_conf.xml?

Tools installed from the toolshed that include data Tables will programmatically modify the shed_ file, the other file can is for manual admin changes.

2. which loc file is the primary one for tools to reference? the one in $GALAXYHOME/tool-data, or the one in $GALAXYHOME/tool-data//toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_all_fasta/cca219f2b212/? If first one, then should the data manager update that one too(right now it didn't)?

Tools shouldn’t be referencing .loc files directly any more, and instead should use the tool data table abstraction.

3. For now, that tool still uses alignseq.loc, I need to manually update that file, correct?

Any tool that makes direct use of a .loc file will need to have that specific file modified.

4. How to determine if a tool is using data table is to look at the tool xml file to see if it contains reference to a data table, right? And in tool_data_table_conf.xml, it should contain data tables definition for all the possible data, right?

a.) Yes, you’ll have to look at the tool.xml, although we could add another section to the ToolShed preview to include this information if it is useful.

b.) tool_data_table_conf.xml should contain data tables distributed with the Galaxy distribution and any manually configured data tables. tool_data_table_conf.xml and shed_tool_data_table_conf.xml will be merged into one set of tool data tables — a tool data table does not need to be in tool_data_table_conf.xml if it is in shed_tool_data_table_conf.xml and vis-versa (but can be in both).


Thanks for using Galaxy,

Dan


On Apr 7, 2015, at 12:47 PM, Beginner TI <[hidden email]> wrote:

Hi Dan,

Thanks for the reply!

So, what would be the expected behavior for the tool that uses data table? For example, in my case, the tool_data_table_conf.xml would be modified to reflect the update to all_fasta.loc?

I'm a little puzzled on how this works. As you could see, after I installed that data manager and downloaded the mm9 fasta data, it modified the file shed_tool_data_table_conf.xml. This should have nothing to do with the tool Extract Genome DNA, am I right? So, my question is, 

1.which file should the downloaded fasta data modify? tool_data_table_conf.xml,  or shed_tool_data_table_conf.xml?

2. which loc file is the primary one for tools to reference? the one in $GALAXYHOME/tool-data, or the one in $GALAXYHOME/tool-data//toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_all_fasta/cca219f2b212/? If first one, then should the data manager update that one too(right now it didn't)?

3. For now, that tool still uses alignseq.loc, I need to manually update that file, correct?

4. How to determine if a tool is using data table is to look at the tool xml file to see if it contains reference to a data table, right? And in tool_data_table_conf.xml, it should contain data tables definition for all the possible data, right?

If you could please elaborate a bit, I'll really appreciate!

Thanks,
Rui




On Tue, Apr 7, 2015 at 7:15 AM, Daniel Blankenberg <[hidden email]> wrote:
Hi Rui,

The Extract Genomic DNA Tool has not yet been updated to work with tool Data Tables. 

Additionally, this tool requires TwoBit formatted files when selecting from built-in data — when selecting a FASTA from your history, the fasta file is converted to TwoBit internally in the tool before extracting sequence chunks.

There is a TwoBit builder Data Manager under development (https://github.com/galaxyproject/tools-devteam/tree/master/data_managers/data_manager_twobit_builder), that hasn’t made it out to the ToolShed just yet, but it won’t be helpful until the  the Extract Genomic DNA Tool is updated: https://trello.com/c/8unnSp7H


Thanks for using Galaxy,

Dan


On Apr 7, 2015, at 12:16 AM, Beginner TI <[hidden email]> wrote:

Hi All,

I just made a new local instance, installed the fetch_all_fasta data manager, downloaded mm9 fasta. Then I noticed the following:

in tool_data_table_conf.xml, it has:

    <table name="all_fasta" comment_char="#">
        <columns>value, dbkey, name, path</columns>
        <file path="tool-data/all_fasta.loc" />
    </table>

however in shed_tool_data_table_conf.xml, it has:

<table comment_char="#" name="all_fasta">
        <columns>value, dbkey, name, path</columns>
        <file path="/auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_all_fasta/cca219f2b212/all_fasta.loc" />
    <tool_shed_repository><tool_shed>toolshed.g2.bx.psu.edu</tool_shed><repository_name>data_manager_fetch_genome_all_fasta</repository_name><repository_owner>devteam</repository_owner><installed_changeset_revision>cca219f2b212</installed_changeset_revision></tool_shed_repository></table>

in tool-data, all_fasta.loc is empty, but in the all_fasta.loc of the shed_tool entry, it shows:

mm9     mm9     Mouse July 2007 (NCBI37/mm9) (mm9)      /auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data/mm9/seq/mm9.fa

So if I try "Extract Genomic DNA function", I could see the parameter passed to the command line is 

-g "/auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data" 

which does not host the data. It should be at least

-g "/auto/rcf-proj/yc1/galaxy-suite/galaxy-dist/tool-data/mm9/seq"

I thought that data manager will automatically populate these loc files? Am I missing something obvious?  I could manually modify tool_data_table_conf.xml to let it point to


This would work, but it is ugly. Could someone please give me a hand to fix this?

Thanks,
Rui
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/