Tabular file metadata - columns names

classic Classic list List threaded Threaded
8 messages Options
| Threaded
Open this post in threaded view
|

Tabular file metadata - columns names

Peter-2
Hi all,

I'd like to know more about Galaxy's column metadata for tabular files.
In the workflow editor under "Edit Step Actions" you can pick "Assign
Columns", and then give column numbers for five predefined cases:
Chrom, Start, End, Strand, Name.

Do these "named columns" get shown anywhere in the Galaxy UI?
For example, in a column select parameter widget?

Is it possible to assign these columns in a tool's wrapper XML file?
>From http://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax
I'm aware of the metadata_source attribute to *copy* the meta data
from the input file, but that isn't always relevant. Can I somehow
specify that my tool has tabular output where column 1 is "Name"?

Is it possible to introduce additional column types? e.g. "evalue" or
"Description".

Thanks,

Peter

| Threaded
Open this post in threaded view
|

Re: Tabular file metadata - columns names

Peter-2
On Fri, Nov 19, 2010 at 12:20 PM, Peter <[hidden email]> wrote:
> Hi all,
>
> I'd like to know more about Galaxy's column metadata for tabular files.
> In the workflow editor under "Edit Step Actions" you can pick "Assign
> Columns", and then give column numbers for five predefined cases:
> Chrom, Start, End, Strand, Name.
>
> Do these "named columns" get shown anywhere in the Galaxy UI?
> For example, in a column select parameter widget?

I've spotted some of these in the "peep" view (the right hand side
history column) for interval data. Are they specific to interval data
only, with no general mechanism available for other tabular data?

> Is it possible to assign these columns in a tool's wrapper XML file?
> From http://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax
> I'm aware of the metadata_source attribute to *copy* the meta data
> from the input file, but that isn't always relevant. Can I somehow
> specify that my tool has tabular output where column 1 is "Name"?
>
> Is it possible to introduce additional column types? e.g. "evalue" or
> "Description".
>
> Thanks,
>
> Peter
>

Peter

| Threaded
Open this post in threaded view
|

Re: Tabular file metadata - columns names

Peter Cock
Bumping this old query:

On Mon, Nov 22, 2010 at 10:11 AM, Peter <[hidden email]> wrote:

> On Fri, Nov 19, 2010 at 12:20 PM, Peter <[hidden email]> wrote:
>> Hi all,
>>
>> I'd like to know more about Galaxy's column metadata for tabular files.
>> In the workflow editor under "Edit Step Actions" you can pick "Assign
>> Columns", and then give column numbers for five predefined cases:
>> Chrom, Start, End, Strand, Name.
>>
>> Do these "named columns" get shown anywhere in the Galaxy UI?
>> For example, in a column select parameter widget?
>
> I've spotted some of these in the "peep" view (the right hand side
> history column) for interval data. Are they specific to interval data
> only, with no general mechanism available for other tabular data?
>
>> Is it possible to assign these columns in a tool's wrapper XML file?
>> From http://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax
>> I'm aware of the metadata_source attribute to *copy* the meta data
>> from the input file, but that isn't always relevant. Can I somehow
>> specify that my tool has tabular output where column 1 is "Name"?
>>
>> Is it possible to introduce additional column types? e.g. "evalue" or
>> "Description".
>>
>> Thanks,
>>
>> Peter
>>
>
> Peter
>

Is there any mechanism for tools to set tabular files' column metadata?

Peter

| Threaded
Open this post in threaded view
|

Re: Tabular file metadata - columns names

Peter Cock
Bumping one of my old queries again, with some more use-cases at the end,

On Wed, Aug 10, 2011 at 11:28 AM, Peter Cock <[hidden email]> wrote:

> Bumping this old query:
>
> On Mon, Nov 22, 2010 at 10:11 AM, Peter <[hidden email]> wrote:
>> On Fri, Nov 19, 2010 at 12:20 PM, Peter <[hidden email]> wrote:
>>> Hi all,
>>>
>>> I'd like to know more about Galaxy's column metadata for tabular files.
>>> In the workflow editor under "Edit Step Actions" you can pick "Assign
>>> Columns", and then give column numbers for five predefined cases:
>>> Chrom, Start, End, Strand, Name.
>>>
>>> Do these "named columns" get shown anywhere in the Galaxy UI?
>>> For example, in a column select parameter widget?
>>
>> I've spotted some of these in the "peep" view (the right hand side
>> history column) for interval data. Are they specific to interval data
>> only, with no general mechanism available for other tabular data?
>>
>>> Is it possible to assign these columns in a tool's wrapper XML file?
>>> From http://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax
>>> I'm aware of the metadata_source attribute to *copy* the meta data
>>> from the input file, but that isn't always relevant. Can I somehow
>>> specify that my tool has tabular output where column 1 is "Name"?
>>>
>>> Is it possible to introduce additional column types? e.g. "evalue" or
>>> "Description".
>>>
>>> Thanks,
>>>
>>> Peter
>>>
>>
>> Peter
>>
>
> Is there any mechanism for tools to set tabular files' column metadata?
>
> Peter

Hi all,

I was prompted to return to this issue after going through a fairly
simple BLAST data analysis flow with a biology colleague - and
being reminded just how non-obvious some of the task steps
were [*]. Galaxy could still be much easier to use.

Most of my protein analysis tool wrappers output tabular files,
where column 1 is the query name, and the rest of the columns
will be some sort of predictive model outcome or score. I do of
course document the column meanings in the tool's help (and
include a #header line in the output where possible), but this
could be much more user friendly.

A specific example is BLAST+ tabular output - where I have taken
pains to document the columns in the tools' help text, but it would
be much nicer to be able to annotate the columns within Galaxy's
metadata as well. If this isn't possible in the base 'tabular' datatype,
can it be done as a custom 'blast-tabular' datatype instead?

This relates to an open issues on improving the parameter widget
for selecting a column (or columns):

Bitbucket issue 554: Show column names, headers or first entry in
column select parameters
https://trello.com/card/554-show-column-names-headers-or-first-entry-in-column-select-parameters/506338ce32ae458f6d15e4b3/436

For instance, when sorting a BLAST tabular file, it should be
trivially easy to sort by bitscore without first having to go away
an lookup the meaning of each column in order to know this is
column 12.

Regards,

Peter

[*] This is one reason why I've just switched the default BLAST+
output from the standard 12 column output to the extended 24
column output in v0.0.17 of the wrappers:
http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
| Threaded
Open this post in threaded view
|

Re: Tabular file metadata - columns names

cjav
On Wed, Feb 20, 2013 at 9:57 AM, Peter Cock <[hidden email]> wrote:
> [*] This is one reason why I've just switched the default BLAST+
> output from the standard 12 column output to the extended 24
> column output in v0.0.17 of the wrappers:
> http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/

Hi Peter,

Would you consider adding the option to set a custom tabular output? I
would like to be able to select exactly which fields to include. For
example the alignment data I rarely need it and if I choose 24 column
output I'll be wasting a lot of space by including it.

I'll be happy to provide patch.

And yes, I'll be very happy to be able to set custom column names in
the workflow for tabular outputs.

Regards,
Carlos
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
| Threaded
Open this post in threaded view
|

Re: Tabular file metadata - columns names

Peter Cock
On Thu, Feb 21, 2013 at 6:12 PM, Carlos Borroto
<[hidden email]> wrote:

> On Wed, Feb 20, 2013 at 9:57 AM, Peter Cock <[hidden email]> wrote:
>> [*] This is one reason why I've just switched the default BLAST+
>> output from the standard 12 column output to the extended 24
>> column output in v0.0.17 of the wrappers:
>> http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/
>
> Hi Peter,
>
> Would you consider adding the option to set a custom tabular output? I
> would like to be able to select exactly which fields to include. For
> example the alignment data I rarely need it and if I choose 24 column
> output I'll be wasting a lot of space by including it.
>
> I'll be happy to provide patch.
>
> And yes, I'll be very happy to be able to set custom column names in
> the workflow for tabular outputs.
>
> Regards,
> Carlos

Hi Carlos,

I had deliberately avoid letting users pick the columns - it is
doable, but has two major downsides. First a more complex
GUI (if we don't allow the order to change then it is still ~24
options), and then the worse problem of it being hard to know
what the output columns are in later work. If the columns are
consistent, it is much easier to write general instructions (e.g.
for filtering on percentage identity). If Galaxy let us label the
columns on the current 'tabular' format, then I'd be more
positive about this, but until that happens I would prefer not
to offer arbitrary columns in the BLAST tabular output.

In the mean time, would you prefer I revert the default to 12
column tabular output? Just how big are your BLAST files if
the extra disk space is a serious concern (compared to raw
sequencing data)?

(Separately I was asking about how to offer automatic datatype
conversion - that would allow easy conversion of BLAST XML
or even BLAST archive ASN.1 format into tabular on demand,
making them viable default output formats from a usability point
of view - but both of those are larger than the default 12 column
or even 24 column tabular formats.)

Regards,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
| Threaded
Open this post in threaded view
|

Re: Tabular file metadata - columns names

cjav
On Thu, Feb 21, 2013 at 2:03 PM, Peter Cock <[hidden email]> wrote:

> On Thu, Feb 21, 2013 at 6:12 PM, Carlos Borroto
> <[hidden email]> wrote:
>> On Wed, Feb 20, 2013 at 9:57 AM, Peter Cock <[hidden email]> wrote:
>>> [*] This is one reason why I've just switched the default BLAST+
>>> output from the standard 12 column output to the extended 24
>>> column output in v0.0.17 of the wrappers:
>>> http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/
>>
>> Hi Peter,
>>
>> Would you consider adding the option to set a custom tabular output? I
>> would like to be able to select exactly which fields to include. For
>> example the alignment data I rarely need it and if I choose 24 column
>> output I'll be wasting a lot of space by including it.
>>
>
> Hi Carlos,
>
> I had deliberately avoid letting users pick the columns - it is
> doable, but has two major downsides. First a more complex
> GUI (if we don't allow the order to change then it is still ~24
> options), and then the worse problem of it being hard to know
> what the output columns are in later work. If the columns are
> consistent, it is much easier to write general instructions (e.g.
> for filtering on percentage identity). If Galaxy let us label the
> columns on the current 'tabular' format, then I'd be more
> positive about this, but until that happens I would prefer not
> to offer arbitrary columns in the BLAST tabular output.
>

I was thinking more like a text box where an advance user could type
in the strings that would be attached to '-outfmt  "6 ...."'. We could
add as a default value the current strings in tabular 24 to help guide
the user to what is possible. I see your point about this would make
harder to later know what each columns is, but this is an advanced use
and could be labelled as such.

> In the mean time, would you prefer I revert the default to 12
> column tabular output? Just how big are your BLAST files if
> the extra disk space is a serious concern (compared to raw
> sequencing data)?
>

I'm fine with 24 being the default. In fact I'm currently rerunning a
big blast exactly cause I now realized I needed more information than
what is provided by 12. I'm expecting 25 million hits with HSPs of
300bp. I'm expecting to double or triple the previous output. The
current output is around 2.1gb and the raw sequencing data is 1gb. As
you can see it would be nice to get rid of the alignments if I'm not
using them.

> (Separately I was asking about how to offer automatic datatype
> conversion - that would allow easy conversion of BLAST XML
> or even BLAST archive ASN.1 format into tabular on demand,
> making them viable default output formats from a usability point
> of view - but both of those are larger than the default 12 column
> or even 24 column tabular formats.)
>

This would be great and I will probably use it for smaller blast jobs.

Best,
Carlos
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
| Threaded
Open this post in threaded view
|

Re: Tabular file metadata - columns names

Peter Cock
In reply to this post by Peter Cock
On Wed, Feb 20, 2013 at 2:57 PM, Peter Cock <[hidden email]> wrote:

> Bumping one of my old queries again, with some more use-cases at the end,
>
> On Wed, Aug 10, 2011 at 11:28 AM, Peter Cock <[hidden email]> wrote:
> http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-August/006350.html
> <cut>
>
> Hi all,
>
> I was prompted to return to this issue after going through a fairly
> simple BLAST data analysis flow with a biology colleague - and
> being reminded just how non-obvious some of the task steps
> were [*]. Galaxy could still be much easier to use.
>
> Most of my protein analysis tool wrappers output tabular files,
> where column 1 is the query name, and the rest of the columns
> will be some sort of predictive model outcome or score. I do of
> course document the column meanings in the tool's help (and
> include a #header line in the output where possible), but this
> could be much more user friendly.
>
> A specific example is BLAST+ tabular output - where I have taken
> pains to document the columns in the tools' help text, but it would
> be much nicer to be able to annotate the columns within Galaxy's
> metadata as well. If this isn't possible in the base 'tabular' datatype,
> can it be done as a custom 'blast-tabular' datatype instead?
>
> ...

Based on this discussion, perhaps I do have to define a sub-format?
http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-February/013526.html

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/