Quantcast

Tool needs a particular file extension

classic Classic list List threaded Threaded
6 messages Options
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Tool needs a particular file extension

Steve Cassidy
Hi,
 I’m wrapping a tool that needs it’s input to have a known file extension (an audio file, eg. .wav).  Since Galaxy stores all data as .dat files the tool is falling over since it doesn’t know what .dat is. 

I thought I’d be able to get around this by hard linking the .dat file to the same name with a .wav extension (dataset_1.dat.wav), this works when I try it with the tool on the command line but within Galaxy it fails, here’s my <command>:

        ln $signal ${signal}.wav &amp;
        /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=$language
        BPF=$bpf INSKANTEXTGRID=$inskantextgrid INSORTTEXTGRID=$insorttextgrid
        MODUS=$modus MAUSSHIFT=$mausshift MINPAUSLEN=$minpauslen WEIGHT=$weight
        INSPROB=$insprob NOINITIALFINALSILENCE=$noinitialfinalsilence OUTSYMBOL=$outsymbol
        OUT=$output SIGNAL=${signal}.wav

resulting in the job command line:

ln /tmp/tmp7AZvx7/files/000/dataset_2.dat /tmp/tmp7AZvx7/files/000/dataset_2.dat.wav & /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=aus BPF=/tmp/tmp7AZvx7/files/000/dataset_1.dat INSKANTEXTGRID=false INSORTTEXTGRID=false MODUS=standard MAUSSHIFT=10 MINPAUSLEN=5 WEIGHT=7.0 INSPROB=0.0 NOINITIALFINALSILENCE=no OUTSYMBOL=sampa OUT=/tmp/tmp7AZvx7/files/000/dataset_3.dat SIGNAL=/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav

I’m getting an error message from the tool:

sox FAIL formats: can't open input file `/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav': WAVE: RIFF header not found

this suggests that the hard link didn’t get made.  I tried copying the file instead but got the same result.  

I could go in and patch the tool script to be more forgiving but it would be good to find a solution that didn’t require that if possible. 

Any pointers appreciated. 

Steve

Department of Computing, Macquarie University
http://web.science.mq.edu.au/~cassidy


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tool needs a particular file extension

Léo Biscassi

On Fri, Oct 21, 2016 at 2:48 PM Steve Cassidy <[hidden email]> wrote:
Hi,
 I’m wrapping a tool that needs it’s input to have a known file extension (an audio file, eg. .wav).  Since Galaxy stores all data as .dat files the tool is falling over since it doesn’t know what .dat is. 

I thought I’d be able to get around this by hard linking the .dat file to the same name with a .wav extension (dataset_1.dat.wav), this works when I try it with the tool on the command line but within Galaxy it fails, here’s my <command>:

        ln $signal ${signal}.wav &amp;
        /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=$language
        BPF=$bpf INSKANTEXTGRID=$inskantextgrid INSORTTEXTGRID=$insorttextgrid
        MODUS=$modus MAUSSHIFT=$mausshift MINPAUSLEN=$minpauslen WEIGHT=$weight
        INSPROB=$insprob NOINITIALFINALSILENCE=$noinitialfinalsilence OUTSYMBOL=$outsymbol
        OUT=$output SIGNAL=${signal}.wav

resulting in the job command line:

ln /tmp/tmp7AZvx7/files/000/dataset_2.dat /tmp/tmp7AZvx7/files/000/dataset_2.dat.wav & /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=aus BPF=/tmp/tmp7AZvx7/files/000/dataset_1.dat INSKANTEXTGRID=false INSORTTEXTGRID=false MODUS=standard MAUSSHIFT=10 MINPAUSLEN=5 WEIGHT=7.0 INSPROB=0.0 NOINITIALFINALSILENCE=no OUTSYMBOL=sampa OUT=/tmp/tmp7AZvx7/files/000/dataset_3.dat SIGNAL=/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav

I’m getting an error message from the tool:

sox FAIL formats: can't open input file `/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav': WAVE: RIFF header not found

this suggests that the hard link didn’t get made.  I tried copying the file instead but got the same result.  

I could go in and patch the tool script to be more forgiving but it would be good to find a solution that didn’t require that if possible. 

Any pointers appreciated. 

Steve

Department of Computing, Macquarie University
http://web.science.mq.edu.au/~cassidy

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
--
Best regards, 
Léo Biscassi

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tool needs a particular file extension

Peter Cock
In reply to this post by Steve Cassidy
Using a soft link for this is a common pattern, and should be followed with &&
(ideally using XML CDATA to avoid escaping everything like &amp; etc),
and quote the filenames just in case there are any spaces. e.g.

https://github.com/galaxyproject/tools-iuc/blob/master/tools/trinity/run_de_analysis.xml#L16


For reference, in tools-iuc there are over 400 soft link examples:

$ grep "ln -s" tools/*/*.xml | wc -l
     446

Peter

On Fri, Oct 21, 2016 at 5:48 PM, Steve Cassidy <[hidden email]> wrote:

> Hi,
>  I’m wrapping a tool that needs it’s input to have a known file extension
> (an audio file, eg. .wav).  Since Galaxy stores all data as .dat files the
> tool is falling over since it doesn’t know what .dat is.
>
> I thought I’d be able to get around this by hard linking the .dat file to
> the same name with a .wav extension (dataset_1.dat.wav), this works when I
> try it with the tool on the command line but within Galaxy it fails, here’s
> my <command>:
>
>         ln $signal ${signal}.wav &amp;
>         /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=$language
>         BPF=$bpf INSKANTEXTGRID=$inskantextgrid
> INSORTTEXTGRID=$insorttextgrid
>         MODUS=$modus MAUSSHIFT=$mausshift MINPAUSLEN=$minpauslen
> WEIGHT=$weight
>         INSPROB=$insprob NOINITIALFINALSILENCE=$noinitialfinalsilence
> OUTSYMBOL=$outsymbol
>         OUT=$output SIGNAL=${signal}.wav
>
> resulting in the job command line:
>
> ln /tmp/tmp7AZvx7/files/000/dataset_2.dat
> /tmp/tmp7AZvx7/files/000/dataset_2.dat.wav & /home/maus/maus
> OUTFORMAT=TextGrid LANGUAGE=aus BPF=/tmp/tmp7AZvx7/files/000/dataset_1.dat
> INSKANTEXTGRID=false INSORTTEXTGRID=false MODUS=standard MAUSSHIFT=10
> MINPAUSLEN=5 WEIGHT=7.0 INSPROB=0.0 NOINITIALFINALSILENCE=no OUTSYMBOL=sampa
> OUT=/tmp/tmp7AZvx7/files/000/dataset_3.dat
> SIGNAL=/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav
>
> I’m getting an error message from the tool:
>
> sox FAIL formats: can't open input file
> `/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav': WAVE: RIFF header not found
>
> this suggests that the hard link didn’t get made.  I tried copying the file
> instead but got the same result.
>
> I could go in and patch the tool script to be more forgiving but it would be
> good to find a solution that didn’t require that if possible.
>
> Any pointers appreciated.
>
> Steve
> —
> Department of Computing, Macquarie University
> http://web.science.mq.edu.au/~cassidy
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tool needs a particular file extension -- now Datatype help

Steve Cassidy
Thanks all,
  it seems that my real problem is that the audio file (.wav) is not being identified as a valid datatype and ending up as a zero length text file. So, I need to start to explore the world of datatypes. 

Following the docs (<a href="https://wiki.galaxyproject.org/Admin/Datatypes/Adding Datatypes" class="">https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes) I can modify datatypes_conf.xml in my Galaxy sources and add a new datatype for wav files:

    <datatype extension="wav" type="galaxy.datatypes.binary:Binary" display_in_upload="true" mimetype="audio/wav" subclass="True”/>

but, I get a message "The uploaded binary file contains inappropriate content” and a zero length file just as I did before adding this - although the datatype is now set to ‘wav’.  

I didn’t add a sniffer for this and set the datatype explicitly on upload. 

Also, this doesn’t seem like a modular way to add datatypes - how do I include datatypes in my tool definition?  I can see from some other tools that I include a datatypes_conf.xml in my tool folder.   When I try that and test with planemo the new type isn’t found.  

Pointers welcome.

Thanks,

Steve

Department of Computing, Macquarie University
http://web.science.mq.edu.au/~cassidy

On 21 Oct. 2016, at 12:58 pm, Peter Cock <[hidden email]> wrote:

Using a soft link for this is a common pattern, and should be followed with &&
(ideally using XML CDATA to avoid escaping everything like &amp; etc),
and quote the filenames just in case there are any spaces. e.g.

https://github.com/galaxyproject/tools-iuc/blob/master/tools/trinity/run_de_analysis.xml#L16


For reference, in tools-iuc there are over 400 soft link examples:

$ grep "ln -s" tools/*/*.xml | wc -l
    446

Peter

On Fri, Oct 21, 2016 at 5:48 PM, Steve Cassidy <[hidden email]> wrote:
Hi,
I’m wrapping a tool that needs it’s input to have a known file extension
(an audio file, eg. .wav).  Since Galaxy stores all data as .dat files the
tool is falling over since it doesn’t know what .dat is.

I thought I’d be able to get around this by hard linking the .dat file to
the same name with a .wav extension (dataset_1.dat.wav), this works when I
try it with the tool on the command line but within Galaxy it fails, here’s
my <command>:

       ln $signal ${signal}.wav &amp;
       /home/maus/maus OUTFORMAT=TextGrid LANGUAGE=$language
       BPF=$bpf INSKANTEXTGRID=$inskantextgrid
INSORTTEXTGRID=$insorttextgrid
       MODUS=$modus MAUSSHIFT=$mausshift MINPAUSLEN=$minpauslen
WEIGHT=$weight
       INSPROB=$insprob NOINITIALFINALSILENCE=$noinitialfinalsilence
OUTSYMBOL=$outsymbol
       OUT=$output SIGNAL=${signal}.wav

resulting in the job command line:

ln /tmp/tmp7AZvx7/files/000/dataset_2.dat
/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav & /home/maus/maus
OUTFORMAT=TextGrid LANGUAGE=aus BPF=/tmp/tmp7AZvx7/files/000/dataset_1.dat
INSKANTEXTGRID=false INSORTTEXTGRID=false MODUS=standard MAUSSHIFT=10
MINPAUSLEN=5 WEIGHT=7.0 INSPROB=0.0 NOINITIALFINALSILENCE=no OUTSYMBOL=sampa
OUT=/tmp/tmp7AZvx7/files/000/dataset_3.dat
SIGNAL=/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav

I’m getting an error message from the tool:

sox FAIL formats: can't open input file
`/tmp/tmp7AZvx7/files/000/dataset_2.dat.wav': WAVE: RIFF header not found

this suggests that the hard link didn’t get made.  I tried copying the file
instead but got the same result.

I could go in and patch the tool script to be more forgiving but it would be
good to find a solution that didn’t require that if possible.

Any pointers appreciated.

Steve

Department of Computing, Macquarie University
http://web.science.mq.edu.au/~cassidy


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tool needs a particular file extension -- now Datatype help

Peter Cock
Hi Steve,

You are on the right track, but something in the WAV file has
triggered one of Galaxy's security protections to try to block
uploading of potentially dangerous files. There may be some
settings here you can relax - I've not had to deal with this
myself.

Peter

On Fri, Oct 21, 2016 at 8:55 PM, Steve Cassidy <[hidden email]> wrote:

> Thanks all,
>   it seems that my real problem is that the audio file (.wav) is not being
> identified as a valid datatype and ending up as a zero length text file. So,
> I need to start to explore the world of datatypes.
>
> Following the docs
> (https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes) I can
> modify datatypes_conf.xml in my Galaxy sources and add a new datatype for
> wav files:
>
>     <datatype extension="wav" type="galaxy.datatypes.binary:Binary"
> display_in_upload="true" mimetype="audio/wav" subclass="True”/>
>
> but, I get a message "The uploaded binary file contains inappropriate
> content” and a zero length file just as I did before adding this - although
> the datatype is now set to ‘wav’.
>
> I didn’t add a sniffer for this and set the datatype explicitly on upload.
>
> Also, this doesn’t seem like a modular way to add datatypes - how do I
> include datatypes in my tool definition?  I can see from some other tools
> that I include a datatypes_conf.xml in my tool folder.   When I try that and
> test with planemo the new type isn’t found.
>
> Pointers welcome.
>
> Thanks,
>
> Steve
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Tool needs a particular file extension -- now Datatype help

Gildas Le Corguillé
Hi Steve,

Galaxy try to sniff the data to guess the appropriate datatype.
For Binaries, if any datatype (sniffer) is found from https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/binary.py , you get this message "The binary uploaded file contains inappropriate content."

Typically for Binary, you can get the first n bytes which used to be a text  and check it’s equal to, i hope, "wave". There are bunch of example in the file.

And finally, a Pull Request on https://github.com/galaxyproject/galaxy :)

Good luck

Gildas

-----------------------------------------------------------------
Gildas Le Corguillé - Bioinformatician/Bioanalyste

Plateform ABiMS (Analyses and Bioinformatics for Marine Science)
http://abims.sb-roscoff.fr

Member of the Workflow4Metabolomics project
http://workflow4metabolomics.org

Station Biologique de Roscoff - UPMC/CNRS - FR2424
Place Georges Teissier 29680 Roscoff FRANCE
tel: +33 2 98 29 23 81
------------------------------------------------------------------



Le 22 oct. 2016 à 12:36, Peter Cock <[hidden email]> a écrit :

Hi Steve,

You are on the right track, but something in the WAV file has
triggered one of Galaxy's security protections to try to block
uploading of potentially dangerous files. There may be some
settings here you can relax - I've not had to deal with this
myself.

Peter

On Fri, Oct 21, 2016 at 8:55 PM, Steve Cassidy <[hidden email]> wrote:
Thanks all,
 it seems that my real problem is that the audio file (.wav) is not being
identified as a valid datatype and ending up as a zero length text file. So,
I need to start to explore the world of datatypes.

Following the docs
(https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes) I can
modify datatypes_conf.xml in my Galaxy sources and add a new datatype for
wav files:

   <datatype extension="wav" type="galaxy.datatypes.binary:Binary"
display_in_upload="true" mimetype="audio/wav" subclass="True”/>

but, I get a message "The uploaded binary file contains inappropriate
content” and a zero length file just as I did before adding this - although
the datatype is now set to ‘wav’.

I didn’t add a sniffer for this and set the datatype explicitly on upload.

Also, this doesn’t seem like a modular way to add datatypes - how do I
include datatypes in my tool definition?  I can see from some other tools
that I include a datatypes_conf.xml in my tool folder.   When I try that and
test with planemo the new type isn’t found.

Pointers welcome.

Thanks,

Steve
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Loading...