Dataset's extra files

classic Classic list List threaded Threaded
2 messages Options
| Threaded
Open this post in threaded view
|

Dataset's extra files

Charles Girardot
Hi all,

We have a local tool which role is to transfer (ie copy) a dataset file to a directory on our NFS. This is extremely convenient as it can be included within workflows and therefore save the time of clicking download button (we also have configurable renaming/compression as part of it). It is heavily used by our users.

The problem is with datasets that have associated files like FASTQC as these extra files are simple not ignored...
We'd like to improve our 'NFS_transfer' tool so it can deal with this in a similar fashion as the download button.

Foreseen solution :
* Check if a directory named 'dataset_<id>_files' exists within the dataset store
* if so, 'cp -r' it into a tmp dir, cp the dataset itself into same tmp dir (with renaming on the fly)
* zip/tar.gz the tmp dir
* copy it to final NFS location

Question is : is this the right way to do it ?  As a non python specialist, it is a little tricky to find the right way to it (I can t locate the piece of code that does this in galaxy ie behind the download button). In particular, can I get the list of extra files using the '$galaxyFile' object given in the tool by :

<param type="data" name="galaxyFile" label="File to transfer"/>

i.e. in the same way we get the dataset name or file extension ($galaxyFile.dataset.name and $galaxyFile.ext) ?

Any advise on how best to implement this, in a portable way, very appreciated.

Thanks for your time,

Charles

=====================================
Charles Girardot
Head of Genome Biology Computational Support (GBCS)
European Molecular Biology Laboratory
Tel: +49 6221 387 -8585
Fax: +49-(0)6221-387-8166
Email: [hidden email]
Room V205
Meyerhofstra├če 1,
69117 Heidelberg, Germany
=====================================















___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|

Re: Dataset's extra files

John Chilton-4
The directory can be obtained using $galaxyData.extra_files_path (be
sure to check it exists before zipping it up).

I would discourage re-using Galaxy components directly from inside of
a tool or tool wrapper - but if you want to reference that code it is
actually inside of the datatypes module -

https://bitbucket.org/galaxy/galaxy-central/src/3fb927653301a0c06a0bf94f2b6bd71b3595ec0d/lib/galaxy/datatypes/data.py?at=default#cl-228

Hope this helps.

-John

On Tue, Mar 11, 2014 at 5:35 AM, Charles Girardot
<[hidden email]> wrote:

> Hi all,
>
> We have a local tool which role is to transfer (ie copy) a dataset file to a directory on our NFS. This is extremely convenient as it can be included within workflows and therefore save the time of clicking download button (we also have configurable renaming/compression as part of it). It is heavily used by our users.
>
> The problem is with datasets that have associated files like FASTQC as these extra files are simple not ignored...
> We'd like to improve our 'NFS_transfer' tool so it can deal with this in a similar fashion as the download button.
>
> Foreseen solution :
> * Check if a directory named 'dataset_<id>_files' exists within the dataset store
> * if so, 'cp -r' it into a tmp dir, cp the dataset itself into same tmp dir (with renaming on the fly)
> * zip/tar.gz the tmp dir
> * copy it to final NFS location
>
> Question is : is this the right way to do it ?  As a non python specialist, it is a little tricky to find the right way to it (I can t locate the piece of code that does this in galaxy ie behind the download button). In particular, can I get the list of extra files using the '$galaxyFile' object given in the tool by :
>
> <param type="data" name="galaxyFile" label="File to transfer"/>
>
> i.e. in the same way we get the dataset name or file extension ($galaxyFile.dataset.name and $galaxyFile.ext) ?
>
> Any advise on how best to implement this, in a portable way, very appreciated.
>
> Thanks for your time,
>
> Charles
>
> =====================================
> Charles Girardot
> Head of Genome Biology Computational Support (GBCS)
> European Molecular Biology Laboratory
> Tel: +49 6221 387 -8585
> Fax: +49-(0)6221-387-8166
> Email: [hidden email]
> Room V205
> Meyerhofstra├če 1,
> 69117 Heidelberg, Germany
> =====================================
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/