Galaxy and object stores

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Galaxy and object stores

Raknes Inge Alexander

​I have a few questions about object stores in Galaxy:

1: Can all Galaxy data sets be stored in an object store?
2: If so,  does Galaxy still need to maintain a local copy of the data?
3: Is LWR or Pulsar able to get the data directly from the object store, or does it still have to go through Galaxy?

We are planning to let users of our Galaxy installation handle large input/output files (~30G) and we expect that the VM containing our Galaxy installation will become a bottleneck if all data needs to travel through that node.

- Inge Alexander Raknes

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Galaxy and object stores

Enis Afgan-2
Hi Inge, 
There is an implementation for using the AWS S3 object store as the data store for a given Galaxy instance. The implementation is located here https://bitbucket.org/galaxy/galaxy-central/src/3a51eaf209f2502bf32dbb421ecabb7fe46243ea/lib/galaxy/objectstore/s3.py?at=default and it offers several config options in universe_wsgi.ini.

The data stored in S3 is locally cached while it's being operated on but always synced with the back end object store. 

Pulsar seems to have some support for S3 but, as the docs say in the implementation, it's explicitly beta: https://github.com/galaxyproject/pulsar/blob/b32b7caafc6582a3a28e694e2dbb75e7a8f2bffc/galaxy/objectstore/pulsar.py

As a side note, there are some planned enhancements to how the object store implementation is handled and there will hopefully be quite a bit of activity on this topic in the near future (eg, https://trello.com/c/YynQKq8m).

Hope this at least clarifies the state of object store support,
Enis


On Mon, Aug 25, 2014 at 10:24 AM, Raknes Inge Alexander <[hidden email]> wrote:

​I have a few questions about object stores in Galaxy:

1: Can all Galaxy data sets be stored in an object store?
2: If so,  does Galaxy still need to maintain a local copy of the data?
3: Is LWR or Pulsar able to get the data directly from the object store, or does it still have to go through Galaxy?

We are planning to let users of our Galaxy installation handle large input/output files (~30G) and we expect that the VM containing our Galaxy installation will become a bottleneck if all data needs to travel through that node.

- Inge Alexander Raknes

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Galaxy and object stores

John Chilton-4
Thanks Enis, just to elaborate on Pulsar - I suspect it would work
with something like configuring Galaxy with S3 object store right now
- but it would do so by having Galaxy cache the data locally and then
Pulsar would negotiate the transfer with Galaxy (many different ways
this could occur depending on who things are mounted). Ideally - it
wouldn't happen this way though - I would love it if Galaxy could
determine the job is going to be run remotely and not attempt the
cache and then configure the remote Pulsar to cache the file directly
from the object store abstraction. In addition to eliminating the
extra cache and transfer, it could allow Pulsar and Galaxy to have
different views of the underlying data sources (e.g. here the data is
mounted as X and there the data is mounted as Y - or here the data is
directly available and there get it via IRODS, etc...).

There are some ... initial grasps... at this sort of thing in Pulsar
and Galaxy but it is not fully (or even substantially) implemented
currently.

-John

On Tue, Aug 26, 2014 at 11:18 AM, Enis Afgan <[hidden email]> wrote:

> Hi Inge,
> There is an implementation for using the AWS S3 object store as the data
> store for a given Galaxy instance. The implementation is located here
> https://bitbucket.org/galaxy/galaxy-central/src/3a51eaf209f2502bf32dbb421ecabb7fe46243ea/lib/galaxy/objectstore/s3.py?at=default
> and it offers several config options in universe_wsgi.ini.
>
> The data stored in S3 is locally cached while it's being operated on but
> always synced with the back end object store.
>
> Pulsar seems to have some support for S3 but, as the docs say in the
> implementation, it's explicitly beta:
> https://github.com/galaxyproject/pulsar/blob/b32b7caafc6582a3a28e694e2dbb75e7a8f2bffc/galaxy/objectstore/pulsar.py
>
> As a side note, there are some planned enhancements to how the object store
> implementation is handled and there will hopefully be quite a bit of
> activity on this topic in the near future (eg,
> https://trello.com/c/YynQKq8m).
>
> Hope this at least clarifies the state of object store support,
> Enis
>
>
> On Mon, Aug 25, 2014 at 10:24 AM, Raknes Inge Alexander
> <[hidden email]> wrote:
>>
>> I have a few questions about object stores in Galaxy:
>>
>> 1: Can all Galaxy data sets be stored in an object store?
>> 2: If so,  does Galaxy still need to maintain a local copy of the data?
>> 3: Is LWR or Pulsar able to get the data directly from the object store,
>> or does it still have to go through Galaxy?
>>
>> We are planning to let users of our Galaxy installation handle large
>> input/output files (~30G) and we expect that the VM containing our Galaxy
>> installation will become a bottleneck if all data needs to travel through
>> that node.
>>
>> - Inge Alexander Raknes
>>
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>   http://lists.bx.psu.edu/
>>
>> To search Galaxy mailing lists use the unified search at:
>>   http://galaxyproject.org/search/mailinglists/
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/