Using Mesos to Enable distributed computing under Galaxy?

classic Classic list List threaded Threaded
9 messages Options
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Using Mesos to Enable distributed computing under Galaxy?

Kyle Ellrott
I think one of the aspects where Galaxy is a bit soft is the ability to do distributed tasks. The current system of split/replicate/merge tasks based on file type is a bit limited and hard for tool developers to expand upon. Distributed computing is a non-trival thing to implement and I think it would be a better use of our time to use an already existing framework. And it would also mean one less API for tool writers to have to develop for.
I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ). You can see an overview of the Mesos architecture at https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
The important thing about Mesos is that it provides an API for C/C++, Java/Scala and Python to write distributed frameworks. There are already implementations of frameworks for common parallel programming systems such as:
And you can find example Python framework at https://github.com/apache/mesos/tree/master/src/examples/python

Integration with Galaxy would have three parts:
1) Add a system config variable to Galaxy called 'MESOS_URL' that is then passed to tool wrappers and allows them to contact the local mesos infrastructure (assuming the system has been configured) or pass a null if the system isn't available. 
2) Write a tool runner that works as a mesos framework to executes single cpu jobs on the distributed system.
3) For instances where mesos is not available at a system wide level (say they only have access to an SGE based cluster), but the user wants to run distributed jobs, write a wrapper that can create a mesos cluster using the existing queueing system. For example, right now I run a Mesos system under the SGE queue system.

I'm curious to see what other people think.

Kyle

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Mesos to Enable distributed computing under Galaxy?

John Chilton
I have created a Trello card for this (https://trello.com/c/m2x1CXmi)
and I have attached a more flushed out IRC conversation for additional
public comment/posterity. I think this is exciting stuff - though I
need to get my head around mesos and how it would interact with Galaxy
more fully. I think some important (perhaps obvious) concerns are:

- Integration at framework level must be optional - this shouldn't be
a required dependency.
- Existing runners and talk parallelism stuff must continue to
function with or without mesos.

Other thoughts?

-John

16:55 < jmchilton> kellrott: Liked the mesos e-mail. Trying to come up
with something intelligent to say before responding :). Do you any
have specific use cases/tools in
                   mind?
16:56 < kellrott> Starting to move our tools to spark
16:56 < kellrott> That's the big draw for me
16:57 < kellrott> And honestly, it seems better to keep hacking away
with things like

https://bitbucket.org/galaxy/galaxy-central/pull-request/175/parameter-based-bam-file-parallelization
16:57 < mrscribe> Title: galaxy / galaxy-central / Pull request #175:
Parameter Based Bam file parallelization Bitbucket (at bitbucket.org)
17:04 < jmchilton> kellrott: Did you mean "it seems better to keep
hacking away with things like" or did you mean it seems better not
to...
17:04 < kellrott> missed a 'than'
17:05 < kellrott> but yes, pull #175 is a bit of a hack and wouldn't
be needed if there was a really parallelization system
17:06 < kellrott> One of the most relevant usages is any tool that
uses a random sampling for statistics, ie consensus clustering
17:06 < jmchilton> Ahh, thought so. Yeah, I think the current task
splitting framework stuff is suppose to be pretty transparent to the
tool wrapper/application. For
                   applications that depend on something like spark,
you idea sounds pretty good to me.
17:07 < kellrott> no real way to support parallel computation of a
background model in current system
17:08 < kellrott> When talking to different people, the lack of robust
parallelization tends to be the biggest complaint
17:10 < kellrott> I think Mesos is a pretty robust solution, I just
wanted to see if anybody had technical complaints
17:11 < jmchilton> hmm.... interesting. I must admit I had never
really thought of that as being a problem. I will have to chew on what
you said though.
17:11 < jmchilton> All of that said, I have no technical complaints
and I am eager to help in any way I reasonably can with the mesos
stuff. It seems pretty exciting.
17:11 < kellrott> Sorry, if my explanations are a bit choppy
17:12 < kellrott> I think the biggest complaint would be 'yet another
dependency'
17:13 < jmchilton> We could do it in a way that it is optional though right?
17:13 < kellrott> Exactly
17:13 < jmchilton> ... though maybe specific tools would require it.
17:14 < jmchilton> I will create a Trello card, if anyone has
objections they can note them there.
17:14 < kellrott> Its not hard to support single CPU calculations
(Spark has a built in single CPU mode)
17:15 < kellrott> But tools that require it would probably not be very
fast running on a single machine ;-)
17:15 < kellrott> I'll probably start working on the implementation, I
just wanted to make sure people are open to merging the work into
central
17:17 < jmchilton> Reserve the right to reject specific
implementations :), but overall I (at least personally) like the idea.
17:18 < pvh_sa> kellrott, agreed 100%
17:18 < kellrott> Obviously I'm open to peoples critiques
17:19 < pvh_sa> the Galaxy execution engine is going to need
replacing, its just a question of with what, and how can it be done in
the least disruptive way
17:19 < kellrott> This has become the thing that it the 'rate limiter'
for adoption by tool writers around here
17:20 < kellrott> are calculations demand robust parallel programming platforms
17:22 < jmchilton> huh... I see the need to support things other than
MPI, but would you say there are problems with our current MPI support
(essentially delegated external
                   resource manager).
17:23 < kellrott> well, MPI is supported under mesos ;-)
17:23 < dannon> THe main problem here is that we wanted something that
supports parallelization for tools that don't natively do it.
17:24 < dannon> new runners for different execution models, totally fine.
17:25 < pvh_sa> dannon, that's something that needs to cooperation of
the execution engine and the type system, right? that's kind of how it
works at the moment - tools
                state that certain inputs are splittable (i.e. are
actually collections)
17:26 < dannon> Right.  I could imagine the parallelization
description langugae being fleshed out more to say "if mesos, do this,
else do naieve file chunking, etc."
17:26 < pvh_sa> of course that needs extending so that collections of
parameters can be provided as inputs, so that tools can explore a
parameter space
17:27 < dannon> But at the core, it needs to be abstract
17:27 < kellrott> I think strait file chunking will only cover a
fraction of parallalization needs
17:27 < dannon> kellrott: Absolutely, file chunking is designed for
embarassingly parallel problems without shared memory/etc
17:27 < dannon> It just won't work for some things.
17:28 < pvh_sa> kellrott, is mesos also scheduling work parcels? or
does it not do scheduling?
17:29 < jmchilton> My understanding is its a framework that you can
plugin scheduling into...
17:30 < kellrott> you write a small 'executor' that the system starts
offering resources to
17:31 < kellrott> then you can accept them, and you scripts are run on
the remote machines
17:31 < pvh_sa> kellrott, ok... so the scheduling is external
17:31 < kellrott> like a queue system, but a two-way dialog rather
then a single sided converation
17:32 < pvh_sa> your "executor" must handle what to do with resource
offers, right? so it matches the resource offers to work that needs
doing
17:32 < kellrott> I believe so
17:33 < pvh_sa> not sure what other people's sense is...
17:34 < kellrott> Sorry, flipped the lingo, you write a scheduler that
then has the system launch executors based on resource offers
17:34 < kellrott>
https://github.com/apache/mesos/blob/master/docs/App-Framework-Development-Guide.md
17:34 < mrscribe> Title: mesos/docs/App-Framework-Development-Guide.md
at master · apache/mesos · GitHub (at github.com)
17:34 < pvh_sa> ok dokey. so conceivably you could use celery with a
mesos backend or something
17:35 < kellrott> Exactly, there is already a Torque framework
17:35 < pvh_sa> one of my bugbears with galaxy's execution engine at
the moment is that it schedules tasks, not workflows... workflows
simply become a set of tasks at
                execution time
17:37 < jmchilton> pvh_sa: What should Galaxy be doing instead?
17:38 < jmchilton> i.e. how does one "schedule a workflow" if not by
individual tasks?
17:38 < pvh_sa> jmchilton, workflows should exist as objects in the
execution environment so that you can e.g. pause and restart them
17:38 < pvh_sa> once you've got that kind of model, you can also
re-execute failed tasks... at the moment if one task fails, i have to
restart the workflow from scratch....
17:39 < dannon> pvh_sa: That's the plan
17:39 < jmchilton> You can restart workflows now, dannon has added that.
17:39 < kellrott> I kind of like the 'separate tasks' workflow model,
because I'm looking into managing 'workflows' using external clients
through the API
17:39 < dannon> Actually, right now, you can restart workflows.
17:39 < dannon> yep.
17:39 < kellrott> that way I can read a workflow file, and then use
the query engine to find previously run jobs that could fullfill the
input needs of that workflow
17:40 < jmchilton> Pausing workflows seems like a reasonable request
though... maybe more advanced UI that lets you visualize and interact
with the workflow as a unit.
17:40 < dannon> And I think it's actually great that that sort of
thing *does* run externally Kyle
17:40 < pvh_sa> jmchilton, yep, something UI related would be good for that.
17:40 < dannon> Very near up on my list is picking up some previous
work with galaxy/messaging to integrate celery tasks instead of having
all these status loops
17:41 < pvh_sa> dannon, yup for celery
17:41 < kellrott> it lets me restart workflows, even if is a
completley different run of the workflow, and I 'forgot' about the
original
17:41 < dannon> Yep, using celery, for sure.
17:41 < pvh_sa> i'm also thinking that we want to reason across
provenance graphs
17:42 < pvh_sa> so e.g. a certain transform need not be repeated if it
already exists in the set of data products... e.g. we've already
created that BLAST db, just re-use it
17:42 < kellrott> thats exactly what I want (and parallel tools ;-)
17:42 < dannon> Right, and I think that's the sort of thing Kyle's
query work will really enable.
17:43 < pvh_sa> kellrott, have you read the paper on using... datalog
i think its called... for provenance recording?
17:43 < kellrott> I'm pretty sure we can do all of this via the api
17:43 < dannon> That's the idea, at least.
17:43 < kellrott> I don't think I've seen that one
17:43 < pvh_sa> yeah the API is king... a powerful API also lets us
cleanly split UI from backend
17:44 < dannon> Exactly
17:44 < kellrott> I'm not even thinking UI. I'm thinking about script
controlled workflows
17:44 < pvh_sa> "Datalog as a Lingua Franca for provenance querying
and reasoning"
17:45 < pvh_sa> seems lots of talking the same language going on. yay.
i'm busy hiring developers to work on exactly this kind of stuff
17:45 < pvh_sa> but yeah my 3-part agenda is basically 1) type system
2) execution engine 3) provenance representation
17:46 < kellrott> We are building toward running workflows on TCGA
data, so I need a system where I can call a mutation calling workflow
~4000 times
17:47 < jmchilton> pvn_sa: Enhancing Galaxy or replacing it? :)
17:47 < pvh_sa> jmchilton, ripping out bits, keeping the project together though
17:47 < kellrott> Hopefully enhancing Galaxy
17:47 < pvh_sa> jmchilton, you've got 130 000 lines of code
17:47 < pvh_sa> and a vibrant community
17:48 < pvh_sa> any way forward needs to build on that. there's lots
to do, but galaxy is like a central beacon we can all gravitate
towards
17:49 < pvh_sa> anyway, the current galaxy is already v2.0 compared to
the original galaxy described in the literature

On Sat, Oct 26, 2013 at 2:43 PM, Kyle Ellrott <[hidden email]> wrote:

> I think one of the aspects where Galaxy is a bit soft is the ability to do
> distributed tasks. The current system of split/replicate/merge tasks based
> on file type is a bit limited and hard for tool developers to expand upon.
> Distributed computing is a non-trival thing to implement and I think it
> would be a better use of our time to use an already existing framework. And
> it would also mean one less API for tool writers to have to develop for.
> I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ).
> You can see an overview of the Mesos architecture at
> https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
> The important thing about Mesos is that it provides an API for C/C++,
> Java/Scala and Python to write distributed frameworks. There are already
> implementations of frameworks for common parallel programming systems such
> as:
>  - Hadoop (https://github.com/mesos/hadoop)
>  - MPI
> (https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md)
>  - Spark (http://spark-project.org)
> And you can find example Python framework at
> https://github.com/apache/mesos/tree/master/src/examples/python
>
> Integration with Galaxy would have three parts:
> 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
> passed to tool wrappers and allows them to contact the local mesos
> infrastructure (assuming the system has been configured) or pass a null if
> the system isn't available.
> 2) Write a tool runner that works as a mesos framework to executes single
> cpu jobs on the distributed system.
> 3) For instances where mesos is not available at a system wide level (say
> they only have access to an SGE based cluster), but the user wants to run
> distributed jobs, write a wrapper that can create a mesos cluster using the
> existing queueing system. For example, right now I run a Mesos system under
> the SGE queue system.
>
> I'm curious to see what other people think.
>
> Kyle
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Mesos to Enable distributed computing under Galaxy?

Ravi Madduri
In reply to this post by Kyle Ellrott
Kyle
This is something I am very interested in. The three parts below make sense to me. I would be very happy to discuss further and provide any help to move this forward.

Regards
On Oct 26, 2013, at 2:43 PM, Kyle Ellrott <[hidden email]> wrote:

I think one of the aspects where Galaxy is a bit soft is the ability to do distributed tasks. The current system of split/replicate/merge tasks based on file type is a bit limited and hard for tool developers to expand upon. Distributed computing is a non-trival thing to implement and I think it would be a better use of our time to use an already existing framework. And it would also mean one less API for tool writers to have to develop for.
I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ). You can see an overview of the Mesos architecture at https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
The important thing about Mesos is that it provides an API for C/C++, Java/Scala and Python to write distributed frameworks. There are already implementations of frameworks for common parallel programming systems such as:
And you can find example Python framework at https://github.com/apache/mesos/tree/master/src/examples/python

Integration with Galaxy would have three parts:
1) Add a system config variable to Galaxy called 'MESOS_URL' that is then passed to tool wrappers and allows them to contact the local mesos infrastructure (assuming the system has been configured) or pass a null if the system isn't available. 
2) Write a tool runner that works as a mesos framework to executes single cpu jobs on the distributed system.
3) For instances where mesos is not available at a system wide level (say they only have access to an SGE based cluster), but the user wants to run distributed jobs, write a wrapper that can create a mesos cluster using the existing queueing system. For example, right now I run a Mesos system under the SGE queue system.

I'm curious to see what other people think.

Kyle
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

--
Ravi K Madduri
MCS, Argonne National Laboratory
Computation Institute, University of Chicago


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Mesos to Enable distributed computing under Galaxy?

Kyle Ellrott
I don't think implementation will be very difficult. The bigger question is this a technology people are open to?
The nearest competitor is YARN (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html). Mesos seems a bit more geared toward general purpose usage (with several existing frameworks), while YARN seems more specific to Hadoop. But I'd be glad to hear some other thoughts.

Kyle


On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri <[hidden email]> wrote:
Kyle
This is something I am very interested in. The three parts below make sense to me. I would be very happy to discuss further and provide any help to move this forward.

Regards
On Oct 26, 2013, at 2:43 PM, Kyle Ellrott <[hidden email]> wrote:

I think one of the aspects where Galaxy is a bit soft is the ability to do distributed tasks. The current system of split/replicate/merge tasks based on file type is a bit limited and hard for tool developers to expand upon. Distributed computing is a non-trival thing to implement and I think it would be a better use of our time to use an already existing framework. And it would also mean one less API for tool writers to have to develop for.
I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ). You can see an overview of the Mesos architecture at https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
The important thing about Mesos is that it provides an API for C/C++, Java/Scala and Python to write distributed frameworks. There are already implementations of frameworks for common parallel programming systems such as:
And you can find example Python framework at https://github.com/apache/mesos/tree/master/src/examples/python

Integration with Galaxy would have three parts:
1) Add a system config variable to Galaxy called 'MESOS_URL' that is then passed to tool wrappers and allows them to contact the local mesos infrastructure (assuming the system has been configured) or pass a null if the system isn't available. 
2) Write a tool runner that works as a mesos framework to executes single cpu jobs on the distributed system.
3) For instances where mesos is not available at a system wide level (say they only have access to an SGE based cluster), but the user wants to run distributed jobs, write a wrapper that can create a mesos cluster using the existing queueing system. For example, right now I run a Mesos system under the SGE queue system.

I'm curious to see what other people think.

Kyle
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

--
Ravi K Madduri
MCS, Argonne National Laboratory
Computation Institute, University of Chicago



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Mesos to Enable distributed computing under Galaxy?

Ketan Maheshwari
Hi Kyle,

We have a similar ongoing development wherein we are working on integrating our Swift framework ( swift-lang.org ) with Galaxy. The goal is to enable Galaxy based applications to run on a variety of distributed resources via various integration schemes as suitable to application and underlying execution environment. 

Here is an abstract of a paper (co-authored with Ravi, who responded on this thread) we will be presenting in a workshop at the upcoming SC 13 conference:

"The Galaxy platform is a web-based science portal for scientific computing supporting Life Sciences users community. While user-friendly and intuitive for doing small to medium scale computations, it currently has a limited support for large-scale, parallel and distributed computing. The Swift parallel scripting framework is capable of composing ordinary applications into parallel scripts that can be run on multi-scale distributed and performance computing platforms. In complex distributed environments, often the user end of application lifecycle slows down because of the technical complexities brought in by the scale, access methods and resource management nuances. Galaxy offers a simple way of designing, composing, executing, reusing, and reproducing application runs. An integration between Swift and Galaxy systems can accelerate science as well as bring the respective user communities together in an interactive, user-friendly, parallel and distributed data analysis environment enabled on a broad range of computational infrastructures."

Kindly let us know if you need a hands on for the various tools we have already developed.


Best,
Ketan



On Mon, Oct 28, 2013 at 3:07 PM, Kyle Ellrott <[hidden email]> wrote:
I don't think implementation will be very difficult. The bigger question is this a technology people are open to?
The nearest competitor is YARN (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html). Mesos seems a bit more geared toward general purpose usage (with several existing frameworks), while YARN seems more specific to Hadoop. But I'd be glad to hear some other thoughts.

Kyle


On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri <[hidden email]> wrote:
Kyle
This is something I am very interested in. The three parts below make sense to me. I would be very happy to discuss further and provide any help to move this forward.

Regards
On Oct 26, 2013, at 2:43 PM, Kyle Ellrott <[hidden email]> wrote:

I think one of the aspects where Galaxy is a bit soft is the ability to do distributed tasks. The current system of split/replicate/merge tasks based on file type is a bit limited and hard for tool developers to expand upon. Distributed computing is a non-trival thing to implement and I think it would be a better use of our time to use an already existing framework. And it would also mean one less API for tool writers to have to develop for.
I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ). You can see an overview of the Mesos architecture at https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
The important thing about Mesos is that it provides an API for C/C++, Java/Scala and Python to write distributed frameworks. There are already implementations of frameworks for common parallel programming systems such as:
And you can find example Python framework at https://github.com/apache/mesos/tree/master/src/examples/python

Integration with Galaxy would have three parts:
1) Add a system config variable to Galaxy called 'MESOS_URL' that is then passed to tool wrappers and allows them to contact the local mesos infrastructure (assuming the system has been configured) or pass a null if the system isn't available. 
2) Write a tool runner that works as a mesos framework to executes single cpu jobs on the distributed system.
3) For instances where mesos is not available at a system wide level (say they only have access to an SGE based cluster), but the user wants to run distributed jobs, write a wrapper that can create a mesos cluster using the existing queueing system. For example, right now I run a Mesos system under the SGE queue system.

I'm curious to see what other people think.

Kyle
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

--
Ravi K Madduri
MCS, Argonne National Laboratory
Computation Institute, University of Chicago



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/



--
Ketan


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Mesos to Enable distributed computing under Galaxy?

Kyle Ellrott
You probably are a good person to get an opinion from. My plan isn't to write new frameworks, but rather use existing libraries that can communicate with Mesos to setup their parallel environments.
But for Swift, you would probably want to write a new framework. Just looking at Swift, I imagine one of the harder parts is just getting the system setup on a cluster (ie distributing out files to remote nodes, making sure that you have a way to start processes on those nodes and have them know where to find the master), it seems like Swift could benefit from having a Mesos based framework. Do you think it would enable you to have a 'zero-config' startup of a distributed Swift application?

Kyle
 


On Mon, Oct 28, 2013 at 1:51 PM, Ketan Maheshwari <[hidden email]> wrote:
Hi Kyle,

We have a similar ongoing development wherein we are working on integrating our Swift framework ( swift-lang.org ) with Galaxy. The goal is to enable Galaxy based applications to run on a variety of distributed resources via various integration schemes as suitable to application and underlying execution environment. 

Here is an abstract of a paper (co-authored with Ravi, who responded on this thread) we will be presenting in a workshop at the upcoming SC 13 conference:

"The Galaxy platform is a web-based science portal for scientific computing supporting Life Sciences users community. While user-friendly and intuitive for doing small to medium scale computations, it currently has a limited support for large-scale, parallel and distributed computing. The Swift parallel scripting framework is capable of composing ordinary applications into parallel scripts that can be run on multi-scale distributed and performance computing platforms. In complex distributed environments, often the user end of application lifecycle slows down because of the technical complexities brought in by the scale, access methods and resource management nuances. Galaxy offers a simple way of designing, composing, executing, reusing, and reproducing application runs. An integration between Swift and Galaxy systems can accelerate science as well as bring the respective user communities together in an interactive, user-friendly, parallel and distributed data analysis environment enabled on a broad range of computational infrastructures."

Kindly let us know if you need a hands on for the various tools we have already developed.


Best,
Ketan



On Mon, Oct 28, 2013 at 3:07 PM, Kyle Ellrott <[hidden email]> wrote:
I don't think implementation will be very difficult. The bigger question is this a technology people are open to?
The nearest competitor is YARN (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html). Mesos seems a bit more geared toward general purpose usage (with several existing frameworks), while YARN seems more specific to Hadoop. But I'd be glad to hear some other thoughts.

Kyle


On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri <[hidden email]> wrote:
Kyle
This is something I am very interested in. The three parts below make sense to me. I would be very happy to discuss further and provide any help to move this forward.

Regards
On Oct 26, 2013, at 2:43 PM, Kyle Ellrott <[hidden email]> wrote:

I think one of the aspects where Galaxy is a bit soft is the ability to do distributed tasks. The current system of split/replicate/merge tasks based on file type is a bit limited and hard for tool developers to expand upon. Distributed computing is a non-trival thing to implement and I think it would be a better use of our time to use an already existing framework. And it would also mean one less API for tool writers to have to develop for.
I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ). You can see an overview of the Mesos architecture at https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
The important thing about Mesos is that it provides an API for C/C++, Java/Scala and Python to write distributed frameworks. There are already implementations of frameworks for common parallel programming systems such as:
And you can find example Python framework at https://github.com/apache/mesos/tree/master/src/examples/python

Integration with Galaxy would have three parts:
1) Add a system config variable to Galaxy called 'MESOS_URL' that is then passed to tool wrappers and allows them to contact the local mesos infrastructure (assuming the system has been configured) or pass a null if the system isn't available. 
2) Write a tool runner that works as a mesos framework to executes single cpu jobs on the distributed system.
3) For instances where mesos is not available at a system wide level (say they only have access to an SGE based cluster), but the user wants to run distributed jobs, write a wrapper that can create a mesos cluster using the existing queueing system. For example, right now I run a Mesos system under the SGE queue system.

I'm curious to see what other people think.

Kyle
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

--
Ravi K Madduri
MCS, Argonne National Laboratory
Computation Institute, University of Chicago



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/



--
Ketan



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Mesos to Enable distributed computing under Galaxy?

Ketan Maheshwari
Hi Kyle,

Swift indeed is a complete framework for distributed computing. Distributing files out to cluster nodes, starting processes, bringing back result files to submit host is done out of the box (stagein-exec-stageout cycle).

We can discuss offline if you are interested in giving it a shot.

Best,
Ketan


On Mon, Oct 28, 2013 at 4:14 PM, Kyle Ellrott <[hidden email]> wrote:
You probably are a good person to get an opinion from. My plan isn't to write new frameworks, but rather use existing libraries that can communicate with Mesos to setup their parallel environments.
But for Swift, you would probably want to write a new framework. Just looking at Swift, I imagine one of the harder parts is just getting the system setup on a cluster (ie distributing out files to remote nodes, making sure that you have a way to start processes on those nodes and have them know where to find the master), it seems like Swift could benefit from having a Mesos based framework. Do you think it would enable you to have a 'zero-config' startup of a distributed Swift application?

Kyle
 


On Mon, Oct 28, 2013 at 1:51 PM, Ketan Maheshwari <[hidden email]> wrote:
Hi Kyle,

We have a similar ongoing development wherein we are working on integrating our Swift framework ( swift-lang.org ) with Galaxy. The goal is to enable Galaxy based applications to run on a variety of distributed resources via various integration schemes as suitable to application and underlying execution environment. 

Here is an abstract of a paper (co-authored with Ravi, who responded on this thread) we will be presenting in a workshop at the upcoming SC 13 conference:

"The Galaxy platform is a web-based science portal for scientific computing supporting Life Sciences users community. While user-friendly and intuitive for doing small to medium scale computations, it currently has a limited support for large-scale, parallel and distributed computing. The Swift parallel scripting framework is capable of composing ordinary applications into parallel scripts that can be run on multi-scale distributed and performance computing platforms. In complex distributed environments, often the user end of application lifecycle slows down because of the technical complexities brought in by the scale, access methods and resource management nuances. Galaxy offers a simple way of designing, composing, executing, reusing, and reproducing application runs. An integration between Swift and Galaxy systems can accelerate science as well as bring the respective user communities together in an interactive, user-friendly, parallel and distributed data analysis environment enabled on a broad range of computational infrastructures."

Kindly let us know if you need a hands on for the various tools we have already developed.


Best,
Ketan



On Mon, Oct 28, 2013 at 3:07 PM, Kyle Ellrott <[hidden email]> wrote:
I don't think implementation will be very difficult. The bigger question is this a technology people are open to?
The nearest competitor is YARN (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html). Mesos seems a bit more geared toward general purpose usage (with several existing frameworks), while YARN seems more specific to Hadoop. But I'd be glad to hear some other thoughts.

Kyle


On Mon, Oct 28, 2013 at 12:55 PM, Ravi K Madduri <[hidden email]> wrote:
Kyle
This is something I am very interested in. The three parts below make sense to me. I would be very happy to discuss further and provide any help to move this forward.

Regards
On Oct 26, 2013, at 2:43 PM, Kyle Ellrott <[hidden email]> wrote:

I think one of the aspects where Galaxy is a bit soft is the ability to do distributed tasks. The current system of split/replicate/merge tasks based on file type is a bit limited and hard for tool developers to expand upon. Distributed computing is a non-trival thing to implement and I think it would be a better use of our time to use an already existing framework. And it would also mean one less API for tool writers to have to develop for.
I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ). You can see an overview of the Mesos architecture at https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
The important thing about Mesos is that it provides an API for C/C++, Java/Scala and Python to write distributed frameworks. There are already implementations of frameworks for common parallel programming systems such as:
And you can find example Python framework at https://github.com/apache/mesos/tree/master/src/examples/python

Integration with Galaxy would have three parts:
1) Add a system config variable to Galaxy called 'MESOS_URL' that is then passed to tool wrappers and allows them to contact the local mesos infrastructure (assuming the system has been configured) or pass a null if the system isn't available. 
2) Write a tool runner that works as a mesos framework to executes single cpu jobs on the distributed system.
3) For instances where mesos is not available at a system wide level (say they only have access to an SGE based cluster), but the user wants to run distributed jobs, write a wrapper that can create a mesos cluster using the existing queueing system. For example, right now I run a Mesos system under the SGE queue system.

I'm curious to see what other people think.

Kyle
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

--
Ravi K Madduri
MCS, Argonne National Laboratory
Computation Institute, University of Chicago



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/



--
Ketan





--
Ketan


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Mesos to Enable distributed computing under Galaxy?

John Chilton-4
In reply to this post by Kyle Ellrott
Hey Kyle, all,

  If anyone wants to play with running Galaxy jobs within an Apache
Mesos environment I have added a prototype of this feature to the LWR.

https://bitbucket.org/jmchilton/lwr/commits/555438d2fe266899338474b25c540fef42bcece7
https://bitbucket.org/jmchilton/lwr/commits/9748b3035dbe3802d4136a6a1028df8395a9aeb3

This work distributes jobs across a Mesos cluster and injects a
MESOS_URL environment variable into the job runtime environment in
case the jobs themselves want to take advantage of Mesos.

The advantage of the LWR versus a traditional Galaxy runner is that
the job can be staged to remote resources without shared disk. Prior
to this I was imaging the LWR to be useful in cases where Galaxy and
remote cluster don't share common disk but where there is in fact a
shared scratch directory or something across the remote cluster as
well a resource manager. The LWR Mesos framework however has the
actual compute servers themselves stage the job up and down - so you
could imagine distributing Galaxy across large clusters without any
shared disk whatsoever - that could be very cool and help scale say
cloud applications.

Downsides of an LWR-based approach versus a Galaxy approach is that it
is less mature and there is more stuff to configure - need to
configure a Galaxy job_conf plugin and destination, need to configure
the LWR itself, need to configure a message queue (for this variant of
LWR operation anyway - it should be possible to drive this via the LWR
in web server mode but I haven't added it yet). I would be more than
happy to continue to see progress toward Mesos support in Galaxy
proper.

It is strictly a prototype so far - a sort of playground if anyone
wants to play with these ideas and build something cool. It really is
a "framework" right - not so much a job scheduler so I am not sure it
is very immediately useful - but I imagine one could build cool stuff
on top of it.

Next, I think I would like to add Apache Aurora
(http://aurora.incubator.apache.org/) support - because it seems like
a much more traditional resource manager but built on top of Mesos so
it would be more practical for traditional Galaxy-style jobs. Doesn't
buy you anything in terms of parallelization but it would "fit better"
with Galaxy.

-John


On Sat, Oct 26, 2013 at 2:43 PM, Kyle Ellrott <[hidden email]> wrote:

> I think one of the aspects where Galaxy is a bit soft is the ability to do
> distributed tasks. The current system of split/replicate/merge tasks based
> on file type is a bit limited and hard for tool developers to expand upon.
> Distributed computing is a non-trival thing to implement and I think it
> would be a better use of our time to use an already existing framework. And
> it would also mean one less API for tool writers to have to develop for.
> I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ).
> You can see an overview of the Mesos architecture at
> https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
> The important thing about Mesos is that it provides an API for C/C++,
> Java/Scala and Python to write distributed frameworks. There are already
> implementations of frameworks for common parallel programming systems such
> as:
>  - Hadoop (https://github.com/mesos/hadoop)
>  - MPI
> (https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md)
>  - Spark (http://spark-project.org)
> And you can find example Python framework at
> https://github.com/apache/mesos/tree/master/src/examples/python
>
> Integration with Galaxy would have three parts:
> 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
> passed to tool wrappers and allows them to contact the local mesos
> infrastructure (assuming the system has been configured) or pass a null if
> the system isn't available.
> 2) Write a tool runner that works as a mesos framework to executes single
> cpu jobs on the distributed system.
> 3) For instances where mesos is not available at a system wide level (say
> they only have access to an SGE based cluster), but the user wants to run
> distributed jobs, write a wrapper that can create a mesos cluster using the
> existing queueing system. For example, right now I run a Mesos system under
> the SGE queue system.
>
> I'm curious to see what other people think.
>
> Kyle
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Using Mesos to Enable distributed computing under Galaxy?

Kyle Ellrott
Glad to see someone else is playing around with Mesos. 
I have a mesos branch that is getting a little long in the tooth. I'd like to get a straight job runner (non-LWR, with a shared file system) running under mesos for Galaxy before I submit that work for a pull request.

The hackathon is only 12 days away! Hopefully we'll be able to make some progress on these sorts of projects.

Kyle



On Sun, Jun 15, 2014 at 4:06 PM, John Chilton <[hidden email]> wrote:
Hey Kyle, all,

  If anyone wants to play with running Galaxy jobs within an Apache
Mesos environment I have added a prototype of this feature to the LWR.

https://bitbucket.org/jmchilton/lwr/commits/555438d2fe266899338474b25c540fef42bcece7
https://bitbucket.org/jmchilton/lwr/commits/9748b3035dbe3802d4136a6a1028df8395a9aeb3

This work distributes jobs across a Mesos cluster and injects a
MESOS_URL environment variable into the job runtime environment in
case the jobs themselves want to take advantage of Mesos.

The advantage of the LWR versus a traditional Galaxy runner is that
the job can be staged to remote resources without shared disk. Prior
to this I was imaging the LWR to be useful in cases where Galaxy and
remote cluster don't share common disk but where there is in fact a
shared scratch directory or something across the remote cluster as
well a resource manager. The LWR Mesos framework however has the
actual compute servers themselves stage the job up and down - so you
could imagine distributing Galaxy across large clusters without any
shared disk whatsoever - that could be very cool and help scale say
cloud applications.

Downsides of an LWR-based approach versus a Galaxy approach is that it
is less mature and there is more stuff to configure - need to
configure a Galaxy job_conf plugin and destination, need to configure
the LWR itself, need to configure a message queue (for this variant of
LWR operation anyway - it should be possible to drive this via the LWR
in web server mode but I haven't added it yet). I would be more than
happy to continue to see progress toward Mesos support in Galaxy
proper.

It is strictly a prototype so far - a sort of playground if anyone
wants to play with these ideas and build something cool. It really is
a "framework" right - not so much a job scheduler so I am not sure it
is very immediately useful - but I imagine one could build cool stuff
on top of it.

Next, I think I would like to add Apache Aurora
(http://aurora.incubator.apache.org/) support - because it seems like
a much more traditional resource manager but built on top of Mesos so
it would be more practical for traditional Galaxy-style jobs. Doesn't
buy you anything in terms of parallelization but it would "fit better"
with Galaxy.

-John


On Sat, Oct 26, 2013 at 2:43 PM, Kyle Ellrott <[hidden email]> wrote:
> I think one of the aspects where Galaxy is a bit soft is the ability to do
> distributed tasks. The current system of split/replicate/merge tasks based
> on file type is a bit limited and hard for tool developers to expand upon.
> Distributed computing is a non-trival thing to implement and I think it
> would be a better use of our time to use an already existing framework. And
> it would also mean one less API for tool writers to have to develop for.
> I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ).
> You can see an overview of the Mesos architecture at
> https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
> The important thing about Mesos is that it provides an API for C/C++,
> Java/Scala and Python to write distributed frameworks. There are already
> implementations of frameworks for common parallel programming systems such
> as:
>  - Hadoop (https://github.com/mesos/hadoop)
>  - MPI
> (https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-mesos.md)
>  - Spark (http://spark-project.org)
> And you can find example Python framework at
> https://github.com/apache/mesos/tree/master/src/examples/python
>
> Integration with Galaxy would have three parts:
> 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
> passed to tool wrappers and allows them to contact the local mesos
> infrastructure (assuming the system has been configured) or pass a null if
> the system isn't available.
> 2) Write a tool runner that works as a mesos framework to executes single
> cpu jobs on the distributed system.
> 3) For instances where mesos is not available at a system wide level (say
> they only have access to an SGE based cluster), but the user wants to run
> distributed jobs, write a wrapper that can create a mesos cluster using the
> existing queueing system. For example, right now I run a Mesos system under
> the SGE queue system.
>
> I'm curious to see what other people think.
>
> Kyle
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Loading...