Galaxy installation inside a secure environment

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Galaxy installation inside a secure environment

azab

​​Hi Galaxy people,


At the University of Oslo, we have an infrastructure for storage and computation on sensitive data, called TSD (presented in the attached image). from inside, the infrastructure is logically divided into projects. Each project has a set of shared VMs (among its users) which are running over a file-system (HNAS). The shared project VMS has access to a computational SLURM cluster which is installed on another file-system (Colossus). There are directories in colossus which are mounted on HNAS for each project in order to allow each project users to push files into colossus and run jobs on the cluster. Users of a particular project can access the project main VMs through user VMs (one for each user). Each project VMs (both shared and user VMs) are in a separate subnet.


All of this is inside the TSD. Now to access the TSD from outside, there is a complex authentication mechanism where a user can access his/her own VM. And to transfer data from/to the TSD is another complex story.  The important thing is that there is NO internet access in or out.​


There are two issues here:


1 - What we need to do is to install one Galaxy VM inside each project area, so that it is accessible by all project users. But we cannot use mercurial to access your distribution server. We can though install a bitbucket server inside the TSD and have the code-base there, so that It can be accessed by all project VMs, but I'm not sure what is the procedure here.

2- We are very concerned about the issue of regularly updating Galaxy instances in projects to the recent release. In many cases it causes many problems, e.g. tool versioning conflicts. So we have the idea of installing each of our tools together with all of its dependencies in a separate docker container, and run those as images on each Galaxy project VM. Is this possible and tested? Should this permanently solve the upgrading problem, or do you suggested another alternative? 


Thank you,

Yours sincerely,
Abdulrahman Azab

Head engineer, ELIXIR.NO / The Genomic HyperBrowser team
Department of Informatics, University of Oslo, Boks 1072 Blindern, NO-0316 OSLO, Norway
Email: [hidden email], Cell-phone: +47 46797339
----
Senior Lecturer in Computer Engineering
Faculty of Engineering, University of Mansoura, 35516-Mansoura, Egypt

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

TSD.png (211K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Galaxy installation inside a secure environment

John Chilton-4
On Thu, Jan 15, 2015 at 10:05 AM, Abdulrahman Azab <[hidden email]> wrote:

> Hi Galaxy people,
>
>
> At the University of Oslo, we have an infrastructure for storage and
> computation on sensitive data, called TSD (presented in the attached image).
> from inside, the infrastructure is logically divided into projects. Each
> project has a set of shared VMs (among its users) which are running over a
> file-system (HNAS). The shared project VMS has access to a computational
> SLURM cluster which is installed on another file-system (Colossus). There
> are directories in colossus which are mounted on HNAS for each project in
> order to allow each project users to push files into colossus and run jobs
> on the cluster. Users of a particular project can access the project main
> VMs through user VMs (one for each user). Each project VMs (both shared and
> user VMs) are in a separate subnet.
>
>
> All of this is inside the TSD. Now to access the TSD from outside, there is
> a complex authentication mechanism where a user can access his/her own VM.
> And to transfer data from/to the TSD is another complex story.  The
> important thing is that there is NO internet access in or out.
>
>
> There are two issues here:
>
>
> 1 - What we need to do is to install one Galaxy VM inside each project area,
> so that it is accessible by all project users. But we cannot use mercurial
> to access your distribution server. We can though install a bitbucket server
> inside the TSD and have the code-base there, so that It can be accessed by
> all project VMs, but I'm not sure what is the procedure here.

You can clone mercurial (or in the future git) repositories to some
file system that is accessible both internal to the firewalls and
external to them (I believe there has to be some file system like this
- even if it is just a USB stick - in order to install new software :)
). You can then treat the repository in the shared location as the
source of Galaxy and clone/update against it.

At a very high-level the initial clone process might look like this:

(desktop) % ssh login_node
(login_node) % cd /shared_directory/
(login_node) % hg clone https://bitbucket.org/galaxy/galaxy-dist galaxy-dist
(login_node) % exit
(desktop) % ssh secure_node
(secure_node) % cd /project_directory ; hg clone
/shared_directory/galaxy-dist galaxy-dist
(secure_node) % cd galaxy-dist ; hg update latest_2015.01.13
(secure_node) % exit
(desktop) %

then doing updates might look like this:

(desktop) % ssh login_node
(login_node) % cd /shared_directory/
(login_node) % hg pull https://bitbucket.org/galaxy/galaxy-dist galaxy-dist
(login_node) % exit
(desktop) % ssh secure_node
(secure_node) % cd /project_directory/galaxy-dist
(secure_node) % hg update latest_future_tag_name
(secure_node) % exit
(desktop) %


>
> 2- We are very concerned about the issue of regularly updating Galaxy
> instances in projects to the recent release. In many cases it causes many
> problems, e.g. tool versioning conflicts. So we have the idea of installing
> each of our tools together with all of its dependencies in a separate docker
> container, and run those as images on each Galaxy project VM. Is this
> possible and tested? Should this permanently solve the upgrading problem, or
> do you suggested another alternative?

Tool dependencies can be installed in Docker containers - several
people have tested it - and while there can be some problems getting
everything setup (a lot of moving pieces) I think it works fine once
setup.

I really like Aaron Petkau's tutorial here:
https://github.com/apetkau/galaxy-hackathon-2014
https://github.com/apetkau/galaxy-hackathon-2014/smalt

Another blog post with more information was put together by the Galaxy
User Group Grand Ouest here:
https://www.e-biogenouest.org/groups/guggo/wiki/FirstGenOuest

And Galaxy's wiki documentation can be found here:
https://wiki.galaxyproject.org/Admin/Tools/Docker

Kyle Ellrott is really getting the Dockerized tools to run at scale -
and we have worked with him to really scale things up across a large
cluster. I will try to update the wiki with some of that information.

Setting up a local tool shed might be another approach to address this
reproduciblity/maintenance problem - but I generally discourage the
use of local tool sheds (but cannot access the Internet might be a
very good reason to set this up).

Hope this helps,
-John

>
>
> Thank you,
>
> Yours sincerely,
> Abdulrahman Azab
>
> Head engineer, ELIXIR.NO / The Genomic HyperBrowser team
> Department of Informatics, University of Oslo, Boks 1072 Blindern, NO-0316
> OSLO, Norway
> Email: [hidden email], Cell-phone: +47 46797339
> ----
> Senior Lecturer in Computer Engineering
> Faculty of Engineering, University of Mansoura, 35516-Mansoura, Egypt
> Email: [hidden email]
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/