R bioconductor dependencies when creating toolshed installation

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

R bioconductor dependencies when creating toolshed installation

Stef van Lieshout
Hi all,

I'm running into some difficulties on how to setup the installation
procedure for a galaxy tool which executes an R script and has certain
dependencies (mainly bioconductor packages). R can deal with
dependencies, packages can be installed with install.packages (has a
"dependencies" argument) or biocLite() for bioconductor packages.

Yet, now I want my tool to be available at toolsheds. To do this I see
several options:

1) setting up tool_dependencies.xml with "R CMD INSTALL" for all
packages. BUT: need to download all dependencies before install, and can
older versions still be downloaded? Maybe need to upload them to
toolshed too..

2) setting up tool_dependencies.xml to call an installation script with
Rscript (where I could use install.packages), BUT: Dependencies are
taken care of. But how do I select specific (older) versions, because if
I dont, installing at different time can give different version.

3) creating a repository for each package and have all of them as
requirement in my galaxy tool. BUT: a lot of work for a lot of
dependencies

All have pros and cons, how do people deal with this?

Stef
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: R bioconductor dependencies when creating toolshed installation

Kandalaft, Iyad
I would typically recommend Option 3 as it is the best practice.  However, human resources limit this as a viable option even though this should be the "Gold Standard" that you aim for.  This allows you to reuse the dependencies later for other tool wrappers AND you don't have to re-install dependencies every time you make a modification to your tool wrapper repository.  While briefly looking at Bioconductor, it seems that they keep old version of packages (ex: http://www.bioconductor.org/packages/2.13/data/experiment/bin/windows/contrib/3.0/AmpAffyExample_1.2.13.zip), where using the URLs directly might be advantageous if their BiocLite doesn't allow you to define which version to install.  You don't necessarily need to have an external R script for the installation because many of these commands can be done within the tool_dependencies.xml.

Regards,


Iyad Kandalaft
Microbial Biodiversity Bioinformatics
Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada
960 Carling Ave.| 960 Ave. Carling
Ottawa, ON| Ottawa (ON) K1A 0C6
E-mail Address / Adresse courriel  [hidden email]
Telephone | Téléphone 613-759-1228
Facsimile | Télécopieur 613-759-1701
Teletypewriter | Téléimprimeur 613-773-2600
Government of Canada | Gouvernement du Canada



-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Stef van Lieshout
Sent: Monday, June 16, 2014 10:04 AM
To: [hidden email]
Subject: [galaxy-dev] R bioconductor dependencies when creating toolshed installation

Hi all,

I'm running into some difficulties on how to setup the installation procedure for a galaxy tool which executes an R script and has certain dependencies (mainly bioconductor packages). R can deal with dependencies, packages can be installed with install.packages (has a "dependencies" argument) or biocLite() for bioconductor packages.

Yet, now I want my tool to be available at toolsheds. To do this I see several options:

1) setting up tool_dependencies.xml with "R CMD INSTALL" for all packages. BUT: need to download all dependencies before install, and can older versions still be downloaded? Maybe need to upload them to toolshed too..

2) setting up tool_dependencies.xml to call an installation script with Rscript (where I could use install.packages), BUT: Dependencies are taken care of. But how do I select specific (older) versions, because if I dont, installing at different time can give different version.

3) creating a repository for each package and have all of them as requirement in my galaxy tool. BUT: a lot of work for a lot of dependencies

All have pros and cons, how do people deal with this?

Stef
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: R bioconductor dependencies when creating toolshed installation

Stef van Lieshout
Say option 3 is the way to go, would you say every new version of an R
package should be wrapped in a new galaxy package (and give them names
like "matrixStats_0_10_0") or create one package ("matrixStats") and
update that one if a new version is worth an update. In the first way
there would be an enormous amount of packages ;)

Also if you do need an external R script as you say, how would I
construct my tool_dependencies.xml to execute R code?

And last, if that approach doesn't work out for me, how can copy a file
in the repository to the installation dir? (to execute it with Rscript)

Many thanks,
Stef


----- Original message -----
From: "Kandalaft, Iyad" <[hidden email]>
To: Stef van Lieshout <[hidden email]>,
"[hidden email]" <[hidden email]>
Subject: RE: [galaxy-dev] R bioconductor dependencies when creating
toolshed installation
Date: Mon, 16 Jun 2014 18:19:46 +0000

I would typically recommend Option 3 as it is the best practice.
However, human resources limit this as a viable option even though this
should be the "Gold Standard" that you aim for.  This allows you to
reuse the dependencies later for other tool wrappers AND you don't have
to re-install dependencies every time you make a modification to your
tool wrapper repository.  While briefly looking at Bioconductor, it
seems that they keep old version of packages (ex:
http://www.bioconductor.org/packages/2.13/data/experiment/bin/windows/contrib/3.0/AmpAffyExample_1.2.13.zip),
where using the URLs directly might be advantageous if their BiocLite
doesn't allow you to define which version to install.  You don't
necessarily need to have an external R script for the installation
because many of these commands can be done within the
tool_dependencies.xml.

Regards,


Iyad Kandalaft
Microbial Biodiversity Bioinformatics
Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada
960 Carling Ave.| 960 Ave. Carling
Ottawa, ON| Ottawa (ON) K1A 0C6
E-mail Address / Adresse courriel  [hidden email]
Telephone | Téléphone 613-759-1228
Facsimile | Télécopieur 613-759-1701
Teletypewriter | Téléimprimeur 613-773-2600
Government of Canada | Gouvernement du Canada



-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Stef van
Lieshout
Sent: Monday, June 16, 2014 10:04 AM
To: [hidden email]
Subject: [galaxy-dev] R bioconductor dependencies when creating toolshed
installation

Hi all,

I'm running into some difficulties on how to setup the installation
procedure for a galaxy tool which executes an R script and has certain
dependencies (mainly bioconductor packages). R can deal with
dependencies, packages can be installed with install.packages (has a
"dependencies" argument) or biocLite() for bioconductor packages.

Yet, now I want my tool to be available at toolsheds. To do this I see
several options:

1) setting up tool_dependencies.xml with "R CMD INSTALL" for all
packages. BUT: need to download all dependencies before install, and can
older versions still be downloaded? Maybe need to upload them to
toolshed too..

2) setting up tool_dependencies.xml to call an installation script with
Rscript (where I could use install.packages), BUT: Dependencies are
taken care of. But how do I select specific (older) versions, because if
I dont, installing at different time can give different version.

3) creating a repository for each package and have all of them as
requirement in my galaxy tool. BUT: a lot of work for a lot of
dependencies

All have pros and cons, how do people deal with this?

Stef
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this and other
Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: R bioconductor dependencies when creating toolshed installation

Björn Grüning-3
In reply to this post by Stef van Lieshout
Hi Stef,

for R packages we have a special installation routine that will
(hapefully) make your life easier.

> I'm running into some difficulties on how to setup the installation
> procedure for a galaxy tool which executes an R script and has certain
> dependencies (mainly bioconductor packages). R can deal with
> dependencies, packages can be installed with install.packages (has a
> "dependencies" argument) or biocLite() for bioconductor packages.
>
> Yet, now I want my tool to be available at toolsheds. To do this I see
> several options:

Great!

> 1) setting up tool_dependencies.xml with "R CMD INSTALL" for all
> packages. BUT: need to download all dependencies before install, and can
> older versions still be downloaded? Maybe need to upload them to
> toolshed too..

It is all a matter of how reproducible you want to have your tool.
If you want 100% reproducibility, you need to mirror the source packages
somehow, because bioc will not store older versions. At least that is
not guaranteed.

I'm using a special github repository for that purpose:
https://github.com/bgruening/download_store

R CMD INSTALL is not needed, see below.

> 2) setting up tool_dependencies.xml to call an installation script with
> Rscript (where I could use install.packages), BUT: Dependencies are
> taken care of. But how do I select specific (older) versions, because if
> I dont, installing at different time can give different version.

Older versions is not possible as far as I know.

> 3) creating a repository for each package and have all of them as
> requirement in my galaxy tool. BUT: a lot of work for a lot of
> dependencies

Imho, we should have one R repository with a handful of standard
packages included in the toolshed.
Like packages_r_3_0_1. You should depend on that repository and
additionally define one second dependency. Lets say your tool is called
deseq2 than create one additional tool_dependencies.xml file called
package_deseq2_1_2_10. In that definition you will install every
dependency you need in addition to R.

Here is one example:
https://github.com/bgruening/galaxytools/blob/master/deseq2/tool_dependencies.xml
https://github.com/bgruening/galaxytools/blob/master/orphan_tool_dependencies/package_deseq2_1_2_10/tool_dependencies.xml

The really nice part is the "setup_r_environment" function from the
toolshed. It will install source packages for you automatically. All you
need to do is to name the package or, as shown in the example, specify
the location of the source package.

The only downside is that the order of these packages is important. If
you are interested we have a script that will give you the correct
dependency tree of a given package.

Hope that helps,
Bjoern


> All have pros and cons, how do people deal with this?
>
> Stef
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>    http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>    http://galaxyproject.org/search/mailinglists/
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: R bioconductor dependencies when creating toolshed installation

Björn Grüning-3
In reply to this post by Stef van Lieshout
Hi Stef,

fortunately it is way easier than that. Please have a look at the
"setup_r_environment" installation routine :)

Cheers,
Bjoern

Am 17.06.2014 11:07, schrieb Stef van Lieshout:

> Say option 3 is the way to go, would you say every new version of an R
> package should be wrapped in a new galaxy package (and give them names
> like "matrixStats_0_10_0") or create one package ("matrixStats") and
> update that one if a new version is worth an update. In the first way
> there would be an enormous amount of packages ;)
>
> Also if you do need an external R script as you say, how would I
> construct my tool_dependencies.xml to execute R code?
>
> And last, if that approach doesn't work out for me, how can copy a file
> in the repository to the installation dir? (to execute it with Rscript)
>
> Many thanks,
> Stef
>
>
> ----- Original message -----
> From: "Kandalaft, Iyad" <[hidden email]>
> To: Stef van Lieshout <[hidden email]>,
> "[hidden email]" <[hidden email]>
> Subject: RE: [galaxy-dev] R bioconductor dependencies when creating
> toolshed installation
> Date: Mon, 16 Jun 2014 18:19:46 +0000
>
> I would typically recommend Option 3 as it is the best practice.
> However, human resources limit this as a viable option even though this
> should be the "Gold Standard" that you aim for.  This allows you to
> reuse the dependencies later for other tool wrappers AND you don't have
> to re-install dependencies every time you make a modification to your
> tool wrapper repository.  While briefly looking at Bioconductor, it
> seems that they keep old version of packages (ex:
> http://www.bioconductor.org/packages/2.13/data/experiment/bin/windows/contrib/3.0/AmpAffyExample_1.2.13.zip),
> where using the URLs directly might be advantageous if their BiocLite
> doesn't allow you to define which version to install.  You don't
> necessarily need to have an external R script for the installation
> because many of these commands can be done within the
> tool_dependencies.xml.
>
> Regards,
>
>
> Iyad Kandalaft
> Microbial Biodiversity Bioinformatics
> Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada
> 960 Carling Ave.| 960 Ave. Carling
> Ottawa, ON| Ottawa (ON) K1A 0C6
> E-mail Address / Adresse courriel  [hidden email]
> Telephone | Téléphone 613-759-1228
> Facsimile | Télécopieur 613-759-1701
> Teletypewriter | Téléimprimeur 613-773-2600
> Government of Canada | Gouvernement du Canada
>
>
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Stef van
> Lieshout
> Sent: Monday, June 16, 2014 10:04 AM
> To: [hidden email]
> Subject: [galaxy-dev] R bioconductor dependencies when creating toolshed
> installation
>
> Hi all,
>
> I'm running into some difficulties on how to setup the installation
> procedure for a galaxy tool which executes an R script and has certain
> dependencies (mainly bioconductor packages). R can deal with
> dependencies, packages can be installed with install.packages (has a
> "dependencies" argument) or biocLite() for bioconductor packages.
>
> Yet, now I want my tool to be available at toolsheds. To do this I see
> several options:
>
> 1) setting up tool_dependencies.xml with "R CMD INSTALL" for all
> packages. BUT: need to download all dependencies before install, and can
> older versions still be downloaded? Maybe need to upload them to
> toolshed too..
>
> 2) setting up tool_dependencies.xml to call an installation script with
> Rscript (where I could use install.packages), BUT: Dependencies are
> taken care of. But how do I select specific (older) versions, because if
> I dont, installing at different time can give different version.
>
> 3) creating a repository for each package and have all of them as
> requirement in my galaxy tool. BUT: a lot of work for a lot of
> dependencies
>
> All have pros and cons, how do people deal with this?
>
> Stef
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this and other
> Galaxy lists, please use the interface at:
>    http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>    http://galaxyproject.org/search/mailinglists/
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>    http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>    http://galaxyproject.org/search/mailinglists/
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: R bioconductor dependencies when creating toolshed installation

Stef van Lieshout
In reply to this post by Björn Grüning-3
Hi Bjoern,

That looks much better indeed ;) The only problem I still have then is
that I need R 3.1.0 for a bioconductor 2.14 package (have send a new
mailing list msg for that). Looking at the xml of other versions it's
not something I will easily do myself.

What will happen if I do not specify the R dependency ("package_r_3_0_3"
in your example code) but do specify the download/install of packages,
guess these get installed in the "default" R instance?

Related to that, how can I call a specific instance of R in de tool.xml
without specifying the full path to the tool. Eg, in the tool.xml I now
do:

<command>
  /path/to/lib/R/R-3.1.0/bin/Rscript
  /path/to/galaxy-dist/tools/testdir/tool.R $config
</command>

Where normally you can do:

<command interpreter="Rscript">
  tool.R $config
</command>

Thanks again!
Stef
 

----- Original message -----
From: Björn Grüning <[hidden email]>
To: Stef van Lieshout <[hidden email]>,
[hidden email]
Subject: Re: [galaxy-dev] R bioconductor dependencies when creating
toolshed installation
Date: Tue, 17 Jun 2014 15:17:36 +0200

Hi Stef,

for R packages we have a special installation routine that will
(hapefully) make your life easier.

> I'm running into some difficulties on how to setup the installation
> procedure for a galaxy tool which executes an R script and has certain
> dependencies (mainly bioconductor packages). R can deal with
> dependencies, packages can be installed with install.packages (has a
> "dependencies" argument) or biocLite() for bioconductor packages.
>
> Yet, now I want my tool to be available at toolsheds. To do this I see
> several options:

Great!

> 1) setting up tool_dependencies.xml with "R CMD INSTALL" for all
> packages. BUT: need to download all dependencies before install, and can
> older versions still be downloaded? Maybe need to upload them to
> toolshed too..

It is all a matter of how reproducible you want to have your tool.
If you want 100% reproducibility, you need to mirror the source packages
somehow, because bioc will not store older versions. At least that is
not guaranteed.

I'm using a special github repository for that purpose:
https://github.com/bgruening/download_store

R CMD INSTALL is not needed, see below.

> 2) setting up tool_dependencies.xml to call an installation script with
> Rscript (where I could use install.packages), BUT: Dependencies are
> taken care of. But how do I select specific (older) versions, because if
> I dont, installing at different time can give different version.

Older versions is not possible as far as I know.

> 3) creating a repository for each package and have all of them as
> requirement in my galaxy tool. BUT: a lot of work for a lot of
> dependencies

Imho, we should have one R repository with a handful of standard
packages included in the toolshed.
Like packages_r_3_0_1. You should depend on that repository and
additionally define one second dependency. Lets say your tool is called
deseq2 than create one additional tool_dependencies.xml file called
package_deseq2_1_2_10. In that definition you will install every
dependency you need in addition to R.

Here is one example:
https://github.com/bgruening/galaxytools/blob/master/deseq2/tool_dependencies.xml
https://github.com/bgruening/galaxytools/blob/master/orphan_tool_dependencies/package_deseq2_1_2_10/tool_dependencies.xml

The really nice part is the "setup_r_environment" function from the
toolshed. It will install source packages for you automatically. All you
need to do is to name the package or, as shown in the example, specify
the location of the source package.

The only downside is that the order of these packages is important. If
you are interested we have a script that will give you the correct
dependency tree of a given package.

Hope that helps,
Bjoern


> All have pros and cons, how do people deal with this?
>
> Stef
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>    http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>    http://galaxyproject.org/search/mailinglists/
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: R bioconductor dependencies when creating toolshed installation

Björn Grüning-3
Hi Stef,

Am 17.06.2014 15:40, schrieb Stef van Lieshout:
> Hi Bjoern,
>
> That looks much better indeed ;) The only problem I still have then is
> that I need R 3.1.0 for a bioconductor 2.14 package (have send a new
> mailing list msg for that). Looking at the xml of other versions it's
> not something I will easily do myself.

If you can wait a little bit we (the IUC, or more concrete Dave Bouvier)
will take care of that and create such a repository.

> What will happen if I do not specify the R dependency ("package_r_3_0_3"
> in your example code) but do specify the download/install of packages,
> guess these get installed in the "default" R instance?

Puh, to be honest, I do not know. I never tested it without a real
instance. I guess it will pick the default version.

> Related to that, how can I call a specific instance of R in de tool.xml
> without specifying the full path to the tool. Eg, in the tool.xml I now
> do:
>
> <command>
>    /path/to/lib/R/R-3.1.0/bin/Rscript
>    /path/to/galaxy-dist/tools/testdir/tool.R $config
> </command>
>
> Where normally you can do:
>
> <command interpreter="Rscript">
>    tool.R $config
> </command>

You should always use the latter version, without a path to R. Setting
the correct path or assuming the default should be handled by Galaxy.
The correct R version will be created with the <requirement> tag. You
can specify 3.1 as soon as we have it :)

You can thank Dave for the new R packages, he spend much time in
creating a big R binary that can run on almost all architectures.

Cheers,
Bjoern

> Thanks again!
> Stef
>
>
> ----- Original message -----
> From: Björn Grüning <[hidden email]>
> To: Stef van Lieshout <[hidden email]>,
> [hidden email]
> Subject: Re: [galaxy-dev] R bioconductor dependencies when creating
> toolshed installation
> Date: Tue, 17 Jun 2014 15:17:36 +0200
>
> Hi Stef,
>
> for R packages we have a special installation routine that will
> (hapefully) make your life easier.
>
>> I'm running into some difficulties on how to setup the installation
>> procedure for a galaxy tool which executes an R script and has certain
>> dependencies (mainly bioconductor packages). R can deal with
>> dependencies, packages can be installed with install.packages (has a
>> "dependencies" argument) or biocLite() for bioconductor packages.
>>
>> Yet, now I want my tool to be available at toolsheds. To do this I see
>> several options:
>
> Great!
>
>> 1) setting up tool_dependencies.xml with "R CMD INSTALL" for all
>> packages. BUT: need to download all dependencies before install, and can
>> older versions still be downloaded? Maybe need to upload them to
>> toolshed too..
>
> It is all a matter of how reproducible you want to have your tool.
> If you want 100% reproducibility, you need to mirror the source packages
> somehow, because bioc will not store older versions. At least that is
> not guaranteed.
>
> I'm using a special github repository for that purpose:
> https://github.com/bgruening/download_store
>
> R CMD INSTALL is not needed, see below.
>
>> 2) setting up tool_dependencies.xml to call an installation script with
>> Rscript (where I could use install.packages), BUT: Dependencies are
>> taken care of. But how do I select specific (older) versions, because if
>> I dont, installing at different time can give different version.
>
> Older versions is not possible as far as I know.
>
>> 3) creating a repository for each package and have all of them as
>> requirement in my galaxy tool. BUT: a lot of work for a lot of
>> dependencies
>
> Imho, we should have one R repository with a handful of standard
> packages included in the toolshed.
> Like packages_r_3_0_1. You should depend on that repository and
> additionally define one second dependency. Lets say your tool is called
> deseq2 than create one additional tool_dependencies.xml file called
> package_deseq2_1_2_10. In that definition you will install every
> dependency you need in addition to R.
>
> Here is one example:
> https://github.com/bgruening/galaxytools/blob/master/deseq2/tool_dependencies.xml
> https://github.com/bgruening/galaxytools/blob/master/orphan_tool_dependencies/package_deseq2_1_2_10/tool_dependencies.xml
>
> The really nice part is the "setup_r_environment" function from the
> toolshed. It will install source packages for you automatically. All you
> need to do is to name the package or, as shown in the example, specify
> the location of the source package.
>
> The only downside is that the order of these packages is important. If
> you are interested we have a script that will give you the correct
> dependency tree of a given package.
>
> Hope that helps,
> Bjoern
>
>
>> All have pros and cons, how do people deal with this?
>>
>> Stef
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>     http://lists.bx.psu.edu/
>>
>> To search Galaxy mailing lists use the unified search at:
>>     http://galaxyproject.org/search/mailinglists/
>>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>    http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>    http://galaxyproject.org/search/mailinglists/
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: R bioconductor dependencies when creating toolshed installation

Stef van Lieshout
Bjoern and Dave,

That sounds great. Of course my next question will be "how much is a
little bit" ;) It just that I have to move on for now and make things at
least work, so I might try it with using the default R instance now, but
as soon as a 3.1.0 package is out I will definitely pick it up!

Stef

----- Original message -----
From: Björn Grüning <[hidden email]>
To: Stef van Lieshout <[hidden email]>,
[hidden email], Dave Bouvier <[hidden email]>
Subject: Re: [galaxy-dev] R bioconductor dependencies when creating
toolshed installation
Date: Tue, 17 Jun 2014 17:37:24 +0200

Hi Stef,

Am 17.06.2014 15:40, schrieb Stef van Lieshout:
> Hi Bjoern,
>
> That looks much better indeed ;) The only problem I still have then is
> that I need R 3.1.0 for a bioconductor 2.14 package (have send a new
> mailing list msg for that). Looking at the xml of other versions it's
> not something I will easily do myself.

If you can wait a little bit we (the IUC, or more concrete Dave Bouvier)
will take care of that and create such a repository.

> What will happen if I do not specify the R dependency ("package_r_3_0_3"
> in your example code) but do specify the download/install of packages,
> guess these get installed in the "default" R instance?

Puh, to be honest, I do not know. I never tested it without a real
instance. I guess it will pick the default version.

> Related to that, how can I call a specific instance of R in de tool.xml
> without specifying the full path to the tool. Eg, in the tool.xml I now
> do:
>
> <command>
>    /path/to/lib/R/R-3.1.0/bin/Rscript
>    /path/to/galaxy-dist/tools/testdir/tool.R $config
> </command>
>
> Where normally you can do:
>
> <command interpreter="Rscript">
>    tool.R $config
> </command>

You should always use the latter version, without a path to R. Setting
the correct path or assuming the default should be handled by Galaxy.
The correct R version will be created with the <requirement> tag. You
can specify 3.1 as soon as we have it :)

You can thank Dave for the new R packages, he spend much time in
creating a big R binary that can run on almost all architectures.

Cheers,
Bjoern

> Thanks again!
> Stef
>
>
> ----- Original message -----
> From: Björn Grüning <[hidden email]>
> To: Stef van Lieshout <[hidden email]>,
> [hidden email]
> Subject: Re: [galaxy-dev] R bioconductor dependencies when creating
> toolshed installation
> Date: Tue, 17 Jun 2014 15:17:36 +0200
>
> Hi Stef,
>
> for R packages we have a special installation routine that will
> (hapefully) make your life easier.
>
>> I'm running into some difficulties on how to setup the installation
>> procedure for a galaxy tool which executes an R script and has certain
>> dependencies (mainly bioconductor packages). R can deal with
>> dependencies, packages can be installed with install.packages (has a
>> "dependencies" argument) or biocLite() for bioconductor packages.
>>
>> Yet, now I want my tool to be available at toolsheds. To do this I see
>> several options:
>
> Great!
>
>> 1) setting up tool_dependencies.xml with "R CMD INSTALL" for all
>> packages. BUT: need to download all dependencies before install, and can
>> older versions still be downloaded? Maybe need to upload them to
>> toolshed too..
>
> It is all a matter of how reproducible you want to have your tool.
> If you want 100% reproducibility, you need to mirror the source packages
> somehow, because bioc will not store older versions. At least that is
> not guaranteed.
>
> I'm using a special github repository for that purpose:
> https://github.com/bgruening/download_store
>
> R CMD INSTALL is not needed, see below.
>
>> 2) setting up tool_dependencies.xml to call an installation script with
>> Rscript (where I could use install.packages), BUT: Dependencies are
>> taken care of. But how do I select specific (older) versions, because if
>> I dont, installing at different time can give different version.
>
> Older versions is not possible as far as I know.
>
>> 3) creating a repository for each package and have all of them as
>> requirement in my galaxy tool. BUT: a lot of work for a lot of
>> dependencies
>
> Imho, we should have one R repository with a handful of standard
> packages included in the toolshed.
> Like packages_r_3_0_1. You should depend on that repository and
> additionally define one second dependency. Lets say your tool is called
> deseq2 than create one additional tool_dependencies.xml file called
> package_deseq2_1_2_10. In that definition you will install every
> dependency you need in addition to R.
>
> Here is one example:
> https://github.com/bgruening/galaxytools/blob/master/deseq2/tool_dependencies.xml
> https://github.com/bgruening/galaxytools/blob/master/orphan_tool_dependencies/package_deseq2_1_2_10/tool_dependencies.xml
>
> The really nice part is the "setup_r_environment" function from the
> toolshed. It will install source packages for you automatically. All you
> need to do is to name the package or, as shown in the example, specify
> the location of the source package.
>
> The only downside is that the order of these packages is important. If
> you are interested we have a script that will give you the correct
> dependency tree of a given package.
>
> Hope that helps,
> Bjoern
>
>
>> All have pros and cons, how do people deal with this?
>>
>> Stef
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>     http://lists.bx.psu.edu/
>>
>> To search Galaxy mailing lists use the unified search at:
>>     http://galaxyproject.org/search/mailinglists/
>>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>    http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>    http://galaxyproject.org/search/mailinglists/
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/