Cloudman indices installation/configuration

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Cloudman indices installation/configuration

Iry Witham
Hi Team,

I have a new instance of galaxy cloudman running and have added tools from the toolshed to it.  When I attempt to run tools like sam-to-bam or any gatk tool I am prompted for a reference genome.  However, indices/references not available for these tools.  I have added the following line to the sam_fa_indices.loc, but that did nothing:

index   hg19    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/gh19.fa

I have also added the following three lines to the gatk2_picard_index.loc:

hg19    hg19    Human (hg19)    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/hg19.fa
GRCh37  GRCh37  Human (GRCh37)  /mnt/galaxyIndices/genomes/Hsapiens/GRCh37/seq/GRCh37.fa
mm10    mm10    Mouse (mm10)    /mnt/galaxyIndices/genomes/Mmusculus/mm10/seq/mm10.fa

I know I have missed something, but can't seem to figure it out.  Could someone point me in the right direction?

Regards,
__________________________________
Iry T. Witham
Scientific Applications Administrator
Computational Sciences Group
The Jackson Laboratory
600 Main Street
Bar Harbor, ME  04609
Phone: 207-288-6744
email: iry.witham@...




The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Cloudman indices installation/configuration

Daniel Blankenberg
Hi Iry,

First thing to check is that your fields are tab delimited — they appear to be spaces instead of tabs in this email, but copy and pasting into email can munge things sometimes (also “gh19.fa” is probably a typo, but that wouldn’t prevent the selection option from showing up).


Thanks for using Galaxy,

Dan


On Oct 2, 2014, at 1:49 PM, Iry Witham <[hidden email]> wrote:

Hi Team,

I have a new instance of galaxy cloudman running and have added tools from the toolshed to it.  When I attempt to run tools like sam-to-bam or any gatk tool I am prompted for a reference genome.  However, indices/references not available for these tools.  I have added the following line to the sam_fa_indices.loc, but that did nothing:

index   hg19    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/gh19.fa

I have also added the following three lines to the gatk2_picard_index.loc:

hg19    hg19    Human (hg19)    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/hg19.fa
GRCh37  GRCh37  Human (GRCh37)  /mnt/galaxyIndices/genomes/Hsapiens/GRCh37/seq/GRCh37.fa
mm10    mm10    Mouse (mm10)    /mnt/galaxyIndices/genomes/Mmusculus/mm10/seq/mm10.fa

I know I have missed something, but can't seem to figure it out.  Could someone point me in the right direction?

Regards,
__________________________________
Iry T. Witham
Scientific Applications Administrator
Computational Sciences Group
The Jackson Laboratory
600 Main Street
Bar Harbor, ME  04609
Phone: 207-288-6744
email: <a href="x-msg://12/iry.witham@jax.org">iry.witham@...


<372D007A-1B00-4668-BA6B-F0527C1F24BE[34][3].png>

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Cloudman indices installation/configuration

Iry Witham
Hi Dan,

Thanks for pointing out the tab issue.  However, I have made the modification and restarted my instance, but still get no reference genome listed.  It is funny that there are a limited number of tools that have this issue.  They are:

SAM Tools:
Generate pileup
SAM-to-BAM
Mpileup
SNP/WGA: Data; Filters:
SnpEff
SnpEff Download (cannot select a genome version to download)
NGS: Picard:
SAM/BAM Alignment Summary
SAM/BAM GC Bias
NGS: GATK2 Tools: (All of the tools)

These are tools that I have installed via the toolshed.  Is there a different location for the .loc files that need modifying?

Thanks,
Iry

From: Daniel Blankenberg <[hidden email]>
Date: Thursday, October 2, 2014 1:57 PM
To: Iry Witham <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Cloudman indices installation/configuration

Hi Iry,

First thing to check is that your fields are tab delimited — they appear to be spaces instead of tabs in this email, but copy and pasting into email can munge things sometimes (also “gh19.fa” is probably a typo, but that wouldn’t prevent the selection option from showing up).


Thanks for using Galaxy,

Dan


On Oct 2, 2014, at 1:49 PM, Iry Witham <[hidden email]> wrote:

Hi Team,

I have a new instance of galaxy cloudman running and have added tools from the toolshed to it.  When I attempt to run tools like sam-to-bam or any gatk tool I am prompted for a reference genome.  However, indices/references not available for these tools.  I have added the following line to the sam_fa_indices.loc, but that did nothing:

index   hg19    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/gh19.fa

I have also added the following three lines to the gatk2_picard_index.loc:

hg19    hg19    Human (hg19)    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/hg19.fa
GRCh37  GRCh37  Human (GRCh37)  /mnt/galaxyIndices/genomes/Hsapiens/GRCh37/seq/GRCh37.fa
mm10    mm10    Mouse (mm10)    /mnt/galaxyIndices/genomes/Mmusculus/mm10/seq/mm10.fa

I know I have missed something, but can't seem to figure it out.  Could someone point me in the right direction?

Regards,
__________________________________
Iry T. Witham
Scientific Applications Administrator
Computational Sciences Group
The Jackson Laboratory
600 Main Street
Bar Harbor, ME  04609
Phone: 207-288-6744
email: <a href="x-msg://12/iry.witham@jax.org">iry.witham@...


<372D007A-1B00-4668-BA6B-F0527C1F24BE[34][3].png>

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Cloudman indices installation/configuration

Iry Witham
In reply to this post by Daniel Blankenberg
It looks like I need to generate the dict file for the mm10 reference as well as add the reference to the srma_index.loc.  My question is where do these need to exist?  Do they belong in the repo directory structure or or in the primary tool-data directory?  The hg19.fa, hg19.fa.fia, hg19.dict as well as these same files for the mm9 GRCh37. However, the .dict does not exist for mm10.  Even though that is the case the references do not appear in the gatk2 tools.  

Any ideas?

Thanks,
Iry

From: Daniel Blankenberg <[hidden email]>
Date: Thursday, October 2, 2014 1:57 PM
To: Iry Witham <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Cloudman indices installation/configuration

Hi Iry,

First thing to check is that your fields are tab delimited — they appear to be spaces instead of tabs in this email, but copy and pasting into email can munge things sometimes (also “gh19.fa” is probably a typo, but that wouldn’t prevent the selection option from showing up).


Thanks for using Galaxy,

Dan


On Oct 2, 2014, at 1:49 PM, Iry Witham <[hidden email]> wrote:

Hi Team,

I have a new instance of galaxy cloudman running and have added tools from the toolshed to it.  When I attempt to run tools like sam-to-bam or any gatk tool I am prompted for a reference genome.  However, indices/references not available for these tools.  I have added the following line to the sam_fa_indices.loc, but that did nothing:

index   hg19    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/gh19.fa

I have also added the following three lines to the gatk2_picard_index.loc:

hg19    hg19    Human (hg19)    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/hg19.fa
GRCh37  GRCh37  Human (GRCh37)  /mnt/galaxyIndices/genomes/Hsapiens/GRCh37/seq/GRCh37.fa
mm10    mm10    Mouse (mm10)    /mnt/galaxyIndices/genomes/Mmusculus/mm10/seq/mm10.fa

I know I have missed something, but can't seem to figure it out.  Could someone point me in the right direction?

Regards,
__________________________________
Iry T. Witham
Scientific Applications Administrator
Computational Sciences Group
The Jackson Laboratory
600 Main Street
Bar Harbor, ME  04609
Phone: 207-288-6744
email: <a href="x-msg://12/iry.witham@jax.org">iry.witham@...


<372D007A-1B00-4668-BA6B-F0527C1F24BE[34][3].png>

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Cloudman indices installation/configuration

Enis Afgan-2
Hi Iry, 
Try adding the following to your /mnt/galaxy/galaxy-app/tool_data_table_conf.xml, populating the referenced files (tool-data/gatk2_picard_index.loc and tool-data/gatk2_annotations.txt) as desired and restarting Galaxy:

    <!-- Location of Picard dict files valid for GATK -->
    <table name="gatk2_picard_indexes" comment_char="#">
        <columns>value, dbkey, name, path</columns>
        <file path="tool-data/gatk2_picard_index.loc" />
    </table>
    <!-- Available of GATK references -->
    <table name="gatk2_annotations" comment_char="#">
        <columns>value, name, gatk_value, tools_valid_for</columns>
        <file path="tool-data/gatk2_annotations.txt" />
    </table>

Hope this gets you going. Let us know if it doesn't,
Enis

On Fri, Oct 3, 2014 at 1:36 PM, Iry Witham <[hidden email]> wrote:
It looks like I need to generate the dict file for the mm10 reference as well as add the reference to the srma_index.loc.  My question is where do these need to exist?  Do they belong in the repo directory structure or or in the primary tool-data directory?  The hg19.fa, hg19.fa.fia, hg19.dict as well as these same files for the mm9 GRCh37. However, the .dict does not exist for mm10.  Even though that is the case the references do not appear in the gatk2 tools.  

Any ideas?

Thanks,
Iry

From: Daniel Blankenberg <[hidden email]>
Date: Thursday, October 2, 2014 1:57 PM
To: Iry Witham <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Cloudman indices installation/configuration

Hi Iry,

First thing to check is that your fields are tab delimited — they appear to be spaces instead of tabs in this email, but copy and pasting into email can munge things sometimes (also “gh19.fa” is probably a typo, but that wouldn’t prevent the selection option from showing up).


Thanks for using Galaxy,

Dan


On Oct 2, 2014, at 1:49 PM, Iry Witham <[hidden email]> wrote:

Hi Team,

I have a new instance of galaxy cloudman running and have added tools from the toolshed to it.  When I attempt to run tools like sam-to-bam or any gatk tool I am prompted for a reference genome.  However, indices/references not available for these tools.  I have added the following line to the sam_fa_indices.loc, but that did nothing:

index   hg19    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/gh19.fa

I have also added the following three lines to the gatk2_picard_index.loc:

hg19    hg19    Human (hg19)    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/hg19.fa
GRCh37  GRCh37  Human (GRCh37)  /mnt/galaxyIndices/genomes/Hsapiens/GRCh37/seq/GRCh37.fa
mm10    mm10    Mouse (mm10)    /mnt/galaxyIndices/genomes/Mmusculus/mm10/seq/mm10.fa

I know I have missed something, but can't seem to figure it out.  Could someone point me in the right direction?

Regards,
__________________________________
Iry T. Witham
Scientific Applications Administrator
Computational Sciences Group
The Jackson Laboratory
600 Main Street
Bar Harbor, ME  04609
Phone: <a href="tel:207-288-6744" value="+12072886744" target="_blank">207-288-6744
email: [hidden email]


<372D007A-1B00-4668-BA6B-F0527C1F24BE[34][3].png>

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Cloudman indices installation/configuration

Iry Witham
Hi Enis,

Thanks for that information.  Now I am getting an error with the Unified_Genotyper failing to locate the GenomeAnalysisTK.jar.  I discovered that gatk2 needs to be downloaded and installed.  I have done that, but can't seem to figure out where the env.sh file reference below exists.  Can you point me to the correct proximity of that file?  Or do I need to create the file and if so where?

Thanks,
Iry

Galaxy wrapper for GATK2

This wrapper is copyright 2013 by Björn Grüning, Jim Johnson & the Galaxy Team.

The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

http://www.broadinstitute.org/gatk http://www.broadinstitute.org/gatk/about/citing-gatk

GATK is Free for academics, and fee for commercial use. Please study the GATK licensing website: http://www.broadinstitute.org/gatk/about/#licensing

Installation

The recommended installation is by means of the toolshed.

Galaxy should be able to install samtools dependencies automatically for you. GATK2, and its new licence model, does not allow us to distribute the GATK binaries. As a consequence you need to install GATK2 by your own, please see the GATK website for more information:

http://www.broadinstitute.org/gatk/download

Once you have installed GATK2, you need to edit the env.sh files that are installed together with the wrappers. You must edit the GATK2_PATH environment variable in the file:

<tool_dependency_dir>/environment_settings/GATK2_PATH/iuc/gatk2/<hash_string>/env.sh

to point to the folder where you have installed GATK2.

Optionally, you may also want to edit the GATK2_SITE_OPTIONS environment variable in the file:

<tool_dependency_dir>/environment_settings/GATK2_SITE_OPTIONS/iuc/gatk2/<hash_string>/env.sh

to deactivate the 'call home feature' of GATK with something like:

GATK2_SITE_OPTIONS='-et NO_ET -K /data/gatk2_key_file'

GATK2_SITE_OPTIONS can be also used to insert other specific options into every GATK2 wrapper at runtime, without changing the actual wrapper.

Read more about the "Phone Home" problem at: http://www.broadinstitute.org/gatk/guide/article?id=1250

Optionally, you may also want to add some commands to be executed before GATK (e.g. to load modules) to the file:

<tool_dependency_dir>/gatk2/default/env.sh

Finally, you should fill in additional information about your genomes and annotations in the gatk2_picard_index.loc and gatk2_annotations.txt. You can find them in the tool-data/ Galaxy directory.


From: Enis Afgan <[hidden email]>
Date: Saturday, October 4, 2014 6:10 AM
To: Iry Witham <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Cloudman indices installation/configuration

Hi Iry, 
Try adding the following to your /mnt/galaxy/galaxy-app/tool_data_table_conf.xml, populating the referenced files (tool-data/gatk2_picard_index.loc and tool-data/gatk2_annotations.txt) as desired and restarting Galaxy:

    <!-- Location of Picard dict files valid for GATK -->
    <table name="gatk2_picard_indexes" comment_char="#">
        <columns>value, dbkey, name, path</columns>
        <file path="tool-data/gatk2_picard_index.loc" />
    </table>
    <!-- Available of GATK references -->
    <table name="gatk2_annotations" comment_char="#">
        <columns>value, name, gatk_value, tools_valid_for</columns>
        <file path="tool-data/gatk2_annotations.txt" />
    </table>

Hope this gets you going. Let us know if it doesn't,
Enis

On Fri, Oct 3, 2014 at 1:36 PM, Iry Witham <[hidden email]> wrote:
It looks like I need to generate the dict file for the mm10 reference as well as add the reference to the srma_index.loc.  My question is where do these need to exist?  Do they belong in the repo directory structure or or in the primary tool-data directory?  The hg19.fa, hg19.fa.fia, hg19.dict as well as these same files for the mm9 GRCh37. However, the .dict does not exist for mm10.  Even though that is the case the references do not appear in the gatk2 tools.  

Any ideas?

Thanks,
Iry

From: Daniel Blankenberg <[hidden email]>
Date: Thursday, October 2, 2014 1:57 PM
To: Iry Witham <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Cloudman indices installation/configuration

Hi Iry,

First thing to check is that your fields are tab delimited — they appear to be spaces instead of tabs in this email, but copy and pasting into email can munge things sometimes (also “gh19.fa” is probably a typo, but that wouldn’t prevent the selection option from showing up).


Thanks for using Galaxy,

Dan


On Oct 2, 2014, at 1:49 PM, Iry Witham <[hidden email]> wrote:

Hi Team,

I have a new instance of galaxy cloudman running and have added tools from the toolshed to it.  When I attempt to run tools like sam-to-bam or any gatk tool I am prompted for a reference genome.  However, indices/references not available for these tools.  I have added the following line to the sam_fa_indices.loc, but that did nothing:

index   hg19    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/gh19.fa

I have also added the following three lines to the gatk2_picard_index.loc:

hg19    hg19    Human (hg19)    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/hg19.fa
GRCh37  GRCh37  Human (GRCh37)  /mnt/galaxyIndices/genomes/Hsapiens/GRCh37/seq/GRCh37.fa
mm10    mm10    Mouse (mm10)    /mnt/galaxyIndices/genomes/Mmusculus/mm10/seq/mm10.fa

I know I have missed something, but can't seem to figure it out.  Could someone point me in the right direction?

Regards,
__________________________________
Iry T. Witham
Scientific Applications Administrator
Computational Sciences Group
The Jackson Laboratory
600 Main Street
Bar Harbor, ME  04609
Phone: <a href="tel:207-288-6744" value="&#43;12072886744" target="_blank">207-288-6744
email: [hidden email]


<372D007A-1B00-4668-BA6B-F0527C1F24BE[34][3].png>

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Cloudman indices installation/configuration

Enis Afgan-2
Hi Iry, 
Yeah, I see what you mean about that env.sh file not being in the GATK2 repo the readme states so. I'm not sure what's exactly supposed to be in that file for GATK2 in particular so perhaps one of the wrapper authors can jump in. For majority of tools, you'd just need something like this in there: PATH=/mnt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/gatk2/8bcc13094767/gatk2/bin:$PATH and have placed the binaries for the tool in that directory. 

If nobody else jumps in, I'll poke around more in the coming days.

On Tue, Oct 7, 2014 at 11:58 AM, Iry Witham <[hidden email]> wrote:
Hi Enis,

Thanks for that information.  Now I am getting an error with the Unified_Genotyper failing to locate the GenomeAnalysisTK.jar.  I discovered that gatk2 needs to be downloaded and installed.  I have done that, but can't seem to figure out where the env.sh file reference below exists.  Can you point me to the correct proximity of that file?  Or do I need to create the file and if so where?

Thanks,
Iry

Galaxy wrapper for GATK2

This wrapper is copyright 2013 by Björn Grüning, Jim Johnson & the Galaxy Team.

The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

http://www.broadinstitute.org/gatk http://www.broadinstitute.org/gatk/about/citing-gatk

GATK is Free for academics, and fee for commercial use. Please study the GATK licensing website: http://www.broadinstitute.org/gatk/about/#licensing

Installation

The recommended installation is by means of the toolshed.

Galaxy should be able to install samtools dependencies automatically for you. GATK2, and its new licence model, does not allow us to distribute the GATK binaries. As a consequence you need to install GATK2 by your own, please see the GATK website for more information:

http://www.broadinstitute.org/gatk/download

Once you have installed GATK2, you need to edit the env.sh files that are installed together with the wrappers. You must edit the GATK2_PATH environment variable in the file:

<tool_dependency_dir>/environment_settings/GATK2_PATH/iuc/gatk2/<hash_string>/env.sh

to point to the folder where you have installed GATK2.

Optionally, you may also want to edit the GATK2_SITE_OPTIONS environment variable in the file:

<tool_dependency_dir>/environment_settings/GATK2_SITE_OPTIONS/iuc/gatk2/<hash_string>/env.sh

to deactivate the 'call home feature' of GATK with something like:

GATK2_SITE_OPTIONS='-et NO_ET -K /data/gatk2_key_file'

GATK2_SITE_OPTIONS can be also used to insert other specific options into every GATK2 wrapper at runtime, without changing the actual wrapper.

Read more about the "Phone Home" problem at: http://www.broadinstitute.org/gatk/guide/article?id=1250

Optionally, you may also want to add some commands to be executed before GATK (e.g. to load modules) to the file:

<tool_dependency_dir>/gatk2/default/env.sh

Finally, you should fill in additional information about your genomes and annotations in the gatk2_picard_index.loc and gatk2_annotations.txt. You can find them in the tool-data/ Galaxy directory.


From: Enis Afgan <[hidden email]>
Date: Saturday, October 4, 2014 6:10 AM

To: Iry Witham <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Cloudman indices installation/configuration

Hi Iry, 
Try adding the following to your /mnt/galaxy/galaxy-app/tool_data_table_conf.xml, populating the referenced files (tool-data/gatk2_picard_index.loc and tool-data/gatk2_annotations.txt) as desired and restarting Galaxy:

    <!-- Location of Picard dict files valid for GATK -->
    <table name="gatk2_picard_indexes" comment_char="#">
        <columns>value, dbkey, name, path</columns>
        <file path="tool-data/gatk2_picard_index.loc" />
    </table>
    <!-- Available of GATK references -->
    <table name="gatk2_annotations" comment_char="#">
        <columns>value, name, gatk_value, tools_valid_for</columns>
        <file path="tool-data/gatk2_annotations.txt" />
    </table>

Hope this gets you going. Let us know if it doesn't,
Enis

On Fri, Oct 3, 2014 at 1:36 PM, Iry Witham <[hidden email]> wrote:
It looks like I need to generate the dict file for the mm10 reference as well as add the reference to the srma_index.loc.  My question is where do these need to exist?  Do they belong in the repo directory structure or or in the primary tool-data directory?  The hg19.fa, hg19.fa.fia, hg19.dict as well as these same files for the mm9 GRCh37. However, the .dict does not exist for mm10.  Even though that is the case the references do not appear in the gatk2 tools.  

Any ideas?

Thanks,
Iry

From: Daniel Blankenberg <[hidden email]>
Date: Thursday, October 2, 2014 1:57 PM
To: Iry Witham <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [galaxy-dev] Cloudman indices installation/configuration

Hi Iry,

First thing to check is that your fields are tab delimited — they appear to be spaces instead of tabs in this email, but copy and pasting into email can munge things sometimes (also “gh19.fa” is probably a typo, but that wouldn’t prevent the selection option from showing up).


Thanks for using Galaxy,

Dan


On Oct 2, 2014, at 1:49 PM, Iry Witham <[hidden email]> wrote:

Hi Team,

I have a new instance of galaxy cloudman running and have added tools from the toolshed to it.  When I attempt to run tools like sam-to-bam or any gatk tool I am prompted for a reference genome.  However, indices/references not available for these tools.  I have added the following line to the sam_fa_indices.loc, but that did nothing:

index   hg19    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/gh19.fa

I have also added the following three lines to the gatk2_picard_index.loc:

hg19    hg19    Human (hg19)    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/hg19.fa
GRCh37  GRCh37  Human (GRCh37)  /mnt/galaxyIndices/genomes/Hsapiens/GRCh37/seq/GRCh37.fa
mm10    mm10    Mouse (mm10)    /mnt/galaxyIndices/genomes/Mmusculus/mm10/seq/mm10.fa

I know I have missed something, but can't seem to figure it out.  Could someone point me in the right direction?

Regards,
__________________________________
Iry T. Witham
Scientific Applications Administrator
Computational Sciences Group
The Jackson Laboratory
600 Main Street
Bar Harbor, ME  04609
Phone: <a href="tel:207-288-6744" value="+12072886744" target="_blank">207-288-6744
email: [hidden email]


<372D007A-1B00-4668-BA6B-F0527C1F24BE[34][3].png>

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Cloudman indices installation/configuration

Björn Grüning-3
Jumping in ;)

Hi Iry,

please have a look at our small readme file under:
https://github.com/galaxyproject/tools-iuc/tree/master/tools/gatk2

The problem is that we are not allowed to ship the jar file. So we came
up with the idea to provide you with empty env.sh file you need to edit.

Sorry for the inconvenience,
Bjoern

Am 08.10.2014 um 00:09 schrieb Enis Afgan:

> Hi Iry,
> Yeah, I see what you mean about that *env.sh* file not being in the GATK2
> repo the readme states so. I'm not sure what's exactly supposed to be in
> that file for GATK2 in particular so perhaps one of the wrapper authors can
> jump in. For majority of tools, you'd just need something like this in
> there: *PATH=/mnt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/gatk2/8bcc13094767/gatk2/bin:$PATH
> <http://toolshed.g2.bx.psu.edu/repos/iuc/gatk2/8bcc13094767/gatk2/bin:$PATH>*
> and
> have placed the binaries for the tool in that directory.
>
> If nobody else jumps in, I'll poke around more in the coming days.
>
> On Tue, Oct 7, 2014 at 11:58 AM, Iry Witham <[hidden email]> wrote:
>
>>   Hi Enis,
>>
>>   Thanks for that information.  Now I am getting an error with the
>> Unified_Genotyper failing to locate the GenomeAnalysisTK.jar.  I discovered
>> that gatk2 needs to be downloaded and installed.  I have done that, but
>> can't seem to figure out where the env.sh file reference below exists.  Can
>> you point me to the correct proximity of that file?  Or do I need to create
>> the file and if so where?
>>
>>   Thanks,
>> Iry
>>
>>      Galaxy wrapper for GATK2
>>
>> This wrapper is copyright 2013 by Björn Grüning, Jim Johnson & the Galaxy
>> Team.
>>
>> The Genome Analysis Toolkit or GATK is a software package developed at the
>> Broad Institute to analyse next-generation resequencing data. The toolkit
>> offers a wide variety of tools, with a primary focus on variant discovery
>> and genotyping as well as strong emphasis on data quality assurance. Its
>> robust architecture, powerful processing engine and high-performance
>> computing features make it capable of taking on projects of any size.
>>
>> http://www.broadinstitute.org/gatk
>> http://www.broadinstitute.org/gatk/about/citing-gatk
>>
>> GATK is Free for academics, and fee for commercial use. Please study the
>> GATK licensing website:
>> http://www.broadinstitute.org/gatk/about/#licensing
>>    Installation
>>
>> The recommended installation is by means of the toolshed
>> <http://toolshed.g2.bx.psu.edu/view/iuc/gatk2>.
>>
>> Galaxy should be able to install samtools dependencies automatically for
>> you. GATK2, and its new licence model, does not allow us to distribute the
>> GATK binaries. As a consequence you need to install GATK2 by your own,
>> please see the GATK website for more information:
>>
>> http://www.broadinstitute.org/gatk/download
>>
>> Once you have installed GATK2, you need to edit the env.sh files that are
>> installed together with the wrappers. You must edit the GATK2_PATH
>> environment variable in the file:
>>
>>
>> <tool_dependency_dir>/environment_settings/GATK2_PATH/iuc/gatk2/<hash_string>/env.sh
>>
>> to point to the folder where you have installed GATK2.
>>
>> Optionally, you may also want to edit the GATK2_SITE_OPTIONS environment
>> variable in the file:
>>
>>
>> <tool_dependency_dir>/environment_settings/GATK2_SITE_OPTIONS/iuc/gatk2/<hash_string>/env.sh
>>
>> to deactivate the 'call home feature' of GATK with something like:
>>
>> GATK2_SITE_OPTIONS='-et NO_ET -K /data/gatk2_key_file'
>>
>> GATK2_SITE_OPTIONS can be also used to insert other specific options into
>> every GATK2 wrapper at runtime, without changing the actual wrapper.
>>
>> Read more about the "Phone Home" problem at:
>> http://www.broadinstitute.org/gatk/guide/article?id=1250
>>
>> Optionally, you may also want to add some commands to be executed before
>> GATK (e.g. to load modules) to the file:
>>
>> <tool_dependency_dir>/gatk2/default/env.sh
>>
>> Finally, you should fill in additional information about your genomes and
>> annotations in the gatk2_picard_index.loc and gatk2_annotations.txt. You
>> can find them in the tool-data/ Galaxy directory.
>>
>>    From: Enis Afgan <[hidden email]>
>> Date: Saturday, October 4, 2014 6:10 AM
>>
>> To: Iry Witham <[hidden email]>
>> Cc: "[hidden email]" <[hidden email]>
>> Subject: Re: [galaxy-dev] Cloudman indices installation/configuration
>>
>>    Hi Iry,
>> Try adding the following to your
>> */mnt/galaxy/galaxy-app/tool_data_table_conf.xml*, populating the
>> referenced files (tool-data/gatk2_picard_index.loc and
>> tool-data/gatk2_annotations.txt) as desired and restarting Galaxy:
>>
>>       <!-- Location of Picard dict files valid for GATK -->
>>      <table name="gatk2_picard_indexes" comment_char="#">
>>          <columns>value, dbkey, name, path</columns>
>>          <file path="tool-data/gatk2_picard_index.loc" />
>>      </table>
>>      <!-- Available of GATK references -->
>>      <table name="gatk2_annotations" comment_char="#">
>>          <columns>value, name, gatk_value, tools_valid_for</columns>
>>          <file path="tool-data/gatk2_annotations.txt" />
>>      </table>
>>
>>   Hope this gets you going. Let us know if it doesn't,
>> Enis
>>
>> On Fri, Oct 3, 2014 at 1:36 PM, Iry Witham <[hidden email]> wrote:
>>
>>>   It looks like I need to generate the dict file for the mm10 reference
>>> as well as add the reference to the srma_index.loc.  My question is where
>>> do these need to exist?  Do they belong in the repo directory structure or
>>> or in the primary tool-data directory?  The hg19.fa, hg19.fa.fia, hg19.dict
>>> as well as these same files for the mm9 GRCh37. However, the .dict does not
>>> exist for mm10.  Even though that is the case the references do not appear
>>> in the gatk2 tools.
>>>
>>>   Any ideas?
>>>
>>>   Thanks,
>>> Iry
>>>
>>>    From: Daniel Blankenberg <[hidden email]>
>>> Date: Thursday, October 2, 2014 1:57 PM
>>> To: Iry Witham <[hidden email]>
>>> Cc: "[hidden email]" <[hidden email]>
>>> Subject: Re: [galaxy-dev] Cloudman indices installation/configuration
>>>
>>>     Hi Iry,
>>>
>>>   First thing to check is that your fields are tab delimited — they
>>> appear to be spaces instead of tabs in this email, but copy and pasting
>>> into email can munge things sometimes (also “gh19.fa” is probably a typo,
>>> but that wouldn’t prevent the selection option from showing up).
>>>
>>>
>>>   Thanks for using Galaxy,
>>>
>>>   Dan
>>>
>>>
>>>    On Oct 2, 2014, at 1:49 PM, Iry Witham <[hidden email]> wrote:
>>>
>>>   Hi Team,
>>>
>>>   I have a new instance of galaxy cloudman running and have added tools
>>> from the toolshed to it.  When I attempt to run tools like sam-to-bam or
>>> any gatk tool I am prompted for a reference genome.  However,
>>> indices/references not available for these tools.  I have added the
>>> following line to the sam_fa_indices.loc, but that did nothing:
>>>
>>>   index   hg19    /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/gh19.fa
>>>
>>>   I have also added the following three lines to
>>> the gatk2_picard_index.loc:
>>>
>>>   hg19    hg19    Human (hg19)
>>>   /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/hg19.fa
>>> GRCh37  GRCh37  Human (GRCh37)
>>>   /mnt/galaxyIndices/genomes/Hsapiens/GRCh37/seq/GRCh37.fa
>>> mm10    mm10    Mouse (mm10)
>>>   /mnt/galaxyIndices/genomes/Mmusculus/mm10/seq/mm10.fa
>>>
>>>   I know I have missed something, but can't seem to figure it out.  Could
>>> someone point me in the right direction?
>>>
>>>   Regards,
>>>   __________________________________
>>> Iry T. Witham
>>> Scientific Applications Administrator
>>> Computational Sciences Group
>>> The Jackson Laboratory
>>> 600 Main Street
>>> Bar Harbor, ME  04609
>>> Phone: 207-288-6744
>>> email: [hidden email]
>>>
>>>
>>> <372D007A-1B00-4668-BA6B-F0527C1F24BE[34][3].png>
>>>
>>>    The information in this email, including attachments, may be
>>> confidential and is intended solely for the addressee(s). If you believe
>>> you received this email by mistake, please notify the sender by return
>>> email as soon as possible.
>>>   ___________________________________________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>>   http://lists.bx.psu.edu/
>>>
>>> To search Galaxy mailing lists use the unified search at:
>>>   http://galaxyproject.org/search/mailinglists/
>>>
>>>
>>>      The information in this email, including attachments, may be
>>> confidential and is intended solely for the addressee(s). If you believe
>>> you received this email by mistake, please notify the sender by return
>>> email as soon as possible.
>>>
>>> ___________________________________________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>>    http://lists.bx.psu.edu/
>>>
>>> To search Galaxy mailing lists use the unified search at:
>>>    http://galaxyproject.org/search/mailinglists/
>>>
>>
>>    The information in this email, including attachments, may be
>> confidential and is intended solely for the addressee(s). If you believe
>> you received this email by mistake, please notify the sender by return
>> email as soon as possible.
>>
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>    http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>    http://galaxyproject.org/search/mailinglists/
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/