Salmon references and data manager

classic Classic list List threaded Threaded
4 messages Options
| Threaded
Open this post in threaded view
|

Salmon references and data manager

Previti

Dear Björn,

I just installed Salmon on our Galaxy instance and I have a couple of basic questions.

Currently the reference transcriptomes are put in the same data table as the genomes, would it be of interest to separate this and give the

transcriptomes their own table? I could probably try to do this...

There is a data manager available that unfortunately has a bug. We fixed that and it now populates the reference genome data table.

I would probably modify this as well use the new table. Could this be useful? I'm not sure how to proceed...would I give you the modified Salmon wrapper for inclusion in the package?

Best regards,

Christopher


--
Dr. Christopher Previti
Genomics and
Proteomics Core Facility
High Throughput Sequencing (W190)
Bioinformatician

German Cancer Research Center (DKFZ)
Foundation under Public Law
Im Neuenheimer Feld 580
69120 Heidelberg
Germany
Room: B2.102 (INF580/TP3)
Phone: +49 6221 42-4661

christopher.previti@...
www.dkfz.de

Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta
VAT-ID No.: DE143293537

Vertraulichkeitshinweis: Diese Nachricht ist ausschließlich für die Personen bestimmt, an die sie adressiert ist.
Sie kann vertrauliche und/oder nur für den/die Empfänger bestimmte Informationen enthalten. Sollten Sie nicht
der bestimmungsgemäße Empfänger sein, kontaktieren Sie bitte den Absender und löschen Sie die Mitteilung.
Jegliche unbefugte Verwendung der Informationen in dieser Nachricht ist untersagt.



    

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
| Threaded
Open this post in threaded view
|

Re: Salmon references and data manager

Björn Grüning-3
Hi Christopher!

> Dear Björn,
>
> I just installed Salmon on our Galaxy instance and I have a couple of
> basic questions.

Sure, thanks for getting in touch!

> Currently the reference transcriptomes are put in the same data table as
> the genomes, would it be of interest to separate this and give the
>
> transcriptomes their own table? I could probably try to do this...

That I don't understand?
Salmon is using this one here, isn't it?

https://github.com/bgruening/galaxytools/blob/master/tools/salmon/salmon.xml#L233

> There is a data manager available that unfortunately has a bug. We fixed
> that and it now populates the reference genome data table.

Do you mean this one?

https://github.com/ieguinoa/data_manager_salmon_index_builder

> I would probably modify this as well use the new table. Could this be
> useful? I'm not sure how to proceed...would I give you the modified
> Salmon wrapper for inclusion in the package?

If you can, please feel free to create PRs to the repositories, so we
can all reviewed it. And then, when we merge, it gets automatically
updated to the Tool Shed :)

Thanks!
Bjoern

> Best regards,
>
> Christopher
>
>
> --
> *Dr. Christopher Previti*
> Genomics and Proteomics Core Facility
> High Throughput Sequencing (W190)
> Bioinformatician
>
> German Cancer Research Center (DKFZ)
> Foundation under Public Law
> Im Neuenheimer Feld 580
> 69120 Heidelberg
> Germany
> Room: B2.102 (INF580/TP3)
> Phone: +49 6221 42-4661
>
> [hidden email] <http://www.dkfz.de/>
> www.dkfz.de <http://www.dkfz.de/>
>
> Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta
> VAT-ID No.: DE143293537
>
> Vertraulichkeitshinweis: Diese Nachricht ist ausschließlich für die
> Personen bestimmt, an die sie adressiert ist.
> Sie kann vertrauliche und/oder nur für den/die Empfänger bestimmte
> Informationen enthalten. Sollten Sie nicht
> der bestimmungsgemäße Empfänger sein, kontaktieren Sie bitte den
> Absender und löschen Sie die Mitteilung.
> Jegliche unbefugte Verwendung der Informationen in dieser Nachricht ist
> untersagt.
>
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
| Threaded
Open this post in threaded view
|

Re: Salmon references and data manager

Ignacio EGUINOA
Hi Christopher and Björn,

I have some comments about this because I also came up with these questions some time ago...


From: "Björn Grüning" <[hidden email]>
To: "Previti" <[hidden email]>, "galaxy-dev" <[hidden email]>
Sent: Friday, September 7, 2018 9:56:41 AM
Subject: Re: [galaxy-dev] Salmon references and data manager
Hi Christopher!

> Dear Björn,
>
> I just installed Salmon on our Galaxy instance and I have a couple of
> basic questions.

Sure, thanks for getting in touch!

> Currently the reference transcriptomes are put in the same data table as
> the genomes, would it be of interest to separate this and give the
>
> transcriptomes their own table? I could probably try to do this...

That I don't understand?
Salmon is using this one here, isn't it?

https://github.com/bgruening/galaxytools/blob/master/tools/salmon/salmon.xml#L233
What he means, I think, is the table to build the index from. Data managers that take a transcriptome as input get it from the all_fasta table, I think that is what he means by the genomes table.
As I said at some point I also thought it may be useful to have a separate table (e.g all_transcriptomes) so that the genome and transcriptome entries of the same build don't get mixed. I think it would be good to have a way of listing only the transcriptomes from the all_gff but that would requiere some kind of standard on the naming to filter. We had this in our instance at some point but didn't help at all so I just modified the data manger to use the all_fasta and that is what I published.
So, @Christopher ...having a separate table is not the solution although it would be easier for the GUI. For now just giving the entries a descriptive name to indicate the entries correspond to a transcriptome is enough and works ok for us. In any case this is not for users and at least for us its all handled through the API so, again, it's just a matter of taking care of the entries names and you are fine with using the all_fasta table.



> There is a data manager available that unfortunately has a bug. We fixed
> that and it now populates the reference genome data table.

Do you mean this one?

https://github.com/ieguinoa/data_manager_salmon_index_builder

> I would probably modify this as well use the new table. Could this be
> useful? I'm not sure how to proceed...would I give you the modified
> Salmon wrapper for inclusion in the package?

If you can, please feel free to create PRs to the repositories, so we
can all reviewed it. And then, when we merge, it gets automatically
updated to the Tool Shed :)
As Björn said, if that's the one you are talking about please create a PR or an isssue or contact me.

Cheers,
Ignacio
Thanks!
Bjoern

> Best regards,
>
> Christopher
>
>
> --
> *Dr. Christopher Previti*
> Genomics and Proteomics Core Facility
> High Throughput Sequencing (W190)
> Bioinformatician
>
> German Cancer Research Center (DKFZ)
> Foundation under Public Law
> Im Neuenheimer Feld 580
> 69120 Heidelberg
> Germany
> Room: B2.102 (INF580/TP3)
> Phone: +49 6221 42-4661
>
> [hidden email] <http://www.dkfz.de/>
> www.dkfz.de <http://www.dkfz.de/>
>
> Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta
> VAT-ID No.: DE143293537
>
> Vertraulichkeitshinweis: Diese Nachricht ist ausschließlich für die
> Personen bestimmt, an die sie adressiert ist.
> Sie kann vertrauliche und/oder nur für den/die Empfänger bestimmte
> Informationen enthalten. Sollten Sie nicht
> der bestimmungsgemäße Empfänger sein, kontaktieren Sie bitte den
> Absender und löschen Sie die Mitteilung.
> Jegliche unbefugte Verwendung der Informationen in dieser Nachricht ist
> untersagt.
>
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/
| Threaded
Open this post in threaded view
|

Re: Salmon references and data manager

Previti

Yeah, I got confused about the data tables. Sorry about this. I too would keep the transcriptome indices separate from the reference genomes, it just makes sense.

 @Ignacio, I found that you need insert the following (in red)

if not os.path.exists( target_directory ):

    os.mkdir( target_directory )

args = ['salmon','index']

in order for anything to happen.

I think that's it...but I'll test some more.

Best regards,

Christopher


On 09/07/2018 10:46 AM, Ignacio EGUINOA wrote:
Hi Christopher and Björn,

I have some comments about this because I also came up with these questions some time ago...


From: "Björn Grüning" [hidden email]
To: "Previti" [hidden email], "galaxy-dev" [hidden email]
Sent: Friday, September 7, 2018 9:56:41 AM
Subject: Re: [galaxy-dev] Salmon references and data manager
Hi Christopher!

> Dear Björn,
>
> I just installed Salmon on our Galaxy instance and I have a couple of
> basic questions.

Sure, thanks for getting in touch!

> Currently the reference transcriptomes are put in the same data table as
> the genomes, would it be of interest to separate this and give the
>
> transcriptomes their own table? I could probably try to do this...

That I don't understand?
Salmon is using this one here, isn't it?

https://github.com/bgruening/galaxytools/blob/master/tools/salmon/salmon.xml#L233
What he means, I think, is the table to build the index from. Data managers that take a transcriptome as input get it from the all_fasta table, I think that is what he means by the genomes table.
As I said at some point I also thought it may be useful to have a separate table (e.g all_transcriptomes) so that the genome and transcriptome entries of the same build don't get mixed. I think it would be good to have a way of listing only the transcriptomes from the all_gff but that would requiere some kind of standard on the naming to filter. We had this in our instance at some point but didn't help at all so I just modified the data manger to use the all_fasta and that is what I published.
So, @Christopher ...having a separate table is not the solution although it would be easier for the GUI. For now just giving the entries a descriptive name to indicate the entries correspond to a transcriptome is enough and works ok for us. In any case this is not for users and at least for us its all handled through the API so, again, it's just a matter of taking care of the entries names and you are fine with using the all_fasta table.



> There is a data manager available that unfortunately has a bug. We fixed
> that and it now populates the reference genome data table.

Do you mean this one?

https://github.com/ieguinoa/data_manager_salmon_index_builder

> I would probably modify this as well use the new table. Could this be
> useful? I'm not sure how to proceed...would I give you the modified
> Salmon wrapper for inclusion in the package?

If you can, please feel free to create PRs to the repositories, so we
can all reviewed it. And then, when we merge, it gets automatically
updated to the Tool Shed :)
As Björn said, if that's the one you are talking about please create a PR or an isssue or contact me.

Cheers,
Ignacio
Thanks!
Bjoern

> Best regards,
>
> Christopher
>
>
> --
> *Dr. Christopher Previti*
> Genomics and Proteomics Core Facility
> High Throughput Sequencing (W190)
> Bioinformatician
>
> German Cancer Research Center (DKFZ)
> Foundation under Public Law
> Im Neuenheimer Feld 580
> 69120 Heidelberg
> Germany
> Room: B2.102 (INF580/TP3)
> Phone: +49 6221 42-4661
>
> [hidden email] <http://www.dkfz.de/>
> www.dkfz.de <http://www.dkfz.de/>
>
> Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta
> VAT-ID No.: DE143293537
>
> Vertraulichkeitshinweis: Diese Nachricht ist ausschließlich für die
> Personen bestimmt, an die sie adressiert ist.
> Sie kann vertrauliche und/oder nur für den/die Empfänger bestimmte
> Informationen enthalten. Sollten Sie nicht
> der bestimmungsgemäße Empfänger sein, kontaktieren Sie bitte den
> Absender und löschen Sie die Mitteilung.
> Jegliche unbefugte Verwendung der Informationen in dieser Nachricht ist
> untersagt.
>
>
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/

--
Dr. Christopher Previti
Genomics and
Proteomics Core Facility
High Throughput Sequencing (W190)
Bioinformatician

German Cancer Research Center (DKFZ)
Foundation under Public Law
Im Neuenheimer Feld 580
69120 Heidelberg
Germany
Room: B2.102 (INF580/TP3)
Phone: +49 6221 42-4661

christopher.previti@...
www.dkfz.de

Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta
VAT-ID No.: DE143293537

Vertraulichkeitshinweis: Diese Nachricht ist ausschließlich für die Personen bestimmt, an die sie adressiert ist.
Sie kann vertrauliche und/oder nur für den/die Empfänger bestimmte Informationen enthalten. Sollten Sie nicht
der bestimmungsgemäße Empfänger sein, kontaktieren Sie bitte den Absender und löschen Sie die Mitteilung.
Jegliche unbefugte Verwendung der Informationen in dieser Nachricht ist untersagt.



    

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/