Providing BLAST db in a data library

classic Classic list List threaded Threaded
12 messages Options
Ulf
Reply | Threaded
Open this post in threaded view
|

Providing BLAST db in a data library

Ulf
Dear all

I have several smallish BLAST databases that I would like to provide in
a data library. I create them in a history with the makeblastdb tool and
them try to add them to the library. I see that for each blast db there
is an empty file created (like /path/dataset_12345.dat) and a folder
with the same name (/path/dataset_12345_files/) that contains the actual
db files (blastdb.n*).

In my library the blastdb shows up empty and I cannot import it back to
another history. I does not seem to be aware of the _files folder,
despite it being the right data type (blastdbn).

Any ideas what I am doing wrong?

Thanks a lot for your help
Ulf

**************************************************************************
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
**************************************************************************

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

Peter Cock
On Wed, Jul 23, 2014 at 10:47 AM, Ulf Schaefer <[hidden email]> wrote:

> Dear all
>
> I have several smallish BLAST databases that I would like to provide in
> a data library. I create them in a history with the makeblastdb tool and
> them try to add them to the library. I see that for each blast db there
> is an empty file created (like /path/dataset_12345.dat) and a folder
> with the same name (/path/dataset_12345_files/) that contains the actual
> db files (blastdb.n*).
>
> In my library the blastdb shows up empty and I cannot import it back to
> another history. I does not seem to be aware of the _files folder,
> despite it being the right data type (blastdbn).
>
> Any ideas what I am doing wrong?
>
> Thanks a lot for your help
> Ulf

Hi Ulf,

I've never tried that. It could be a bug in Galaxy importing
composite datatypes into a library, or something in the BLAST
database definition which needs fixing. Does importing an
HTML report (with child files like images) into a library work
for you? (This is another composite datatype so a useful
comparison).

Rather than using Data Libraries, we just list all the locally
installed shared BLAST databases via the BLAST *.loc
files instead.

Note using the *.loc files makes the databases available to
all the Galaxy users, while with a Data Library you can
control access to specific groups/roles.

Regards,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Ulf
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

Ulf
Dear Peter

Thanks for your reply.

I can import an html report (e.g. FastQC output) successfully into a new
history from a data library. But the .dat file for the html is not empty
like the one for the blastdb. Makes me think that I could do this with a
blast db as well, if only it would not check for size 0 at the time of
importing it.

Thanks
Ulf

On 23/07/14 10:56, Peter Cock wrote:

> On Wed, Jul 23, 2014 at 10:47 AM, Ulf Schaefer <[hidden email]> wrote:
>> Dear all
>>
>> I have several smallish BLAST databases that I would like to provide in
>> a data library. I create them in a history with the makeblastdb tool and
>> them try to add them to the library. I see that for each blast db there
>> is an empty file created (like /path/dataset_12345.dat) and a folder
>> with the same name (/path/dataset_12345_files/) that contains the actual
>> db files (blastdb.n*).
>>
>> In my library the blastdb shows up empty and I cannot import it back to
>> another history. I does not seem to be aware of the _files folder,
>> despite it being the right data type (blastdbn).
>>
>> Any ideas what I am doing wrong?
>>
>> Thanks a lot for your help
>> Ulf
>
> Hi Ulf,
>
> I've never tried that. It could be a bug in Galaxy importing
> composite datatypes into a library, or something in the BLAST
> database definition which needs fixing. Does importing an
> HTML report (with child files like images) into a library work
> for you? (This is another composite datatype so a useful
> comparison).
>
> Rather than using Data Libraries, we just list all the locally
> installed shared BLAST databases via the BLAST *.loc
> files instead.
>
> Note using the *.loc files makes the databases available to
> all the Galaxy users, while with a Data Library you can
> control access to specific groups/roles.
>
> Regards,
>
> Peter
>

**************************************************************************
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
**************************************************************************

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

Peter Cock
Interesting hypothesis - you may well be right.

Galaxy guys - who is the expert to talk to on this and/or where
in the code should we be looking?

Thanks,

Peter

On Wed, Jul 23, 2014 at 11:22 AM, Ulf Schaefer <[hidden email]> wrote:

> Dear Peter
>
> Thanks for your reply.
>
> I can import an html report (e.g. FastQC output) successfully into a new
> history from a data library. But the .dat file for the html is not empty
> like the one for the blastdb. Makes me think that I could do this with a
> blast db as well, if only it would not check for size 0 at the time of
> importing it.
>
> Thanks
> Ulf
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

Nate Coraor (nate@bx.psu.edu)
On Jul 23, 2014, at 6:42 AM, Peter Cock <[hidden email]> wrote:

> Interesting hypothesis - you may well be right.
>
> Galaxy guys - who is the expert to talk to on this and/or where
> in the code should we be looking?
>
> Thanks,
>
> Peter

I think there's a bit of a mixup here - Peter, I believe you were asking if other composite types with an html primary dataset could be imported from the history to library, but Ulf, your test was the other direction (library->history). I'd be interested in knowing the outcome of the history->library test as well.

I am woefully ignorant about the blastdbn datatype. Is the primary file supposed to be html type but empty?

--nate

>
> On Wed, Jul 23, 2014 at 11:22 AM, Ulf Schaefer <[hidden email]> wrote:
>> Dear Peter
>>
>> Thanks for your reply.
>>
>> I can import an html report (e.g. FastQC output) successfully into a new
>> history from a data library. But the .dat file for the html is not empty
>> like the one for the blastdb. Makes me think that I could do this with a
>> blast db as well, if only it would not check for size 0 at the time of
>> importing it.
>>
>> Thanks
>> Ulf
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

Peter Cock
On Thu, Jul 24, 2014 at 2:50 PM, Nate Coraor <[hidden email]> wrote:

> On Jul 23, 2014, at 6:42 AM, Peter Cock <[hidden email]> wrote:
>
>> Interesting hypothesis - you may well be right.
>>
>> Galaxy guys - who is the expert to talk to on this and/or where
>> in the code should we be looking?
>>
>> Thanks,
>>
>> Peter
>
> I think there's a bit of a mixup here - Peter, I believe you were asking
> if other composite types with an html primary dataset could be imported
> from the history to library, but Ulf, your test was the other direction
> (library->history). I'd be interested in knowing the outcome of the
> history->library test as well.

Good catch - yes, that was what I was asking about. Ulf?

> I am woefully ignorant about the blastdbn datatype. Is the primary
> file supposed to be html type but empty?

The BLAST databases are 'basic' composite datatypes, of which
the most commonly used example is HTML (and some bits of
the base class code code seem to assume HTML). This means
testing if something works with HTML is a good first step.

https://github.com/peterjc/galaxy_blast/tree/master/datatypes/blast_datatypes

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Ulf
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

Ulf
Dear Nate, dear Peter

Sorry for the delay in replying.

I can import both HTML and blastdb from a history to a data library. If
I try to get the data out of the library into anothre history, I am
successful for the html but not for the blastdb. The problem seems to be
that the primary data file (the /path/dataset_12345.dat) is empty for
the blastdb, while the html primary file has something in it.

When I try to import the blastdb (from library to history) there is a
message along the lines of "can't import empty file". I hypothesise
(admittedly without having looked at a line of code) that there is a
test for file size 0 somewhere that is either altogether unnecessary or,
more likely, does not take into account that for composite datatypes it
might be completely legitimate for the primary file to be empty.

Or is my primary blastdb file not supposed to be empty in the first
place? I can blast against it just fine.

Thanks a lot for your help
Ulf

On 24/07/14 15:02, Peter Cock wrote:

> On Thu, Jul 24, 2014 at 2:50 PM, Nate Coraor <[hidden email]> wrote:
>> On Jul 23, 2014, at 6:42 AM, Peter Cock <[hidden email]> wrote:
>>
>>> Interesting hypothesis - you may well be right.
>>>
>>> Galaxy guys - who is the expert to talk to on this and/or where
>>> in the code should we be looking?
>>>
>>> Thanks,
>>>
>>> Peter
>>
>> I think there's a bit of a mixup here - Peter, I believe you were asking
>> if other composite types with an html primary dataset could be imported
>> from the history to library, but Ulf, your test was the other direction
>> (library->history). I'd be interested in knowing the outcome of the
>> history->library test as well.
>
> Good catch - yes, that was what I was asking about. Ulf?
>
>> I am woefully ignorant about the blastdbn datatype. Is the primary
>> file supposed to be html type but empty?
>
> The BLAST databases are 'basic' composite datatypes, of which
> the most commonly used example is HTML (and some bits of
> the base class code code seem to assume HTML). This means
> testing if something works with HTML is a good first step.
>
> https://github.com/peterjc/galaxy_blast/tree/master/datatypes/blast_datatypes
>
> Peter
>

**************************************************************************
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
**************************************************************************

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

Peter Cock
On Mon, Jul 28, 2014 at 8:28 AM, Ulf Schaefer <[hidden email]> wrote:
> Dear Nate, dear Peter
>
> Sorry for the delay in replying.
>
> I can import both HTML and blastdb from a history to a data library. If
> I try to get the data out of the library into anothre history, I am
> successful for the html but not for the blastdb. The problem seems to be
> that the primary data file (the /path/dataset_12345.dat) is empty for
> the blastdb, while the html primary file has something in it.

OK. Can you tell where Galaxy thinks the library files are on disk,
and check to see if the folder of BLAST database files is actually
there?

> When I try to import the blastdb (from library to history) there is a
> message along the lines of "can't import empty file". I hypothesise
> (admittedly without having looked at a line of code) that there is a
> test for file size 0 somewhere that is either altogether unnecessary or,
> more likely, does not take into account that for composite datatypes it
> might be completely legitimate for the primary file to be empty.

This guess makes sense - but I've not yet tried to trace through
the code either.

> Or is my primary blastdb file not supposed to be empty in the first
> place? I can blast against it just fine.

The BLAST databases do not define/populate a primary file, so
Galaxy seems to create a dummy empty file on its own. I have
wondered about altering the BLAST database datatype definition
to have a human readable text file as the "primary file" (i.e. the
information currently saved as a text log file when creating a
database).

> Thanks a lot for your help
> Ulf

You too - you've found an "interesting" bug...

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Ulf
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

Ulf
Dear Nate, dear Peter

Again, sorry for the delay in replying.

Yes I can. It looks like this

[galaxy@srv ~]$ cat /galaxy/database/files/081/dataset_81002.dat
[galaxy@srv ~]$ ls /galaxy/database/files/081/dataset_81002_files/
blastdb.nhd  blastdb.nhi  blastdb.nhr  blastdb.nin  blastdb.nog
blastdb.nsd  blastdb.nsi  blastdb.nsq

I think the simplest solution would be to put something in the primary
file. Just a short string that gets the file size above 0.

I personally have followed you initial suggestion and made the dbs
available globally via the .loc file.

Thanks again
Ulf


On 28/07/14 09:43, Peter Cock wrote:

> On Mon, Jul 28, 2014 at 8:28 AM, Ulf Schaefer <[hidden email]> wrote:
>> Dear Nate, dear Peter
>>
>> Sorry for the delay in replying.
>>
>> I can import both HTML and blastdb from a history to a data library. If
>> I try to get the data out of the library into anothre history, I am
>> successful for the html but not for the blastdb. The problem seems to be
>> that the primary data file (the /path/dataset_12345.dat) is empty for
>> the blastdb, while the html primary file has something in it.
>
> OK. Can you tell where Galaxy thinks the library files are on disk,
> and check to see if the folder of BLAST database files is actually
> there?
>
>> When I try to import the blastdb (from library to history) there is a
>> message along the lines of "can't import empty file". I hypothesise
>> (admittedly without having looked at a line of code) that there is a
>> test for file size 0 somewhere that is either altogether unnecessary or,
>> more likely, does not take into account that for composite datatypes it
>> might be completely legitimate for the primary file to be empty.
>
> This guess makes sense - but I've not yet tried to trace through
> the code either.
>
>> Or is my primary blastdb file not supposed to be empty in the first
>> place? I can blast against it just fine.
>
> The BLAST databases do not define/populate a primary file, so
> Galaxy seems to create a dummy empty file on its own. I have
> wondered about altering the BLAST database datatype definition
> to have a human readable text file as the "primary file" (i.e. the
> information currently saved as a text log file when creating a
> database).
>
>> Thanks a lot for your help
>> Ulf
>
> You too - you've found an "interesting" bug...
>
> Peter
>

**************************************************************************
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
**************************************************************************

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

Peter Cock
On Wed, Jul 30, 2014 at 11:52 AM, Ulf Schaefer <[hidden email]> wrote:

> Dear Nate, dear Peter
>
> Again, sorry for the delay in replying.
>
> Yes I can. It looks like this
>
> [galaxy@srv ~]$ cat /galaxy/database/files/081/dataset_81002.dat
> [galaxy@srv ~]$ ls /galaxy/database/files/081/dataset_81002_files/
> blastdb.nhd  blastdb.nhi  blastdb.nhr  blastdb.nin  blastdb.nog
> blastdb.nsd  blastdb.nsi  blastdb.nsq

Good. Thanks for confirming that.

> I think the simplest solution would be to put something in the primary
> file. Just a short string that gets the file size above 0.

That won't help with all the existing datasets out there - I think we
rather need to fix something in the Galaxy code for composite files...

> I personally have followed you initial suggestion and made the dbs
> available globally via the .loc file.
>
> Thanks again
> Ulf

Great.

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

John Chilton-4
Thanks for tracking down the problem - it sounds like it is a Galaxy
bug then so I have created a Trello card
(https://trello.com/c/bNEKfOWR).

-John

On Wed, Jul 30, 2014 at 7:06 AM, Peter Cock <[hidden email]> wrote:

> On Wed, Jul 30, 2014 at 11:52 AM, Ulf Schaefer <[hidden email]> wrote:
>> Dear Nate, dear Peter
>>
>> Again, sorry for the delay in replying.
>>
>> Yes I can. It looks like this
>>
>> [galaxy@srv ~]$ cat /galaxy/database/files/081/dataset_81002.dat
>> [galaxy@srv ~]$ ls /galaxy/database/files/081/dataset_81002_files/
>> blastdb.nhd  blastdb.nhi  blastdb.nhr  blastdb.nin  blastdb.nog
>> blastdb.nsd  blastdb.nsi  blastdb.nsq
>
> Good. Thanks for confirming that.
>
>> I think the simplest solution would be to put something in the primary
>> file. Just a short string that gets the file size above 0.
>
> That won't help with all the existing datasets out there - I think we
> rather need to fix something in the Galaxy code for composite files...
>
>> I personally have followed you initial suggestion and made the dbs
>> available globally via the .loc file.
>>
>> Thanks again
>> Ulf
>
> Great.
>
> Peter
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Providing BLAST db in a data library

Peter Cock
In reply to this post by Peter Cock
On Mon, Jul 28, 2014 at 9:43 AM, Peter Cock <[hidden email]> wrote:

> On Mon, Jul 28, 2014 at 8:28 AM, Ulf Schaefer <[hidden email]> wrote:
>> Dear Nate, dear Peter
>>
>> Sorry for the delay in replying.
>>
>> I can import both HTML and blastdb from a history to a data library. If
>> I try to get the data out of the library into anothre history, I am
>> successful for the html but not for the blastdb. The problem seems to be
>> that the primary data file (the /path/dataset_12345.dat) is empty for
>> the blastdb, while the html primary file has something in it.
>
> OK. Can you tell where Galaxy thinks the library files are on disk,
> and check to see if the folder of BLAST database files is actually
> there?
>
>> When I try to import the blastdb (from library to history) there is a
>> message along the lines of "can't import empty file". I hypothesise
>> (admittedly without having looked at a line of code) that there is a
>> test for file size 0 somewhere that is either altogether unnecessary or,
>> more likely, does not take into account that for composite datatypes it
>> might be completely legitimate for the primary file to be empty.
>
> This guess makes sense - but I've not yet tried to trace through
> the code either.
>
>> Or is my primary blastdb file not supposed to be empty in the first
>> place? I can blast against it just fine.
>
> The BLAST databases do not define/populate a primary file, so
> Galaxy seems to create a dummy empty file on its own. I have
> wondered about altering the BLAST database datatype definition
> to have a human readable text file as the "primary file" (i.e. the
> information currently saved as a text log file when creating a
> database).

Correction: I actually implemented this late last year (included in
BLAST+ wrapper version v0.0.22 onwards, and the Galaxy
BLAST datatypes version v0.0.18 onwards):

https://github.com/peterjc/galaxy_blast/commit/9b3f65cddcc60de26de63272c362c6ca53f6559d
https://github.com/peterjc/galaxy_blast/commit/2ebfb790d5a1bbe310c3d7ccc2b953c2c37bccf2

The makeblastdb wrapper will send the stdout (log information)
to the dummy index file, see the end of the <command> tag in:
https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/ncbi_makeblastdb.xml

The display_data method for a BLAST database will show any
makeblastdb log information held in the dummy index file, see
https://github.com/peterjc/galaxy_blast/blob/master/datatypes/blast_datatypes/blast.py

i.e. Only older BLAST databases in histories should have empty
dummy index files, which will mitigate the library problem:
https://trello.com/c/bNEKfOWR

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/