Unable to remove old datasets

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Unable to remove old datasets

Sanka, Ravi
Greetings,

Despite being an admin, I am unable to remove old datasets from our Galaxy instance. I am following the procedure detailed in:


  1. delete_userless_histories.sh
  2. purge_histories.sh
  3. purge_libraries.sh
  4. purge_folders.sh
  5. delete_datasets.sh   -->  desired to remove datasets before their outer container had been deleted
  6. purge_datasets.sh
None of the scripts have been changed. They all call cleanup_datasets.py with –d at 10 and –r enabled.

But it does not appear to have any effect. All datasets (both those older than 10 days and those more recent) in <galaxy root>/database/files are still present, despite the –r setting in each script.

Is there some parameter or such that needs to be set in the universe config that will allow this process to work?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
301-795-7743
----------------------------------------------

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Unable to remove old datasets

Peter Cock
Have the owners of the old datasets marked them as permanently deleted?

Peter

On Thu, Mar 13, 2014 at 5:35 PM, Sanka, Ravi <[hidden email]> wrote:

> Greetings,
>
> Despite being an admin, I am unable to remove old datasets from our Galaxy
> instance. I am following the procedure detailed in:
>
> https://wiki.galaxyproject.org/Admin/Config/Performance/Purge%20Histories%20and%20Datasets
>
> delete_userless_histories.sh
> purge_histories.sh
> purge_libraries.sh
> purge_folders.sh
> delete_datasets.sh   -->  desired to remove datasets before their outer
> container had been deleted
> purge_datasets.sh
>
> None of the scripts have been changed. They all call cleanup_datasets.py
> with -d at 10 and -r enabled.
>
> But it does not appear to have any effect. All datasets (both those older
> than 10 days and those more recent) in <galaxy root>/database/files are
> still present, despite the -r setting in each script.
>
> Is there some parameter or such that needs to be set in the universe config
> that will allow this process to work?
>
> ----------------------------------------------
> Ravi Sanka
> ICS - Sr. Bioinformatics Engineer
> J. Craig Venter Institute
> 301-795-7743
> ----------------------------------------------
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Unable to remove old datasets

Sanka, Ravi
I do not think so. Several individual datasets have been deleted (clicked
the upper-right X on the history item box) but no History has been
permanently deleted.

Is there any indication in the database if target dataset or datasets were
marked for permanent deletion? In the dataset table, I see fields
"deleted", "purged", and "purgable", but nothing that says permanently
deleted.

----------------------------------------------
Ravi Sanka
ICS ­ Sr. Bioinformatics Engineer
J. Craig Venter Institute
301-795-7743
----------------------------------------------




On 3/13/14 1:45 PM, "Peter Cock" <[hidden email]> wrote:

>Have the owners of the old datasets marked them as permanently deleted?
>
>Peter
>
>On Thu, Mar 13, 2014 at 5:35 PM, Sanka, Ravi <[hidden email]> wrote:
>> Greetings,
>>
>> Despite being an admin, I am unable to remove old datasets from our
>>Galaxy
>> instance. I am following the procedure detailed in:
>>
>>
>>https://wiki.galaxyproject.org/Admin/Config/Performance/Purge%20Histories
>>%20and%20Datasets
>>
>> delete_userless_histories.sh
>> purge_histories.sh
>> purge_libraries.sh
>> purge_folders.sh
>> delete_datasets.sh   -->  desired to remove datasets before their outer
>> container had been deleted
>> purge_datasets.sh
>>
>> None of the scripts have been changed. They all call cleanup_datasets.py
>> with -d at 10 and -r enabled.
>>
>> But it does not appear to have any effect. All datasets (both those
>>older
>> than 10 days and those more recent) in <galaxy root>/database/files are
>> still present, despite the -r setting in each script.
>>
>> Is there some parameter or such that needs to be set in the universe
>>config
>> that will allow this process to work?
>>
>> ----------------------------------------------
>> Ravi Sanka
>> ICS - Sr. Bioinformatics Engineer
>> J. Craig Venter Institute
>> 301-795-7743
>> ----------------------------------------------
>>
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>   http://lists.bx.psu.edu/
>>
>> To search Galaxy mailing lists use the unified search at:
>>   http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Unable to remove old datasets

Peter Cock
On Thu, Mar 13, 2014 at 6:40 PM, Sanka, Ravi <[hidden email]> wrote:
> I do not think so. Several individual datasets have been deleted (clicked
> the upper-right X on the history item box) but no History has been
> permanently deleted.
>
> Is there any indication in the database if target dataset or datasets were
> marked for permanent deletion? In the dataset table, I see fields
> "deleted", "purged", and "purgable", but nothing that says permanently
> deleted.

I would welcome clarification from the Galaxy Team, here and
on the wiki page which might benefit from a flow diagram?

https://wiki.galaxyproject.org/Admin/Config/Performance/Purge%20Histories%20and%20Datasets

My assumption is using "permanently delete" in the user interface
marks an entry as "purgable", and then it will be moved to "purged"
(and the associated file on disk deleted) by the cleanup scripts -
but I'm a bit hazy on this any why it takes a while for a user's
usage figures to change.

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Unable to remove old datasets

Peter Cock
On Fri, Mar 14, 2014 at 11:24 AM, Peter Cock <[hidden email]> wrote:

> On Thu, Mar 13, 2014 at 6:40 PM, Sanka, Ravi <[hidden email]> wrote:
>> I do not think so. Several individual datasets have been deleted (clicked
>> the upper-right X on the history item box) but no History has been
>> permanently deleted.
>>
>> Is there any indication in the database if target dataset or datasets were
>> marked for permanent deletion? In the dataset table, I see fields
>> "deleted", "purged", and "purgable", but nothing that says permanently
>> deleted.
>
> I would welcome clarification from the Galaxy Team, here and
> on the wiki page which might benefit from a flow diagram?
>
> https://wiki.galaxyproject.org/Admin/Config/Performance/Purge%20Histories%20and%20Datasets
>
> My assumption is using "permanently delete" in the user interface
> marks an entry as "purgable", and then it will be moved to "purged"
> (and the associated file on disk deleted) by the cleanup scripts -
> but I'm a bit hazy on this any why it takes a while for a user's
> usage figures to change.

Hmm. Right now I've unable (via the web interface) to permanently
delete a history - it stays stuck as "deleted", and thus (presumably)
won't get purged by the clean up scripts.

I've tried:

1. Load problem history
2. Rename the history "DIE DIE" to avoid confusion
3. Top right menu, "Delete permanently"
4. Prompted "Really delete the current history permanently? This
cannot be undone", OK
5. Told "History deleted, a new history is active"
6. Top right menu, "Saved Histories"
7. Click "Advanced Search", status "all"
8. Observe "DIE DIE" history is only "deleted" (while other older
histories are "deleted permanently") (BAD)
9. Run the cleanup scripts,

$ sh scripts/cleanup_datasets/delete_userless_histories.sh
$ sh scripts/cleanup_datasets/purge_histories.sh
$ sh scripts/cleanup_datasets/purge_libraries.sh
$ sh scripts/cleanup_datasets/purge_folders.sh
$ sh scripts/cleanup_datasets/purge_datasets.sh

10. Reload the saved history list, no change.
11. Using the drop down menu, select "Delete Permanently"
12. Prompted "History contents will be removed from disk, this cannot
be undone.  Continue", OK
13. No change to history status (BAD)
14. Tick the check-box, and use the "Delete Permanently" button at the
bottom of the page
15. Prompted "History contents will be removed from disk, this cannot
be undone.  Continue", OK
16. No change to history status (BAD)
17. Run the cleanup scripts, no change.

Note that in my universe_wsgi.ini I have not (yet) set:
allow_user_dataset_purge = True

If this setting is important, then the interface seems confused -
and if quotas are enforced, very frustrating :(

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Unable to remove old datasets

Carl Eberhard
Thanks, Ravi & Peter

I've added a card to get the allow_user_dataset_purge options into the client and to better show the viable options to the user: https://trello.com/c/RCPZ9zMF


On Fri, Mar 14, 2014 at 11:10 AM, Peter Cock <[hidden email]> wrote:
On Fri, Mar 14, 2014 at 11:24 AM, Peter Cock <[hidden email]> wrote:
> On Thu, Mar 13, 2014 at 6:40 PM, Sanka, Ravi <[hidden email]> wrote:
>> I do not think so. Several individual datasets have been deleted (clicked
>> the upper-right X on the history item box) but no History has been
>> permanently deleted.
>>
>> Is there any indication in the database if target dataset or datasets were
>> marked for permanent deletion? In the dataset table, I see fields
>> "deleted", "purged", and "purgable", but nothing that says permanently
>> deleted.
>
> I would welcome clarification from the Galaxy Team, here and
> on the wiki page which might benefit from a flow diagram?
>
> https://wiki.galaxyproject.org/Admin/Config/Performance/Purge%20Histories%20and%20Datasets
>
> My assumption is using "permanently delete" in the user interface
> marks an entry as "purgable", and then it will be moved to "purged"
> (and the associated file on disk deleted) by the cleanup scripts -
> but I'm a bit hazy on this any why it takes a while for a user's
> usage figures to change.

Hmm. Right now I've unable (via the web interface) to permanently
delete a history - it stays stuck as "deleted", and thus (presumably)
won't get purged by the clean up scripts.

I've tried:

1. Load problem history
2. Rename the history "DIE DIE" to avoid confusion
3. Top right menu, "Delete permanently"
4. Prompted "Really delete the current history permanently? This
cannot be undone", OK
5. Told "History deleted, a new history is active"
6. Top right menu, "Saved Histories"
7. Click "Advanced Search", status "all"
8. Observe "DIE DIE" history is only "deleted" (while other older
histories are "deleted permanently") (BAD)
9. Run the cleanup scripts,

$ sh scripts/cleanup_datasets/delete_userless_histories.sh
$ sh scripts/cleanup_datasets/purge_histories.sh
$ sh scripts/cleanup_datasets/purge_libraries.sh
$ sh scripts/cleanup_datasets/purge_folders.sh
$ sh scripts/cleanup_datasets/purge_datasets.sh

10. Reload the saved history list, no change.
11. Using the drop down menu, select "Delete Permanently"
12. Prompted "History contents will be removed from disk, this cannot
be undone.  Continue", OK
13. No change to history status (BAD)
14. Tick the check-box, and use the "Delete Permanently" button at the
bottom of the page
15. Prompted "History contents will be removed from disk, this cannot
be undone.  Continue", OK
16. No change to history status (BAD)
17. Run the cleanup scripts, no change.

Note that in my universe_wsgi.ini I have not (yet) set:
allow_user_dataset_purge = True

If this setting is important, then the interface seems confused -
and if quotas are enforced, very frustrating :(

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Unable to remove old datasets

Peter Cock
On Tue, Mar 18, 2014 at 2:14 PM, Carl Eberhard <[hidden email]> wrote:
> Thanks, Ravi & Peter
>
> I've added a card to get the allow_user_dataset_purge options into the
> client and to better show the viable options to the user:
> https://trello.com/c/RCPZ9zMF

Thanks Carl - so this was a user interface bug, showing the user
non-functional permanent delete (purge) options. That's clearer now.

In this situation can the user just 'delete', and wait N days for
the cleanup scripts to actually purge the files and free the space?
(It seems N=10 in scripts/cleanup/purge_*.sh at least, elsewhere
like the underlying Python script the default looks like N=60).

Regards,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Unable to remove old datasets

Carl Eberhard
I believe it's a (BAD) silent failure mode in the server code.

If I understand correctly, the purge request isn't coughing an error when it gets to the 'allow_user_dataset_purge' check and instead is silently marking (or re-marking) the datasets as deleted.

I would rather it fail with a 403 error if purge is explicitly requested.

That said, it of course would be better to remove the purge operation based on the configuration then to show an error after we've found you can't do the operation. The same holds true for the 'permanently remove this dataset' link in deleted datasets.

I'll see if I can find out the answer to your question on the cleanup scripts.


On Tue, Mar 18, 2014 at 10:49 AM, Peter Cock <[hidden email]> wrote:
On Tue, Mar 18, 2014 at 2:14 PM, Carl Eberhard <[hidden email]> wrote:
> Thanks, Ravi & Peter
>
> I've added a card to get the allow_user_dataset_purge options into the
> client and to better show the viable options to the user:
> https://trello.com/c/RCPZ9zMF

Thanks Carl - so this was a user interface bug, showing the user
non-functional permanent delete (purge) options. That's clearer now.

In this situation can the user just 'delete', and wait N days for
the cleanup scripts to actually purge the files and free the space?
(It seems N=10 in scripts/cleanup/purge_*.sh at least, elsewhere
like the underlying Python script the default looks like N=60).

Regards,

Peter


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Unable to remove old datasets

Carl Eberhard
The cleanup scripts enforce a sort of "lifetime" for the datasets.

The first time they're run, they may mark a dataset as deleted and also reset the update time and you'll have to wait N days for the next stage of the lifetime.

The next time they're run, or if a dataset has already been marked as deleted, the actual file removal happens and purged is set to true (if it wasn't already).

You can manually pass in '-d 0' to force removal of datasets recently marked as deleted.

The purge scripts do not check 'allow_user_dataset_purge', of course.


On Tue, Mar 18, 2014 at 11:50 AM, Carl Eberhard <[hidden email]> wrote:
I believe it's a (BAD) silent failure mode in the server code.

If I understand correctly, the purge request isn't coughing an error when it gets to the 'allow_user_dataset_purge' check and instead is silently marking (or re-marking) the datasets as deleted.

I would rather it fail with a 403 error if purge is explicitly requested.

That said, it of course would be better to remove the purge operation based on the configuration then to show an error after we've found you can't do the operation. The same holds true for the 'permanently remove this dataset' link in deleted datasets.

I'll see if I can find out the answer to your question on the cleanup scripts.


On Tue, Mar 18, 2014 at 10:49 AM, Peter Cock <[hidden email]> wrote:
On Tue, Mar 18, 2014 at 2:14 PM, Carl Eberhard <[hidden email]> wrote:
> Thanks, Ravi & Peter
>
> I've added a card to get the allow_user_dataset_purge options into the
> client and to better show the viable options to the user:
> https://trello.com/c/RCPZ9zMF

Thanks Carl - so this was a user interface bug, showing the user
non-functional permanent delete (purge) options. That's clearer now.

In this situation can the user just 'delete', and wait N days for
the cleanup scripts to actually purge the files and free the space?
(It seems N=10 in scripts/cleanup/purge_*.sh at least, elsewhere
like the underlying Python script the default looks like N=60).

Regards,

Peter



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Re: Unable to remove old datasets

Sanka, Ravi
I have now been able to successfully remove datasets from disk. After deleting the dataset or history from the front-end interface (as the user), I then run the cleanup scripts as admin:

python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -1 $@ >> ./scripts/cleanup_datasets/delete_userless_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -2 -r $@ >> ./scripts/cleanup_datasets/purge_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -3 -r $@ >> ./scripts/cleanup_datasets/purge_datasets.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -5 -r $@ >> ./scripts/cleanup_datasets/purge_folders.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -4 -r $@ >> ./scripts/cleanup_datasets/purge_libraries.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -6 -r $@ >> ./scripts/cleanup_datasets/delete_datasets.log

However, my final goal is to have a process that can remove old datasets from disk regardless of whether or not the users have deleted them at the front-end (and then automate said process via cronjob). This will be essentially in a situation where users are likely to leave datasets unattended and accumulating disk space.

I found the following Galaxy thread:


And am trying to use the script it mentions:

python ./scripts/cleanup_datasets/admin_cleanup_datasets.py universe_wsgi.ini -d 30 --smtp <smtp server> --fromaddr [hidden email]

I chose –d 30 to remove all datasets older than 30 days, which currently only targets one dataset. The resulting stdout indicates success:

"""""""""""""""""""""""""""""""""""""
# 2014-03-25 16:27:47 - Handling stuff older than 30 days
Marked HistoryDatasetAssociation id 301 as deleted

Subject: Galaxy Server Cleanup - 1 datasets DELETED
----------
Galaxy Server Cleanup
---------------------
The following datasets you own on Galaxy are older than 30 days and have been DELETED:

    "Small.fastq" in history "Unnamed history"

You may be able to undelete them by logging into Galaxy, navigating to the appropriate history, selecting "Include Deleted Datasets" from the history options menu, and clicking on the link to undelete each dataset that you want to keep.  You can then download the datasets.  Thank you for your understanding and cooporation in this necessary cleanup in order to keep the Galaxy resource available.  Please don't hesitate to contact us if you have any questions.

 -- Galaxy Administrators

Marked 1 dataset instances as deleted
"""""""""""""""""""""""""""""""""""""

But when I check the database, the status of dataset 301 is unchanged (ok-false-false-true).

I then run the same cleanup_datasets.py routine from above (but with –d 30), but dataset 301 is still present. I tried a second time, this time using –d 0, but still no deletion (which is not surprising since the dataset's deleted status is still false).

If I run admin_cleanup_datasets.py again with the same parameters, the stdout says no datasets matched the criteria, so it seems to remember it's previous execution, but it's NOT actually updating the database.

What am I doing wrong?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
301-795-7743
----------------------------------------------

From: Carl Eberhard <[hidden email]>
Date: Tuesday, March 18, 2014 2:09 PM
To: Peter Cock <[hidden email]>
Cc: Ravi Sanka <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Unable to remove old datasets

The cleanup scripts enforce a sort of "lifetime" for the datasets.

The first time they're run, they may mark a dataset as deleted and also reset the update time and you'll have to wait N days for the next stage of the lifetime.

The next time they're run, or if a dataset has already been marked as deleted, the actual file removal happens and purged is set to true (if it wasn't already).

You can manually pass in '-d 0' to force removal of datasets recently marked as deleted.

The purge scripts do not check 'allow_user_dataset_purge', of course.


On Tue, Mar 18, 2014 at 11:50 AM, Carl Eberhard <[hidden email]> wrote:
I believe it's a (BAD) silent failure mode in the server code.

If I understand correctly, the purge request isn't coughing an error when it gets to the 'allow_user_dataset_purge' check and instead is silently marking (or re-marking) the datasets as deleted.

I would rather it fail with a 403 error if purge is explicitly requested.

That said, it of course would be better to remove the purge operation based on the configuration then to show an error after we've found you can't do the operation. The same holds true for the 'permanently remove this dataset' link in deleted datasets.

I'll see if I can find out the answer to your question on the cleanup scripts.


On Tue, Mar 18, 2014 at 10:49 AM, Peter Cock <[hidden email]> wrote:
On Tue, Mar 18, 2014 at 2:14 PM, Carl Eberhard <[hidden email]> wrote:
> Thanks, Ravi & Peter
>
> I've added a card to get the allow_user_dataset_purge options into the
> client and to better show the viable options to the user:
> https://trello.com/c/RCPZ9zMF

Thanks Carl - so this was a user interface bug, showing the user
non-functional permanent delete (purge) options. That's clearer now.

In this situation can the user just 'delete', and wait N days for
the cleanup scripts to actually purge the files and free the space?
(It seems N=10 in scripts/cleanup/purge_*.sh at least, elsewhere
like the underlying Python script the default looks like N=60).

Regards,

Peter



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Re: Unable to remove old datasets

Nate Coraor (nate@bx.psu.edu)
Hi Ravi,

If you take a look at the dataset's entry in the history_dataset_association table, is that marked deleted? admin_cleanup_datasets.py only marks history_dataset_association rows deleted, not datasets.

Running the cleanup_datasets.py flow with -d 0 should have then caused the dataset to be deleted and purged, but this may not be the case if there is more than one instance of the dataset you are trying to purge (either another copy in a history somewhere, or in a library).

--nate


On Tue, Mar 25, 2014 at 5:12 PM, Sanka, Ravi <[hidden email]> wrote:
I have now been able to successfully remove datasets from disk. After deleting the dataset or history from the front-end interface (as the user), I then run the cleanup scripts as admin:

python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -1 $@ >> ./scripts/cleanup_datasets/delete_userless_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -2 -r $@ >> ./scripts/cleanup_datasets/purge_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -3 -r $@ >> ./scripts/cleanup_datasets/purge_datasets.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -5 -r $@ >> ./scripts/cleanup_datasets/purge_folders.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -4 -r $@ >> ./scripts/cleanup_datasets/purge_libraries.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -6 -r $@ >> ./scripts/cleanup_datasets/delete_datasets.log

However, my final goal is to have a process that can remove old datasets from disk regardless of whether or not the users have deleted them at the front-end (and then automate said process via cronjob). This will be essentially in a situation where users are likely to leave datasets unattended and accumulating disk space.

I found the following Galaxy thread:


And am trying to use the script it mentions:

python ./scripts/cleanup_datasets/admin_cleanup_datasets.py universe_wsgi.ini -d 30 --smtp <smtp server> --fromaddr [hidden email]

I chose –d 30 to remove all datasets older than 30 days, which currently only targets one dataset. The resulting stdout indicates success:

"""""""""""""""""""""""""""""""""""""
# 2014-03-25 16:27:47 - Handling stuff older than 30 days
Marked HistoryDatasetAssociation id 301 as deleted

Subject: Galaxy Server Cleanup - 1 datasets DELETED
----------
Galaxy Server Cleanup
---------------------
The following datasets you own on Galaxy are older than 30 days and have been DELETED:

    "Small.fastq" in history "Unnamed history"

You may be able to undelete them by logging into Galaxy, navigating to the appropriate history, selecting "Include Deleted Datasets" from the history options menu, and clicking on the link to undelete each dataset that you want to keep.  You can then download the datasets.  Thank you for your understanding and cooporation in this necessary cleanup in order to keep the Galaxy resource available.  Please don't hesitate to contact us if you have any questions.

 -- Galaxy Administrators

Marked 1 dataset instances as deleted
"""""""""""""""""""""""""""""""""""""

But when I check the database, the status of dataset 301 is unchanged (ok-false-false-true).

I then run the same cleanup_datasets.py routine from above (but with –d 30), but dataset 301 is still present. I tried a second time, this time using –d 0, but still no deletion (which is not surprising since the dataset's deleted status is still false).

If I run admin_cleanup_datasets.py again with the same parameters, the stdout says no datasets matched the criteria, so it seems to remember it's previous execution, but it's NOT actually updating the database.

What am I doing wrong?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="+13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Carl Eberhard <[hidden email]>
Date: Tuesday, March 18, 2014 2:09 PM
To: Peter Cock <[hidden email]>
Cc: Ravi Sanka <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Unable to remove old datasets

The cleanup scripts enforce a sort of "lifetime" for the datasets.

The first time they're run, they may mark a dataset as deleted and also reset the update time and you'll have to wait N days for the next stage of the lifetime.

The next time they're run, or if a dataset has already been marked as deleted, the actual file removal happens and purged is set to true (if it wasn't already).

You can manually pass in '-d 0' to force removal of datasets recently marked as deleted.

The purge scripts do not check 'allow_user_dataset_purge', of course.


On Tue, Mar 18, 2014 at 11:50 AM, Carl Eberhard <[hidden email]> wrote:
I believe it's a (BAD) silent failure mode in the server code.

If I understand correctly, the purge request isn't coughing an error when it gets to the 'allow_user_dataset_purge' check and instead is silently marking (or re-marking) the datasets as deleted.

I would rather it fail with a 403 error if purge is explicitly requested.

That said, it of course would be better to remove the purge operation based on the configuration then to show an error after we've found you can't do the operation. The same holds true for the 'permanently remove this dataset' link in deleted datasets.

I'll see if I can find out the answer to your question on the cleanup scripts.


On Tue, Mar 18, 2014 at 10:49 AM, Peter Cock <[hidden email]> wrote:
On Tue, Mar 18, 2014 at 2:14 PM, Carl Eberhard <[hidden email]> wrote:
> Thanks, Ravi & Peter
>
> I've added a card to get the allow_user_dataset_purge options into the
> client and to better show the viable options to the user:
> https://trello.com/c/RCPZ9zMF

Thanks Carl - so this was a user interface bug, showing the user
non-functional permanent delete (purge) options. That's clearer now.

In this situation can the user just 'delete', and wait N days for
the cleanup scripts to actually purge the files and free the space?
(It seems N=10 in scripts/cleanup/purge_*.sh at least, elsewhere
like the underlying Python script the default looks like N=60).

Regards,

Peter



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Re: Re: Unable to remove old datasets

Sanka, Ravi
Hi Nate,

I checked the dataset's entry in history_dataset_association, and the value in field "deleted" is true.

But if this does not enable the cleanup scripts to remove the dataset from disk, then how can I accomplish that? As an admin, my intention is to completely remove datasets that are past a certain age from Galaxy, including all instances of the dataset that may exist, regardless of whether or not the various users who own said instances have deleted them from their histories.

Can this be done with admin_cleanup_datasets.py? If so, how?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
301-795-7743
----------------------------------------------

From: Nate Coraor <[hidden email]>
Date: Friday, March 28, 2014 9:59 AM
To: Ravi Sanka <[hidden email]>
Cc: Carl Eberhard <[hidden email]>, Peter Cock <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

If you take a look at the dataset's entry in the history_dataset_association table, is that marked deleted? admin_cleanup_datasets.py only marks history_dataset_association rows deleted, not datasets.

Running the cleanup_datasets.py flow with -d 0 should have then caused the dataset to be deleted and purged, but this may not be the case if there is more than one instance of the dataset you are trying to purge (either another copy in a history somewhere, or in a library).

--nate


On Tue, Mar 25, 2014 at 5:12 PM, Sanka, Ravi <[hidden email]> wrote:
I have now been able to successfully remove datasets from disk. After deleting the dataset or history from the front-end interface (as the user), I then run the cleanup scripts as admin:

python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -1 $@ >> ./scripts/cleanup_datasets/delete_userless_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -2 -r $@ >> ./scripts/cleanup_datasets/purge_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -3 -r $@ >> ./scripts/cleanup_datasets/purge_datasets.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -5 -r $@ >> ./scripts/cleanup_datasets/purge_folders.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -4 -r $@ >> ./scripts/cleanup_datasets/purge_libraries.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -6 -r $@ >> ./scripts/cleanup_datasets/delete_datasets.log

However, my final goal is to have a process that can remove old datasets from disk regardless of whether or not the users have deleted them at the front-end (and then automate said process via cronjob). This will be essentially in a situation where users are likely to leave datasets unattended and accumulating disk space.

I found the following Galaxy thread:


And am trying to use the script it mentions:

python ./scripts/cleanup_datasets/admin_cleanup_datasets.py universe_wsgi.ini -d 30 --smtp <smtp server> --fromaddr [hidden email]

I chose –d 30 to remove all datasets older than 30 days, which currently only targets one dataset. The resulting stdout indicates success:

"""""""""""""""""""""""""""""""""""""
# 2014-03-25 16:27:47 - Handling stuff older than 30 days
Marked HistoryDatasetAssociation id 301 as deleted

Subject: Galaxy Server Cleanup - 1 datasets DELETED
----------
Galaxy Server Cleanup
---------------------
The following datasets you own on Galaxy are older than 30 days and have been DELETED:

    "Small.fastq" in history "Unnamed history"

You may be able to undelete them by logging into Galaxy, navigating to the appropriate history, selecting "Include Deleted Datasets" from the history options menu, and clicking on the link to undelete each dataset that you want to keep.  You can then download the datasets.  Thank you for your understanding and cooporation in this necessary cleanup in order to keep the Galaxy resource available.  Please don't hesitate to contact us if you have any questions.

 -- Galaxy Administrators

Marked 1 dataset instances as deleted
"""""""""""""""""""""""""""""""""""""

But when I check the database, the status of dataset 301 is unchanged (ok-false-false-true).

I then run the same cleanup_datasets.py routine from above (but with –d 30), but dataset 301 is still present. I tried a second time, this time using –d 0, but still no deletion (which is not surprising since the dataset's deleted status is still false).

If I run admin_cleanup_datasets.py again with the same parameters, the stdout says no datasets matched the criteria, so it seems to remember it's previous execution, but it's NOT actually updating the database.

What am I doing wrong?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="&#43;13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Carl Eberhard <[hidden email]>
Date: Tuesday, March 18, 2014 2:09 PM
To: Peter Cock <[hidden email]>
Cc: Ravi Sanka <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Unable to remove old datasets

The cleanup scripts enforce a sort of "lifetime" for the datasets.

The first time they're run, they may mark a dataset as deleted and also reset the update time and you'll have to wait N days for the next stage of the lifetime.

The next time they're run, or if a dataset has already been marked as deleted, the actual file removal happens and purged is set to true (if it wasn't already).

You can manually pass in '-d 0' to force removal of datasets recently marked as deleted.

The purge scripts do not check 'allow_user_dataset_purge', of course.


On Tue, Mar 18, 2014 at 11:50 AM, Carl Eberhard <[hidden email]> wrote:
I believe it's a (BAD) silent failure mode in the server code.

If I understand correctly, the purge request isn't coughing an error when it gets to the 'allow_user_dataset_purge' check and instead is silently marking (or re-marking) the datasets as deleted.

I would rather it fail with a 403 error if purge is explicitly requested.

That said, it of course would be better to remove the purge operation based on the configuration then to show an error after we've found you can't do the operation. The same holds true for the 'permanently remove this dataset' link in deleted datasets.

I'll see if I can find out the answer to your question on the cleanup scripts.


On Tue, Mar 18, 2014 at 10:49 AM, Peter Cock <[hidden email]> wrote:
On Tue, Mar 18, 2014 at 2:14 PM, Carl Eberhard <[hidden email]> wrote:
> Thanks, Ravi & Peter
>
> I've added a card to get the allow_user_dataset_purge options into the
> client and to better show the viable options to the user:
> https://trello.com/c/RCPZ9zMF

Thanks Carl - so this was a user interface bug, showing the user
non-functional permanent delete (purge) options. That's clearer now.

In this situation can the user just 'delete', and wait N days for
the cleanup scripts to actually purge the files and free the space?
(It seems N=10 in scripts/cleanup/purge_*.sh at least, elsewhere
like the underlying Python script the default looks like N=60).

Regards,

Peter



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Re: Re: Unable to remove old datasets

Nate Coraor (nate@bx.psu.edu)
Hi Ravi,

Can you check whether any other history_dataset_association or library_dataset_dataset_association rows exist which reference the dataset_id that you are attempting to remove?

When you run admin_cleanup_datasets.py, it'll set history_dataset_association.deleted = true. After that is done, you need to run cleanup_datasets.py with the `-6 -d 0` option to mark dataset.deleted = true, followed by `-3 -d 0 -r ` to remove the dataset file from disk and set dataset.purged = true. Note that the latter two operations will not do anything until *all* associated history_dataset_association and library_dataset_dataset_association rows are set to deleted = true.

--nate


On Fri, Mar 28, 2014 at 1:52 PM, Sanka, Ravi <[hidden email]> wrote:
Hi Nate,

I checked the dataset's entry in history_dataset_association, and the value in field "deleted" is true.

But if this does not enable the cleanup scripts to remove the dataset from disk, then how can I accomplish that? As an admin, my intention is to completely remove datasets that are past a certain age from Galaxy, including all instances of the dataset that may exist, regardless of whether or not the various users who own said instances have deleted them from their histories.

Can this be done with admin_cleanup_datasets.py? If so, how?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="+13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Nate Coraor <[hidden email]>
Date: Friday, March 28, 2014 9:59 AM
To: Ravi Sanka <[hidden email]>
Cc: Carl Eberhard <[hidden email]>, Peter Cock <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

If you take a look at the dataset's entry in the history_dataset_association table, is that marked deleted? admin_cleanup_datasets.py only marks history_dataset_association rows deleted, not datasets.

Running the cleanup_datasets.py flow with -d 0 should have then caused the dataset to be deleted and purged, but this may not be the case if there is more than one instance of the dataset you are trying to purge (either another copy in a history somewhere, or in a library).

--nate


On Tue, Mar 25, 2014 at 5:12 PM, Sanka, Ravi <[hidden email]> wrote:
I have now been able to successfully remove datasets from disk. After deleting the dataset or history from the front-end interface (as the user), I then run the cleanup scripts as admin:

python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -1 $@ >> ./scripts/cleanup_datasets/delete_userless_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -2 -r $@ >> ./scripts/cleanup_datasets/purge_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -3 -r $@ >> ./scripts/cleanup_datasets/purge_datasets.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -5 -r $@ >> ./scripts/cleanup_datasets/purge_folders.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -4 -r $@ >> ./scripts/cleanup_datasets/purge_libraries.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -6 -r $@ >> ./scripts/cleanup_datasets/delete_datasets.log

However, my final goal is to have a process that can remove old datasets from disk regardless of whether or not the users have deleted them at the front-end (and then automate said process via cronjob). This will be essentially in a situation where users are likely to leave datasets unattended and accumulating disk space.

I found the following Galaxy thread:


And am trying to use the script it mentions:

python ./scripts/cleanup_datasets/admin_cleanup_datasets.py universe_wsgi.ini -d 30 --smtp <smtp server> --fromaddr [hidden email]

I chose –d 30 to remove all datasets older than 30 days, which currently only targets one dataset. The resulting stdout indicates success:

"""""""""""""""""""""""""""""""""""""
# 2014-03-25 16:27:47 - Handling stuff older than 30 days
Marked HistoryDatasetAssociation id 301 as deleted

Subject: Galaxy Server Cleanup - 1 datasets DELETED
----------
Galaxy Server Cleanup
---------------------
The following datasets you own on Galaxy are older than 30 days and have been DELETED:

    "Small.fastq" in history "Unnamed history"

You may be able to undelete them by logging into Galaxy, navigating to the appropriate history, selecting "Include Deleted Datasets" from the history options menu, and clicking on the link to undelete each dataset that you want to keep.  You can then download the datasets.  Thank you for your understanding and cooporation in this necessary cleanup in order to keep the Galaxy resource available.  Please don't hesitate to contact us if you have any questions.

 -- Galaxy Administrators

Marked 1 dataset instances as deleted
"""""""""""""""""""""""""""""""""""""

But when I check the database, the status of dataset 301 is unchanged (ok-false-false-true).

I then run the same cleanup_datasets.py routine from above (but with –d 30), but dataset 301 is still present. I tried a second time, this time using –d 0, but still no deletion (which is not surprising since the dataset's deleted status is still false).

If I run admin_cleanup_datasets.py again with the same parameters, the stdout says no datasets matched the criteria, so it seems to remember it's previous execution, but it's NOT actually updating the database.

What am I doing wrong?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="+13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Carl Eberhard <[hidden email]>
Date: Tuesday, March 18, 2014 2:09 PM
To: Peter Cock <[hidden email]>
Cc: Ravi Sanka <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Unable to remove old datasets

The cleanup scripts enforce a sort of "lifetime" for the datasets.

The first time they're run, they may mark a dataset as deleted and also reset the update time and you'll have to wait N days for the next stage of the lifetime.

The next time they're run, or if a dataset has already been marked as deleted, the actual file removal happens and purged is set to true (if it wasn't already).

You can manually pass in '-d 0' to force removal of datasets recently marked as deleted.

The purge scripts do not check 'allow_user_dataset_purge', of course.


On Tue, Mar 18, 2014 at 11:50 AM, Carl Eberhard <[hidden email]> wrote:
I believe it's a (BAD) silent failure mode in the server code.

If I understand correctly, the purge request isn't coughing an error when it gets to the 'allow_user_dataset_purge' check and instead is silently marking (or re-marking) the datasets as deleted.

I would rather it fail with a 403 error if purge is explicitly requested.

That said, it of course would be better to remove the purge operation based on the configuration then to show an error after we've found you can't do the operation. The same holds true for the 'permanently remove this dataset' link in deleted datasets.

I'll see if I can find out the answer to your question on the cleanup scripts.


On Tue, Mar 18, 2014 at 10:49 AM, Peter Cock <[hidden email]> wrote:
On Tue, Mar 18, 2014 at 2:14 PM, Carl Eberhard <[hidden email]> wrote:
> Thanks, Ravi & Peter
>
> I've added a card to get the allow_user_dataset_purge options into the
> client and to better show the viable options to the user:
> https://trello.com/c/RCPZ9zMF

Thanks Carl - so this was a user interface bug, showing the user
non-functional permanent delete (purge) options. That's clearer now.

In this situation can the user just 'delete', and wait N days for
the cleanup scripts to actually purge the files and free the space?
(It seems N=10 in scripts/cleanup/purge_*.sh at least, elsewhere
like the underlying Python script the default looks like N=60).

Regards,

Peter



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Re: Re: Re: Unable to remove old datasets

Sanka, Ravi
Hi Nate,

I checked and there are 3 rows of dataset 301 in the history_dataset_association table (none in library_dataset_dataset_association):

dataset_id create_time update_time deleted
301 2/14/14 18:49 3/25/14 20:27 TRUE
301 3/6/14 15:48 3/25/14 18:41 TRUE
301 3/6/14 20:11 3/6/14 20:11 FALSE

The one with the most recent create_time has its deleted status set to false. The other 2, older ones are true.

I would have guessed that the most recent create_time instance is still false due being created within 30 days, but the second most recent is only 5 hours older and is set to true. Perhaps that instance was deleted by its user. That would cause its deleted status to become true, correct?

I assume that if I were to wait until all 3 instances' create_times are past 30 days, my process will work, as admin_cleanup_datasets.py will set all 3 instances to false.

Perchance, is there any setting on admin_cleanup_datasets.py that would cause it to judge datasets by their physical file's timestamp instead?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
301-795-7743
----------------------------------------------

From: Nate Coraor <[hidden email]>
Date: Friday, March 28, 2014 1:56 PM
To: Ravi Sanka <[hidden email]>
Cc: Carl Eberhard <[hidden email]>, Peter Cock <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

Can you check whether any other history_dataset_association or library_dataset_dataset_association rows exist which reference the dataset_id that you are attempting to remove?

When you run admin_cleanup_datasets.py, it'll set history_dataset_association.deleted = true. After that is done, you need to run cleanup_datasets.py with the `-6 -d 0` option to mark dataset.deleted = true, followed by `-3 -d 0 -r ` to remove the dataset file from disk and set dataset.purged = true. Note that the latter two operations will not do anything until *all* associated history_dataset_association and library_dataset_dataset_association rows are set to deleted = true.

--nate


On Fri, Mar 28, 2014 at 1:52 PM, Sanka, Ravi <[hidden email]> wrote:
Hi Nate,

I checked the dataset's entry in history_dataset_association, and the value in field "deleted" is true.

But if this does not enable the cleanup scripts to remove the dataset from disk, then how can I accomplish that? As an admin, my intention is to completely remove datasets that are past a certain age from Galaxy, including all instances of the dataset that may exist, regardless of whether or not the various users who own said instances have deleted them from their histories.

Can this be done with admin_cleanup_datasets.py? If so, how?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="&#43;13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Nate Coraor <[hidden email]>
Date: Friday, March 28, 2014 9:59 AM
To: Ravi Sanka <[hidden email]>
Cc: Carl Eberhard <[hidden email]>, Peter Cock <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

If you take a look at the dataset's entry in the history_dataset_association table, is that marked deleted? admin_cleanup_datasets.py only marks history_dataset_association rows deleted, not datasets.

Running the cleanup_datasets.py flow with -d 0 should have then caused the dataset to be deleted and purged, but this may not be the case if there is more than one instance of the dataset you are trying to purge (either another copy in a history somewhere, or in a library).

--nate


On Tue, Mar 25, 2014 at 5:12 PM, Sanka, Ravi <[hidden email]> wrote:
I have now been able to successfully remove datasets from disk. After deleting the dataset or history from the front-end interface (as the user), I then run the cleanup scripts as admin:

python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -1 $@ >> ./scripts/cleanup_datasets/delete_userless_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -2 -r $@ >> ./scripts/cleanup_datasets/purge_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -3 -r $@ >> ./scripts/cleanup_datasets/purge_datasets.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -5 -r $@ >> ./scripts/cleanup_datasets/purge_folders.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -4 -r $@ >> ./scripts/cleanup_datasets/purge_libraries.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -6 -r $@ >> ./scripts/cleanup_datasets/delete_datasets.log

However, my final goal is to have a process that can remove old datasets from disk regardless of whether or not the users have deleted them at the front-end (and then automate said process via cronjob). This will be essentially in a situation where users are likely to leave datasets unattended and accumulating disk space.

I found the following Galaxy thread:


And am trying to use the script it mentions:

python ./scripts/cleanup_datasets/admin_cleanup_datasets.py universe_wsgi.ini -d 30 --smtp <smtp server> --fromaddr [hidden email]

I chose –d 30 to remove all datasets older than 30 days, which currently only targets one dataset. The resulting stdout indicates success:

"""""""""""""""""""""""""""""""""""""
# 2014-03-25 16:27:47 - Handling stuff older than 30 days
Marked HistoryDatasetAssociation id 301 as deleted

Subject: Galaxy Server Cleanup - 1 datasets DELETED
----------
Galaxy Server Cleanup
---------------------
The following datasets you own on Galaxy are older than 30 days and have been DELETED:

    "Small.fastq" in history "Unnamed history"

You may be able to undelete them by logging into Galaxy, navigating to the appropriate history, selecting "Include Deleted Datasets" from the history options menu, and clicking on the link to undelete each dataset that you want to keep.  You can then download the datasets.  Thank you for your understanding and cooporation in this necessary cleanup in order to keep the Galaxy resource available.  Please don't hesitate to contact us if you have any questions.

 -- Galaxy Administrators

Marked 1 dataset instances as deleted
"""""""""""""""""""""""""""""""""""""

But when I check the database, the status of dataset 301 is unchanged (ok-false-false-true).

I then run the same cleanup_datasets.py routine from above (but with –d 30), but dataset 301 is still present. I tried a second time, this time using –d 0, but still no deletion (which is not surprising since the dataset's deleted status is still false).

If I run admin_cleanup_datasets.py again with the same parameters, the stdout says no datasets matched the criteria, so it seems to remember it's previous execution, but it's NOT actually updating the database.

What am I doing wrong?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="&#43;13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Carl Eberhard <[hidden email]>
Date: Tuesday, March 18, 2014 2:09 PM
To: Peter Cock <[hidden email]>
Cc: Ravi Sanka <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Unable to remove old datasets

The cleanup scripts enforce a sort of "lifetime" for the datasets.

The first time they're run, they may mark a dataset as deleted and also reset the update time and you'll have to wait N days for the next stage of the lifetime.

The next time they're run, or if a dataset has already been marked as deleted, the actual file removal happens and purged is set to true (if it wasn't already).

You can manually pass in '-d 0' to force removal of datasets recently marked as deleted.

The purge scripts do not check 'allow_user_dataset_purge', of course.


On Tue, Mar 18, 2014 at 11:50 AM, Carl Eberhard <[hidden email]> wrote:
I believe it's a (BAD) silent failure mode in the server code.

If I understand correctly, the purge request isn't coughing an error when it gets to the 'allow_user_dataset_purge' check and instead is silently marking (or re-marking) the datasets as deleted.

I would rather it fail with a 403 error if purge is explicitly requested.

That said, it of course would be better to remove the purge operation based on the configuration then to show an error after we've found you can't do the operation. The same holds true for the 'permanently remove this dataset' link in deleted datasets.

I'll see if I can find out the answer to your question on the cleanup scripts.


On Tue, Mar 18, 2014 at 10:49 AM, Peter Cock <[hidden email]> wrote:
On Tue, Mar 18, 2014 at 2:14 PM, Carl Eberhard <[hidden email]> wrote:
> Thanks, Ravi & Peter
>
> I've added a card to get the allow_user_dataset_purge options into the
> client and to better show the viable options to the user:
> https://trello.com/c/RCPZ9zMF

Thanks Carl - so this was a user interface bug, showing the user
non-functional permanent delete (purge) options. That's clearer now.

In this situation can the user just 'delete', and wait N days for
the cleanup scripts to actually purge the files and free the space?
(It seems N=10 in scripts/cleanup/purge_*.sh at least, elsewhere
like the underlying Python script the default looks like N=60).

Regards,

Peter



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Re: Re: Re: Unable to remove old datasets

Nate Coraor (nate@bx.psu.edu)
Hi Ravi,

I believe admin_cleanup_datasets.py only works on database times. The rest of your assumptions are likely correct, although without looking at more details of the database I can't confirm.

--nate


On Fri, Mar 28, 2014 at 5:12 PM, Sanka, Ravi <[hidden email]> wrote:
Hi Nate,

I checked and there are 3 rows of dataset 301 in the history_dataset_association table (none in library_dataset_dataset_association):

dataset_id create_time update_time deleted
301 2/14/14 18:49 3/25/14 20:27 TRUE
301 3/6/14 15:48 3/25/14 18:41 TRUE
301 3/6/14 20:11 3/6/14 20:11 FALSE

The one with the most recent create_time has its deleted status set to false. The other 2, older ones are true.

I would have guessed that the most recent create_time instance is still false due being created within 30 days, but the second most recent is only 5 hours older and is set to true. Perhaps that instance was deleted by its user. That would cause its deleted status to become true, correct?

I assume that if I were to wait until all 3 instances' create_times are past 30 days, my process will work, as admin_cleanup_datasets.py will set all 3 instances to false.

Perchance, is there any setting on admin_cleanup_datasets.py that would cause it to judge datasets by their physical file's timestamp instead?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="+13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Nate Coraor <[hidden email]>
Date: Friday, March 28, 2014 1:56 PM

To: Ravi Sanka <[hidden email]>
Cc: Carl Eberhard <[hidden email]>, Peter Cock <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

Can you check whether any other history_dataset_association or library_dataset_dataset_association rows exist which reference the dataset_id that you are attempting to remove?

When you run admin_cleanup_datasets.py, it'll set history_dataset_association.deleted = true. After that is done, you need to run cleanup_datasets.py with the `-6 -d 0` option to mark dataset.deleted = true, followed by `-3 -d 0 -r ` to remove the dataset file from disk and set dataset.purged = true. Note that the latter two operations will not do anything until *all* associated history_dataset_association and library_dataset_dataset_association rows are set to deleted = true.

--nate


On Fri, Mar 28, 2014 at 1:52 PM, Sanka, Ravi <[hidden email]> wrote:
Hi Nate,

I checked the dataset's entry in history_dataset_association, and the value in field "deleted" is true.

But if this does not enable the cleanup scripts to remove the dataset from disk, then how can I accomplish that? As an admin, my intention is to completely remove datasets that are past a certain age from Galaxy, including all instances of the dataset that may exist, regardless of whether or not the various users who own said instances have deleted them from their histories.

Can this be done with admin_cleanup_datasets.py? If so, how?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="+13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Nate Coraor <[hidden email]>
Date: Friday, March 28, 2014 9:59 AM
To: Ravi Sanka <[hidden email]>
Cc: Carl Eberhard <[hidden email]>, Peter Cock <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

If you take a look at the dataset's entry in the history_dataset_association table, is that marked deleted? admin_cleanup_datasets.py only marks history_dataset_association rows deleted, not datasets.

Running the cleanup_datasets.py flow with -d 0 should have then caused the dataset to be deleted and purged, but this may not be the case if there is more than one instance of the dataset you are trying to purge (either another copy in a history somewhere, or in a library).

--nate


On Tue, Mar 25, 2014 at 5:12 PM, Sanka, Ravi <[hidden email]> wrote:
I have now been able to successfully remove datasets from disk. After deleting the dataset or history from the front-end interface (as the user), I then run the cleanup scripts as admin:

python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -1 $@ >> ./scripts/cleanup_datasets/delete_userless_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -2 -r $@ >> ./scripts/cleanup_datasets/purge_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -3 -r $@ >> ./scripts/cleanup_datasets/purge_datasets.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -5 -r $@ >> ./scripts/cleanup_datasets/purge_folders.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -4 -r $@ >> ./scripts/cleanup_datasets/purge_libraries.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -6 -r $@ >> ./scripts/cleanup_datasets/delete_datasets.log

However, my final goal is to have a process that can remove old datasets from disk regardless of whether or not the users have deleted them at the front-end (and then automate said process via cronjob). This will be essentially in a situation where users are likely to leave datasets unattended and accumulating disk space.

I found the following Galaxy thread:


And am trying to use the script it mentions:

python ./scripts/cleanup_datasets/admin_cleanup_datasets.py universe_wsgi.ini -d 30 --smtp <smtp server> --fromaddr [hidden email]

I chose –d 30 to remove all datasets older than 30 days, which currently only targets one dataset. The resulting stdout indicates success:

"""""""""""""""""""""""""""""""""""""
# 2014-03-25 16:27:47 - Handling stuff older than 30 days
Marked HistoryDatasetAssociation id 301 as deleted

Subject: Galaxy Server Cleanup - 1 datasets DELETED
----------
Galaxy Server Cleanup
---------------------
The following datasets you own on Galaxy are older than 30 days and have been DELETED:

    "Small.fastq" in history "Unnamed history"

You may be able to undelete them by logging into Galaxy, navigating to the appropriate history, selecting "Include Deleted Datasets" from the history options menu, and clicking on the link to undelete each dataset that you want to keep.  You can then download the datasets.  Thank you for your understanding and cooporation in this necessary cleanup in order to keep the Galaxy resource available.  Please don't hesitate to contact us if you have any questions.

 -- Galaxy Administrators

Marked 1 dataset instances as deleted
"""""""""""""""""""""""""""""""""""""

But when I check the database, the status of dataset 301 is unchanged (ok-false-false-true).

I then run the same cleanup_datasets.py routine from above (but with –d 30), but dataset 301 is still present. I tried a second time, this time using –d 0, but still no deletion (which is not surprising since the dataset's deleted status is still false).

If I run admin_cleanup_datasets.py again with the same parameters, the stdout says no datasets matched the criteria, so it seems to remember it's previous execution, but it's NOT actually updating the database.

What am I doing wrong?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="+13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Carl Eberhard <[hidden email]>
Date: Tuesday, March 18, 2014 2:09 PM
To: Peter Cock <[hidden email]>
Cc: Ravi Sanka <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Unable to remove old datasets

The cleanup scripts enforce a sort of "lifetime" for the datasets.

The first time they're run, they may mark a dataset as deleted and also reset the update time and you'll have to wait N days for the next stage of the lifetime.

The next time they're run, or if a dataset has already been marked as deleted, the actual file removal happens and purged is set to true (if it wasn't already).

You can manually pass in '-d 0' to force removal of datasets recently marked as deleted.

The purge scripts do not check 'allow_user_dataset_purge', of course.


On Tue, Mar 18, 2014 at 11:50 AM, Carl Eberhard <[hidden email]> wrote:
I believe it's a (BAD) silent failure mode in the server code.

If I understand correctly, the purge request isn't coughing an error when it gets to the 'allow_user_dataset_purge' check and instead is silently marking (or re-marking) the datasets as deleted.

I would rather it fail with a 403 error if purge is explicitly requested.

That said, it of course would be better to remove the purge operation based on the configuration then to show an error after we've found you can't do the operation. The same holds true for the 'permanently remove this dataset' link in deleted datasets.

I'll see if I can find out the answer to your question on the cleanup scripts.


On Tue, Mar 18, 2014 at 10:49 AM, Peter Cock <[hidden email]> wrote:
On Tue, Mar 18, 2014 at 2:14 PM, Carl Eberhard <[hidden email]> wrote:
> Thanks, Ravi & Peter
>
> I've added a card to get the allow_user_dataset_purge options into the
> client and to better show the viable options to the user:
> https://trello.com/c/RCPZ9zMF

Thanks Carl - so this was a user interface bug, showing the user
non-functional permanent delete (purge) options. That's clearer now.

In this situation can the user just 'delete', and wait N days for
the cleanup scripts to actually purge the files and free the space?
(It seems N=10 in scripts/cleanup/purge_*.sh at least, elsewhere
like the underlying Python script the default looks like N=60).

Regards,

Peter



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: [CONTENT] Re: Re: Re: Re: Re: Unable to remove old datasets

Sanka, Ravi
Hi Nate,

I have waited for all 3 instances of dataset 301 in history_dataset_association to be older than 30 days. Then I re-ran the admin script. As expected all 3 instances now have TRUE as there deleted status. I then tried the following commands as you suggested:

python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 30 -6 -r $@
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 30 -3 -r $@

I set -d to 30 and not 0; I didn't want to risk deleting other datasets.

However, there was no effect. The two executions' STDOUT stated 0 datasets affected, and dataset 301 is still on disk.

Will the process only work if –d is set to 0? I didn't think it would make a difference since all instances of dataset 301 are older than 30 days.

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
301-795-7743
----------------------------------------------

From: Nate Coraor <[hidden email]>
Date: Friday, April 4, 2014 2:37 PM
To: Ravi Sanka <[hidden email]>
Cc: Carl Eberhard <[hidden email]>, Peter Cock <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: Re: Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

I believe admin_cleanup_datasets.py only works on database times. The rest of your assumptions are likely correct, although without looking at more details of the database I can't confirm.

--nate


On Fri, Mar 28, 2014 at 5:12 PM, Sanka, Ravi <[hidden email]> wrote:
Hi Nate,

I checked and there are 3 rows of dataset 301 in the history_dataset_association table (none in library_dataset_dataset_association):

dataset_idcreate_timeupdate_timedeleted
3012/14/14 18:493/25/14 20:27TRUE
3013/6/14 15:483/25/14 18:41TRUE
3013/6/14 20:113/6/14 20:11FALSE

The one with the most recent create_time has its deleted status set to false. The other 2, older ones are true.

I would have guessed that the most recent create_time instance is still false due being created within 30 days, but the second most recent is only 5 hours older and is set to true. Perhaps that instance was deleted by its user. That would cause its deleted status to become true, correct?

I assume that if I were to wait until all 3 instances' create_times are past 30 days, my process will work, as admin_cleanup_datasets.py will set all 3 instances to false.

Perchance, is there any setting on admin_cleanup_datasets.py that would cause it to judge datasets by their physical file's timestamp instead?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="&#43;13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Nate Coraor <[hidden email]>
Date: Friday, March 28, 2014 1:56 PM

To: Ravi Sanka <[hidden email]>
Cc: Carl Eberhard <[hidden email]>, Peter Cock <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

Can you check whether any other history_dataset_association or library_dataset_dataset_association rows exist which reference the dataset_id that you are attempting to remove?

When you run admin_cleanup_datasets.py, it'll set history_dataset_association.deleted = true. After that is done, you need to run cleanup_datasets.py with the `-6 -d 0` option to mark dataset.deleted = true, followed by `-3 -d 0 -r ` to remove the dataset file from disk and set dataset.purged = true. Note that the latter two operations will not do anything until *all* associated history_dataset_association and library_dataset_dataset_association rows are set to deleted = true.

--nate


On Fri, Mar 28, 2014 at 1:52 PM, Sanka, Ravi <[hidden email]> wrote:
Hi Nate,

I checked the dataset's entry in history_dataset_association, and the value in field "deleted" is true.

But if this does not enable the cleanup scripts to remove the dataset from disk, then how can I accomplish that? As an admin, my intention is to completely remove datasets that are past a certain age from Galaxy, including all instances of the dataset that may exist, regardless of whether or not the various users who own said instances have deleted them from their histories.

Can this be done with admin_cleanup_datasets.py? If so, how?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="&#43;13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Nate Coraor <[hidden email]>
Date: Friday, March 28, 2014 9:59 AM
To: Ravi Sanka <[hidden email]>
Cc: Carl Eberhard <[hidden email]>, Peter Cock <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

If you take a look at the dataset's entry in the history_dataset_association table, is that marked deleted? admin_cleanup_datasets.py only marks history_dataset_association rows deleted, not datasets.

Running the cleanup_datasets.py flow with -d 0 should have then caused the dataset to be deleted and purged, but this may not be the case if there is more than one instance of the dataset you are trying to purge (either another copy in a history somewhere, or in a library).

--nate


On Tue, Mar 25, 2014 at 5:12 PM, Sanka, Ravi <[hidden email]> wrote:
I have now been able to successfully remove datasets from disk. After deleting the dataset or history from the front-end interface (as the user), I then run the cleanup scripts as admin:

python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -1 $@ >> ./scripts/cleanup_datasets/delete_userless_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -2 -r $@ >> ./scripts/cleanup_datasets/purge_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -3 -r $@ >> ./scripts/cleanup_datasets/purge_datasets.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -5 -r $@ >> ./scripts/cleanup_datasets/purge_folders.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -4 -r $@ >> ./scripts/cleanup_datasets/purge_libraries.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 -6 -r $@ >> ./scripts/cleanup_datasets/delete_datasets.log

However, my final goal is to have a process that can remove old datasets from disk regardless of whether or not the users have deleted them at the front-end (and then automate said process via cronjob). This will be essentially in a situation where users are likely to leave datasets unattended and accumulating disk space.

I found the following Galaxy thread:


And am trying to use the script it mentions:

python ./scripts/cleanup_datasets/admin_cleanup_datasets.py universe_wsgi.ini -d 30 --smtp <smtp server> --fromaddr [hidden email]

I chose –d 30 to remove all datasets older than 30 days, which currently only targets one dataset. The resulting stdout indicates success:

"""""""""""""""""""""""""""""""""""""
# 2014-03-25 16:27:47 - Handling stuff older than 30 days
Marked HistoryDatasetAssociation id 301 as deleted

Subject: Galaxy Server Cleanup - 1 datasets DELETED
----------
Galaxy Server Cleanup
---------------------
The following datasets you own on Galaxy are older than 30 days and have been DELETED:

    "Small.fastq" in history "Unnamed history"

You may be able to undelete them by logging into Galaxy, navigating to the appropriate history, selecting "Include Deleted Datasets" from the history options menu, and clicking on the link to undelete each dataset that you want to keep.  You can then download the datasets.  Thank you for your understanding and cooporation in this necessary cleanup in order to keep the Galaxy resource available.  Please don't hesitate to contact us if you have any questions.

 -- Galaxy Administrators

Marked 1 dataset instances as deleted
"""""""""""""""""""""""""""""""""""""

But when I check the database, the status of dataset 301 is unchanged (ok-false-false-true).

I then run the same cleanup_datasets.py routine from above (but with –d 30), but dataset 301 is still present. I tried a second time, this time using –d 0, but still no deletion (which is not surprising since the dataset's deleted status is still false).

If I run admin_cleanup_datasets.py again with the same parameters, the stdout says no datasets matched the criteria, so it seems to remember it's previous execution, but it's NOT actually updating the database.

What am I doing wrong?

----------------------------------------------
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
<a href="tel:301-795-7743" value="&#43;13017957743" target="_blank">301-795-7743
----------------------------------------------

From: Carl Eberhard <[hidden email]>
Date: Tuesday, March 18, 2014 2:09 PM
To: Peter Cock <[hidden email]>
Cc: Ravi Sanka <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: [CONTENT] Re: [galaxy-dev] Re: Unable to remove old datasets

The cleanup scripts enforce a sort of "lifetime" for the datasets.

The first time they're run, they may mark a dataset as deleted and also reset the update time and you'll have to wait N days for the next stage of the lifetime.

The next time they're run, or if a dataset has already been marked as deleted, the actual file removal happens and purged is set to true (if it wasn't already).

You can manually pass in '-d 0' to force removal of datasets recently marked as deleted.

The purge scripts do not check 'allow_user_dataset_purge', of course.


On Tue, Mar 18, 2014 at 11:50 AM, Carl Eberhard <[hidden email]> wrote:
I believe it's a (BAD) silent failure mode in the server code.

If I understand correctly, the purge request isn't coughing an error when it gets to the 'allow_user_dataset_purge' check and instead is silently marking (or re-marking) the datasets as deleted.

I would rather it fail with a 403 error if purge is explicitly requested.

That said, it of course would be better to remove the purge operation based on the configuration then to show an error after we've found you can't do the operation. The same holds true for the 'permanently remove this dataset' link in deleted datasets.

I'll see if I can find out the answer to your question on the cleanup scripts.


On Tue, Mar 18, 2014 at 10:49 AM, Peter Cock <[hidden email]> wrote:
On Tue, Mar 18, 2014 at 2:14 PM, Carl Eberhard <[hidden email]> wrote:
> Thanks, Ravi & Peter
>
> I've added a card to get the allow_user_dataset_purge options into the
> client and to better show the viable options to the user:
> https://trello.com/c/RCPZ9zMF

Thanks Carl - so this was a user interface bug, showing the user
non-functional permanent delete (purge) options. That's clearer now.

In this situation can the user just 'delete', and wait N days for
the cleanup scripts to actually purge the files and free the space?
(It seems N=10 in scripts/cleanup/purge_*.sh at least, elsewhere
like the underlying Python script the default looks like N=60).

Regards,

Peter



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/