Issue with set_user_disk_usage.py and Postgres 8.x

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with set_user_disk_usage.py and Postgres 8.x

Lance Parsons
The recent updates to set_user_disk_usage.py for Postgres users have an issue with Postgres 8.x.  The SQL in the pgcalc method (line 51) leads to the following error:

sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "d.total_size" must appear in the GROUP BY clause or be used in an aggregate function
LINE 4: FROM (  SELECT d.total_siz...
            ^

The problem is that version of Postgres before 9.x were a bit more restrictive in the use of GROUP BY.  This can be fixed using DISTINCT ON instead.  See this StackOverflow post for more info: http://stackoverflow.com/questions/1769361/postgresql-group-by-different-from-mysql


I've included a patch below.  Let me know if a pull request would be preferred.


--- a/scripts/set_user_disk_usage.py
+++ b/scripts/set_user_disk_usage.py
@@ -52,7 +52,7 @@
     sql = """
            UPDATE galaxy_user
               SET disk_usage = (SELECT COALESCE(SUM(total_size), 0)
-                                  FROM (  SELECT d.total_size
+                                  FROM (  SELECT DISTINCT ON (d.id) d.total_size, d.id
                                             FROM history_dataset_association hda
                                                  JOIN history h ON h.id = hda.history_id
                                                  JOIN dataset d ON hda.dataset_id = d.id
@@ -62,7 +62,7 @@
                                                  AND d.purged = false
                                                  AND d.id NOT IN (SELECT dataset_id
                                                                     FROM library_dataset_dataset_association)
-                                        GROUP BY d.id) sizes)
+                                        ) sizes)
             WHERE id = :id
         RETURNING disk_usage;
     """
--
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
Princeton University


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Issue with set_user_disk_usage.py and Postgres 8.x

Björn Grüning
Hi Lance,

I can confirm both the regression and the fix. A pull request would be
great I think. I can also do it if you want.

Thanks,
Björn

> The recent updates to set_user_disk_usage.py for Postgres users have
> an issue with Postgres 8.x.  The SQL in the pgcalc method (line 51)
> leads to the following error:
>
> sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "d.total_size" must appear in the GROUP BY clause or be used in an aggregate function
> LINE 4: FROM (  SELECT d.total_siz...
>             ^
>
> The problem is that version of Postgres before 9.x were a bit more
> restrictive in the use of GROUP BY.  This can be fixed using DISTINCT
> ON instead.  See this StackOverflow post for more info:
> http://stackoverflow.com/questions/1769361/postgresql-group-by-different-from-mysql
>
> I've included a patch below.  Let me know if a pull request would be
> preferred.
>
>
> --- a/scripts/set_user_disk_usage.py
> +++ b/scripts/set_user_disk_usage.py
> @@ -52,7 +52,7 @@
>      sql = """
>             UPDATE galaxy_user
>                SET disk_usage = (SELECT COALESCE(SUM(total_size), 0)
> -                                  FROM (  SELECT d.total_size
> +                                  FROM (  SELECT DISTINCT ON (d.id)
> d.total_size, d.id
>                                              FROM
> history_dataset_association hda
>                                                   JOIN history h ON
> h.id = hda.history_id
>                                                   JOIN dataset d ON
> hda.dataset_id = d.id
> @@ -62,7 +62,7 @@
>                                                   AND d.purged = false
>                                                   AND d.id NOT IN
> (SELECT dataset_id
>
> FROM library_dataset_dataset_association)
> -                                        GROUP BY d.id) sizes)
> +                                        ) sizes)
>              WHERE id = :id
>          RETURNING disk_usage;
>      """
> --
> Lance Parsons - Scientific Programmer
> 134 Carl C. Icahn Laboratory
> Lewis-Sigler Institute for Integrative Genomics
> Princeton University
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Issue with set_user_disk_usage.py and Postgres 8.x

Lance Parsons
Thanks for confirmation Björn.

Pull request was submitted: https://bitbucket.org/galaxy/galaxy-central/pull-request/97/fix-for-postgres-versions-9/diff

Trello card also created: https://trello.com/c/Z0EynAV2


Björn Grüning wrote:
Hi Lance,

I can confirm both the regression and the fix. A pull request would be
great I think. I can also do it if you want.

Thanks,
Björn

The recent updates to set_user_disk_usage.py for Postgres users have
an issue with Postgres 8.x.  The SQL in the pgcalc method (line 51)
leads to the following error:

sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "d.total_size" must appear in the GROUP BY clause or be used in an aggregate function
LINE 4: FROM (  SELECT d.total_siz...
            ^

The problem is that version of Postgres before 9.x were a bit more
restrictive in the use of GROUP BY.  This can be fixed using DISTINCT
ON instead.  See this StackOverflow post for more info:
http://stackoverflow.com/questions/1769361/postgresql-group-by-different-from-mysql

I've included a patch below.  Let me know if a pull request would be
preferred.


--- a/scripts/set_user_disk_usage.py
+++ b/scripts/set_user_disk_usage.py
@@ -52,7 +52,7 @@
     sql = """
            UPDATE galaxy_user
               SET disk_usage = (SELECT COALESCE(SUM(total_size), 0)
-                                  FROM (  SELECT d.total_size
+                                  FROM (  SELECT DISTINCT ON (d.id)
d.total_size, d.id
                                             FROM
history_dataset_association hda
                                                  JOIN history h ON
h.id = hda.history_id
                                                  JOIN dataset d ON
hda.dataset_id = d.id
@@ -62,7 +62,7 @@
                                                  AND d.purged = false
                                                  AND d.id NOT IN
(SELECT dataset_id

FROM library_dataset_dataset_association)
-                                        GROUP BY d.id) sizes)
+                                        ) sizes)
             WHERE id = :id
         RETURNING disk_usage;
     """
-- 
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
Princeton University


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/




--
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
Princeton University


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Issue with set_user_disk_usage.py and Postgres 8.x

Dannon Baker-2
Thanks for the bug fix, (and for the reminder that it was still sitting in the inbox, sorry about that).  I've verified the fix as well and pulled it into -central.

-Dannon


On Thu, Jun 6, 2013 at 10:13 AM, Lance Parsons <[hidden email]> wrote:
Thanks for confirmation Björn.

Pull request was submitted: https://bitbucket.org/galaxy/galaxy-central/pull-request/97/fix-for-postgres-versions-9/diff

Trello card also created: https://trello.com/c/Z0EynAV2



Björn Grüning wrote:
Hi Lance,

I can confirm both the regression and the fix. A pull request would be
great I think. I can also do it if you want.

Thanks,
Björn

The recent updates to set_user_disk_usage.py for Postgres users have
an issue with Postgres 8.x.  The SQL in the pgcalc method (line 51)
leads to the following error:

sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "d.total_size" must appear in the GROUP BY clause or be used in an aggregate function
LINE 4: FROM (  SELECT d.total_siz...
            ^

The problem is that version of Postgres before 9.x were a bit more
restrictive in the use of GROUP BY.  This can be fixed using DISTINCT
ON instead.  See this StackOverflow post for more info:
http://stackoverflow.com/questions/1769361/postgresql-group-by-different-from-mysql

I've included a patch below.  Let me know if a pull request would be
preferred.


--- a/scripts/set_user_disk_usage.py
+++ b/scripts/set_user_disk_usage.py
@@ -52,7 +52,7 @@
     sql = """
            UPDATE galaxy_user
               SET disk_usage = (SELECT COALESCE(SUM(total_size), 0)
-                                  FROM (  SELECT d.total_size
+                                  FROM (  SELECT DISTINCT ON (d.id)
d.total_size, d.id
                                             FROM
history_dataset_association hda
                                                  JOIN history h ON
h.id = hda.history_id
                                                  JOIN dataset d ON
hda.dataset_id = d.id
@@ -62,7 +62,7 @@
                                                  AND d.purged = false
                                                  AND d.id NOT IN
(SELECT dataset_id

FROM library_dataset_dataset_association)
-                                        GROUP BY d.id) sizes)
+                                        ) sizes)
             WHERE id = :id
         RETURNING disk_usage;
     """
-- 
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
Princeton University


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


--
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
Princeton University


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Issue with set_user_disk_usage.py and Postgres 8.x

Lance Parsons
No problem, glad it was helpful.

Dannon Baker wrote:
Thanks for the bug fix, (and for the reminder that it was still sitting in the inbox, sorry about that).  I've verified the fix as well and pulled it into -central.

-Dannon


On Thu, Jun 6, 2013 at 10:13 AM, Lance Parsons <[hidden email]> wrote:
Thanks for confirmation Björn.

Pull request was submitted: https://bitbucket.org/galaxy/galaxy-central/pull-request/97/fix-for-postgres-versions-9/diff

Trello card also created: https://trello.com/c/Z0EynAV2



Björn Grüning wrote:
Hi Lance,

I can confirm both the regression and the fix. A pull request would be
great I think. I can also do it if you want.

Thanks,
Björn

The recent updates to set_user_disk_usage.py for Postgres users have
an issue with Postgres 8.x.  The SQL in the pgcalc method (line 51)
leads to the following error:

sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "d.total_size" must appear in the GROUP BY clause or be used in an aggregate function
LINE 4: FROM (  SELECT d.total_siz...
            ^

The problem is that version of Postgres before 9.x were a bit more
restrictive in the use of GROUP BY.  This can be fixed using DISTINCT
ON instead.  See this StackOverflow post for more info:
http://stackoverflow.com/questions/1769361/postgresql-group-by-different-from-mysql

I've included a patch below.  Let me know if a pull request would be
preferred.


--- a/scripts/set_user_disk_usage.py
+++ b/scripts/set_user_disk_usage.py
@@ -52,7 +52,7 @@
     sql = """
            UPDATE galaxy_user
               SET disk_usage = (SELECT COALESCE(SUM(total_size), 0)
-                                  FROM (  SELECT d.total_size
+                                  FROM (  SELECT DISTINCT ON (d.id)
d.total_size, d.id
                                             FROM
history_dataset_association hda
                                                  JOIN history h ON
h.id = hda.history_id
                                                  JOIN dataset d ON
hda.dataset_id = d.id
@@ -62,7 +62,7 @@
                                                  AND d.purged = false
                                                  AND d.id NOT IN
(SELECT dataset_id

FROM library_dataset_dataset_association)
-                                        GROUP BY d.id) sizes)
+                                        ) sizes)
             WHERE id = :id
         RETURNING disk_usage;
     """
-- 
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
Princeton University


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

--
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
Princeton University


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


--
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
Princeton University


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/