Three Useful PostgreSQL Features

Last week I needed to extract some specific numbers from our systems. It was a good chance to learn a thing or two about PostgreSQL’s capabilities, especially in regards to scripting and updating existing extractions.

FILTER aggregate functions

My main challenge was to split an existing COUNT into two different ones according to a specific criterion. The code, which counted the number of orders per store, did something equivalent to this:

SELECT orders.store_id,
  COUNT(*) AS orders_count,
  -- More rows
FROM orders
WHERE
  -- Multiple conditions on the orders
GROUP BY orders.store_id

To divide it, I could have done something like this:

SELECT orders.store_id
  COUNT(*) AS orders_count
  COUNT(CASE WHEN state = 'confirmed' THEN 1 ELSE 0) AS confirmed_count,
  COUNT(CASE WHEN state = 'cancelled' THEN 1 ELSE 0) AS cancelled_count
FROM orders
WHERE
  -- Multiple conditions on the orders
GROUP BY orders.store_id

There is no need to write ELSE 0 inside each CASE since COUNT will only consider non NULL entries. But thanks to FILTER, a new feature added in the 9.4 release to specify conditions on aggregate functions, I could do this:

SELECT orders.store_id,
  COUNT(*) AS orders_count,
  COUNT(*) FILTER (WHERE state = 'confirmed') AS confirmed_count,
  COUNT(*) FILTER (WHERE state = 'cancelled') AS cancelled_count
FROM orders
WHERE
  -- Multiple conditions on the orders
GROUP BY orders.store_id

I think the latter is slightly more expressive about the intent and easier to understand.

You can read more about FILTER in PostgreSQL’s documentation.

Command line variables with psql

Inside the conditions, there was a date that needed to change. Editing the script each time was not an option. Thankfully, you can pass variables to a script run with psql:

SELECT orders.store_id,
  COUNT(*) AS orders_count,
  COUNT(*) FILTER (WHERE state = 'confirmed') AS confirmed_count,
  COUNT(*) FILTER (WHERE state = 'cancelled') AS cancelled_count
FROM orders
WHERE
  created_at < :date AND created_at >= :date - INTERVAL '7 days'
GROUP BY orders.store_id

This way :date can be defined in the command line, i.e.:

$ psql -f example.sql -v date="to_date('2016-04-06', 'YYYY-MM-DD')"

You can read more about psql command line variables here in the documentation.

COPY query results

The last step is that I needed to print the results to STDOUT in CSV format instead of the default representation. For this we have the COPY statement:

COPY (
  SELECT orders.store_id,
    COUNT(*) AS orders_count,
    COUNT(*) FILTER (WHERE state = 'confirmed') AS confirmed_count,
    COUNT(*) FILTER (WHERE state = 'cancelled') AS cancelled_count,
  FROM orders
  WHERE
    created_at < :date AND create_at >= :date - INTERVAL '7 days'
  GROUP BY orders.store_id
) TO STDOUT (DELIMITER ';');

With these changes, I could run this in the command line:

$ psql -f example.sql -v date="to_date('2016-04-06', 'YYYY-MM-DD')"

And get the formatted output:

1;123;13
2;224;11
3;156;8
...

This way I could easily invoke the script from another program and save the output to handle it like CSV.

You can read more about COPY in the documentation. It offers more options about formatting and even allows writing directly to a file.

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *