What to do when NULL hides your data?

Relational databases have been around us for quite a while. One may think during all those years of SQL (being the de-facto standard query language for RDBMS), its numerous proprietary and open source implementations have matured. A recent issue I’ve been dealing with tells me they are still evolving. Nevertheless, this incompleteness only kicks in when you want to create complex SQL statements, but then there always are alternatives to reach a particular goal.

I’ve recently came across an issue with the ordering of the NULL elements. For some reason, PostgreSQL thinks NULL is the highest value. Well, it can be interpreted that way too, but with the following business case it wasn’t so.

The structure (simplified for easier digestion) being queried contains two tables: one for clients, and another for purchases made by clients. Obviously, every purchase is linked to a client, yet not every client has made a purchase.

I needed to create a report that displays every customer, with the total amount of purchases made by them, showing the client with the highest total amount first.

I’ve started off with the following query:
SELECT client.id, SUM(purchase.amount) AS total FROM client LEFT JOIN purchase ON purchase.client_id = client.id GROUP BY client.id ORDER BY total DESC;

Running the query revealed the problem: the aggregate function SUM returned NULL for the clients with no purchase, and NULL is treated as the highest value by Postgres.

This was just about time for looking this issue up. Well, according to the relevant wikipedia entry:

“The SQL standard does not explicitly define a default sort order for Nulls. Instead, on conforming systems, Nulls can be sorted before or after all data values by using the NULLS FIRST or NULLS LAST clauses of the ORDER BY list, respectively. Not all DBMS vendors implement this functionality, however. Vendors who do not implement this functionality may specify different treatments for Null sorting in the DBMS.”

Yes, that NULLS LAST sounded like a promising solution, so I’ve checked which DBMS implement it. A quick search revealed this forum entry that says, by default Postgres, DB2, and Oracle considers NULLs as higher than non-NULLs, while MSSQL and MySQL considers them being lower. DB2 V7 and Oracle 9i supports NULLS FIRST/LAST, while MSSQL, MySQL and PostgreSQL doesn’t.

However, that entry was two and a half years old, while PostgreSQL is known to being actively developed, so I’ve checked its status concerning NULLS LAST again. Its latest documentation does in fact tells NULLS LAST is supported, so I’ve checked when it became integrated. There it was, a patch from last December does in fact take care of the NULL ordering problem. However, it is only included in version 8.3, and an upgrade wasn’t an option in my case.

That meant another route to be taken. After a little more reading, I’ve found my alternative candidate, the COALESCE function, which is implemented in older versions of PostgreSQL too. COALESCE returns the first non-NULL value from its list of parameters, which is exactly what I needed here. My query therefore was altered to this:

SELECT client.id, COALESCE(SUM(purchase.amount),0) AS total FROM client LEFT JOIN purchase ON purchase.client_id = client.id GROUP BY client.id ORDER BY total DESC;

And suddenly, the problem ceased to exist, and every one was happy by looking at the numbers of the biggest customers at last. After all, from the business perspective, they represent the highest value, not NULL.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

0 Response to “What to do when NULL hides your data?”


  • No Comments

Leave a Reply