When I worked for MySQL I saw frequent complaints that the UNIQUE constraint didn’t stop users from inserting NULLs, multiple times. For example:
CREATE TABLE t (s1 INT, UNIQUE (s1));
INSERT INTO t VALUES (NULL);
INSERT INTO t VALUES (NULL); /* This does not cause a “unique constraint violation” error */
There are now eleven generously-commented bugs.mysql.com reports:
#5685, #6829, #7479, #8173, #9844, #17825, #19377, #25544, #27019, #27376, #66512. The essential points (along with observations about how we were stupid or deaf or lazy) are:
the ANSI/ISO SQL standard said we should throw an error,
all the other major DBMSs would throw an error,
and throwing an error would be more sensible and convenient.
The first point is false; the second point depends what “major” means; the third point is a valid opinion.
I will now attempt to beat this subject to death with enough thoroughness that it will become dead.
Exactly what the ANSI/ISO standard says, and how it applies
Our example will be a table named t created thus:
CREATE TABLE t (s1 INT, UNIQUE (s1));
INSERT INTO t VALUES (NULL),(NULL);
So t looks like this:
Our question will be: is the constraint “UNIQUE (s1)” violated according to the SQL standard?
I’ll use a recent draft of SQL:2011 because it’s current and it’s available for all to see, at:
jtc1sc32.org/doc/N1951-2000/32N1964T-text_for_ballot-FCD_9075-2.pdf.
The wording is about the same in all earlier versions of the standard since SQL-89, which didn’t allow NULLs at all for UNIQUE constraints.
Section 11.7 says:
[Syntax Rules]
4) … let SC be the <search condition>:
UNIQUE ( SELECT UCL FROM TNN )
where UCL means “Unique Column List” and TNN means “Table Name”.
So for our example, search condition SC is “UNIQUE (SELECT s1 FROM t)”.
[General Rules]
3) The unique constraint is not satisfied if and only if
EXISTS ( SELECT * FROM TNN WHERE NOT ( SC ) )
is True.
… Plugging in our values, that means
3) The unique constraint is not satisfied if and only if
EXISTS ( SELECT * FROM t WHERE NOT ( UNIQUE (SELECT s1 FROM t) ) )
is True.
So now we have to know how “UNIQUE(…)” is supposed to work.
Section 8.11 says:
If there are no two rows in T such that the value of each column in one row is non-null and is not distinct from the value of the corresponding column in the other row, then the result of the <unique predicate> is True; otherwise, the result of the <unique predicate> is False.
… where T is the table.
Now, apply the predicate “UNIQUE (s1)” to table t.
For either Row#1 or Row#2: Is the value of column s1 non-null?
Answer: NO. It is NULL.
Therefore the requirement “that the value of each column in one row is non-null” is met for zero rows in t.
But the <unique predicate> is only True if the requirement is met for two rows in t. Therefore the result of the <unique predicate> is False.
In other words,
UNIQUE (s1) is False.
Therefore
NOT (UNIQUE (s1)) is True.
Therefore
SELECT * FROM TNN WHERE NOT (UNIQUE (t)) returns zero rows.
Therefore
EXISTS ( SELECT * FROM TNN WHERE NOT ( UNIQUE (t) ) ) is false.
But Rule 3) says that the unique constraint is not satisfied if and only if that EXISTS clause is True.
Therefore the constraint is satisfied.
And when a constraint is satisfied, it is not violated.
Thus, according to the rules for <unique constraint definition>, our example table violates no constraint.
It’s tedious to work through this because there are multiple negatives, but it is not ambiguous.
Why, then, do some people have trouble? Partly because they look at a different statement in the introductory sections. Section 4.17 Integrity constraints says:
A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns.
And I have to admit that it’s not at all obvious what that’s supposed to mean. But, since you have now seen what is in the later and more detailed sections, you should read it as “no two rows have unique-column values which are the same (an undefined word which probably means equal) and which are also non-null”.
What other DBMSs do
Trudy Pelzer and I wrote this on page 260 of our book, SQL Performance Tuning:
DBMS
Maximum number of NULLs when there is a UNIQUE constraint
IBM (DB2)
One
Informix
One
Ingres
Zero
InterBase
Zero
Microsoft (SQL Server)
One
MySQL
Many [although the BDB storage engine was an exception]
Oracle
Many
Sybase
One
DB2 has an optional clause which can force standard behaviour.
Oracle is only allowing multiple NULLs because it doesn’t put NULLs in indexes; if we were talking about a multiple-column constraint, Oracle would be non-standard.
The book didn’t cover PostgreSQL and Access and Firebird, which all allow “Many” NULLs. A newer entrant, NuoDB, allows only one NULL (I told them that’s non-standard but they didn’t believe me).
Most of the complainers were apparently thinking of Microsoft SQL Server. So it’s interesting that on Microsoft’s discussion boards the complaints are (spoiler alert!) that SQL Server doesn’t follow the standard and should be more like other DBMSs. See, for example, NULL in unique index (where SQL Server gets it wrong) and Change UNIQUE constraint to allow multiple NULL values.
According to SQL Server Magazine, in 2008,
ANSI SQL supports two kinds of UNIQUE constraints — one that enforces uniqueness of NULLs just like with known values, and another that enforces uniqueness of known values but allows multiple NULLs. Microsoft SQL Server implemented only the former.
As we’ve seen, that’s false, and so is another claim, that “Informix and Microsoft SQL Server follow the other interpretation of the standard.” But perhaps Microsoft’s stories help explain people’s beliefs.
What is more sensible
An unfortunate defence of the allow-many-NULLs behaviour is that “Null values are not considered equal”. That’s what’s happening in effect in this particular case, but it’s not a general or useful rule.
(a) if we were doing a NULL/NULL comparison, or a NULL/not-NULL comparison, the result would be UNKNOWN. For other constraints an UNKNOWN result is considered a violation.
(b) “equality” is the wrong concept to consider here, it is more important to consider whether the values are “not distinct” — and in fact two NULL values are not distinct.
(c) NULLs sort together for purposes of GROUP BY, ORDER BY, or duplicate elimination.
Therefore, the UNIQUE rules do not follow automatically from how NULLs behave elsewhere in SQL. That is why the standard had to add wording to specify that the uniqueness predicate applies only for “non-null” values.
So, regarding what looks more sensible, it seems that the complainers have a point.
What is more convenient?
It is better, now, to carry on as before. The current behaviour is the de facto and the de jure standard. It is neither a bug nor undesirable.
But would it be even better if MySQL or MariaDB allowed an option? The DB2 syntax to allow UNIQUE with many NULLs is:
CREATE TABLE t (s1 INT UNIQUE WHERE NOT NULL)
so presumably the way to specify UNIQUE with one NULL would be:
CREATE TABLE t (s1 INT UNIQUE WHERE NULL OR NOT NULL)
but it would be easier to understand if we said
CREATE TABLE t (s1 INT UNIQUE AND MAXIMUM NUMBER OF NULLS = 1).
Anyway, it would be possible. But a nice first step would be to acknowledge that it is a feature request rather than a bug.
Copyright (c) 2013 by Ocelot Computer Services Inc. All rights reserved.
PlanetMySQL Voting:
Vote UP /
Vote DOWN