2013-08-20

It’s not often that a SQL Server bug really surprises me or makes me wonder how it was never spotted before, but this is one of those.

To get right into it, say you have the following two tables:

Load table T1 with 4,999 rows. All of the rows have a SomeID value of 1234, and the T1ID primary key is sequentially numbered from 1 to 4,999:

Table T2 gets 999 rows, generated by adding T1ID values from T1 that divide exactly by 5:

More visually, T1 looks like this (T1ID goes up to 4,999):



And T2 looks like this (T2ID goes up to 999):



The test query simply counts the rows that match between the two tables when joined on T1ID:

The execution plan features a merge join:



The correct result (999) is returned and everyone is happy:

Enter the Index

Now someone comes along and adds a new index to table T1:

This is a perfectly reasonable index, apparently essential for some crucial query or other. Let’s run our COUNT_BIG(*) query again:

The execution plan looks similar:

But the result is wrong! There are still 999 matches in the underlying data.

An Even Simpler Query

With the new index still in place, we run this query:

This query should obviously return all the T1IDs from 1 to 4,999 in ascending order. Instead, we get:

The list starts at 4000 not 1! Also, out-of-order rows are found further down:

The results are not ordered by T1ID despite the ORDER BY T1TD ASC clause. Quite astonishing.

Cause

Both problems are caused by a bug in the query optimizer, which is present in all versions of SQL Server from 2008 to 2014 CTP 1 inclusive. The bug produces a query plan that does not provide the ordering guarantees the optimizer thinks it does, leading to incorrect results.

The sneaky aspect to it is that the index which causes the problem could be added at any time, without the original query-writer’s knowledge. Equally, data changes could mean that a query plan that used to use a hash or nested loops join suddenly recompiles to choose a merge join. Since a merge join requires sorted input, the opportunity for suddenly incorrect (incomplete!) results is obvious (and an example was shown above).

There is no trace flag that I am aware of that fixes this issue.

I have opened a Connect item for this bug, and written more about the detailed explanation in a guest post on SQLperformance.com

Paul White

Twitter : @SQL_Kiwi

© 2013 All Rights Reserved

Show more