Sqlperformance.com

Data Modifications under Read Committed Snapshot Isolation

2014-05-14

The previous post in this series showed how a T-SQL statement running under read committed snapshot isolation (RCSI) normally sees a snapshot view of the committed state of the database as it was when the statement started execution. That is a good description of how things work for statements that read data, but there are important differences for statements running under RCSI that modify existing rows.

I emphasise the modification of existing rows above, because the following considerations apply only to UPDATE and DELETE operations (and the corresponding actions of a MERGE statement). To be clear, INSERT statements are specifically excluded from the behaviour I am about to describe because inserts do not modify existing data.

Update locks and row versions

The first difference is that update and delete statements do not read row versions under RCSI when searching for the source rows to modify. Update and delete statements under RCSI instead acquire update locks when searching for qualifying rows. Using update locks ensures that the search operation finds rows to modify using the most recent committed data.

Without update locks, the search would be based on a possibly out-of-date version of the data set (committed data as it was when the data modification statement started). This might remind you of the trigger example we saw last time, where a READCOMMITTEDLOCK hint was used to revert from RCSI to the locking implementation of read committed isolation. That hint was required in that example to avoid basing an important action on out-of-date information. The same kind of reasoning is being used here. One difference is that the READCOMMITTEDLOCK hint acquires shared locks instead of update locks. In addition, SQL Server automatically acquires update locks to protect data modifications under RCSI without requiring us to add an explicit hint.

Taking update locks also ensures that the update or delete statement will block if it encounters an incompatible lock, for example an exclusive lock protecting an in-flight data modification performed by another concurrent transaction.

An additional complication is that the modified behaviour only applies to the table that is the target of the update or delete operation. Other tables in the same delete or update statement, including additional references to the target table, continue to use row versions.

Some examples are probably required to make these confusing behaviours a bit clearer…

Test Setup

The following script ensures we are all set up to use RCSI, creates a simple table, and adds two example rows to it:

The next step needs to run in a separate session. It starts a transaction and deletes both rows from the test table (seems odd, but this will all make sense shortly):

Note that the transaction is deliberately left open. This maintains exclusive locks on both rows being deleted (along with the usual intent-exclusive locks on the containing page and the table itself) as the query below can be used to show:

The Select Test

Switching back to the original session, the first thing I want to show is that regular select statements using RCSI still see the two rows being deleted. The select query below uses row versions to return the latest committed data as at the time the statement begins:

In case that seems surprising, remember that showing the rows as deleted would mean displaying an uncommitted view of the data, which is not allowed at read committed isolation.

The Delete Test

Despite the success of the select test, an attempt to delete these same rows from the current session will be blocked. You might imagine this blocking occurs when the operation tries to acquire exclusive locks, but that is not the case.

The delete does not use row versioning to locate the rows to delete; it tries to acquire update locks instead. Update locks are incompatible with the exclusive row locks held by the session with the open transaction, so the query blocks:

The estimated query plan for this statement shows that the rows to be deleted are identified by a regular seeking operation before a separate operator performs the actual deletion:

We can see the locks held at this stage by running the same locking query as before (from another session) remembering to change the SPID reference to that used by the blocked query. The results look like this:

Our delete query is blocked at the Clustered Index Seek operator, which is waiting to acquire an update lock to read data. This shows that locating the rows to delete under RCSI acquires update locks rather than reading potentially-stale versioned data. It also shows that the blocking is not due to the delete part of the operation waiting to acquire an exclusive lock.

The Update Test

Cancel the blocked query and try the following update instead:

The estimated execution plan is similar to the one seen in the delete test:

The Compute Scalar is there to determine the result of adding 1000 to the current value of the Data column in each row, which is read by the Clustered Index Seek. This statement will also block when executed, due to the update lock requested by the read operation. The screenshot below shows the locks held when the query blocks:

As before, the query is blocked at the seek, waiting for the incompatible exclusive lock to be released so an update lock can be acquired.

The Insert Test

The next test features a statement that inserts a new row into our test table, using the Data column value from the existing row with ID 1 in the table. Recall that this row is still exclusively locked by session with the open transaction:

The execution plan is again similar to the previous tests:

This time, the query is not blocked. This shows that update locks were not acquired when reading data for the insert. This query instead used row-versioning to acquire the Data column value for the newly-inserted row. Update locks were not acquired because this statement did not locate any rows to modify, it merely read data to use in the insert.

We can see this new row in the table using the select test query from before:

Note that we are able to update and delete the new row (which will require update locks) because there is no conflicting exclusive lock. The session with the open transaction only has exclusive locks on rows 1 and 2:

This test confirms that insert statements do not acquire update locks when reading, because unlike updates and deletes they do not modify an existing row. The reading portion of an insert statement uses the normal RCSI row versioning behaviour.

Multiple reference test

I mentioned before that only the single table reference used to locate rows to modify acquires update locks; other tables in the same update or delete statement still read row versions. As a special case of that general principle, a data modification statement with multiple references to the same table only applies update locks on the one instance used to locate rows to modify. This final test illustrates this more complex behaviour, step by step.

The first thing we will need is a new third row for our test table, this time with a zero in the Data column:

As expected, this insert proceeds without blocking, resulting in a table that looks like this:

Remember, the second session still holds exclusive locks on rows 1 and 2 at this point. We are free to acquire locks on row 3 if we need to. The following query is the one we will use to show the behaviour with multiple references to the target table:

This is a more complex query, but its operation is relatively simple. There are two references to the test table, one I have aliased as ReadRef, and the other as WriteRef. The idea is to read from row 1 (using a row version) via ReadRef, and to update the third row (which will need an update lock) using WriteRef.

The query specifies row 1 explicitly in the where clause for the reading table reference. It joins to the writing reference to the same table by adding 2 to that RowID (so identifying row 3). The update statement also uses an output clause to return a result set showing the values read from the source table and the resulting changes made to row 3.

The estimated query plan for this statement is as follows:

The properties of the seek labelled (1) show that this seek is on the ReadRef alias, reading data from the row with RowID 1:

This seek operation does not locate a row that will be updated, so update locks are not taken; the read is performed using versioned data. The read is not blocked by the exclusive locks held by the other session.

The compute scalar labelled (2) defines an expression labelled 1004 that calculates the updated Data column value. Expression 1009 calculates the row ID to be updated (1 + 2 = row ID 3):

The second seek is a reference to the same table (3). This seek locates the row that will be updated (row 3) using expression 1009:

Because this seek locates a row to be changed, an update lock is taken instead of using row versions. There is no conflicting exclusive lock on row ID 3, so the lock request is granted immediately.

The final highlighted operator (4) is the update operation itself. The update lock on row 3 is upgraded to an exclusive lock at this point, just before the modification is actually performed. This operator also returns the data specified in the output clause of the update statement:

The result of the update statement (generated by the output clause) is shown below:

The final state of the table is as shown below:

We can confirm the locks taken during execution using a Profiler trace:

This shows that only a single update row key lock is acquired. When this row reaches the update operator, the lock is converted to an exclusive lock. At the end of the statement, the lock is released.

You may be able to see from the trace output that the lock hash value for the update-locked row is (98ec012aa510) in my test database. The following query shows that this lock hash is indeed associated with the in the clustered index with RowID 3:

Note that the update locks taken in these examples are shorter-lived than the update locks taken if we specify an UPDLOCK hint. These internal update locks are released at the end of the statement, whereas UPDLOCK locks are held to the end of the transaction.

This concludes the demonstration of cases where RCSI acquires update locks to read current committed data instead of using row versioning.

Shared and Key-Range Locks under RCSI

There are a number of other scenarios where the database engine may still acquire locks under RCSI. These situations all relate to the need to preserve correctness that would be threatened by relying on potentially out-of-date versioned data.

Shared Locks taken for Foreign Key Validation

For two tables in a straightforward foreign key relationship, the database engine needs to take steps to ensure constraints are not violated by relying on potentially-stale versioned reads. The current implementation does this by switching to locking read committed when accessing data as part of an automatic foreign key check.

Taking shared locks ensures the integrity check reads the very latest committed data (not an old version), or blocks due to a concurrent in-flight modification. The switch to locking read committed only applies to the particular access method used to check foreign key data; other data access in the same statement continues to use row versions.

This behaviour only applies to statements that change data, where the change directly affects a foreign key relationship. For modifications to the referenced (parent) table, this means updates that affect the referenced value (unless it is set to NULL) and all deletions. For the referencing (child) table, this means all inserts and updates (again, unless the key reference is NULL). The same considerations apply to the component effects of a MERGE.

An example execution plan showing a foreign key lookup that takes shared locks is shown below:

Serializable for cascading foreign keys

Where the foreign key relationship has a cascading action, correctness requires a local escalation to serializable isolation semantics. This means you will see key-range locks taken for a cascading referential action. As was the case for the update locks seen previously, these key-range locks are scoped to the statement, not the transaction. An example execution plan showing where the internal serializable locks are taken under RCSI is shown below:

Other scenarios

There are many other specific cases where the engine automatically extends the lifetime of locks, or locally escalates the isolation level to ensure correctness. These include the serializable semantics used when maintaining a related indexed view, or when maintaining an index that has the IGNORE_DUP_KEY option set.

The takeaway message is that RCSI reduces the amount of locking, but cannot always eliminate it entirely.

Next Time

The next post in this series looks at the snapshot isolation level.