Martincarstenbach.wordpress.com

Understanding enhancements to block cleanouts in Exadata part 1

Subscribe to Martincarstenbach.wordpress.com

2015-02-23

Travel time is writing time and I have the perfect setting for a techie post. Actually I got quite excited about the subject causing the article to get a bit longer than initially anticipated. In this part you can read about block cleanouts when using buffered I/O. The next part will show how this works using direct path reads and Smart Scans.

The article ultimately aims at describing the enhancements Exadata brings to the table for direct path reads and delayed block cleanouts. Delayed block cleanouts are described in Jonathan Lewis’s “Oracle Core”, and in one of his blog posts, so here’s just a summary.

The delayed block cleanout

In a nutshell Oracle’s database writer (the process persisting your transactions to disk) is free to write blocks to disk in batches when it has to be done. A commit on its own won’t trigger a write of dirty (or modified) block to disk. If it were to trigger the write, the commit-time would be proportional to the number of blocks affected by the last transaction. The commit command however completes very quickly regardless of how much data has been affected. Unlike a rollback … It is entirely possible that a block modified by a transaction is written to disk before the transaction has been completed. A little later, once the commit has been acknowledged by Oracle there is no process that would read the blocks back into the buffer cache and clear them out-this happens later. It would also be quite inefficient to do so.

Defining “clean”

Now when database writer writes to disk it is possible that the block just written has an active transaction recorded in its header. You can see this by dumping a block with an active transaction – the ITL in the header will reference the XID, UBA and the number of rows affected plus other information. The individual rows that are part of the transaction have their lock byte set as you can see in the row directory. The number in the lb field refers back to an ITL you see further up in the block dump (don’t worry I’ll show you an example shortly).

What happened in Oracle before direct path reads became a lot more common is this:

A user starts a transaction, for example by updating a large portion of the table

In the meantime the database writer flushes some blocks to disk, including some of the ones affected by the transaction.

The user commits a transaction

The next user queries the block (let’s assume it is not in the buffer cache)

The second session’s foreground process reads the block

Realises that it has an active transaction recorded in it

Checks if that transaction is still active

And clears the block out if not

The block with a valid CR image is now in the buffer cache

Huh?

Sounds too abstract-I agree? Let’s have an example.

An easy way to simulate the flushing of blocks to disk is a slightly brute-force approach of flushing the buffer cache. Only recommended in the lab, really. Let’s have a look at what the block looks like:

The block dump for block 3940811 on datafile 5 (users tablespace) is now in that trace file:

The significant pieces of information here are in the ITL (interested transaction list) and the lock byte set, pointing to the second ITL entry. The absence of flags for ITL #2 indicates the transaction is ongoing (you can see this in v$transaction-this is a lab environment, in real databases you’d see more than 1 active transaction):

This is indeed the transaction that is referenced in v$transaction and the ITL 0x02 above (9589 is decimal for 2575 in hexadecimal notation, 9 and 5 don’t need converting to decimal numbers). If you are curious now about some more internals then you might find this post by Arup Nanda interesting.

Now what happens if another session queries the table? Since the query in session 1 has not yet committed the changes in it must not be seen by any other session (that would be a dirty read otherwise. We do not use dirty reads in Oracle RDBMS). So to give the user a consistent view of the data the block must be rolled back. Before the test I made sure there wasn’t a block in the buffer cache for the object in question:

Let’s test:

I am using Adrian Billington’s mystats script here which I can only recommend to you-it’s very decent (and that’s an understatement). In the example above it took two snapshots and calculated the change of the session counters in v$mystat during the query execution. Have a look at the cleanout% statistics here! These are the block cleanouts. As this generates redo you can see that recorded here too. This is a tricky interview question: can a select generate redo? Yes sure can! There is also a fair amount of physical I/O going on. After all the buffer cache was empty.

The second execution of the query completes a lot faster due to the absence of physical I/O. Nevertheless, the session has to clean out all these blocks and perform consistent read processing again (remember the transaction is still uncommitted as seen in active txn%):

I should have mentioned that these 2 examples were executed on an X2-2 quarter rack. To prevent Exadata from applying some clever optimisations I specifically disabled direct path reads and turned off Exadata features. Here is the view from the buffer cache again:

For each subsequent execution of the select statement consistent read processing has to be conducted. The number of CR blocks in the buffer cache will increase. For the third execution I can see the following:

Things improve when the transaction commits.

The impact of a commit

What about a commit now? Committing in session 1 and running the query again in session 2 shows a very good response time. Please remember that I forced buffered reads. The next part deals with this case when using direct path reads instead (those are not buffered). The segment size would have qualified for direct path reads otherwise.

Access to blocks in the buffer cache is quick! There was no need to clean out any blocks anymore. The fastest way to do something is indeed not having to do it at all. Mind you, the block on disk still has the active transaction recorded!

The transaction recorded in the ITL slot 2 is gone though:

Did you notice the Flag in the second ITL entry? That wasn’t there before. It indicates a fast commit according to Core Oracle by Jonathan Lewis.

Summary

Block cleanouts are simple with buffered I/O. Dirty blocks can be rolled back and kept in the buffer cache and can be accessed without having to undergo block cleanout.