Blogs.oracle.com

Updates in Row Compressed Tables

2014-05-15

Updates in compressed tables, both basic table
compression and advanced row compression, are not well understood and this
article will attempt to explain how they work. In order to do this some
background will be discussed about how Oracle performs row based compression
and why the structure of the data is so important to the effectiveness of the
compression. More importantly, space usage will be detailed for various insert
and update scenarios and the performance impacts that occur when updating
compressed tables.

Basic table compression was first introduced in Oracle 9i
Release 2 and it only supported direct path inserts to achieve compression. In later
releases ALTER TABLE … MOVE operations and online table redefinition are also
supported. It produces better compression than advanced row compression, uses less CPU and generates less undo and redo, but additional single row inserts
and updates are not compressed, and updates can lead to significant row
migration. This is a particular problem because the default setting for PCTFREE
is 0 which leaves no space in the block for updates which require decompression.

Advanced row compression, or OLTP table compression as it
was known in 11g, allows single row inserts and updates to be compressed, and
is therefore more suitable for an OLTP environment. Advanced row compression
requires the Advanced Compression Option and works by only attempting compression when the block is full. This makes the
compression batch like and means that inserts and updates are made uncompressed
into the block. If the block is successfully compressed then it is eligible for
more inserts or updates. If the block is full then the PCTFREE setting governs
how much free space is available for updates, but at this point the updates
will no longer be compressed because the compression mechanism will not be
triggered. Due to the batch nature of the compression algorithm, advanced row
compression will generally not achieve compression as good as basic table
compression because there will typically be uncompressed rows leftover in the
block.

Both basic table compression and advanced row compression
use the same compression algorithm. Compression is achieved by replacing
duplicate values with a pointer to a symbol table entry stored at the beginning
of the block. The duplicate values can occur in one or more columns and there
can be multiple entries in the symbol table. All of the information needed to
decompress a row is stored in the database block itself so Oracle doesn’t have
to access any other information, and since the data is part of the block,
decompression really just consists of substituting the symbol table entry or
entries so it is very fast. It is not at all like uncompressing a zip file.

The format of the compression in a block looks like the following (note that this was created from a block dump):

In this example, we see that we have a symbol table entry at tab 0, row 0. All symbol table entries are associated with table 0 and row data is part of table 1 in row compression blocks. The actual row data starts at tab 1, row 0 in the example above. We see from the “tl” entry that the symbol table entry has a total length of 105 bytes, and the compressed row has a total length of only 4 bytes. If we look a little more closely at this example we see that the “bindmp” (binary dump) field for the compressed row has exactly 4 bytes which is also the same as the total length of the row. In this example the first byte is the flag byte, the second byte the lock byte, the third byte the column count and the fourth byte the entry in the symbol table.

Update Tests

Note: All tests have been performed on an Oracle 12c (12.1.0.1) database and all table definitions and SQL used to perform the tests has been included in the file located here.

A series of tests has been performed to show how well Oracle performs compression and how it handles various update scenarios. The first set of tests involve three tables, one with no compression (table T1), one with basic table compression (table T2) and the third with advanced row compression (table T3). First we’ll load the tables and verify the space usage, and then we will update all of the rows in each table and show the resulting space usage. To keep things simple the tables have only one column and the values used are highly compressible, just a series of 5000 rows with 100 A’s and 5000 rows with 100 B’s.

The result after creating and loading the tables:

We see that the best compression is obtained with basic table compression (T2). Advanced row compression is not quite as good but with a PCTFREE of 10 versus 0 for basic table compression (the defaults) it is still pretty good (T3). Now let’s update the tables, and after updating the 5000 rows with 100 A’s to 100 B’s we get the following:

We see that the space usage for the table without compression has not changed, but the two tables with compression have changed quite a bit. So what’s happened?

Updates in Basic Table Compression Tables

We already know that updates are not going to be compressed in tables using basic table compression, but why is our test table now 4 times larger and the used space over 5 times larger? The answer lies in how Oracle processes the updates on compressed tables. If we consider that Oracle will not re-compress the rows, or at least the columns that were updated, and if we notice that basic table compression tables are created with a default PCTFREE value of 0 then we can be suspicious that Oracle may have chained the rows into new blocks. Where else can it put the updated column values? The following proves our theory:

We see that Oracle has indeed chained our 5000 updated rows. If we dump one of the blocks we see that Oracle has actually migrated the rows to new blocks, and since the migrated rows won’t be compressed we can begin to see why so much space was used.

The “nrid” field points us to the migrated row. If we look at that row:

We see that the total length of the column is 110 bytes so no compression. We also see that the hrid field points back to our original row so we know these two pieces go together.
The last question is what if we increase our PCTFREE to a higher value? Can we avoid row migrations and how much space is required? The answer is that it takes a PCTFREE of 90 (I tested in increments of 10) to avoid row migrations.

Table T2A is the test table and the problem with this is that just as much space is required as the non-compressed table T1. Granted this is an extreme example, but the takeaway is that tables that are compressed with basic table compression should be rarely, if ever, updated.

Updates in Advanced Row Compression Tables

We have seen that updates are not compressed with basic table compression tables so what about advanced row compression tables? After all, the whole point of advanced row compression is to support OLTP workloads with single row inserts and updates. If we recall our simple test above, our advanced row compression table grew from 24 blocks to 32 blocks after our update. Our update just changed the values in 5000 of the rows to the values of the other 5000 rows. In a perfect scenario the space shouldn’t have grown at all. So what’s going on?

I realized that the original test had just inserted 5000 rows with 100 A’s and then 5000 rows with 100 B’s. I was concerned that this might have caused problems with the update since most blocks would have had either all As or all Bs with only one symbol table entry. So I performed a second test that looped 5000 times inserting a row with 100 A’s and then a row with 100 B’s. The second test effectively interleaves the rows and therefore insures that most blocks will have both rows with As and rows with Bs. The total space usage was the same for both tables, but there was a slight difference in the number of rows per block, and that appears to be due to the space required in each block for two symbol tables versus only one (at least that’s the only difference that I could find).

After the updates the space usage for both tests was also the same, 24 blocks versus 32 blocks. However, in both cases the end result is the same as what occurred in the basic table compression test, the updates are not compressed. There is a difference with advanced row compression in that it uses a default PCTFREE value of 10 rather than 0. This means that some rows can be updated in place, and for the rest of the rows we see the same row migrations, but now those row migrations are compressed! This explains why the space usage is quite a bit less than with basic table compression.

Here’s an example block dump from the second table with interleaved rows:

The block dump excerpt above shows us that we have two symbol table entries for our 100 character A’s and B’s at tab 0, row 0 and tab 0, row 1. We also see that our first data row at tab1, row 0 is uncompressed at 106 bytes, but the very next row is compressed at only 4 bytes. This alternating behavior continues through the rest of the block dump, but when we get to row 16 we see our first row migration. We can assume that this occurs as we exhaust the free space that was reserved by the PCTFREE setting of 10, and based on our data interleaving this seems to make sense. The row migration for row 18 was followed to its destination block below. Note that the block header DBA matches the "nrid" DBA (next rowid), and the row number f1 in hex is row 241 in decimal (the "nrid" field is of the format <data block address>.<row number> in hex).

This entry is also shown below:

It’s very interesting to note that the migrated row is compressed. The “tl” field, or total length, is only 10 bytes for a 100 byte column, and we see that there is a symbol table entry at tab 0, row 0.

So this begs the question, are updates ever compressed in the same block? The answer lies in how advanced row compression works. Compression is only attempted when the free space in the block is exhausted. If a compression can take place then the rows in the block are compressed. Before the first compression, an update is really the same as an insert because there are no compressed rows to update. However, after compression has taken place, if a row that has been compressed is updated, then the resulting row can be re-compressed only if another compression is triggered. This is a key point, another compression has to be triggered by filling the block again. If however, all free space has been exhausted or subsequent inserts don’t trigger another compression then the rows that have been updated will stay uncompressed and subsequent updates will cause row migrations once all space in the block is exhausted (i.e. the reserved space from the PCTFREE setting). I said inserts in the previous sentence because in my testing I only observed compression being triggered by inserts.

Let’s show an example of an update getting compressed. The idea here is that as a block is loaded it will reach a point where all free space will be exhausted. Assuming that inserts are what is filling the block a compression will be triggered. Once the block is compressed then more rows can be added to the block. Depending on the data and how compressible it is, there can be many compressions triggered. This test will trigger an initial compression, update compressed rows, and then trigger additional compressions with more inserts. We will then dump the block to prove that our updated rows were indeed compressed.

Based on dumping table T1 we find that we get about 68 uncompressed rows per block. A new table is created with 50 rows of 100 A’s, 3 rows of 100 B’s and 20 more rows of 100 A’s. This gives us a total of 73 rows and they will only fit in one block with compression. We can verify that compression has been invoked by dumping the block and checking the rows.

This dump shows us that we have two symbol table entries, all rows are compressed except rows 67 through 72 (not shown). And we see that our three rows with B’s are at rows 50, 51 and 52 and they are compressed with total lengths of 4 bytes. Next we’ll update the three rows of B’s to A’s and verify that the updates have taken place.

We see that rows 50, 51 and 52 have been updated and are no longer compressed because their total lengths are now 106 bytes. They also have not been migrated so they reside completely within the block. We do see that the surrounding rows are still compressed and have not been changed. Lastly we insert an additional 600 rows of 100 A’s with a commit after each, and see if we can trigger additional compressions.

Now all rows are compressed up to row 557 with rows 558 through 566 remaining uncompressed. The inserts have caused a new block to be added with an additional 106 rows (not shown). We also see that there are no row migrations (from analyzing for chained rows – not shown).

Another way of verifying whether compressions have been triggered is to examine the "HSC" statistics for the session performing the DML (See Master Note for OLTP Compression (Doc ID 1223705.1) for a list of the specific statistics associated with row compression).

In this case the statistics were captured below:

We see that the statistic “HSC OLTP positive compression” has a value of 20 which means that 20 successful compressions were performed during the insertion of the additional 600 rows. Now these weren’t all performed on the original block because an additional block was added to hold the additional rows with a total of 567 rows in the first block and 106 rows in the second. An inspection of the second block shows that 67 rows were compressed and the remaining 39 uncompressed (block dump not shown).

This issue of inserts triggering compression when a space threshold is reached also helps to explain why advanced row compression rarely compresses as well as basic table compression. There are usually uncompressed rows that could have been compressed left over in the blocks of advanced row compression tables and this affects the space savings as compared to basic table compression.

Summary

So we have seen that updates can be compressed in tables using advanced row compression. This seems to be most beneficial for tables that have a mixture of DML activity rather than with tables that are loaded first and then updated. This is a key take away about the behavior of advanced row compression because we have seen that it takes an insert to trigger a block compression. Although updates are never compressed with basic table compession, if a table is going to be loaded and not updated then basic table compression can achieve the best compression possible. If a table will experience updates then advanced row compression should be used. If there are issues with single row insert performance on tables with advanced row compression then the table should be loaded first without compression, and once most insert activity is complete then it can be compressed with advanced row compression.