Data Compression + DW

Compression Overview

I’m sitting in the SQL CAT Compression session, and it’s fascinating. It’s really nice to get some insight into what types of compression to use in the real world.

To quickly recap, there are two types of compression:

1. Row Compression

This type is a storage option. Basically, SQL will only use the number of bytes required at any point in time to store the data that is currently in the row. e.g. an INT data type field with a value of 1 would normally uses 4 bytes all the time. When row compression is turned on for this field, only 1 byte is actually used. In fact, if the value was 0, no bytes would be used. Nice option as reads do not require additional CPU cycles.

2. Page Compression

This is the more interesting version of compression. Zip-type technology is used as patterns are used in the page, and these patterns are stored in a central dictionary with pointers in the data itself to these shared patterns. The trade off here is that any reads / writes will require additional CPU cycles to uncompress the rows required. Although, the CPU overhead is between 5%-10%, which isn’t terrible.

Compression Lessons

So, seeing as I approach the world through Data Warehouse (DW) glasses, let’s summarise some of the best practises in relation to compression in a DW:

  1. Compress all tables in your Data Warehouse. Using PAGE compression.
    1. Thinking here is that the majority of the workload in DW is read operations. (i.e. no ongoing CRUD operations that would cause massive CPU overhead)
  2. To estimate the amount of space that will be saved, use the [sp_estimate_data_compression_savings] stored procedure.
  3. DW tables should be compressed in offline mode.
    1. Thinking here is that the DW itself has quiet times where users are not querying your tables. Especially in scenarios where Analysis Services is implemented, where users actually never hit your tables directly anyway.
  4. SORT_IN_TEMPDB = ON
    1. Recommended, and if SQLCAT say so, who am I to argue 🙂
  5. Order of table compression, well start with the smallest table first.
    1. Thinking here, is that you need additional space when compressing a table, roughly table size plus compressed table size. So, as you compress tables, you are progressively saving space in your DB, and by the time you compress the really big tables, you should freed up enough free space as you’ve gone along.
  6. When reclaiming all the space you’ve saved, which is normally through a DBCC SHRINKFILE, make sure you run a REORGANIZE on your indices.
    1. As the shrink algorithm is not aware of indices and introduces fragmentation.
    2. There is talk of making the shrink algorithm cleverer in time, let’s see how that goes.
  7. When bulk loading, try bulk load into an uncompressed heap. Then use a CREATE CLUSTERED INDEX statement to create the Page compression.
    1. Significantly quicker than inserting into a table that is already marked for compression
  8. Transparent Data Encryption (TDE) and Compression play well together.
    1. i.e. TDE has basically zero impact on compression performance.
Advertisements

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: