Posts Tagged sql server

DW != BI?

For the past few years, ITWeb in South Africa have run a number of annual Business Intelligence (BI) conferences. In addition to this, they have run seperate conferences for Data Warehousing (DW).

Now, personally I have never really understood this approach of seperating the concepts. Basically this is implying that possibly BI != DW.

Based on my experience in delivering BI projects; I have always found that a good BI project absolutely requires a good DW component. Despite Analysis Services’ ability to conform data through the virtual world of the UDM (and I include PowerPivot and other 3rd party BI front end tools with similar claims in this discussion), pragmatism and the usual dodgy data quality issues have always dictated that best practise ETL and DW practises are needed before we expose data to end users.

In fact, I have always found the relationship to be so tight, that my internal framing of the world has dictated that BI == DW.

As with any good internal assumption or belief, it is currently being challenged. I’m involved in assisting an international company in developing a group wide reporting data warehouse. Now, the programme is so large, that our current scope of work is “simply” to consolidate the data from the inevitable wide variety of sources into a central data model. This stage of the project has dictated that the approach is very different to what I am used to. i.e. I am no longer directly involved in data delivery, so my usual toolkit of Excel, Strategy Companion, Performance Point and a bit if good old Business Strategy conversations are simply not in the picture.

So, maybe there is space in the world for the idea that DW != BI. (i.e. maybe ITWeb has a point after all)

So, maybe if we flip my formulas into logical implication statements, we could say:

BI –> DW

And although my current project does not necessarily dicate that DW –> BI, I would argue that for the long term success and acceptance of the data we’re consolidating, BI projects will need to be initiated to allow end users to dive into the data asset we’re creating.

Looking forward to being involved in that 🙂

Advertisements

, ,

Leave a comment

Data Compression + DW

Compression Overview

I’m sitting in the SQL CAT Compression session, and it’s fascinating. It’s really nice to get some insight into what types of compression to use in the real world.

To quickly recap, there are two types of compression:

1. Row Compression

This type is a storage option. Basically, SQL will only use the number of bytes required at any point in time to store the data that is currently in the row. e.g. an INT data type field with a value of 1 would normally uses 4 bytes all the time. When row compression is turned on for this field, only 1 byte is actually used. In fact, if the value was 0, no bytes would be used. Nice option as reads do not require additional CPU cycles.

2. Page Compression

This is the more interesting version of compression. Zip-type technology is used as patterns are used in the page, and these patterns are stored in a central dictionary with pointers in the data itself to these shared patterns. The trade off here is that any reads / writes will require additional CPU cycles to uncompress the rows required. Although, the CPU overhead is between 5%-10%, which isn’t terrible.

Compression Lessons

So, seeing as I approach the world through Data Warehouse (DW) glasses, let’s summarise some of the best practises in relation to compression in a DW:

  1. Compress all tables in your Data Warehouse. Using PAGE compression.
    1. Thinking here is that the majority of the workload in DW is read operations. (i.e. no ongoing CRUD operations that would cause massive CPU overhead)
  2. To estimate the amount of space that will be saved, use the [sp_estimate_data_compression_savings] stored procedure.
  3. DW tables should be compressed in offline mode.
    1. Thinking here is that the DW itself has quiet times where users are not querying your tables. Especially in scenarios where Analysis Services is implemented, where users actually never hit your tables directly anyway.
  4. SORT_IN_TEMPDB = ON
    1. Recommended, and if SQLCAT say so, who am I to argue 🙂
  5. Order of table compression, well start with the smallest table first.
    1. Thinking here, is that you need additional space when compressing a table, roughly table size plus compressed table size. So, as you compress tables, you are progressively saving space in your DB, and by the time you compress the really big tables, you should freed up enough free space as you’ve gone along.
  6. When reclaiming all the space you’ve saved, which is normally through a DBCC SHRINKFILE, make sure you run a REORGANIZE on your indices.
    1. As the shrink algorithm is not aware of indices and introduces fragmentation.
    2. There is talk of making the shrink algorithm cleverer in time, let’s see how that goes.
  7. When bulk loading, try bulk load into an uncompressed heap. Then use a CREATE CLUSTERED INDEX statement to create the Page compression.
    1. Significantly quicker than inserting into a table that is already marked for compression
  8. Transparent Data Encryption (TDE) and Compression play well together.
    1. i.e. TDE has basically zero impact on compression performance.

Leave a comment