Archive for category #sqlpass 2009
PowerPivot, the topic that has been getting all the big attention here at PASS, has introduced ‘yet another’ expression language to define calculations. So, introducing DAX, Data Analysis eXpressions.
The idea behind this language is to provide OLAP type query functionality to the Excel end user. So the basic syntax looks and feels exactly like standard Excel functions. A set of Excel functions have been implemented (80 in total), with additional functions from the Vertipaq engine to allow OLAP functionality.
Ed Note: The additional functions in DAX actually come directly from the VertiPaq engine. Meaning that it is not a wrapper for MDX, it is actually the language used to define and execute the calculation within the engine.
DAX can be used in two places
- Calculated Columns
- Similar to creating measures in fact tables, or in the DSV in SSAS. Basically used for line-by-line calculations. These values are calculated and stored at design time, and not recalculated at execution time.
- Similar to Calculations in SSAS. Calculated at run time, in the context of the current pivot table or query. Best place to define ratios or custom aggregations that don’t make sense in a line-by-line method.
IMHO, the single most important aspect of this language is the underlying relationships defined within the PowerPivot model. As most calculations require some form of navigation between tables, whether it be date calculations or aggregations over groups.
The basic groups of functions that are supported include:
- Excel Functions
- Table Functions
- Time Intelligence
The Time Intelligence ones are by far the most fun ones for me. it is possible to define MDX-type calculations including Year on Year growth, Period to Date and Parallel Period navigation. A bit of a different model though, as the “time dimension” is assumed based on date fields. Only limitation is that it assumes calendar time.
e.g. YOY Growth looks like
Fact([Sales]) – Fact([Sales])(DateAdd(Time[Date], -1, YEAR)
Overall impression is very positive. Full descriptions available at http://blogs.msdn.com/powerpivot/archive/2009/10/01/introduction-to-data-analysis-expressions-dax-in-gemini.aspx
So, PerformancePoint is slowly but surely growing up into a fully fledged dashboarding environment.
Key new feature that I have seen today is the inclusion of the Sharepoint Connection Framework within PerformancePoint 2010. This basically means that the filters and links between dashboard items are implemented through the Sharepoint glue as apposed to the current custom PerformancePoint 2007 glue.
This opens up a number of new possibilities for filtering dashboards. Including:
- The ability to automatically tie the current user’s login to the dashboard elements. Meaning that finally dashboards can be “seamlessly” targeted to the current user with no fancy filter logic.
- Also means that it is possible to link to practically any Sharepoint Web Part as this is a fully supported sharepoint interface.
Just sat in a fascinating session with Dave Wickert and Denny Lee, chatting about the deployment considerations for PowerPivot Server.
Very entertaining, I got my first session swag, and had a lot of laughs.
So, they ran through a large number of points, and I’m quite sure that they will produce full whitepapers with all the points… here are my personal top 6 lessons
- PowerPivot plugin within Excel client is an instance of the Vertipaq OLAP engine.
- This engine defies some of the basic laws of traditional SSAS, primarily aggregations simply don’t exist in PowerPivot.
- To minimise the Excel file sizes, check this cool trick…
- While this runs in memory, it also has a disk based version of the data, which can be found in User Appdata Local Temp IMBIXXX
- Look out for the DICTIONARY files in this folder, these files store the “dimensional attributes”
- If you have excessively large DICTIONARY files on attributes that are not used, simply drop those attributes
- Read the manual when installing! There are a number of server components involved in PowerPivot, so there are many dependancies, including
- The Powerpivot Service can be located in all sort of places within the Sharepoint Server farm
- and you don’t necessarily need kerberos!! Sharepoint 2010 introduces Claims Based Authentication, which keeps the user token within the Sharepoint farm.
- Recommended to use the New Farm option when beginning, all configuration is handled for you.
- Most important aspect of capacity planning is RAM.
General (and possibly most important…)
- Powerpivot does not replace SSAS
- Although Powerpivot happens to use SSAS as it’s internal engine, this was done for calculation abilities, and not as a replacement roadmap for ‘traditional’ SSAS. Yes, all you BI professionals out there, we still have our jobs!
I’m sitting in the SQL CAT Compression session, and it’s fascinating. It’s really nice to get some insight into what types of compression to use in the real world.
To quickly recap, there are two types of compression:
1. Row Compression
This type is a storage option. Basically, SQL will only use the number of bytes required at any point in time to store the data that is currently in the row. e.g. an INT data type field with a value of 1 would normally uses 4 bytes all the time. When row compression is turned on for this field, only 1 byte is actually used. In fact, if the value was 0, no bytes would be used. Nice option as reads do not require additional CPU cycles.
2. Page Compression
This is the more interesting version of compression. Zip-type technology is used as patterns are used in the page, and these patterns are stored in a central dictionary with pointers in the data itself to these shared patterns. The trade off here is that any reads / writes will require additional CPU cycles to uncompress the rows required. Although, the CPU overhead is between 5%-10%, which isn’t terrible.
So, seeing as I approach the world through Data Warehouse (DW) glasses, let’s summarise some of the best practises in relation to compression in a DW:
- Compress all tables in your Data Warehouse. Using PAGE compression.
- Thinking here is that the majority of the workload in DW is read operations. (i.e. no ongoing CRUD operations that would cause massive CPU overhead)
- To estimate the amount of space that will be saved, use the [sp_estimate_data_compression_savings] stored procedure.
- DW tables should be compressed in offline mode.
- Thinking here is that the DW itself has quiet times where users are not querying your tables. Especially in scenarios where Analysis Services is implemented, where users actually never hit your tables directly anyway.
- SORT_IN_TEMPDB = ON
- Recommended, and if SQLCAT say so, who am I to argue 🙂
- Order of table compression, well start with the smallest table first.
- Thinking here, is that you need additional space when compressing a table, roughly table size plus compressed table size. So, as you compress tables, you are progressively saving space in your DB, and by the time you compress the really big tables, you should freed up enough free space as you’ve gone along.
- When reclaiming all the space you’ve saved, which is normally through a DBCC SHRINKFILE, make sure you run a REORGANIZE on your indices.
- As the shrink algorithm is not aware of indices and introduces fragmentation.
- There is talk of making the shrink algorithm cleverer in time, let’s see how that goes.
- When bulk loading, try bulk load into an uncompressed heap. Then use a CREATE CLUSTERED INDEX statement to create the Page compression.
- Significantly quicker than inserting into a table that is already marked for compression
- Transparent Data Encryption (TDE) and Compression play well together.
- i.e. TDE has basically zero impact on compression performance.
So, I’ve just experienced my first morning at SQL PASS. First impressions have been very good. The vibe is very similar to Tech Ed South Africa, about 2,000 people registered, and we’re staying in a hotel just next to the Convention Centre. I’m so glad we’re indoors, it was a chilly walk this morning 🙂
So, key lessons so far:
- SQL Server is definitely not a simple departmental DB any more. They demonstrated a rack server live on stage with 192 processors. ridiculous processing power.
- Visual Studio 2010 is very cool. Looks good, everything is now a slick Midnight Blue.
- Maybe that’ll result in better looking applications as an end result.
- Also, it supports Data Tier Applications. This is a project type which supports deployment of databases using a file type called a DACPAC. Looks like a great way to deploy changes from dev to live. Have we really seen the end of the humble SQL Scripts?
- Sharepoint 2010 supports publishing of data directly from any document library through ATOM feeds. This is fantastic for PowerPivot, as it is now possible to create full reporting solutions without a database?
- Ed Note: Not sure if it’s a good idea… but is an architectural option, and it’s always good to keep your choices open.
- SQL Azure is close. New release coming through mid November. It’s getting closer and closer to on-premise SQL, in that you can now issue CREATE DATABASE statements directly from SSMS, instead of provisioning DBs through the browser.
- Also possible to synchronise on premise and hosted DBs directly through SSMS. Interesting concept. A bit of a parallel to merge replication, just when you think you know it all, MS introduces another method.