One of my more common consults is for “my workbook is slow”. Occasionally, I will have to dig into the actual measures, but more typically, I just check for a few common issues that can have a large negative impact on performance.
Before we dig into those, let’s just take one moment to state the obvious. Power Pivot is an in-memory database. Almost by definition… if you want things to run faster… use less memory! When I am looking at the customers model, the overriding question I am asking is “are they using more memory than they need to?”.
I will often start the review by running the vba script on Kasper’s site. It will quickly tell you which columns are using the most memory, which is a great way to begin your diagnosis.
Ok, with that out of the way, let’s look at some common trouble spots.
Too many columns on Fact tables
When I say “Fact Table’’” here… I think I actually mean “any table with lots of rows”. Typically that is a fact table, but if you have large large lookup tables, it is probably worth looking at them too. There are many ways to remove columns:
- My personal favorite is to convince people they don’t need a column, then just delete it! If it is a field of strings storing notes/memos, removing it is a double win because those are eating lots of memory.
- Normalize something like 4 columns of Geography down to just 1 GeographyID that you pull into a separate lookup table
- Realize that some columns can be easily computed. You probably don’t need UnitPrice, Quantity and TotalPrice, since the 3rd can be computed, given the other two.
Obviously I pay closer attention the columns using the most memory first, but the thing to realize is that removing a column will usually have a larger impact than you expect.
Please note: on small lookup tables, you do not care. At all. Add your 100th calculated column on your Calendar table… I don’t care!
Too much granularity on columns
The most common culprit here is storing a date and time in one column, especially when you probably don’t even care about the time. There are obviously many more unique values of date and time, compared to either one separately. Best is to just remove the Time column, but you can also simply separate them into two columns.
Any column that is a unique identifier is going to take a lot of memory (by definition there is no duplication, it won’t compress well)… if you only care about your values in aggregate, don’t even bring that evil column into your model.
Do you have a column with 14 bazillion floating point digits? Maybe just 1 or 2 bazillion will suffice? Or, while it will have some side effects, you could consider breaking a column with lots of digits into multiple columns, and recombining with math in your measures. You probably want to try to find a split that gives roughly the same number of unique values in each column, so don’t constrain your self to just “dollars and cents” or even first 5 digits, last 5 digits. This one… you have to test!
The goal here is better compression by having less distinct values.
Untrimmed Calendar Table
I think this happens because the problem is easily “masked” or hidden. You have a calendar table from 2000 to 2040, but you only have sales in 2012-2014. And sure enough, when you drop Year on rows and Sales in the values… it looks nice! A pivot table with just 2012, 2013 and 2014.
What you don’t realize is that Power Pivot also tried to compute values for 2000, 2001, … it is just that blanks were returned, and by default Excel filters out any row that is all blank. But once you start adding more columns PriorYear, YearToDate , PriorYearToDate and %Growth, then throw in a few slicers by Geography and Category… and all the sudden things are slow. Well, you are doing 10x more work than you needed.
Trim your Calendar table to 2012-2014, and dance the happy performance dance. Also, note there isn’t that much magic to the Calendar table… if you can trim other lookup tables… do so! You tend to “iterate” on the values in our lookup tables, and iterating is slow. Less is more!
This one is a doozy.
You know how when you click on California in your state slicer, the cities for Dallas and Seattle magically “turn gray”, leaving just San Francisco and Sacramento? It does that by basically “trying” all the values in the slicers. If clicking the slicer entry would return all blanks in the pivot table… it disables the item in the slicer. But that is A LOT of work, especially if the slicer has many items. It doesn’t take many slicers for your performance to take a serious nose dive.
You can either have less slicers or you can keep the slicers but turn off the cross-slicer filtering. See this great blog entry over on PowerPivotPro for more info.
Lot’s O’ Cells (aka Detail Reports)
The thing to remember here is that every cell in a pivot table is calculated completely independently. Even grand total cells. Okay, so you have 10 years of rows on your pivot table and all 12 months on columns. You are doing 120 or so completely independent calculations. Now you want to add 20 Categories (now 2400 cells) and 80 Sub-Category (now 192,000 cells).
Well, ya… it is going to get slow. Especially if you also have some slicers on the report with cross slicer filtering turned on!
The best solution is just to avoid these types of detail reports, but if you are using 2013, there may be another way to skin this cat. You can see in this blog entry I posted over at PowerPivotPro, that you can do a single DAX query that returns a whole table of UI goodness. So, if you can craft your query just right… you can do 1 query instead of 192,000.
I couldn’t find a better reference to the technique, so maybe I will work on that next time.
I actually had a few other sections here like “avoid large string columns” (duh) and “careful large lookup tables” (duh)… but the first is super obvious (and should come out at the “Too many columns on Fact tables” step) and, for the latter, the Calendar is actually a really good example of that.
If you take a good look at all this stuff in your model, and you still have performance issues, well… that is when things get serious Looking at individual measures in an attempt to optimize for performance is not for the feint of heart. “The Italians” have some great videos on optimizing your DAX queries, but that is some hard work. Thankfully, Power Pivot is so crazy fast that it is rarely required, especially if you keep your model reasonable.
Avoid the pitfalls and likely you will be just fine!