One of my more common consults is for “my workbook is slow”. Occasionally, I will have to dig into the actual measures, but more typically, I just check for a few common issues that can have a large negative impact on performance.
Before we dig into those, let’s just take one moment to state the obvious. Power Pivot is an in-memory database. Almost by definition… if you want things to run faster… use less memory! When I am looking at the customers model, the overriding question I am asking is “are they using more memory than they need to?”.
I will often start the review by running the vba script on Kasper’s site. It will quickly tell you which columns are using the most memory, which is a great way to begin your diagnosis.
Ok, with that out of the way, let’s look at some common trouble spots.
Too many columns on Fact tables
When I say “Fact Table’’” here… I think I actually mean “any table with lots of rows”. Typically that is a fact table, but if you have large large lookup tables, it is probably worth looking at them too. There are many ways to remove columns:
- My personal favorite is to convince people they don’t need a column, then just delete it! If it is a field of strings storing notes/memos, removing it is a double win because those are eating lots of memory.
- Normalize something like 4 columns of Geography down to just 1 GeographyID that you pull into a separate lookup table
- Realize that some columns can be easily computed. You probably don’t need UnitPrice, Quantity and TotalPrice, since the 3rd can be computed, given the other two.
Obviously I pay closer attention the columns using the most memory first, but the thing to realize is that removing a column will usually have a larger impact than you expect.
Please note: on small lookup tables, you do not care. At all. Add your 100th calculated column on your Calendar table… I don’t care!
Too much granularity on columns
The most common culprit here is storing a date and time in one column, especially when you probably don’t even care about the time. There are obviously many more unique values of date and time, compared to either one separately. Best is to just remove the Time column, but you can also simply separate them into two columns.
Any column that is a unique identifier is going to take a lot of memory (by definition there is no duplication, it won’t compress well)… if you only care about your values in aggregate, don’t even bring that evil column into your model.
Do you have a column with 14 bazillion floating point digits? Maybe just 1 or 2 bazillion will suffice? Or, while it will have some side effects, you could consider breaking a column with lots of digits into multiple columns, and recombining with math in your measures. You probably want to try to find a split that gives roughly the same number of unique values in each column, so don’t constrain your self to just “dollars and cents” or even first 5 digits, last 5 digits. This one… you have to test!
The goal here is better compression by having less distinct values.
Untrimmed Calendar Table
I think this happens because the problem is easily “masked” or hidden. You have a calendar table from 2000 to 2040, but you only have sales in 2012-2014. And sure enough, when you drop Year on rows and Sales in the values… it looks nice! A pivot table with just 2012, 2013 and 2014.
What you don’t realize is that Power Pivot also tried to compute values for 2000, 2001, … it is just that blanks were returned, and by default Excel filters out any row that is all blank. But once you start adding more columns PriorYear, YearToDate , PriorYearToDate and %Growth, then throw in a few slicers by Geography and Category… and all the sudden things are slow. Well, you are doing 10x more work than you needed.
Trim your Calendar table to 2012-2014, and dance the happy performance dance. Also, note there isn’t that much magic to the Calendar table… if you can trim other lookup tables… do so! You tend to “iterate” on the values in our lookup tables, and iterating is slow. Less is more!
Cross-Slicer Filtering
This one is a doozy.
You know how when you click on California in your state slicer, the cities for Dallas and Seattle magically “turn gray”, leaving just San Francisco and Sacramento? It does that by basically “trying” all the values in the slicers. If clicking the slicer entry would return all blanks in the pivot table… it disables the item in the slicer. But that is A LOT of work, especially if the slicer has many items. It doesn’t take many slicers for your performance to take a serious nose dive.
You can either have less slicers or you can keep the slicers but turn off the cross-slicer filtering. See this great blog entry over on PowerPivotPro for more info.
Lot’s O’ Cells (aka Detail Reports)
The thing to remember here is that every cell in a pivot table is calculated completely independently. Even grand total cells. Okay, so you have 10 years of rows on your pivot table and all 12 months on columns. You are doing 120 or so completely independent calculations. Now you want to add 20 Categories (now 2400 cells) and 80 Sub-Category (now 192,000 cells).
Well, ya… it is going to get slow. Especially if you also have some slicers on the report with cross slicer filtering turned on!
The best solution is just to avoid these types of detail reports, but if you are using 2013, there may be another way to skin this cat. You can see in this blog entry I posted over at PowerPivotPro, that you can do a single DAX query that returns a whole table of UI goodness. So, if you can craft your query just right… you can do 1 query instead of 192,000.
I couldn’t find a better reference to the technique, so maybe I will work on that next time.
Wrap-Up
I actually had a few other sections here like “avoid large string columns” (duh) and “careful large lookup tables” (duh)… but the first is super obvious (and should come out at the “Too many columns on Fact tables” step) and, for the latter, the Calendar is actually a really good example of that.
If you take a good look at all this stuff in your model, and you still have performance issues, well… that is when things get serious Looking at individual measures in an attempt to optimize for performance is not for the feint of heart. “The Italians” have some great videos on optimizing your DAX queries, but that is some hard work. Thankfully, Power Pivot is so crazy fast that it is rarely required, especially if you keep your model reasonable.
Avoid the pitfalls and likely you will be just fine!
- The streak is alive! – August 10, 2017
- DAX KEEPFILTERS Plus Bonus Complaining! – July 20, 2017
- VAR: The best thing to happen to DAX since CALCULATE() – July 11, 2017
- Review: Analyzing Data with Power BI and Power Pivot for Excel – June 20, 2017
- Power BI Date Table – June 2, 2017
Scott Senkeresty here… Founder and Owner of Tiny Lizard. Yes, I have done some stuff: A Master’s Degree in Computer Science from California Polytechnic State University in beautiful San Luis Obispo, California. Over twelve years developing software for Microsoft, across Office, Windows and Anti-Malware teams… Read More >>
Konstantinos Ioannou says
Thanks for the summarizing what to look first as we often check measures first..Most of them are known but the untrimmed date table is fantastic tip-Never realized due to empty pivot values…I will try Kasper’s script too
Ken Puls says
“convince people they don’t need a column, then just delete it!”
You waste time on convincing them first? That’s what backups are for! 🙂
With regards to the calendar table, I recently burned myself in the opposite direction. I always trim my calendar tables, basically modifying the SQL that pulls them in to restrict the data range. Did that with a model I built a while back, then recently expanded it to add some other intell we needed. Couldn’t figure out why the measures were messed up till I realized that my calendar table was over-trimmed given the revised scope. Doh!
Matt Allington says
I have found that if you have too many pivot tables in a workbook (Excel 2013) then that is also a big drain on performance. It is remarkable – I have some dead slow pivot tables in a workbook. Then I save a copy of the workbook and delete all other sheets apart from the one I am using – BAM! super fast!
Hadi says
Thanks Scott for the post.
To you point on “Lot’s O’ Cells (aka Detail Reports)”, I am experiencing slow update because of too many calculation when adding/removing a field on higher level. For example, I have 5 fields on row area (say 10 Department, 100 Subgroups, 1000 Cost centers, 20 GL Account and 200 unique comments on variance). If I remove top field (department field in this example), all other fields are fully expended and it takes a lot of time to update.
If I start removing/adding fields from bottom with higher level fields fully collapsed, update is quick.
However, I cannot make sure that fields are collapsed when I add or remove a field from a higher level. Do you have any suggestion for me? Is there any setting which ensure lower level fields are fully collapsed by default. Thank you in advance.