Edit: Before or after reading my version, go read what Marco and Alberto have to say on the subject. I find it confusing… so two views are probably better than one: http://www.sqlbi.com/articles/autoexist-and-normalization/
Hierarchical. Just saying that makes me feel smart; though, I’m not super sure I spelled it right.
Here, I am not talking about the feature that let’s you group a set of columns such that you can drop them all into a pivot table as one set. So far as I can tell, that is just some UI candy.
I am talking about any situation where your data actually is in a hierarchy. eg: Location (Country –> State –> City) or Products (Category –> SubCategory), etc.
Frequently, I start one of the blog entries… realize I don’t know what the hell I am talking about, then go figure it out, and pretend I always knew it. This time is different, in that I am skipping everything but the first step…
So, I searched around for city populations in the States and found this one: http://www.infoplease.com/ipa/A0763098.html and simply dropped that URL into Power Query to pull in the data… which is still fun and neato, if you ask me. It’s the populations for the 50 largest cities in the U.S.
I wrote a simple measure because I couldn’t help myself: Total Pop := SUM(Population[Estimate])
Now, here is my question. We know that Excel will magically hide rows that result in all BLANK() values… so, did it try to evaluate Portland, Texas? (And while some may think I just mean “Austin”… no silly, I mean… is it trying to evaluate every city with every state!?)
Under Pivot Table Options, you can go flip the checkbox that hides empty rows. Or, you can do something goofy, like I did… and create this AWESOME MEASURE: [One] := 1 Which ya know… returns a 1. Always. So, if there was an evaluation AT ALL, it will show up. And as you can see at right… we are only seeing evaluations that “make sense”. Hurray! Efficient!
Hierarchies and Lookup Tables
Okay, minor change. We are going to create a States lookup table. I’m not even going to bother doing anything interesting with it… just match up state name to state name and call it a day. And we are going to see if it has any impact on our lives when we use that lookup column in our pivot table.
First, let’s just recognize that the results will be fine. And the default behavior of the pivot table will show what you expect. But when we include our cute [One] measure to force all evaluated rows to show up… indeed, we get different results than our previous pivot table.
We can see that it tried to evaluate Dallas, California… maybe the worst idea for a city… EVAR?
Does any of this matter?
Well, as per always… follow Tiny Lizard Rule #1 “Don’t Solve Problems You Don’t Have”. Some day I hope to have a Rule #2. Anyway, if you don’t have issues… don’t stress it… I really don’t think this scenario would be super common. But… I have stumbled into it a few times, so… it happens.
Why does it matter?
Here I paraphrase Master Collie: “Computers Math Like a Mother!” – which is to say, by the time the processor is just adding numbers up, it does that real real (real) fast. But evaluating filter context to find which numbers to add? That… that can take awhile. And here, we only had 50 cities across 30 cities, so 30*50=1500 evaluations of filter context (instead of our original 50!). But you can easily imagine this number become REAL large, REAL fast… Country*Region*City… by the time you have evaluated the great city of New Orleans from the Emilia-Romagna region of Canada, … you might be having some perf issues.
Conclusions and Such
I think just… be vaguely aware of this issue. It feels solvable with what my buddy Tyler calls 4D… “Don’t Do Dat Den”. I can imagine it might call for some careful data shaping… but in my experience hierarchies like this tend to come from the same table anyway.
You will (of course) get the same impact by putting other columns on the pivot table… Geography*YearMonth*SalesPerson is evaluating EVERY geography, yearmonth and salesperson combination… even if Joe only operated in Rome in 2011… because that is just not “known” until the evaluation happens.
That said, I do not feel I have completely wrapped my head around this issue. As always if somebody wants to crisp this up, hit me in the comments. I am sure somebody can relate it back to some SQL join madness to make it clear.
Certainly it doesn’t “feel” quite right to me… as I always think of filters “flowing” into the fact table from a lookup as being identical to applying the filter directly to the fact table… but clearly that is not quite true.