Edit: Before or after reading my version, go read what Marco and Alberto have to say on the subject. I find it confusing… so two views are probably better than one: http://www.sqlbi.com/articles/autoexist-and-normalization/
Hierarchical. Just saying that makes me feel smart; though, I’m not super sure I spelled it right.
Here, I am not talking about the feature that let’s you group a set of columns such that you can drop them all into a pivot table as one set. So far as I can tell, that is just some UI candy.
I am talking about any situation where your data actually is in a hierarchy. eg: Location (Country –> State –> City) or Products (Category –> SubCategory), etc.
Frequently, I start one of the blog entries… realize I don’t know what the hell I am talking about, then go figure it out, and pretend I always knew it. This time is different, in that I am skipping everything but the first step…
Simple Sample
So, I searched around for city populations in the States and found this one: http://www.infoplease.com/ipa/A0763098.html and simply dropped that URL into Power Query to pull in the data… which is still fun and neato, if you ask me. It’s the populations for the 50 largest cities in the U.S.
I wrote a simple measure because I couldn’t help myself: Total Pop := SUM(Population[Estimate])
Drag State and City onto rows, and drop Total Pop onto values and we get the simple pivot table, which looks exactly like we expected. And might I just say that NY, NY is REALLY populated
Now, here is my question. We know that Excel will magically hide rows that result in all BLANK() values… so, did it try to evaluate Portland, Texas? (And while some may think I just mean “Austin”… no silly, I mean… is it trying to evaluate every city with every state!?)
Under Pivot Table Options, you can go flip the checkbox that hides empty rows. Or, you can do something goofy, like I did… and create this AWESOME MEASURE: [One] := 1 Which ya know… returns a 1. Always. So, if there was an evaluation AT ALL, it will show up. And as you can see at right… we are only seeing evaluations that “make sense”. Hurray! Efficient!
Hierarchies and Lookup Tables
Okay, minor change. We are going to create a States lookup table. I’m not even going to bother doing anything interesting with it… just match up state name to state name and call it a day. And we are going to see if it has any impact on our lives when we use that lookup column in our pivot table.
Any guesses?
First, let’s just recognize that the results will be fine. And the default behavior of the pivot table will show what you expect. But when we include our cute [One] measure to force all evaluated rows to show up… indeed, we get different results than our previous pivot table.
We can see that it tried to evaluate Dallas, California… maybe the worst idea for a city… EVAR?
Does any of this matter?
Well, as per always… follow Tiny Lizard Rule #1 “Don’t Solve Problems You Don’t Have”. Some day I hope to have a Rule #2. Anyway, if you don’t have issues… don’t stress it… I really don’t think this scenario would be super common. But… I have stumbled into it a few times, so… it happens.
Why does it matter?
Here I paraphrase Master Collie: “Computers Math Like a Mother!” – which is to say, by the time the processor is just adding numbers up, it does that real real (real) fast. But evaluating filter context to find which numbers to add? That… that can take awhile. And here, we only had 50 cities across 30 cities, so 30*50=1500 evaluations of filter context (instead of our original 50!). But you can easily imagine this number become REAL large, REAL fast… Country*Region*City… by the time you have evaluated the great city of New Orleans from the Emilia-Romagna region of Canada, … you might be having some perf issues.
Conclusions and Such
I think just… be vaguely aware of this issue. It feels solvable with what my buddy Tyler calls 4D… “Don’t Do Dat Den”. I can imagine it might call for some careful data shaping… but in my experience hierarchies like this tend to come from the same table anyway.
You will (of course) get the same impact by putting other columns on the pivot table… Geography*YearMonth*SalesPerson is evaluating EVERY geography, yearmonth and salesperson combination… even if Joe only operated in Rome in 2011… because that is just not “known” until the evaluation happens.
That said, I do not feel I have completely wrapped my head around this issue. As always if somebody wants to crisp this up, hit me in the comments. I am sure somebody can relate it back to some SQL join madness to make it clear.
Certainly it doesn’t “feel” quite right to me… as I always think of filters “flowing” into the fact table from a lookup as being identical to applying the filter directly to the fact table… but clearly that is not quite true.
Now I could be wrong here, but it looks to me like you have a Cartesian Product of the State Lookup table and the original data table that has been caused (deliberately) by the measure [one].
Now given the filter engine operates before the formula engine, it would seem to me that in the absence of the rogue measure [one], that the filter engine will successfully exclude the need to iterate the invalid state/city combos because they are filtered out in the absence of a measure that forces them to be considered. So in your case these extra invalid combos are being executed because you are telling it to do it, but I am pretty confident that it is not doing this in the background when you don’t force it.
That’s my view anyway. :-). Happy to be enlightened further.
That’s not what is happening. Though… I don’t claim to know exactly what is 🙂
Notice in the upper right pivot table, there is the “1” measure, but there is no Dallas, Calif value… but in the lower left pivot table (once I moved the state to the lookup), there is. But have the “1” measure, but the lookup version is clearly acting different.
I see the same thing 2 ways:
1) set the pivot table to show rows if all values are blank
2) before I put ANY values in the pivot table… excel shows all row values. There again, it only shows the bogus Dallas, Calif… when dragging down from the lookup.
So, I am confident this is somehow… different. My belief that it can have a large impact on performance is a good 88%. I’m sure Marco will come straighten us out eventually 🙂
This article may explain what you have observed
http://www.sqlbi.com/articles/autoexist-and-normalization/
Good gravy! Those crazy Italians are always ON their game. I glanced at the top of the article, saw the word “MDX”, and totally dismissed it! Ooops! Thanks for sending me back.
CREATE TABLE #STATES_CITIES(STATE NVARCHAR(20), CITY NVARCHAR(20), POPULATION INT)
INSERT INTO #STATES_CITIES
VALUES (‘FLORIDA’,’MIAMI’,10000),
(‘FLORIDA’,’ORLANDO’,15000),
(‘CALIFORNIA’,’LOS ANGELES’,20000),
(‘ILLINOIS’,’CHICAGO’,30000),
(‘WASHINGTON’,’SEATTLE’,NULL);
CREATE TABLE #STATES(STATE NVARCHAR(20))
INSERT INTO #STATES
VALUES (‘ILLINOIS’),
(‘WASHINGTON’),
(‘CALIFORNIA’),
(‘NEW YORK’),
(‘FLORIDA’);
CREATE TABLE #NUMBER(ONE INT)
INSERT INTO #NUMBER
VALUES (1);
SELECT * FROM #STATES_CITIES
SELECT * FROM #STATES
SELECT * FROM #NUMBER
SELECT * FROM #STATES_CITIES SC INNER JOIN #STATES S
ON SC.STATE = S.STATE
WHERE SC.POPULATION IS NOT NULL
SELECT * FROM #STATES_CITIES SC INNER JOIN #STATES S
ON SC.STATE = S.STATE, #NUMBER N
WHERE SC.POPULATION IS NOT NULL OR N.ONE IS NOT NULL
DROP TABLE #STATES_CITIES
DROP TABLE #STATES
DROP TABLE #NUMBER
Here are the tsql versions of the two pivot tables:
SELECT * FROM #STATES_CITIES SC, #NUMBER N
WHERE SC.POPULATION IS NOT NULL OR N.ONE IS NOT NULL
SELECT
S.STATE
,STC.CITY
,CASE WHEN S.STATE = STC.STATE THEN STC.POPULATION END POPULATION
,N.ONE
FROM #STATES S,
(SELECT SC.STATE, SC.CITY, SC.POPULATION FROM #STATES_CITIES SC INNER JOIN #STATES S ON SC.STATE = S.STATE) STC,
#NUMBER N
ORDER BY S.STATE