Tiny Lizard

Power Pivot Consulting and Training

  • About
  • Blog
  • Power Pivot
  • Training
  • Contact

Power Pivot Hierarchical Data

November 17, 2014 By Scott Senkeresty 6 Comments

Edit:  Before or after reading my version, go read what Marco and Alberto have to say on the subject.  I find it confusing… so two views are probably better than one: http://www.sqlbi.com/articles/autoexist-and-normalization/

Hierarchical.  Just saying that makes me feel smart; though, I’m not super sure I spelled it right.

Here, I am not talking about the feature that let’s you group a set of columns such that you can drop them all into a pivot table as one set.  So far as I can tell, that is just some UI candy.

I am talking about any situation where your data actually is in a hierarchy.  eg: Location (Country –> State –> City) or Products (Category –> SubCategory), etc.

Frequently, I start one of the blog entries… realize I don’t know what the hell I am talking about, then go figure it out, and pretend I always knew it.  This time is different, in that I am skipping everything but the first step…

Simple Sample

So, I searched around for city populations in the States and found this one: http://www.infoplease.com/ipa/A0763098.html and simply dropped that URL into Power imageQuery to pull in the data… which is still fun and neato, if you ask me.   It’s the populations for the 50 largest cities in the U.S.

I wrote a simple measure because I couldn’t help myself:    Total Pop := SUM(Population[Estimate])

Drag State and City onto rows, and drop Total Pop onto values and we get the simple pivot table, which looks exactly like we expected.image  And might I just say that NY, NY is REALLY populated Smile

Now, here is my question.  We know that Excel will magically hide rows that result in all BLANK() values… so, did it try to evaluate Portland, Texas? (And while some may think I just mean “Austin”… no silly, I mean… is it trying to evaluate every city with every state!?)

Under Pivot Table Options, you can go flip the checkbox that hides empty rows.  Or, you can do something goofy, like I did… and create this AWESOME MEASURE:   [One] := 1   Which ya know… returns a 1.  Always.  So, if there was an evaluation AT ALL, it will show up.  And as you can see at right… we are only seeing evaluations that “make sense”.   Hurray!  Efficient!

Hierarchies and Lookup Tables

imageOkay, minor change.  We are going to create a States lookup table.  I’m not even going to bother doing anything interesting with it… just match up state name to state name and call it a day.  And we are going to see if it has any impact on our lives when we use that lookup column in our pivot table.

Any guesses?

First, let’s just recognize that the results will be fine.  And the default behavior of the pivot table will show what you expect.  But when we include our cute [One] measure to force all evaluated rows to show up… indeed, we get different results than our previous pivot table.

We can see that it tried to evaluate Dallas, California… maybe the worst idea for a city… EVAR?

image

Does any of this matter?

Well, as per always… follow Tiny Lizard Rule #1 “Don’t Solve Problems You Don’t Have”.  Some day I hope to have a Rule #2.  Anyway, if you don’t have issues… don’t stress it… I really don’t think this scenario would be super common.  But… I have stumbled into it a few times, so… it happens.

Why does it matter?

Here I paraphrase Master Collie: “Computers Math Like a Mother!” – which is to say, by the time the processor is just adding numbers up, it does that real real (real) fast.  But evaluating filter context to find which numbers to add?  That… that can take awhile.  And here, we only had 50 cities across 30 cities, so 30*50=1500 evaluations of filter context (instead of our original 50!).  But you can easily imagine this number become REAL large, REAL fast… Country*Region*City… by the time you have evaluated the great city of New Orleans from the Emilia-Romagna region of Canada, … you might be having some perf issues.

Conclusions and Such

I think just… be vaguely aware of this issue.  It feels solvable with what my buddy Tyler calls 4D… “Don’t Do Dat Den”.  I can imagine it might call for some careful data shaping… but in my experience hierarchies like this tend to come from the same table anyway.

You will (of course) get the same impact by putting other columns on the pivot table… Geography*YearMonth*SalesPerson is evaluating EVERY geography, yearmonth and salesperson combination… even if Joe only operated in Rome in 2011… because that is just not “known” until the evaluation happens.

That said, I do not feel I have completely wrapped my head around this issue.  As always if somebody wants to crisp this up, hit me in the comments.  I am sure somebody can relate it back to some SQL join madness to make it clear.

Certainly it doesn’t “feel” quite right to me… as I always think of filters “flowing” into the fact table from a lookup as being identical to applying the filter directly to the fact table… but clearly that is not quite true.

WordPress Migration
Converting Row Context into Filter Context

Filed Under: Power Pivot

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 278 other subscribers

Comments

  1. Matt Allington says

    November 18, 2014 at 12:58 am

    Now I could be wrong here, but it looks to me like you have a Cartesian Product of the State Lookup table and the original data table that has been caused (deliberately) by the measure [one].

    Now given the filter engine operates before the formula engine, it would seem to me that in the absence of the rogue measure [one], that the filter engine will successfully exclude the need to iterate the invalid state/city combos because they are filtered out in the absence of a measure that forces them to be considered. So in your case these extra invalid combos are being executed because you are telling it to do it, but I am pretty confident that it is not doing this in the background when you don’t force it.

    That’s my view anyway. :-). Happy to be enlightened further.

    Reply
  2. scottsen says

    November 18, 2014 at 9:53 am

    That’s not what is happening. Though… I don’t claim to know exactly what is 🙂

    Notice in the upper right pivot table, there is the “1” measure, but there is no Dallas, Calif value… but in the lower left pivot table (once I moved the state to the lookup), there is. But have the “1” measure, but the lookup version is clearly acting different.

    I see the same thing 2 ways:
    1) set the pivot table to show rows if all values are blank
    2) before I put ANY values in the pivot table… excel shows all row values. There again, it only shows the bogus Dallas, Calif… when dragging down from the lookup.

    So, I am confident this is somehow… different. My belief that it can have a large impact on performance is a good 88%. I’m sure Marco will come straighten us out eventually 🙂

    Reply
  3. Bobby says

    November 19, 2014 at 2:58 pm

    This article may explain what you have observed
    http://www.sqlbi.com/articles/autoexist-and-normalization/

    Reply
    • scottsen says

      November 19, 2014 at 3:34 pm

      Good gravy! Those crazy Italians are always ON their game. I glanced at the top of the article, saw the word “MDX”, and totally dismissed it! Ooops! Thanks for sending me back.

      Reply
  4. David Canales says

    April 14, 2015 at 4:26 am

    CREATE TABLE #STATES_CITIES(STATE NVARCHAR(20), CITY NVARCHAR(20), POPULATION INT)
    INSERT INTO #STATES_CITIES
    VALUES (‘FLORIDA’,’MIAMI’,10000),
    (‘FLORIDA’,’ORLANDO’,15000),
    (‘CALIFORNIA’,’LOS ANGELES’,20000),
    (‘ILLINOIS’,’CHICAGO’,30000),
    (‘WASHINGTON’,’SEATTLE’,NULL);

    CREATE TABLE #STATES(STATE NVARCHAR(20))
    INSERT INTO #STATES
    VALUES (‘ILLINOIS’),
    (‘WASHINGTON’),
    (‘CALIFORNIA’),
    (‘NEW YORK’),
    (‘FLORIDA’);

    CREATE TABLE #NUMBER(ONE INT)
    INSERT INTO #NUMBER
    VALUES (1);

    SELECT * FROM #STATES_CITIES
    SELECT * FROM #STATES
    SELECT * FROM #NUMBER

    SELECT * FROM #STATES_CITIES SC INNER JOIN #STATES S
    ON SC.STATE = S.STATE
    WHERE SC.POPULATION IS NOT NULL

    SELECT * FROM #STATES_CITIES SC INNER JOIN #STATES S
    ON SC.STATE = S.STATE, #NUMBER N
    WHERE SC.POPULATION IS NOT NULL OR N.ONE IS NOT NULL

    DROP TABLE #STATES_CITIES
    DROP TABLE #STATES
    DROP TABLE #NUMBER

    Reply
  5. David Canales says

    April 14, 2015 at 6:53 am

    Here are the tsql versions of the two pivot tables:

    SELECT * FROM #STATES_CITIES SC, #NUMBER N
    WHERE SC.POPULATION IS NOT NULL OR N.ONE IS NOT NULL

    SELECT
    S.STATE
    ,STC.CITY
    ,CASE WHEN S.STATE = STC.STATE THEN STC.POPULATION END POPULATION
    ,N.ONE
    FROM #STATES S,
    (SELECT SC.STATE, SC.CITY, SC.POPULATION FROM #STATES_CITIES SC INNER JOIN #STATES S ON SC.STATE = S.STATE) STC,
    #NUMBER N
    ORDER BY S.STATE

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Blog Posts

  • The streak is alive! August 10, 2017
  • DAX KEEPFILTERS Plus Bonus Complaining! July 20, 2017
  • VAR: The best thing to happen to DAX since CALCULATE() July 11, 2017
  • Review: Analyzing Data with Power BI and Power Pivot for Excel June 20, 2017
  • Power BI Date Table June 2, 2017

Follow on Twitter

My Tweets

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 278 other subscribers

Copyright © 2023 Tiny Lizard