When I give a Power Pivot training, this is one of the harder concepts to explain. It’s not something easy for a complete Power Pivot beginner to understand… but there is a general pattern you can use anyway, without completely understanding it. Good thing, because it a SUPER useful function. By the end of this post, you will better understand Row Context, the EARLIER() function… and have the absolute worst stock analysis ever.
Before we really dig into this bad boy, the first thing we have to understand is that EARLIER() is all about Row Context. (And it’s not All About That Bass, which sadly, just jumped into my head). We talk about Row Context in this previous post, but that post is generally a bit advanced and scary
By far the most common place to have a row context… is when writing a calculated column (as opposed, to a calculated field, aka a measure, the differences are discussed in this post).
Let’s find some data… in my favorite way possible. Using Power Query to search the interwebs! Ummm… ok, uh… let’s go with “Microsoft”. Okay, I got something here: good old msft stock data.
At left we see a snip of my table once I pull it into Power Pivot. We can’t decide which column of pricing data is more important to us… so, we decide we will just average together the high and low, and pretend that is useful. So we add a new calculated column.
Avg = (MSFT[High]+MSFT[Low])/2
The important thing to realize is that the values for this calculated column are a) only updated when you refresh your data (no amount of slicing and filtering will impact that value) and b) is evaluated one row at a time. Which, makes intuitive sense… you would probably create that formula without really even thinking about that. But it is very different from writing a measure / calculated field… when you are evaluating a whole bunch of rows. This evaluation of one row at a time means we have “a row context”. Meaning, simply, that when the evaluation is taking place the engine knows exactly which row it is evaluating. I am taking the [High] of the row in the current context, and adding it with [Low] of the row… in the current context.
Let’s write another table that is a bit more useful… a way to divide the volume into buckets.
=if ([Volume] < 45000000, “Low”, IF ([Volume] > 70000000, “High”, “Medium”))
This is a common and incredibly useful technique, so remember it We walked our rows and, for each row, checked the [Volume] column, and if it was more then 70 million rows, we consider that a High volume day. What is great about this is we can now go drop the Low/Med/High values onto rows of a pivot table or chart and evaluate if those buckets tend to behave differently.
Okay, with that background on row context and calculated columns behind us… let’s jump into the harder part…
We are going to kind of … guess our way to the need for this function. Let’s say that we decide we think these volume buckets are the key to our investing genius and a fat fat retirement fund. We decide that we want to know the maximum [Avg] price… per bucket. Ignore that you would probably do that in a Measure First, realize that if you throw down a MAX() into a calculated column… it will evaluate over the entire table. The MAX() function doesn’t care if you have a row context… it is expecting a filter context… and doesn’t find one and so, parties on the whole table.
The other weirdness is that if you call CALCULATE() in a calculated column it will create a filter context, but it will be set to just the current row. So, =CALCULATE(MAX(MSFT[Avg])) will…. just return the [Avg] of the current row. (And remember, without the calculate, it will take the max of the entire table).
But, we want neither of those behaviors. We want the max of everything “in our bucket”. Right, I am walking row by row (eg: we have a row context) and saying “what is the max value… constrained to just the buck I am in?”
=CALCULATE(MAX(MSFT[Avg]), FILTER(ALL(MSFT), MSFT[Volume Bucket] = ???))
what goes in place of the ??? part, though?
I mean, based on the title of the article you can probably guess, but let’s look a bit deeper than that
Realize that FILTER() also creates a row context. Just like evaluating a calculated column creates a row context. So does SUMX, and some other fun functions. It makes sense, too. FILTER() is saying “iterate over things rows…and include them or not, depending on this condition…” Well, to iterate over rows and make the filtering decision, you need a row context.
So, you are walking 1 row at a time from the calculated column madness… then for each of those rows you are walking the table again 1 row at a time to evaluate the FILTER() clause. The blue column is representing the evaluation of calculated column rows. Having already evaluated Rows 1-3, we are currently evaluating Row 4. So, we start evaluating Row 4, and see that the calculation calls FILTER(ALL(MSFT)…) which means, we have to walk every row. Again. That is what the weird yellowy-orange color represents. For Blue 4, I am evaluating Yellow1, Yellow 2, etc… filtering rows in or out before I finally apply my MAX() for Blue 4.
For programming nerds, this is totally like nested for loops…. for i = 1 to 10, for j = 1 to 10.
Okay, so, we have two row contexts. What does that have to do with the ??? stuff?
=CALCULATE(MAX(MSFT[Avg]), FILTER(ALL(MSFT), MSFT[Volume Bucket] = EARLIER(MSFT[Volume Bucket])))
The EARLIER function is used to say “It is lovely that I have a yellowy-orange row context, but I want to use the blue one! The previous row context. The earlier one!”
Pretty much every time you are writing a calculated column and want to say “I want do some maths on values of this table… that are kinda like me…” you will have a calculated column formula that looks very much like the one above. It is a general pattern worth remembering.
That is not to say you can’t make it more complex. For example, say you want to calculate the maximum value of the stock in the time window from 6 days ago until today. You plan to use this in a calculation of “is todays value a new high in the last 60 days?” (which I have to admit, almost sounds useful!).
MSFT[Date] <= EARLIER(MSFT[Date]) && MSFT[Date] > EARLIER(MSFT[Date]) - 60
This still follows the same pattern, but we use EARLIER() twice so that we can say “is the date I am looking at, while evaluating rows from FILTER… today or before and greater than 60 days ago? If so, include that rows and take the average of all those rows. Kinda Cool!
Wrap It Up!
As crazy as this sounds… the EARLIER function is sometimes used… not in a calculated column. Just… not very often. All that is required is this “nested row context” … say a SUMX (which evaluates some expression against every row in a table, then adds the together) is passed the result of a FILTER() instead of just a table. Inside that FILTER() you could use EARLIER(). 1 row context from SUMX, and a nested row context from the FILTER() operator. It’s just… I have only actually needed to do this a small handful of times. Maybe I am just a simple fella
Okay, so there you have our 3rd post on power pivot fundamentals. As always, feel free to hit me up with questions in the comments, or even recommendation for future posts!
- The streak is alive! – August 10, 2017
- DAX KEEPFILTERS Plus Bonus Complaining! – July 20, 2017
- VAR: The best thing to happen to DAX since CALCULATE() – July 11, 2017
- Review: Analyzing Data with Power BI and Power Pivot for Excel – June 20, 2017
- Power BI Date Table – June 2, 2017
Scott Senkeresty here… Founder and Owner of Tiny Lizard. Yes, I have done some stuff: A Master’s Degree in Computer Science from California Polytechnic State University in beautiful San Luis Obispo, California. Over twelve years developing software for Microsoft, across Office, Windows and Anti-Malware teams… Read More >>