I believe this is the first time I have written most of a blog post, then nuked it to start over. It was going “okay” I guess… attempting to show some patterns around (the fancy phrase) “semi-additive measures”. I was feeling pretty smart, but… I was feeling more and more lost as I wrote and meandered. I realized that I should probably focus more on the LASTDATE() function, which brings our boring story to present. This is also the first post with no images. So, at least we got that going for us.
What does LASTDATE Do?
Dude. Come on. It returns the last date from a column of date values — what do you think it does? 🙂 In this way the LASTDATE() function is incredibly similar to the MAX() function. So much so that one kinda might wonder why it exists, which is why I’ve pulled out this quote from MSDN docs:
Technically, the return value is a table that contains a single column and single value. Therefore, this function can be used as an argument to any function that requires a table in its arguments. Also, the returned value can be used whenever a date value is required.
So, that last part – “can be used whenever a date value is required” – that part is exactly like MAX(). MAX() returns a scalar value… and so can LASTDATE(). But unlike MAX()… LASTDATE() is actually returning a table, and this can be handy.
Here’s where that term “semi-additive measure” comes in. It sounds fancy, but it’s really just a way to work around scenarios in which it doesn’t make sense to take the numbers in aggregate. Here’s a typical example of a semi-additive measure:
Say you have inventory data. If I have 5 balls and 7 dolls at my toy store, I can obviously add those together and say I have 12 in aggregate. But if instead I had 10 toys in inventory yesterday and today I have 11… I can’t add those together to say I have 21. I need a semi-additive measure to show total inventory amount.
So, we can’t do:
Total Inventory := SUM(Inventory[Count]) As that would measure across time, which is bad (21 toys instead of 11). Okay, so… we only want to grab the inventory for the last date. How about:
Total Inventory := CALCULATE(SUM(Inventory[Count]), Calendar[Date] = MAX(Calendar[Date]))
Well, that doesn’t work. Some hard to read error that means “dude, that expression you passed to CALCULATE is too complex” pops up. Right, I knew that:
Total Inventory := CALCULATE(SUM(Inventory[Count]), FILTER(Calendar, Calendar[Date] = MAX(Calendar[Date])))
Well, that might kinda work, except… what if I didn’t have an inventory measurement on the exact date returned by the MAX(Calendar[Date])!? I think we better key off the Inventory table instead:
Total Inventory := CALCULATE(SUM(Inventory[Count]), FILTER(Inventory, Inventory[Date] = MAX(Inventory[Date])))
Okay, I was way way way (etc) too lazy to actually test this measure out, but I feel pretty good about it.
Let’s take one moment to understand that the FILTER() function returns a table. Total Inventory is the table of just the rows in the Inventory table that matched the expression in the 2nd parameter of the FILTER function (Inventory[Date] = MAX(Inventory[Date]). I’m not great at figuring out if that understanding is “useful” to most readers; I think it’s almost nicer just to read it like it says: “Oh, I am going to filter the Inventory table based on some condition”. But as a practical matter, and to call things like they are, FILTER() does indeed return a table.
Now, scroll back up to that quote from MSDN and read that first sentence. “Something something return value is a table something something”. So, LASTDATE() returns a table just like FILTER() returns a table… um, can we then simplify the above measure, maybe?
Total Inventory := CALCULATE(SUM(Inventory[Count]), LASTDATE(Inventory[Date]))
Personally, I like it. I think it “reads well” (calculates the sum of the inventory count, on the last date in the current context). It’s slightly weird at the start, since you are used to only passing simple expressions to CALCULATE like Calendar[Year] = 2006 or using ALL/FILTER. But this is certainly easier to read, less typing, and maybe it’s faster (like all performance related questions – you are never gonna know until you try it with YOUR data).
Events in Progress
Now let’s think about LASTDATE() vs. MAX() from a performance perspective. Say you are a real estate investor. You want a fancy chart that shows a trend of the number of properties you have leased out, across time. But what does it mean when you say “How many properties were leased in April?”. One reasonable way to look at this might be “we will look at the leases, based on the last day of the month”. Using the LASTDATE() function, the DAX may look something like this:
Properties[LeaseStartDate] <= VALUES(Calendar[Date]) ,
Properties[LeaseEndDate] >= VALUES(Calendar[Date])
If we were to write the same thing using MAX, it would look something like this:
Leased Properties := CALCULATE(
Properties[LeaseStartDate] <= MAX(Calendar[Date]) &&
Properties[LeaseEndDate] >= MAX(Calendar[Date])
We call FILTER() ‘cuz we want to filter the properties… and because we are using MAX() we cross that magic threshold into “too complicated an expression, must use FILTER”. And FILTER doesn’t take multiple arguments the way CALCULATE does, so we use the && to combine our conditions.
Now, my claim is that the first version is faster. I mean… at least it looks faster, which sounds pretty important to me 😉 That first version might look slightly intimidating because it has *2* calls to calculate, nested inside each other. The outer CALCULATE constrains us down to just one date (the last one in the current context). Once we have that… we no longer need to use MAX in the inner CALCULATE call, because we know there is only one value. That means we can use the VALUES function to return the scalar value… all while not crossing the magic threshold into “too complicated an expression, must use FILTER”.
I don’t know this is true, but I’m sticking w/ this theory anyway – that if you can use the simple version of CALCULATE… it will go “darn fast.” By using this nested calculate and LASTDATE madness, we put ourselves in a situation where we could use the simple and fast version.
Some light testing between these two measures shows the LASTDATE version was 5x faster at summarizing results across all months in my calendar. 750ms instead of 3.5s.
There are… complications. Complicated complications. These complications are typically mostly ignorable, IMO… as long as you stick w/ my theory of “if the results don’t look like what you expect, try adding a CALCULATE or VALUES” function. But, if you want to dig in… go check w/ the Italians.