Thanks to SimonNU at the Mr Excel Power BI forums for this question:
Would a measure process slower if there are many embedded measures inside of it? A basic example for illustrative purposes: which would be a quicker measure for [Bikes]: 1. Generic Amount:=SUM([Col a]) Bikes:= CALCULATE([Generic Amount],[Col b] = "Bikes") 2. Bikes:= CALCULATE(SUM([Col a]),[Col b] = "Bikes")
I admit I have had the same question, but so far properly suppressed it, but… fine, since you asked, let’s investigate!
Our Test Measures
CombinedTotal:=CALCULATE(SUM(Table1[Blue]) + SUM(Table1[Red]), Table1[Green] < 50)
BlueTotal:=SUM(Table1[Blue])
RedTotal:=SUM(Table1[Red])
CombinedTotal:=CALCULATE([BlueTotal] + [RedTotal], Table1[Green] < 50)
I have a goofy macro that spits out random numbers into a table for Red, Green and Blue values, so that is what we are using. Close enough to Simon’s question for me. “Gimme the sum of all the red values and red values… when green is under 50.”
Theoretically, we expect not just the same answer, but the same performance. Or at least… I do. But how are we going to validate this? Well… we are going to get more crazy that usual. Way more.
Enabling Power Pivot Tracing
If you hit the Power Pivot ribbon on your Excel window, you will see a Settings, but… and if you click that sucker you will be able to enable tracing. If you are a SQL-type-o-person, this is just like a trace file created by your normal old SQL Profiler. In fact… while I haven’t exactly researched it… I don’t know of any way to open and view the .trc file than to use the SQL Profiler tool.
You kind of have to be a sick individual to want to stare at these files, but you can find some really useful tidbits. I’m feeling a bit too lazy tonight to go look it up, but The Italians have great videos on performance where they use the traces to show when something is being evaluated in the storage engine (stupid crazy fast) versus the expression engine (mostly kinda fast). It’s also great for showing you the actual number of milliseconds to execute a query if you are tweaking for every drop of performance.
In our case, we are hoping to find out how our 2 measures get translated into actual queries down in the bowels of the engine… and if we see they are the same, we will feel more or less good about nesting measures.
What Do We Find?
Well, I don’t know that we found exactly what I expected, which is cool
SELECT
[Table1-d0b70f3c-b04e-4f50-8e49-017d65875f22].[Green]
FROM [Table1-d0b70f3c-b04e-4f50-8e49-017d65875f22]
WHERE
(COALESCE(PFCAST( [Table1-d0b70f3c-b04e-4f50-8e49-017d65875f22].[Green] AS INT )) < COALESCE(50));
SELECT
SUM([Table1-d0b70f3c-b04e-4f50-8e49-017d65875f22].[Blue])
FROM [Table1-d0b70f3c-b04e-4f50-8e49-017d65875f22]
WHERE
[Table1-d0b70f3c-b04e-4f50-8e49-017d65875f22].[Green] IN (29, 37, 0, 34, 35, 44);
So… I dunno. What to say… ummm, first, I thought it was pretty cool just to “see” the DAX get translated into SQL queries that most of us are more familiar with. It’s just neat to me. I also didn’t really expect to see this sorta… multi-phased approach where the engine grabs all the values for the filter (Green < 50) first… then sends those actual values into the next query. Then again, we know the storage engine is a lean mean data retrieving machine… probably doesn’t do JOINs natively.
Obviously, there is another SUM over the red column that looks just like the Blue, and those results get added together somewhere for the final answer.
I am not going to paste the trace results from the other set of measures… because, that would look REAL redundant. They are completely 100% identical.
Is That Always True?
I don’t know. I do have a few thoughts on the matter, but as so often comes up here… first, it’s time to repeat Tiny Lizard Rule #1 (and yes, #2 is still super elusive) : Don’t Solve Problems You Don’t Have.
This type of nesting of measures it incredibly powerful. It is easy to get lost in the details and excited about tweaking every bit of performance and completely lose the forest for the trees. In this case, the forest is that you should always try to re-use measures like this.
Hey, less typing! But more importantly, I want to define what [Total Sales] means in exactly one place. I will then re-use that in my [Total Sales – YTD], [Total Sales – PY] and probably re-use one of those in a [Total Sales – PY – YTD]. Then, when I realize I need to change the way I handle “Buy One, Get One” information in my [Total Sales], all the other measures magically get the same benefit. Who doesn’t love magic?
And we have certainly seen in this case there was no negative impact on performance… the query plan was identical. That said, I have no idea how query plans are formulated in xVelocity. In SQL land there are probably whole books about how cardinality estimates and statistics are used to estimate the best order of operations and type of hash algorithm to use for the joins, etc. Here… I am sure it is much simpler, but I don’t know what is published about this algorithm – certainly not a lot. Could a nested measure cause a different plan to be picked? maybe. But again, unless you see a problem… just don’t worry about it.
Now, that said… sometimes… ignoring performance…
There can be problems
Not with performance. I have actually never seen or noticed that. I mean, like I said, maybe. But. There are at least two things that make me vaguely nervous about nesting measures.
- When you have a row context, a measure performs an implicit calculate… which can completely change the answer.
- I can imagine a measure that references another measure… and both modify the filter context… and the interaction could be confusing.
For #1, maybe just go re-read this guy: http://tinylizard.com/converting-row-context-into-filter-context/
For #2… suppose I have something like:
[All Sales] := CALCULATE(SUM(Sales[Price]), ALL(Sales))
[WA Sales] := CALCULATE([All Sales], Sales[State] = “WA”)
So, my [WA Sales] is trying to make the filter for Sales[State] equal to WA at the “same time” that the [All Sales] measure is trying to just clear the filter completely on all columns of the Sales table. Who wins? I don’t know. Nobody does. Well, fine. Somebody does… probably like 3 people. Maybe 4. It’s probably not me or you.
What I do is just… not worry about it.
Write my measure, be amazed it worked as expected, and move on. That is what happens 97 times out of 99. Yes, every once in awhile something freaky will happen, and I will grumble and re-write the measure to deal with one of the above problems, but mostly… I just don’t worry about it. It’s amazing how well that serves me.
- The streak is alive! – August 10, 2017
- DAX KEEPFILTERS Plus Bonus Complaining! – July 20, 2017
- VAR: The best thing to happen to DAX since CALCULATE() – July 11, 2017
- Review: Analyzing Data with Power BI and Power Pivot for Excel – June 20, 2017
- Power BI Date Table – June 2, 2017
Scott Senkeresty here… Founder and Owner of Tiny Lizard. Yes, I have done some stuff: A Master’s Degree in Computer Science from California Polytechnic State University in beautiful San Luis Obispo, California. Over twelve years developing software for Microsoft, across Office, Windows and Anti-Malware teams… Read More >>
Mark says
I have seen, and have the traces to show, nesting measures to have MASSIVELY deleterious performance issues.
Factor of 10 to 20 in fact. It only happens with a particular kind of point in time snapshot where the query plan is pretty horrendous to start off with, but it happens