Two methods to calculate Person-Hours - Power Pivot

The point of this post is not really calculating Person-Hours. I mean, we will show some measures on that, and if you got here by searching for that, and I helped you… ROCK ON! I’m happy to help.

But… I kind of thing there is value in just… talking about any old measure. Getting some reps in. Getting’ your learn on! Though, in this case, it will have a performance slant to the discussion, which is becoming a theme on this blog.

“People need to be reminded more often than they need to be instructed.”
– Samuel Johnson

A Simple Person-Hours Model:

Okay, models don’t get a lot simpler than this and still have the luxury of being a called a model. We have two tables, and they aren’t even related.

Why? Well, I suppose we could try to link either StartDate or EndDate on Calendar[Date], but typically I don’t really want to filter out Volunteers that started or ended on a particular day.

Rather, we are interested in saying “for any given period of time, how many people were working?” 3 people working 1 day is 3 Person-Days. 1 person working 3 days… still 3 person days. It’s kind of like a Man-Hour (or Man-Month)… but less… well… sexist. Smile

First Measure: via Calendar

I think this measure feels a bit more intuitive/natural and is probably what I would naively write first.

=SUMX (
    Calendar,
CALCULATE (
        COUNTROWS ( Volunteers ),
FILTER (
            Volunteers,
Calendar[Date] >= Volunteers[StartDate] &&
Calendar[Date] <= Volunteers[EndDate]
     )
)
)

Hopefully that doesn’t look too intimidating. This just says, “walk every day in the calendar, and for each day, count how many rows are active in the Volunteers table, assuming we filter to just those Volunteers that are active on that day”.

I have to admit, when this problem first came to me, the COUNTROWS was a DISTINCTCOUNT on the Name (errrr, well, probably Id) column. For the life of me, I can’t remember why – I suspect it was bad data (one person was in two places at once?), and it easier to use a DISTINCTCOUNT than to clean up the data. Anyway, THAT version was brutal on performance!

Anyway, if we are just straight mathin’ here … we are talking about { Rows in Calendar in Current Context } * { Rows in Volunteer } total iterations. (Regardless of COUNTROWS vs DISTINCTCOUNT). If you think about the grand total cell (where you are going to look at the most rows), the outer loop, you are getting no interesting optimization/filtering there, you gotta walk every day in the table… I really hope you trimmed your calendar table.

The inner loop is the FILTER( ) over the Volunteers table. While I could imagine some optimization there (quickly filtering out volunteers based on dates), I am skeptical. I mean, you better hope you don’t have a ton of unique start/end dates, but it doesn’t help that I am filtering by two columns. Generally a no-no for perf.

For the DISTINCTCOUNT version, each row in the Calendar table has to build out some memory store those distinct values… and keep track of that while walking each row in the Volunteers table. Generally, a DISTINCTCOUNT is not something to take lightly.

Second Measure: via Volunteers

Before we dig into another way to look at this measure, let me let you in on a Tiny Lizard secret. Don’t solve problems you don’t have. One of the nice things about Power Pivot is that it is very forgiving. It is so stinkin’ fast that you can happily write measures without really thinking about performance. Most problems boil down to the these performance gotchas. However, sometimes you have to put on your big boy pants… and this was one of those times.

VolunteerDays2 :=
SUMX (
FILTER (
Volunteers,
Volunteers[StartDate] <= MAX ( Calendar[Date] )
&& Volunteers[EndDate] >= MIN ( Calendar[Date] )
),
INT ( 1
+ IF (
MAX ( Calendar[Date] ) < Volunteers[EndDate],
MAX ( Calendar[Date] ),
Volunteers[EndDate]
)
– IF (
MIN ( Calendar[Date] ) > Volunteers[StartDate],
MIN ( Calendar[Date] ),
Volunteers[StartDate]
)
)
)

Oh, Simmer Down. It’s not that bad. What you would LIKE to write is a simple…

=SUMX(Volunteers, 1 + Volunteers[EndDate] – Volunteers[StartDate])

and if you only care about the final grand total GO NUTS! Smile “For each volunteer, how many days did they work?”.

But likely you are going to want to drop some Calendar rows into your pivot and see this value change over time… so you need to do a bit more work (it’s not like we have any relationships to help us).

The first param to the SUMX uses a FILTER() that is more or less an optimization. “Only look at volunteers that were active at some point in this date window”. It’s not totally “just an optimization” because if you left it out… you will end up SUM’ing values for Volunteers totally outside this Calendar window, and that’s bad. Trust me, I know first hand.

Then the ugly IF() statements are to protect against a calendar window ending in the middle of a volunteer being active. (and similar for starting). So, if the cell in our pivot table is only looking at April, and this Volunteer worked from Apr 15 to May 15… we don’t want to count any days in May for this cell. That is all those IF()’s are for.

While this measure obviously looks more complicated, it is considerably faster. Where before we had roughly { Calendar Rows In Context } * { Volunteer Rows } iterations… here we have something smaller… { Volunteer Rows That are Active In Context }. There is some extra processing to figure out which are “Active”, but you don’t get that nasty multiplicative effect… since we are never walking all the rows in the Calendar.

Wrap Up

Could this be faster? I would be SHOCKED if it couldn’t be optimized further. The Ninja’s would be pulling out CALCULATETABLE instead of FILTER, and throwing around SUMMARIZE and ADDCOLUMNS and other fancy tricks. But that wasn’t really my point.

Other than just raw value of reading somebody else’s measures as a stepping stone to writing your own, your main take away is that in the rare cases that you do have a performance issues, and you have already walked “the gotchas” list… your next step is to think about the problem in a completely different way.

In our example, instead of thinking “for each day … how many people are working?” … we just flipped that around to “for each volunteer, how many days did they work?”… and we were able to get the performance into the “good enough” range without much effort.

Which is the way Tiny Lizard likes it.

About Scott Senkeresty

Scott Senkeresty here… Founder and Owner of Tiny Lizard. Yes, I have done some stuff: A Master’s Degree in Computer Science from California Polytechnic State University in beautiful San Luis Obispo, California. Over twelve years developing software for Microsoft, across Office, Windows and Anti-Malware teams… Read More >>

The streak is alive! – August 10, 2017
DAX KEEPFILTERS Plus Bonus Complaining! – July 20, 2017
VAR: The best thing to happen to DAX since CALCULATE() – July 11, 2017
Review: Analyzing Data with Power BI and Power Pivot for Excel – June 20, 2017
Power BI Date Table – June 2, 2017

View All Posts

Comments

Chris Webb says

October 6, 2014 at 7:02 am

This looks very much like the ‘events-in-progress’ problem I’ve blogged about several times:
http://cwebbbi.wordpress.com/2010/01/07/solving-the-%E2%80%98events-in-progress%E2%80%99-problem-in-dax/
http://cwebbbi.wordpress.com/2011/11/10/solving-the-events-in-progress-problem-in-dax-v2-0/
http://cwebbbi.wordpress.com/2013/06/13/a-new-events-in-progress-dax-pattern/

The second post shows my favourite way of solving the problem, but the third post (and Alberto’s work on optimising it even further) is where things start to get fun 🙂

scottsen says

October 6, 2014 at 9:04 am

Thanks Chris! I remember seeing that madness, but couldn’t remember what terms to search for 🙂