Rumors of my death were greatly exaggerated. Let’s get back to work.
Okay, we want a map of “The States” where each state is colored by some metric. In our case, it is the Gross Margin we made off sales for that state. The style of map is called a Choropleth (apparently – I had no idea ). This is pretty easy in Power BI!
- Insert a “Filled Map” visual
- Click on your “state” column.
- Click on your GM$ measure.
- There is no step 4. Unless you want to tweak the colors a bit.
I got something that looks like the map on the right, which is pretty good for 3 or 4 clicks! We can see that California and Colorado are making us the most margin, and that other states are typically gray. or what. or Albert Canada. Wait, what?
I gotta be honest – for me this is a GOOD example of the built in maps. It sorta goes down from there, in my experience. Trying to convince Bing to correctly map your geography onto a map is pretty frustrating and I don’t find the UI or docs terribly helpful.
Get Your R On!
Power BI recently added the ability to have R create visuals for you. Minor caveat is that it isn’t working in powerbi.com yet, but… ya. I have seen it working there already – so coming very soon. Now, I just read R For Dummies over the weekend, so I am like… the opposite of an expert. But… I was able to muddle my way through.
There are probably 23,589,258 ways to do this in R… at least. I went with the standard map function out of the maps library. There are probably easier ways, and definitely fancier ways, but I didn’t want to get lost in the bajillion R packages.
Using literally nothing but:
map("state") You get what you get the map at right. Which looks pretty nice, but seems to be missing a little something Here is the full script I ended up using:
library(maps) # rename columns for fun, and find the full state name from the abbreviated state name names(dataset) <- c('stateAbbr', 'value') dataset$state = tolower(state.name[match(dataset$stateAbbr, state.abb)]) # we coud use max(dataset$value) but california is super high. Hard code to $300m. cap <- 300000000 # setup the color of the state to be the % of the state, relative to cap. dataset$percent <- dataset$value / cap dataset$percent <- apply(as.data.frame(dataset$percent), 1, FUN=min, 1) dataset$color <- rgb(0, dataset$percent, 0) # figure out all the crazy "regions" (which vaguely map to states) in the US m <- map('state', plot = FALSE) full.region.list <- unlist(lapply(m$names, function(e) strsplit(e, ":")[])) map.full.region.list <- data.frame(state = full.region.list) # join the region list used by the map, with our data. all <- merge(dataset, map.full.region.list, all.y = TRUE, by="state") # rows that had no data should be black all$color[is.na(all$color)] <- rgb(0,0,0) # uncomment to help debug # write.csv(all, file = "c:\\r.txt") # finally, draw the map map('state', fill = TRUE, col = all$color)
And then you get this green monster, which could probably also use some work (say, labels for the states), but if nothing else, it does look better than the Filled Map from built in Power BI. And that is with only having a few days experience with R!
I made an attempt to provide some comments in my script, but a few minor points here:
- There is a col parameter to the map function. And intuitively you can understand that you need to pass a vector of colors. And you are probably thinking, there are 48 or 50 or maybe 52 states (DC and Puerto Rico maybe?). But the actual number it wants is something like 65. And this took like a full day to figure out. Assigning the result of the map( ) function to a variable… is weird to me… and the key to success. It holds a field with the names of all the regions in the map. The docs were no help here, at least not for me.
- The “extra” regions are because they like to have extra regions like “Washington : Stupid Islands” or something. I needed extra code to strip off the extra goo.
- Debugging R inside Power BI is not fun. The idea to write data out to file, then open that sucker in Notepad (or in my case vim )… was a big big help.
- There is almost certainly a smarter way to cap my percent calc at 100%, but… um, I don’t know what it is, and apply() technique worked well enough given my limited knowledge 🙂
I’m pretty late to the R party. And I found it pretty confusing to understand what data types I was using, or what type the functions expected… vector, list, data.frame, ugh. Even accepting that scalars are secretly a vector was confusing at first. But really… even after just a week, it’s not that bad. At least you understand these are things to worry about… which is half the battle. And, I have certainly seen some attractive maps (I like the one at right)… and I (sadly) loved not dealing with Bing.
Especially with R support rolling into the PowerBI.com service soon, I plan to keep this in my repertoire… and it’s probably worth your time to be familiar as well.
I haven’t really played with using R to munge my data inputs, but I’m confident that will be hugely powerful, and I have seen some nice demos that build ML models in R with surprisingly little code, and certainly my predictive skillz could use some work Maybe that will be next…