Ryan Sleeper explains aggregation and disaggregation which are critical analytics concepts for proficiency in Tableau. In addition, Ryan demonstrates three separate ways to make a box plot and provides a tour of Tableau’s Explain Data feature.
Hi. This is Ryan with Playfair Data TV. And in this video, I’ll show you several different ways to make a box-and-whisker plot in Tableau. I’ll also show you how to interpret a box plot, we’ll talk about disaggregating measures, and I’ll show you the Explain Data feature in Tableau.
Box plots are a great option for showing the distribution across any measure as you’re doing your visual analytics. So you can use any measure you’d like, but for this first example over in Tableau Desktop, I’ll use the Sample-Superstore dataset, and maybe we’ll start with the Discount measure.
So I’ll just drag discount to Rows to get that onto the view. And I’m going to change the Mark type to Circle so that we can eventually see all the different orders with our discounts appear. Notice by default when I added Discount to the Row shelf that it comes with the aggregation of sum.
Aggregation is a very critical concept within Tableau, and in fact, analytics in general. And we do address that overrun on the fundamentals track. But just a quick review of how I think about this, I think of aggregation as consolidating several rows into a single value. So any time you see that aggregation of sum, that means that we’ve taken multiple rows and we’ve added up all of their values into a single value.
But with a box-and-whisker plot, which we’re about to create, we’re looking for the distribution of all of the different values across the dataset. So I’m actually going to do what’s called disaggregating that measure. So we should see that Sum disappear.
And I happen to know that there are 9,994 records in the Sample-Superstore dataset, so we should see that many circles. And that’s the start of evaluating the distribution across all roughly 10,000 of those circles. You can do this on any visualization within Tableau.
If you navigate to Analysis, by default, notice the second option it says Aggregate Measures. And it is checked. If I de-select that, the aggregation of sum goes away and we see a lot more circles appear on the view. I can tell by this summary in the bottom left corner that there are actually all 9,994 records represented, but because we chose discount as our measure, we have a lot of common values.
So for 80%, for example, if I were to select that row in the dataset, I can see that 300 out of my 9,994 were discounted at 80%. So this is helpful, but there’s a lot of overlap. So I’m going to show you several ways to convert this into a box plot so that we can better analyze the distribution.
One thing I can do is simply navigate to Show Me. And on the second to last row, that last option there will convert what’s on the screen now into a box-and-whisker plot. Because I chose Discount again, I’m not seeing too much of a distribution. I’m not getting the full sense of what that box plot is explaining. So I’m going to break down my discount values by a dimension. Maybe I’ll throw Sub-Category onto Columns.
And now we get a better sense. We can see some of the true way that box-and-whisker plots were meant to be displayed. You can see on the view that some of the dimension members– Machines, for example– has a box with a line in the middle to quickly show you how to interpret what we’re looking at here.
That line in the middle of a box is the median distribution across Machines. So in this case, that would be 20%. The rest of the box, so that light or gray shading and the dark or gray shading represents plus or minus 1 quartile from that median. When you add those together, that gets you to 50%.
The proper term for that middle 50 is Interquartile Range. That, I think, is why some people get intimidated with this chart. Interquartile range is an overly fancy term. But it just means middle 50. 50% of the value should fit within that box. The whiskers then by default represent 1 and 1/2 times that middle 50. And if you’re outside of 1 and 1/2 times the interquartile range, you are considered an outlier.
So that’s how to interpret a box plot and the first way to make a box plot. A second way to make a box plot, if I were to just undo twice but then add Sub-Category back to columns, is I can just navigate to the Analytics pane. And the fourth option down is box plot.
If I drag that onto the view and let go, we have the same box-and-whisker plot in order to analyze our distribution. A better metric for looking at the distribution might be Profit. And to also show you that you can look at a box-and-whisker plot even with aggregated measures, let me drag Profit onto the view and go back to Analysis and choose Aggregate Measures. And by default, we get Sum.
But I did that on purpose to show you that– notice, there’s only one box-and-whisker plot. It’s actually very subtle. If I bump the size down, we could see that a little bit better. But that box-and-whisker plot is being crammed into a single value. Obviously, that’s not showing us much value.
This is the difference with an aggregate measure versus a disaggregated measure. Right now, it’s aggregating things at the visualization level of detail, which currently is Sub-Category. And because we have nothing else, we’re just seeing a single mark, and therefore, a single distribution or single box plot per Sub-Category.
To add some context to this and analyze an actual distribution, I will drag Continuous MONTH of Order Date to the Detail property of the Marks card. I can do that by right-clicking on the Order Date field and dragging it to Detail and letting go. And if I choose MONTH Continuous, I will see 48 different circles because there are 48 months in the Sample-Superstore dataset per Sub-Category.
And now I’m back to being able to analyze things like what is the median distribution across the entire Sample-Superstore dataset by Category and Month of Order Date. I can also very quickly identify outliers. A third way to make a box-and-whisker plot is to simply use reference lines.
So I’ve shown you how to make a box plot via Show Me via the Analytics pane, but box plots are actually technically just reference lines. If I right-click, in fact, it already says Remove Reference Line. That’s because that would be the box plot that I added via the Analytics pane. I’ll go ahead and choose that just to start over here.
But the third way to make a box plot is to right-click on the axes that you want to add, the visualization too, and choose Add Reference Line. And this is such a commonly used reference line that there is a tab just to make a box plot.
And I actually prefer making box plots this way because you’ll see there are some additional options such as the style of the box plot. You can even choose different colors for the shading, the opacity, the line weights and colors of the borders and the whiskers.
So that reference line box or dialogue opens up a lot more flexibility in how you format these. There’s that default that I mentioned earlier where the whiskers go out to 1 and 1/2 times the IQR. That’s short for interquartile range. There’s also an option to extend those to the maximum extent of the data, which will extend those whiskers to the highest value and the lowest value within the dataset.
And then because distribution is the key aspect of this chart, you even have an option to hide everything else. If I choose that, notice that really what we care about is just that box plot visualization in the middle. Tableau is also left the outliers on the view. So we can see the distribution at a glance, but it also alerts us if there’s anything unusual happening within our data.
But I’ll go ahead and uncheck that just to get back to all the defaults. And that is a third way to make a box plot. The last thing I promised I’d show you on this video is a quick tour of the Explain Data feature. This is a relatively new feature. It’s been around actually maybe a couple of years now. But this allows us to do lots of different analyses on the fly, and then Tableau will boil up just the key outlier in difference for us.
Tableau claims to run this data through thousands of possibilities and then report on something that is unusual in the data that you can then dig in deeper. So let’s try to apply that to this box-and-whisker plot. Let’s take a look at maybe the Machine’s profit, because it has our minimum value across the entire dataset.
Looks like in November 2020, we came in at minus over $5,000. Well, if I want to get more context on why that might have happened, I can click directly on that data point. And it’s this small light bulb icon that is the Explain Data feature. And if I choose that, we’ll see the Explain Data pane appear on the right side.
It tells us that that value was lower than expected. And if I click this right arrow, we’ll see a dropdown with a couple of things that might be interesting about it. One of my favorite features of Explain Data is this extreme value.
That would be really hard for us to look through all 9,994 records in this Sample-Supestore dataset to figure out what exactly the outlier values were. Tableau has automatically done that for us and we can simply use this dropdown to get detail on exactly what that value was.
So it says this mark, which remember, is just November 2020 is made up of six records. So we had six sales that were machines in that month, and one of those is very low, minus 6,600. The others, there are some unprofitable values, but they’re hovering around zero, and then we even have a decent profitable order there. But it’s this extreme value that’s dragging us all the way down.
And then Tableau is giving us some context on what happened within that order. Looks like the discount was 70%. So if I’m a good analyst, I’m probably starting to get an idea at this point on why that value was so extreme. Why was it so unprofitable? We had discounted that product 70%. Maybe we were just trying to offload this particular machine. Maybe we had a customer that has been a great customer that we’ve been trying to reward.
So maybe it’s not as bad as it seems. And one of the reasons I like this extreme value feature is I can visualize the difference. So I can have the option to see what my data would look like with and without that one extreme value. And to do that, you can just look at the visualization within this pane, or if you click this Go to Sheet button, it will actually create a brand new worksheet that by default contains the original visualization without that extreme value.
And when it does this, visualize the difference, it actually creates a set and puts it on the Filter shelf. The reason that’s interesting and valuable is you can right-click on that, that set filter and click Show Filter. And you as well as your users have the option now to show everything without that extreme value. You can look at just the extreme value, or you can look at everything together. And you’re back to your original visualization.
So that was Box Plots, Aggregate Measures , and Disaggregating Measures, and the Explain Data feature in Tableau. I’m Ryan with Playfair Data TV. Thanks for watching.