Ryan Sleeper
Ryan demonstrates how to visualize the 80-20 rule by creating traditional pareto charts in Tableau. By the end of this video, you’ll be able to make a highly interactive pareto chart by incorporating reference lines that show the intersection of the 80-20 rule, calculated fields that allow you to automatically segment the top quintile in the view, and animations to drive interactivity.
Hi, this is Ryan with Playfair Data TV, and in this video, I’m going to show you how to make Pareto charts in Tableau. And I’ll also be providing some additional tactics on how to visualize the 80-20 rule.
By the end of this video, you’ll be able to make something similar to what you see on the screen now. We have a percent of running sum of profit on the y-axis and percent of running sum of order IDs on the x-axis. I’ve also added reference lines that show the intersection of the 80-20 rule. And I’ll be sharing a calculated field that allows you to automatically segment the top quintile in the view, so that you can use it in other applications.
To get started over here in Tableau Desktop, I’m going to show you how to make what I would refer to as a traditional Pareto chart. We’re going to take this much further. But just to get started and kind of show you what it’s trying to do, I’m going to double click on Sales, which will add SUM(Sales) to the Rows shelf, and double-click on Sub-Category, which will add the Sub-Category dimension to the Columns shelf.
A little bit of context on Pareto charts, which are named after the Pareto principle, which were named after the Italian economist Vilfredo Pareto, who– I guess his parents were angry at him or something. But he did provide a very useful principle that we use all the time in the consulting side of our practice at Playfair Data. The Pareto principle is also known as the 80-20 rule. It states that 80% of outcomes are due to 20% of effects.
So, for this example, to isolate those 20% that are really moving the needle in the business, we’re going to start by sorting this chart in descending order. So I will just click Descending to get that into descending order. What this currently shows us is SUM(Sales) by the 17 subcategories in the Sample – Superstore dataset.
To identify at what point we’ve reached 80% of our sales, we are going to need a second measure on the view that computes the percent of running total of sales. The easiest way to do that is to start by duplicating the pill that’s currently on the Rows shelf. This is my very favorite shortcut in Tableau. I’m just going to hold the Control key while I drag Sales right next to itself.
That provides the exact same chart on two rows, but now those fields are independent. So I can do things to them independently of each other. The first row I can leave as is, but the second row, I can add a table calculation to. I will do that by clicking into the pill, hovering over Quick Table Calculation, and choosing Running Total. You can do this at any point while you’re authoring this type of chart.
But at this point, I’m going to go ahead and combine these into a dual axis chart. There’s a few ways to do this. We do have another video to show you how to do this, if you’re not familiar. This time, I will take the route of hovering over the second y-axis. When I do that, a green triangle will appear. I can left click on that and drag it to the opposite axis. That dotted line is going to show me where it’s going to draw that axis.
Normally, we recommend, at Playfair Data TV, synchronizing those axes. But by the end of this tutorial, the left side is going to be SUM(Sales) and the right side is going to be a percent of total, which only goes from 0 to 100. So they will be on very different scales, so we do not want to synchronize them in this case.
Tableau tried to help us out here by changing the Mark type, because it recognized automatically that we had quite a bit of overlap. I actually want to turn the first axis back to a Mark type of Bar. And for the second axis, I’ll probably go ahead and make that Line, so that those dots are connected. All right, so at this point, on the left axis, SUM(Sales) by subcategory. On the right, we have a table calculation that’s computing the running SUM(Sales).
The Running Sum and Moving Average table calculations are unique in Tableau, in that you can add a second table calculation. So you can take a table calculation of a table calculation, if you will, or compute a second outcome on the original outcome of the first table calculation. That’s a mouthful. But why it comes into play here is, instead of displaying only the running SUM(Sales), we can convert that into a running percent of the running SUM(Sales).
To do that, you click on the pill that has a Delta symbol on it. Click Edit Table Calculation. And because we’ve got Running Total selected here as the calculation type, we have access to this option to add a secondary calculation. If I check that box, I can add a second table calculation. This time I will do Percent of Total. And that is the entire table calculation, so I will close a dialog.
And I’ll use the tool tips to show you how to read the Pareto principle out of this chart type. As I hover over, I’m seeing the percent of sales that were contributed to that point within the 17 subcategories. So right now, I’m hovering over Copiers, and by the time I get to Copiers, we have earned 78% of our sales.
So for this particular use case, we’re hitting the 80% mark somewhere about halfway between our 17 sub-categories, which means this particular analysis turned out to be more like the 80-50 rule, where 80% of our sales were coming from 50% of our sub-categories. That’s not quite what we were looking for, and it was because our data was not very deep. We only had 17 sub-categories to evaluate.
So we’re going to do a much more robust analysis, and then I’m going to share some extra tricks and tactics for visualizing the 80-20 rule. This time we’ll use Order ID as our dimension, which in the Sample – Superstore dataset, comes with over 5,000 data points to evaluate. So we should, theoretically, be closer to the 80-20 rule by the time we’re done with this next analysis.
So on a new sheet, I’m going to, this time, mix up the measures a little bit. And I’ll start with profit, by just double clicking on Profit. And I will break up the profit measure by CNTD(Order ID). The easiest way to do that is to right click on my Order ID dimension while I drag it to the Columns shelf. And when I let go, I can choose the Aggregation of Count Distinct.
At this point, we see a scatter plot with SUM(Profit) on the y-axis, and Count Distinct of our order IDs on the x-axis. To evaluate all 5,000-plus orders within the Sample – Superstore dataset, I need to also add the Order ID dimension to the Detail Marks card. And when we do that, we should see the x-axis change to only 1, because each order is being counted one time so its value is 1. So we only see one value on the x-axis so far.
All right, there’s the foundation of this second Pareto chart. As with the first Pareto chart I showed you, we’re going to add both a primary and secondary table calculation to each measure that’s on the view. To do so, I’m going to click into the Profit pill, hover over Quick Table Calculation, and choose Running Total. I’m going to do the same thing on the CNTD(Order ID) field. And at this point, both measures have a primary table calculation taking place.
We don’t see anything change yet, and that’s because there’s one extra step that we need to do if we’re trying to evaluate all 5,000-plus orders in the Sample – Superstore dataset. By default, table calculations are computing from Table, across. But in this case, we want that evaluation to happen at the Order ID level.
So for both of these measures, I’m going to click into the pill, hover over Compute Using, and choose Order ID. We still don’t see much change, when I did that for the SUM(Profit) pill. But once I do that for the CNTD(Order ID) field, we should start to see something coming into place. That’s how the Pareto looks so far, with the first table calculation taking place.
But remember, we also want to add a secondary table calculation to convert that raw Running Sum number into a Percent of Total. So for each of these fields, I will click into them again. Click Edit Table Calculation. Choose Add Secondary, and choose Percent of Total. The shape should look the same, but now the y-axis is in terms of percentages, instead of a sum or running sum.
We need to do the same thing for the x-axis. So click into the pill, Edit Table Calculation, Add Secondary Calculation, and choose Percent of Total. Again, same shape, but we have percentages on the x-axis, instead of dollar amounts.
Not quite that nice, smooth curve we were looking for, however. That is because we need to sort– Just like we did in the first view, where we sorted the subcategories in descending order, we need to sort every one of these customers in descending order. To do so, you click into the pill that’s on the Detail, Property of the Marks card, and choose Sort.
I can change this dropdown to be based on a field. We’ll put these in descending order. Choose the correct field, which, in this case, is profit. That’s what we’re evaluating. And we should get that nice, smooth curve. And we have 5,009 circles on the view, currently.
This next tip is kind of a take-it-or-leave-it one, but– at least for this analysis– I’m not very interested in the negative performers. So take a closer look at the y-axis in the current view. It goes from 0 to just under 150, and then it goes all the way back down to 100. That’s because we have quite a few Order IDs that actually had negative profits.
So we had a running sum get all the way up to 150, but then we come back down to Earth, where we end up at 100% of profit. For me, if I’m trying to focus on just the 20% that are causing 80% of the positive profit gains, I’m going to go ahead and add a Order ID Filter to get rid of all those negative performers. To do so, just drag Order ID to the filter shelf.
I like to go to the Condition tab, and I can just change the condition here. I can choose Profit. It automatically takes an aggregation of Sum, so that part’s good. And I’ll say, has to be greater than 0.
After I click the Apply button, we should see the negative portion of that curve go away. And we end up at the positive 100, which is exactly what I would like. So I click OK. So there is an example of a Pareto chart with a much deeper analysis, that should get us closer to visualizing the 80-20 rule.
Let me give you a couple of tactics on really making that precise, to yourself and your end user. We’ll start by adding a couple of reference lines that just show the intersection of, hopefully, what turns out to be roughly the 80-20 rule, with this new analysis that we’ve done.
To do so I’ll navigate to the Analytics pane. These are reference lines. But they’ll always be constant, because the y-axis will always be at 80% and the x-axis will always be 20%.
So I will make the first selection, and drag that onto the view. I’ll do the y-axis first, which is SUM(Profit), and I’ll type in 0.8. And we should see the line appear there at the 80% mark. I’ll do the same thing for the x-axis, Count the Order ID, and I will type in 0.2.
And as you can see right away, we did get very close to the 80-20 principle here. If I hover over near that mark, I can see that, roughly– It’s kind of hard to move your mouse, which is why I’ll provide the next tip for you. But it looks like 20.15% of our orders delivered 77.36% of our profit.
Let’s make that even more precise, though. And I will share a calculated field, where you can actually segment all of these orders into whether they were in that 20% that drove roughly 80% of the value. What’s great about this technique– And I’ll call this Top Quantile, which means the top 20%.
What’s great about this technique is, once we’re done with this, you can leverage this segmentation to isolate that top quantile and use it for whatever reason you would like. You can add it to a color property of the Marks card, in other views, to see who makes the cut, of that top 20%.
You can even make views that filter to just that top 20%. So if you wanted to export it, maybe get in touch with that top 20%, do focus groups, maybe send them a discount offer or refer a friend offer, it just really helps to easily isolate the top 20%. Which again, the whole reason we’re doing this is those are the 20% that are really moving the needle and providing the most value.
I’m going to cheat a little bit on this formula. But you can, too, because I will share this in the Related Content below the video. But I just pasted the formula in there.
But I will walk you through what this is doing. It’s a very simple Boolean formula, but it leverages the WINDOW_PERCENTILE function to evaluate all 5,009 orders and determine which of them are in the top quantile or 80% and above. It’s saying, look at each Order ID, at least in our case.
Again, I should remind you that, you can do this analysis for whatever dimension you would like. We happen to be using Order ID. But for whatever your visualization level of detail is, Tableau is going to run this logic to create a Boolean segmentation.
You’re either in the top 20%, or you’re not. It’s going to look at each of the 5,009 orders, look at its profit value, and it’s going to see if it’s above or equal to the SUM(Profit) for the 80th percentile and above, is what the second half of that formula does.
And one last reminder before I close this box. We are evaluating the profit measure, but if you are using something different, you would simply replace the two orange Profit fields in this view with whatever measure you are evaluating. But that’s it for this analysis, so I’ll click OK.
We have a new field down here called Top Quantile. I’m going to drag that to the Color property of the Marks card. And we don’t see anything change yet.
This is because, once again, this is WINDOW_PERCENTILE as a table calculation. I know there’s a table calculation taking place on this field because I see Delta symbol on top of its pill. And by default, that window calculation is happening from Table, across, or left to right in this case.
Well, we need to change that so that it evaluates whether each Order ID is in the top 20%. So to change how it’s computing the table calculation, I will click on the pill, hover over Compute Using, and choose Order ID. And then we should see two colors come into place.
Now we know exactly where the cutoff is for our 20%, because it’s going to be the last orange dot. So we can get even more precise, and I could see that 77.8% of our profit came from 19.9% of our order IDs. So that’s about as close as we can get there. Really nice effect and calculation.
There’s a couple other things you can do with this. For example, if I wanted to create a set out of that top 20%, I could filter this whole view to just the top 20%. If I click Control-A, it highlights all those 20%.
And now, if I hover over any of those marks, I can create a set out of them. And that way these 794 marks are isolated, and this is my top performing segment. So I just call this my Top Orders.
And now that I’ve got that set, we’ve got another video here at Playfair Data TV that shows you several ways to leverage those sets. I could use them to compare people that are in the 20% versus out. I could use it to export different orders. I could use it to highlight different orders, within the context of everybody else. There’s all kinds of things you can do with this, now that we’ve got that set available to us.
But I’m going to remove that filter to finalize this view and show you one more kind of cool thing that we can do. I’m going to add a filter for Sub-Category, by right-clicking on that dimension and clicking Show Filter. And to replicate the animation that you saw at the beginning of this tutorial, I’m going to change that filter, so that you can either choose All or a single subcategory at any given time.
To do so, on the filter, I’m going to click this down arrow and choose Single Value List. And now we see radio buttons appear. And like I said, you can either choose All– which gets us back to the beginning– or any individual subcategory.
And you can see, by default, Tableau has already added animations. But they’re a little fast, so I’m going to give you a couple of tips on animation, as well, in this video. If you go to Format, one of the options is Animations.
And they’re turned on by default, but they’re a little too quick for my liking. So I’m going to slow that down. Maybe I’ll just choose the slow option and close that box. And now, as I make changes, you could see a little bit slicker animation.
But there’s one more thing that I don’t love about this. This Pareto chart always goes from 0% to 100%, so I shouldn’t see these different numbers on the y-axis. I don’t know if that’s a Tableau bug. It’s kind of showing me how it’s building up to that 100%, on the fly.
I’m not a big fan of that, but we can get around that by just fixing both axes. That’s going to make our animation much smoother. So to do that, I’m going to right-click on each axis, choose Edit Axis, and fix the axis– from, maybe, 0 to 110– and see how that looks.
And I don’t like that extra label. So let me do 0.09, and then we shouldn’t see that extra label show up. Perfect. And I’m going to do the same thing on the x-axis, fix it from 0 to 1.09.
Now I see why it was liking the negative. That first circle is getting a little cut off, so maybe I’ll try minus 0.05. That looks pretty good. And I’ll close that.
And then– very last thing, from a formatting perspective– I’m going to change the Mark type, from that open circle shape to a closed circle shape. Because this analysis is rich enough where– when there are enough dimension members on the view– it’ll appear to almost be a solid line, which I think looks a little bit nicer.
And now as I click through different subcategories, you don’t see those axes change. This chart is very engaging. You have a nice smooth animation. Not to mention, you’re getting all the value that the Pareto chart was meant for, which is to help us isolate the top 20% in our business that are causing 80% of the effects.
This has been Ryan with Playfair Data TV. Thanks for watching.