Ryan Sleeper
Learn how to improve charts like box-and-whisker plots that have lots of overlapping data points. You’ll see the formula for separating marks and a trick that allows your users to control the intensity of the ‘jitter’.
Hi, this is Ryan with Playfair Data TV. And in this video, I’m going to show you why and how to make jitter plots in Tableau.
To get started, let me set up an example over in Tableau Desktop using the Sample Superstore dataset that explains why you might need to use this technique. If you’re not familiar with jitter plots, what it does is it provides some separation between underlying marks so that you can see more data on the view.
So first, why we might need this. If I were to make a quick chart that looks at Sales by Sub-Category, maybe I would sort that. And just to get a lot of data points in here, I’m going to put continuous month of Order Date onto the Detail Marks Card. And what we have at this point is a stacked bar chart. This is one of my least favorite chart types, because unless you are that stack on the very bottom, it’s very hard to see the trend of individual dimension members, because they are all inheriting the value below it.
So in this case, we’ve got 48 data points in each column. By the time you get to that stack on the very top, you’re seeing that value. But it has 47 values below it that have pushed it all the way up on the y-axis.
So in almost every case, the first thing I would do with a stacked bar chart is change the mark type to Circle. Now there’s a much truer representation of where each data point is on that y-axis.
To make this even better and look at the distribution, I might jump over here to the Analytics pane and convert this dot plot into a box plot, which will just show me the median, the interquartile range, and the outliers. I’ll make these a little bit smaller. Actually better yet, let me just filter this to my top five dimension members so we can see this a little bit better.
So we’ve made this chart even better now. We’re looking at the distribution. All these lines have some statistical context. But one thing that we lose here are the values of all the underlying marks. Yes, the most important aspect of this chart is that distribution that’s being communicated via the box plot.
But there are a lot of data points in here. And in Tableau, I have the ability to hover over those data points to gain more information about them. And that’s part of what’s being lost here. I don’t have a good sense of the density of these marks on the view. This could be 20 marks. It could be, as we know, 48 marks. I just don’t know, because I can’t see those underlying data points.
That’s where jittering comes in. To create a jitter, it involves a table calculation. If you’re new to table calculations, I encourage you to check out the video An Introduction to Tableau Table Calculations here at Playfair Data TV.
But I use this table calculation so often that instead of adding it in the flow of my analysis, which I could do– by the way, there’s a video that shows you how to do Tableau in the flow as well. But I almost always make a calculated field. So I’m going to create a calculated field. And I always call it Jitter.
And the entire formula is the function I-N-D-E-X– INDEX()%. And then whatever number you type after this percentage sign will be the intensity of the jitter. So this is controlling how much space things are going to have from left to right. It’s completely arbitrary.
Just to get started, I’m going to type 10. So this will look at each– eventually, this will look at each of my 48 months. And it will assign it a number from 0 to 9, so 10 values on the x-axis. It’s going to provide some horizontal separation. Going to click OK.
Because we are separating these marks horizontally, I’m going to place that jitter measure that we just created onto the Columns Shelf. As you can see, it has a delta symbol on its pill. That’s Tableau telling you there is a table calculation taking place.
We saw something change. However, it doesn’t look quite right. What’s happening is the columns are getting assigned a number between 0 and 9. But they’re on the Sub-Category level. So all my Phones are assigned the number 1. All of my Chairs are assigned the number 2, and so on.
To get this to work as intended, we need to change what’s called the addressing of this table calculation to assign that number. Instead of on Sub-Category, we want it to be assigned by month of Order Date. To change the addressing, simply click on the pill that has the table calculation taking place. Hover over Compute Using, and change it here.
As you can see, right now it’s using the default, which is called Table (across). That’s why it’s moving left to right across sub-categories. If I change it to Order Date, we will see much better separation now. Let me click on the Size Marks Card to make these a little bigger so you can see them. But now, as I said earlier, it’s looking at each of the 48 months per sub-category and giving it a number between 0 and 9. So we’ve got nice separation of those data points within each column.
This number, again, is completely arbitrary. It means nothing. In fact, every time I make a jitter plot, I would hide this header, because I don’t want my end user to confuse the meaning of that value and think that it means anything and interpret this as a scatter plot. So I’m going to hide that header by right-clicking and deselecting Show Header.
The point is– here’s why you might want to use this. We can now see much more of the underlying data points. And because we’re using Tableau, we can hover over those data points to communicate additional information.
Let me show you one more little trick. We’re going to make this jitter plot even better by allowing our end user to control the intensity of that jitter. We’re going to parameterize that intensity value. Right now it’s hard-coded at 10.
But I could create a parameter. We’ll call this Jitter Intensity. And we’ll say it’s a data type of Integer. And we’ll do a range. We’ll say it has to be at least 5. But we’ll let them go all the way to 50. Let’s just take a look at how this looks. And the step size or multiples will also be 5. Click OK.
And now I’m just going to go plug in that parameter in my Jitter calculated field. So instead of this hard-coded 10, I’m going to do percent Jitter Intensity. Anytime you see the Apply button in Tableau, you can preview the change before you accept it. Because we had a hard-coded number of 10 but we just set up a parameter that has the current value of 5, we should see less separation right away when I click Apply. So I click Apply. Sure enough, now each of the columns has horizontal separation across five values instead of 10.
Now to unlock the user control of this, there’s one last step. I just need to right-click on the parameter and choose Show Parameter Control. And now the end user themselves can decide how much separation they want to provide. So 5 doesn’t look bad. But in certain cases, you might want to bump that up to 10. That’s what we started with. And if they’ve got a lot of marks on the view for certain filtering situations, they might want to keep bumping this up. And it allows them to go all the way to 50, which is what we coded in that underlying parameter.
This has been Ryan with Playfair Data TV – thanks for watching!