Ryan Sleeper
How to make a scatter plot in Tableau, formatting tips for making your scatter plots more engaging, and a practical tip and calculated field for automatically creating segments from your scatter plots.
Hi, this is Ryan with Playfair Data TV. And in this video, I’m going to show you how to make a scatter plot, a couple of ways to make those scatter plots more engaging, and a practical application for how to get the most out of this chart type. Scatter plots are one of my very favorite chart types. These are actually my third favorite chart type, closely behind bar charts and line graphs.
The reason I like scatter plots so much is that they provide several benefits. One is, it’s one of the only chart types where you can view a lot of different marks in a very small concise space. They’re also very good at helping illustrate correlations. And related to the last tip I’m going to give you in this video, they create kind of a natural segmentation. A four-quadrant segmentation that you can then repurpose for other things.
Scatter plots are made with zero or more dimensions and two to four measures. Those first two measures form the y-axis and the x-axis. It’s typically best practice to put your primary metric, it’s often referred to as the dependent metric, under the y-axis or in Tableau terms, onto the rows shelf. And your explanatory metric, so your secondary metric that you’re trying to compare and see if there’s a correlation to it, on the x-axis or the columns shelf.
Let’s go ahead and make a scatter plot. And we’re going to do this with profit ratio as our dependent metric and sales as our explanatory metric. So I’m going to jump over here to Tableau Desktop. And to start I’m just going to double-click on my primary metric, because the default behavior is for Tableau to put that on the rows shelf. So that forms my y-axis.
And I’ll double-click on my explanatory metric second, because the default behavior of Tableau is for it to put the second measure that you double-click onto the columns shelf. So it’s actually that easy to create a scatter plot. Tableau does a lot of the work for us to kind of create this best practice scatter plot.
But we’re going to take this a step further, several steps further, in fact. Notice by default that my mark type is an open shape, it’s an open circle. I typically like to change that mark type, which you can do using this dropdown on the marks shelf, from shape, which is the default or automatic, to circle. I’ll just make it a closed in circle. I’ll also make the size of those marks a little bit bigger by clicking on the size marks card and dragging this over to the right a little bit.
Right now we haven’t specified anything more granular than the entire file. If we want to make our analysis more granular, we need to change the level of detail. One of the ways we can do that is by putting a dimension on the detail marks card. Let’s say that we wanted to look at profit ratio by sales at the subcategory level. So I’ll put subcategory on the detail marks card.
And then let’s also say we want to encode those circles by what category they’re in. We’ll color those circles by what category they’re in by putting the category dimension onto the color marks card. And that’s essentially a scatter plot. This is already fairly useful. We’re able to see a correlation between profit ratio and sales. And we’ve also broken this down by category and subcategory to see if we can get any clues to help us analyze this data.
But we’re now going to take this a step further. And the first tip I’m going to show you is how to make this more engaging using a formatting trick that I like to use. It’s related to the borders. There is an effect that lives on the color marks card called borders. But it’s all or nothing in terms of if I choose a different color, such as black, notice they all get a black border.
And also notice, there is no flexibility in the formatting. What you see is what you get there with that black border. It’s pretty thin. It’s actually really hard to see, even if I were to make the size of these marks even bigger. That’s not always enough for me. Sometimes I like to customize the size and color of those borders, which you can do with this first trick that I’m going to share.
It involves creating a dual axis combination chart with this scatter plot. To do so, we’ll start by using my very favorite shortcut in all of Tableau, which is to hold down the Control key and click on a pill that’s already on the view. That creates an exact duplicate of that pill. So I’m going to do that with profit ratio by holding Control, clicking the profit ratio pill, and dragging it next to itself.
At this point, we’ve got the same chart on two different rows. But what’s important here is we’ve got two measures on the rows shelf, and therefore, they each get their own set of marks cards which we can encode independently of each other. So we can leave the first row as is. But on the second row, I could change the mark type back to shape. And you can kind of already see this coming together.
But these open rings will now be our borders. So we still want them at the same level of detail, category and subcategory. But so far, all we’ve done is change the mark type from circle to open shape. I’m now going to combine these into a dual axis combination chart. And this is technically a combination chart, because on one side is a closed circle.
On the right side, we have a different mark type. That’s why it’s called a combination chart. We have a combination of mark types. And you can create a combination chart or a dual axis chart by clicking into the second pill on the rows shelf and clicking dual axis. We’ll also want to make sure that these are synchronized. It looks pretty close already, but just to ensure these are in sync, I’m going to right-click on either axis and click Synchronize axis. It just ensures those are properly lined up.
You don’t really see anything change yet, but remember, these are now independent. So back to one of my favorite things to do with the opacity is to go to about 80% or 90% on the color marks card. So I can do that on the interior of the circles and leave the outside of the circles as is. So I’ll click color, drag this opacity slider over to the left. Maybe I’ll go to 80%. And this is looking a little bit better now.
We notice the color is now in line with the mark. And I actually could also hide the border that was originally there so it looks even better now. And that helps the halo look a little bit darker there. And just a formatting trick that I like to use, it gives you just more flexibility in how to customize those borders.
The second tip I’ve got for you also related to formatting, which you’ll find on pretty much any time I’m giving design tips related to charts, is that we want to maximize the data-ink ratio. And if you’re not familiar with that, in case you haven’t come across this in another video yet, it’s a term coined by Edward Tufte in his 1983 book, The Visual Display of Quantitative Information.
And he states that for all the ink on a view, so anything that you can see, we want as much of that as possible to be dedicated to data. You can also have redundant data-ink. One example we can clearly see here are the two axes. We obviously don’t need both of those. Those are literally repetitive data.
So let’s just hide the right axis by right-clicking and deselecting show header. And this is a little bit subjective. But in my opinion, you can also have redundant data-ink if you’ve got too many tick marks. So this is a tip that I often incorporate into my own work where I’ll reduce the number of these tick marks.
So notice, we’ve got on the y-axis, a tick mark every five percentage points. So minus 10, minus five, zero, five, 10. And it’s not too bad in this case that we’re sharing now, but this can look pretty ugly. It can just be a big row of numbers that you don’t necessarily need all that detail. In my opinion, excess tick marks can also be an example of repetitive data-ink.
Fortunately, we can alter those in Tableau by right-clicking on an axis, clicking edit axis. And there’s a tab called tick marks. I often navigate here and I’ll change it. We’ll just stick with 10% instead of 5%. But you can already see in the background, that y-axis got cleaned up quite a bit. I’ll do the same thing on the x-axis. Right-click, click edit axis, go to tick marks and maybe instead of every 50,000, we’ll fix those at every 100,000.
I do want to point out one pitfall when you fix tick marks. These are truly fixed, just like they sound like. So if we were to add a filter, for example, and change the scale, it might throw off the number of ticks. If, for example, we filtered this down and we didn’t have any data points that were greater than 100,000, we wouldn’t even see a tick mark on the view. So you do have to be a little bit careful with that. But it is a way to help you out with the repetitive data-ink.
I also tend to get rid of most lines. This is another way you can maximize the data-ink ratio. You can format almost every line in Tableau by right-clicking anywhere in the view and clicking format. And there’s a tab here for borders, as well as for lines. For the borders, I usually get rid of the row dividers and column dividers. Again, a lot of this is subjective. It’s a case by case basis, depending on how you plan to distribute the view. Perhaps you’ve got some brand guidelines that are going to dictate some of this.
But the point is, I’m trying to make it as clean as possible. I only want to show the data. So I’m trying to reduce that excess ink on the view. Zero lines are also another one I tend to get rid of. Again, a case by case basis. Sometimes you might very well want to point out those zero lines and bring even more attention to them. But sometimes I’ll just hide those. That lives on the lines tab. There’s a dropdown called zero lines. You can change those to none. You can see that made it a little bit cleaner.
And then just stylistically, sometimes I’ll make the grid lines actually a little bit thicker. So I can change the grid line and choose a little bit heavier weight. But I think we’ve taken that scatter plot pretty far. I think that’s a pretty good-looking scatter plot. Just using those last two design tips, we put a little bit of professional polish on this, and it really made it stand out.
My third tip for you is more of a practical application of this. One of the benefits that I mentioned to you in the intro to this video was that scatter plots kind of create a natural four-quadrant segmentation. This would be easier to see if I added a reference line to both the y-axis and the x-axis.
My favorite way to add a reference line is to simply right-click on an axis, then click reference line. Add reference line rather. The default reference line is just going to be the average of that axis. We’ll stick with that for now. Of course, a lot of people tend to use median here, but we’ll stick with average, which is the default. I’ll click OK. And I’ll do the same thing on the x-axis.
And here’s what I meant by a four-quadrant segmentation. These reference lines that are crossing here have created these four boxes. And each of these subcategories can be grouped into a specific type of behavior. So for example, in the top left corner, these are our lowest selling customers, but highest profit generating. So of those smaller sales, they’re generating a high ratio of profit.
In the top right quadrant, these are my best selling customers, so we made the most of money on them. Plus, they have a very, an above average profit ratio. So these are kind of our superstars. They’re buying a lot and we’re making a lot of money on them. In the bottom right quadrant, these people are spending a lot of money on average, but they’re below average in how much of that money is profit. So we might want to treat them in a slightly different way.
And then these here in the bottom left corner, they’re not buying a lot, and we’re also not making a lot on them. So they’re kind of maybe the least, the people that we want to focus on the least. Or if you run out of things to do, maybe you want to enhance that group and make them perform even better.
But the point is, each of these four segments, you might want to focus on them and treat them differently. If we were breaking down the scatter plot by customer, for example, we’d have this nice four-quadrant segmentation. Maybe we could give our best customers some extra discounts or give them a holiday gift or something like that.
If they’re up here in the top left where they’re not buying a lot, but they’re buying profit generating products, maybe you want to reach out to them and upsell them. Get them to buy more stuff. We’re already making good money on what they’re buying, let’s get them to buy more. That’s good, but we’re going to take this even further and make this happen automatically for us. We’re going to create this four-quadrant segmentation using a calculated field.
And I’m going to cheat a little bit here. The point of these videos isn’t to have you watch me type. But I will provide this in the related content section below this video. If you click on that, I’ll provide a link to this formula. But essentially, it’s breaking out these four different quadrants.
It’s saying if the profit ratio is greater than the profit ratio for the window average, this is a table calculation that’s going to compute the average profit ratio for this view. So if that happens and the sum of sales is less than the window average sum of sales, those are high profit and low sales subcategories.
Second line, these are the ones with above average profit ratio and above average sales. So they’re labeled accordingly. The next one, this is below average profit ratio, but above average sales. So these are, they’re spending a lot, but we’re not making much money on them profit ratio-wise. And then that leaves the last category, so I just put this catch-all in. So everybody else is low profit ratio and low sales.
So I’ve created the segmentation. Tableau is now going to do the rest of the lifting for me, once I add this to the view. I’m gong to click OK. Instead of coloring the circles by their category, I’ll replace that category color with segmentation. It’s going to replace what’s on the color marks card. It’s on the view. Notice the color change, but they’re all blue, and it’s not quite classifying this correctly. It’s saying that all the circles are low profit ratio and low sales.
Obviously, that’s not the case. We can see there’s four different quadrants on the view. The reason that’s happening has to do with the default addressing of a table calculation. I can see that the segmentation dimension has a table calculation on it because of that delta symbol. And remember, any time you use a table calculation by default, that table calculation is computed from left to right across the view.
That’s not what we want in this case. So you can change the addressing by clicking in, either right-clicking on a pill that has a table calc or clicking this down arrow that appears when you hover over that pill. And hover over compute using. It’s the same thing as changing the addressing. Instead of going by the default table across, we’ll go by subcategory.
It will now do that calculation at the subcategory level, and because I left these reference lines in the view, we can see that it did, in fact, work. We’re seeing exactly what we want to see. Now that we’ve got these isolated, we might want to treat these people differently. And in fact, you can isolate these by dragging a box around them and clicking keep only.
Maybe you want to make a set out of these guys. Oh also, let me point this out. Once I filter to those two, notice it redid the calculation. It’s because our window average has changed. So be careful with that, because table calculations are going to be computed just on the data that’s in the view.
But the reason I was filtering this down is, I know that these are my highest selling, most profitable subcategories. So maybe I want to make a set out of these guys. I could filter it down. Now that it’s here, I can hover over and create a set out of them. So notice those are the subcategories in my set.
And we’ll just call these our best subcategories. Click OK. We’ve now got a new area on the Data pane that includes my segment of my best subcategories. I can export those names if I wanted to, treat those customers differently. I might want to see what’s different about those customers in the context of everybody else.
Now that I’ve got that in a set, I could use it as a highlighter, a dimension, a filter. All kinds of applications that we could get out of this. But it was all made possible by leveraging this practical application of a calculated field to improve our use of a scatter plot.
This has been Ryan with Playfair Data TV – thanks for watching!