3 Ways to Visualize Outliers in Tableau 3 Ways to Visualize Outliers in Tableau In this video, Ethan will show you three ways to visualize outliers in your dataset in Tableau: using Standard Deviations, Median with Quartiles, and Z-Score tests. You will also learn when and how to apply these techniques.

3 Ways to Visualize Outliers in Tableau

3 Ways to Visualize Outliers in Tableau

In this video, Ethan will show you three ways to visualize outliers in your dataset in Tableau: using Standard Deviations, Median with Quartiles, and Z-Score tests. You will also learn when and how to apply these techniques.

Hi, this is Ethan with Playfair+, and in today’s video I’m gonna be showing you three ways that you can visualize outliers within your dataset. I’ll be using Tableau to walk us through these different techniques, but just know that these techniques can be applied in any other data visualization tool.

The important thing here is to know how to apply them and when. Let’s get into Tableau. At the end of today’s video, you should be able to recreate this exact workbook. This is a supplemental workbook if you wanted to download it and follow along or try to re-engineer it from there, go to Tableau Public, Playfair Data’s, Tableau Public, and you can find it there within our portfolio.

You can see here I’ve implemented three different types of outlier detection. So we have Standard Deviations, Median with Quartiles, and I’ve implemented Z-score tests here. So let’s start with our standard deviation and let’s just first look at this so we can recreate it here.

But you can see I’ve incorporated some banding that’s plus or minus one standard deviation, another set of distribution banding, that is plus or minus two standard deviations. And then I’ve outlined that on top of a dual-axis combination chart, which is a line chart with circle marks on top of the kind of intersection of each points on the line.

So that way I can add in some conditional formatting so you can visually see the data points that you might need to focus in on for your analyses. That’s what we’re gonna try to rebuild here. So let’s open up a sheet and let’s get that going. So I’ve connected to the Sample Superstore dataset for this tutorial.

And first I’m gonna add Order Date and we’re gonna build our dual-axis combination chart. So I’ll add that to the Column shelf. I’ll choose month of Order Date, continuous, and then I’ll add Profit onto the row shelf That gives us a line chart. Now I need to create kind of that dual axes combination chart with circles.

So I’m gonna add Profit again to the Rows shelf here, and then I’m going to right-click on sum of Profit in the Rows shelf and choose Dual Axis. Couple other things I need to do, I’m gonna right-click on the axiis here on the right and select Synchronize Axis. And then I’m going to change this Mark type to Circle, and you can see it’s gonna layer them there.

Now this is the foundation for all three of these charts we’re gonna build. So before I get started, I’m just gonna duplicate these or duplicate this chart. So that’ll give us our foundation. I’ll do that by just right-clicking down here on the sheet and hitting duplicate. All right, let’s hop back to the first one and let me rename this sheet.

I’m just gonna call it Standard Deviation. Okay. Now we have our foundation set up, so I’m going to go ahead and start building out the outlier detection. So I’m gonna toggle to the Analytics Pane and we’re gonna add a distribution band onto our view. So I’ll drag distribution band to table. And our little editor window will pop up.

This is where I can change this computation. You know, right now it’s looking at percentiles. I want it, or excuse me, percentages. I want it to look at, standard deviations. So I’ll select that dropdown on value and choose standard deviation from the menu. And on the right side, you can see this factors section.

This is where we can actually define, how many standard deviations from the average. So for this first banding, I actually want that to be plus or minus two standard deviations. You can see that kind of happening in the background there, and it kind of goes out. The one thing I do want to do is add some lines so we can see kind of some boldness here, where we’re kind of drawing a line as far as what’s an outlier and what’s not.

And that’s our first band. So I’ll select OK. On that editor, and I’m gonna do the same thing. I’m gonna add a distribution band to table in our editor. I’m gonna drop down the value dropdown from computation. I’m gonna choose standard deviation. And for this I do want it, plus or minus one standard deviation.

So I’ll leave the factors as is. And a couple other things just to format this a bit, I’m gonna change this fill to a little bit lighter color. So you can see now we can have, or we have two separate bandings. We can kind of see the difference. And then I’m also gonna add in a line here, again, just drawing a line, making it a little bit more bold.

That’s essentially it. So now we can very quickly see, just with the visual itself, some outliers here that fall above two standard deviations. And since we’re looking at profit, these would be kind of good anomalies to have. And then we can see one that kind of fell below that two standard deviation mark.

Everything else you can kind of consider expected values, plus or minus one standard deviation from the average. You may use these in your analysis. So I am gonna highlight those in our conditional formatting, but that is the foundation of the standard deviation technique here. Now let’s add in that conditional formatting.

So I’ve already created the logic here for us. And let me zoom into that so you can see it. And again, if you are following along, you can download the supplemental workbook. The link to that is actually just below this video, and you can copy and paste this directly out of the workbook. But if you’re following along and you want to type it out, this is essentially the logic.

It’s the sum of Profit. If it’s less than the window average minus the standard deviation of Profit, then we’re calling that a bad anomaly. The reverse of that, so I’m looking at greater than those values, I’m gonna call that good anomaly. Everything else I’m capturing in this else statement. So anything in between that, I’m just saying it’s expected.

So I’ll click OK. And I’m just gonna add that into my view here on the Circle Marks. So I’ll drop that right into the Color property of the Circle Marks. And you can see that come to life here, where we have kind of these bad anomalies here at the bottom and good anomalies across the top. And that’s our first technique.

The next technique I’m gonna apply is this Median with quartiles. Sounds difficult, but if you’ve ever used a box plot to detect outliers, this is essentially the same technique. We are just going to apply different formatting to make this easier and a little bit more legible and clearer.

Let me show you the difference. So if I drop box plot onto our view here, Tableau does this really ugly box plot here. You can see there’s one outlier for this technique. Here is that interquartile range, if you will. But this, it doesn’t draw the lines all the way across. There’s just a few things here I really don’t like.

I don’t like the presentation of it, I wouldn’t put it in front of a client. I just don’t feel like it delivers on what it’s supposed to. So let me clear that. And we’re going to essentially rebuild that box plot, utilizing different techniques directly from the Analytics Pane. So the first one for that interquartile range, we’re just gonna add Median with Quartiles to our view.

You can see that draws a band or distribution band. I’m going to edit that and just like before, I’m gonna add this lighter fill and I’ll add a line to create some boldness here. So you can see I have the median and then these quartiles, the upper and lower quartile ranges, just like we would in a box plot.

That’s our interquartile range. Now I’m gonna draw some distribution bands. And for this, that box and whisker plot, it actually does utilize a technique with standard deviations. And the upper and lower whisker is actually 2.698 standard deviations from the average here. So I’ll add that to our view and I’ll just change this color a little bit, make it a little darker and I’m gonna add some lines here.

And I can already see, there’s one thing that I did out of order here, and you might run across this yourself, so it’s good for me to show you. I’m glad I ran into this. I added this distribution band first. So just like anything within Tableau, it’s layered behind the second distribution band I added.

That’s why that formatting went away, that color that we had in the middle. To account for this, I can simply right-click on one of the lines on the first distribution band. Just choose the order here. I’m gonna move it to the front and you can see it fixes that, and that’s essentially this technique.

So if I apply that box plot view, you can see that box and whisker plot appears. There is one tricky thing with box and whisker plots. The actual logic behind them. It chooses the maximum point within your view, and that’s where it stops the upper and lower whiskers. With the way we’ve organized it, it actually goes all the way to the maximum range of that upper and lower whisker.

So you can see, I actually think that’s an additional point. Because we can see this particular outlier here is right on the border. But it is falling outside that range. So it’s something we need to drill into. But overall you can see how much better that looks than the box and whisker plot as far as presenting it out.

And let me rename this sheet and call that Median with Quartiles.

Now, the last technique I’ll show you is utilizing Z-score tests. And a Z-score test is essentially, it’s scoring and assigning each mark in the view a score based on how far away from the average that particular mark is. And we can quantify that and draw out what our outliers are. So for this, I have created our Z-score test logic here.

Let me preview that with you. And again, you can copy and paste this out of the supplemental workbook. Or if you’re following along, you can simply just type in this logic here. But it’s looking at the sum of Profit minus the window average divided by the standard deviation, and that’s gonna give us, excuse me, the window standard deviation. That is important, window standard deviation.

And what that’s gonna do is it’s going to assign a value. So if I drop this onto Tooltip, the Tooltip property on the Circle Marks, you can see now if I hover, it’s assigned a Z-score to every single mark within our view. Now Z-scores, they kind of oscillate around an average line. So let me add an average line.

I’ll just toggle the Analytics Pane again and drop this into our view. Here’s our average. And if you notice, if I go to any point below the average, the Z-score test, that, Z-score test value is negative. And you can see the farther away from the average, the more negative it becomes. So this point is very close.

So it’s -0.075 is its Z-score. This one is pretty far away. It’s -1.6. Same thing above. So anything above that average line has a positive Z-score value assigned to it. And just like before, if it’s close to the Z or the average line, it’s gonna be closer to zero. And as we move further away, that Z-score test is gonna get longer, excuse me, larger and larger.

If we had a value that was exactly the average, the Z-score would actually be zero. Because it’s the exact same value. So that’s how that works. Now the last thing I’m gonna do is add a distribution band here. Just like before. I’ll drop that to table. I’m gonna change the value to standard deviation and I’m gonna do plus or minus two standard deviations again.

I’ll just enhance this view by bolding out those lines, and I’m gonna fill it in with kind of a lighter color, the entire view here, and I’ll click OK. So now again, you can very clearly see we have these three outliers here and this one here. Now I’m gonna add in some conditional formatting to this particular model, and I have created the logic for us yet again, let me zoom in here.

It’s very similar to the standard deviation custom formatting except we’re dividing by the window standard deviation. And we’re comparing that to our Z-score. So if it’s a Z-score that’s greater than two, we’re gonna say a good anomaly. A Z-score, that’s less than two, we’re gonna say a bad anomaly.

Click OK here, and I’m just gonna drop that onto our Color property. Again, you can kind of see that come to life here.

Now that is the three techniques on three different ways that you can visualize outliers within Tableau. Again, I’m gonna toggle to our supplemental workbook here, and you can see those exact same techniques applied across here,

and you should be well equipped now to create that same logic in any other tool. So I just walked you through those three techniques that you could use to visualize outliers. Again, you can apply those techniques in any other tools now that you kind of understand the logic. You’ll just need to swap out some functions and certain things like that.

The syntax is gonna be a little different, but the ultimate technique is the same. This has been Ethan with Playfair+. Thank you for watching.