Ethan Lang
In part two of our series on the Analytics pane in Tableau, Ethan walks us through the model and custom sections. Learn how to enhance your data visualizations with analytics by simply dragging and dropping them onto the view!
Hey, everyone. This is Ethan Lang with Playfair+. And in today’s video, I’ll be covering part 2 of my series on the Analytics Pane. So in today’s video, I’ll be covering the Modeling section as well as the Custom section of the Analytics Pane. So let’s jump in, and we’ll start with modeling.
And in this section, I will hop back over and jump over to average line with 95% confidence. We hop into this sheet. And you can see here, I’ve already prebuilt that average line, and it draws a 95% confidence interval around the average line.
Now, just to demonstrate this, again, very similar to the average and median with quartiles from the Summarize section. If I drag the average with 95% confidence interval, we’re left with the same different aggregations. So I can aggregate it at the table level, pane, or cell.
Once I have the average line here, it’s going to draw that average line. Again, very similar to the Summarize section, we can see that average line drawn here. But it’s going to wrap it in a 95% confidence interval that shows me the upper and lower bounds. What this means is as the data comes in to the view, or as we start getting more data, that average line is going to fall in between that upper and lower bound 95% of the time.
If we had an outlier, for instance, that came into the data that bumped it out of that 95% confidence interval, which will happen occasionally– that’s why it’s a 95% confidence interval– it will fall outside of that bound. But that gives us a really good gauge on where this average line is going to fall moving forward as we start working or getting more data. Again, a great way to benchmark.
Now, if I right-click on this and select Edit, I can see I’m presented with very similar options, but I also have some new ones here. So you can see, by default, this one selects line and confidence interval, which we covered previously when I was talking through the average line. However, I can also, now that that’s selected, I can change the confidence interval. So by default, it goes to 95% confidence interval, but I can change that to make it more confident by bumping it up 99%, 99.5%, 99.9%, or less confident– 90%, 80%, 50%.
And really, this is up to you and what you’re analyzing. If you want to be more confident within this data and the analysis, obviously bump it up a little bit. If you can be less confident, it’s just going to make the confidence interval a little bit wider. You can drop it down.
And that’s really just up to you. This 95% is primarily just an arbitrary number that statisticians have put out there. However, you can change it. It’s not set in stone. So they give you those options there.
Same as before, our Formatting section here, we have our line. We can do a fill above or below that line and all of those same options that we covered previously in the average line section. Now I’ll drag in median with 95 confidence, and I’ll drop that onto the table distribution here. And we’ll see it adds it in here, and it’s very similar to what we saw prior.
This is going to draw that average line. It draws that median line. And now we have the median with 95% confidence interval.
The cool thing, again, adding both of these simultaneously, it gives us these unique bands where we can start making more assumptions. So we can see that the average and median, they intercept. Their confidence intervals intercept within this area here. So that might lead to understanding more about your data distribution, understanding more about outliers potentially that might fall within your data, and maybe driving certain analyzes that you can make from that band.
Now, the next modeling that we’re going to cover is the trend line. So I’m going to jump back over to the primary view, and I’m going to update this view to Trend Line. And I’ll click into the sheet.
We can see here, I’ve added a trend or a trend line to this view. And this trend line is more than just showing a trend of the data. It is actually a model, so it’s running a regression.
And if we hover over, we can see some of the statistical summary that is built from this regression. So we can see what the actual equation is there. We can see some of the statistical summary values like R squared and the P value to see if it’s statistically significant or not.
And if we right-click into this trend line, we can describe the trend model. And what this is going to do for people that are more statistical savvy it’s going to show us all of those summary statistics. So now we can start getting into what are the degrees of freedom, what are the coefficients of this model, the standard errors, T values, P values. We can see all of that directly from this Describe model. I’ll close that here.
And I’m actually going to clear this trend line from the view. And I can do that by right-clicking on it and then just unchecking this Show Trend Line. And the reason I want to do that is because there are different models that you can incorporate. So if I grab Trend Line from the Analytics Pane here and drag it into the view, at the upper top left, we’ll see these options appear. I can add in a linear model, logarithmic, exponential, polynomial, and power.
And what I love about this, Tableau makes this really easy, even for people that are non-statistical savvy. We can see here that they’ve given us a picture of different shapes of data and maybe what trend lines would perform better depending on the shape of your data. So we can see if our trend line follows our data starting at the bottom left and moving up to the upper right.
Maybe we should use a linear regression to understand that data. If our data goes ups and downs, has some seasonality to it, let’s try polynomial. If it goes starting at the bottom and starts shooting up, try exponential. Again, these are just suggestions. And what I would always recommend is just maybe plop on a few and see what works best for your data.
For those of you that are more statistical savvy, obviously we want to check those summary statistics, see which ones are the most statistically significant and fit our data better. But we also don’t want to start making inappropriate assumptions either. So I’m going to drop this onto polynomial, which was where it was prior. And again, we can view all those summary statistics from here.
Now, just to show you guys the difference, I’ll also drop in a linear line. And we can see it updates our trend line to now this straight linear line or trend. Again, if we hover over, we can start viewing those summary statistics, just like before.
I can describe that model. And it’s going to show me all of the summary statistics underneath that make up that model. Close that out. And that is our trend line.
Now, I’m going to clear trend line from the view. And we’ll move on to our Forecasting model. So if I grab forecast from the Model section in the Analytics Pane and drag it over, we can see we only have one model– or excuse me– one option, which is to forecast our data forward. So if I drop forecast onto that option, we can see I now have something that was dropped into the color property on my Marks card, and it’s this forecasting model. It’s going to differentiate what is actuals– so our actual values versus our estimates, and this is what it’s estimating out in these periods moving into the future.
It’s also moved our– it’s created data essentially within our view. So before, my data ran through October ’21. And now you can see it’s extended that through December ’22.
Now, just like with all of the other models, if I right-click into here and I go to Forecast, I can go to Forecast Options. And this is where I can start changing the length. So if I wanted to maybe not forecast the next 13 months, but I can change it to, let’s say, the last six months– or excuse me– the next six months. I can select exactly 6. And then as you saw just a second ago from this dropdown, I can choose months, quarters, years, and so on.
I can also choose the source data. Here, I can change what it’s being aggregated by. So I can see it’s automatically by month. But I can change that to a different aggregation.
I can make it a little bit more precise by ignoring the last X months, and I can change those options from here. I can fill in missing values. And then my forecast model here, I can set it to Automatic or choose some of these different options. For now I’ll leave it the same.
And this actually gives us a little bit of insight on what’s going on underneath the hood of Tableau. Tableau uses a forecasting model called exponential smoothing, and you can see it calls that out right here. And then that’s what they use. If you wanted to do some more advanced statistical modeling, obviously we have external connections to R and Python that you can also bring in.
And then lastly, it allows us to change that confidence interval that’s wrapped around the estimates. It also gives us this really nice brief of what’s being done here. So it says, currently using source data from January 2018 to November ’21 to create the forecaster May ’22, looking for potential seasonal patterns every 12 months.
So this gives you, in plain English, what the model is doing, the source data that it’s using, what’s being used to create that forecast. So if you were trying to explain this, this gives you a great place to start. So you can start communicating to your stakeholders how you were able to build out this forecast. For now I’ll select OK.
And then the last thing I want to show you guys on the forecast is if I hover back over that, we just covered forecast options. But again, I can describe the forecast. And this is going to show me those summary statistics, very similar to what we saw with the trend line.
So here I can see the seasonality effects, the highs, the lows. I can switch the tab. This is the Summary tab. I can switch it to Models, and I can start viewing some of those quality metrics, like AIC or the mean squared error. Hop back over.
So you can really start viewing all of your summary statistics from right here. And this is going to give you an insight on how that model is performing, so you can start making decisions on whether this is going to be accurate or not and going from there. For now I’m going to close this Describe forecast. So I’ll just click Close. And that is our Forecasting model.
Now, the last one, I am going to hop back over to our view because I’ll need to change the sheet. But I’m going to select clustering, which is our last option within our analytics model section. And you can see here, I’ve already built in the cluster– and very similar to the way the forecasting model worked when I dragged the cluster onto this view. And I’ll just clear it for now.
So if I have a scatterplot or any kind of view that a cluster can be used on, this option here is going to go from a gray color to this darker color. If I drag cluster onto the view, we only have that one option, and that’s to add the clustering model into the view. And when I do so, we’ll be brought up with this menu. It’s going to ask me the variables I want to include within the clustering model. And then it’s going to ask me how many clusters I want.
Automatically it’s going to use its best judgment. But for now I’m going to select 4. And you can see here, that’s going to give me four distinct clusters. And that clustering model– again, very similar to the forecast– it’s added into the color property of your Marks card. So you can see its clusters here, and that’s actually the model being presented in the view visually.
Just like all of the other models in Tableau, if I right-click on clusters here, I can describe the cluster or edit it. If I describe the cluster, again, we’re presented with those summary statistics. So we can start gauging how well this cluster performs.
We can also view these values here– again, just gauging how well this cluster is performing and looking at those summary statistics. I’ll hop back over to Summary, and this gives us a summary diagnostic as well– so how many clusters, how many points are in the data, between group of sum of squares. All of those values, we can see directly from this Summary tab.
Now, if I close this, again, very visually we can see these clusters appear. And what I love about this is even for folks, again, that are non-statistically savvy, you can start making some assumptions about your data. For this particular analysis, you can see that building out these four clusters, these actually probably make sense just to the naked eye visually that we have low profit, low sales here, and then it works its way up to a medium range, a higher range, and then these up here would be considered almost outliers, if you will.
What’s great about clustering is it gives us a way to look at customers individually or products and cluster them together. So if we were trying to come up with maybe some sort of market-facing group or customer segmentation, how we’re going to market to these customers. Maybe it’s identifying which customers we need to pay attention to or which products we need to pay attention to improve them, or maybe we’re trying to make a decision on which products we want to continue working with, which products we want to maybe discontinue. So all of those very useful analysis bringing in clusters– I mean, just visually looking at the data getting a grasp for what you have.
So that covers clusters– actually, one more thing. The model that’s being used underneath the hood here is a model called k-means. That’s what Tableau has built in using the clustering. If you wanted to use a different clustering model, again, you can default back to your external connections, bringing in R or Python code within Tableau and visualizing the data, what it returns.
All right, I’m going to hop back over to our workbook, and I’m Going to select our Custom options. And within this tab– let me jump in here– the custom options or the section here is very similar to what we’d find in Summarize. But notice when I bring in the reference line. So a reference line prior was very similar to the constant line. If you look at the little icon, it even is marked the same.
However, notice when I bring in our reference line here, I’m presented with more options. So I can add that reference line to the table, the pane, the cell. I can add it to individual axes in each of those aggregations. So we can see that the Reference line, it gives us a little bit more options and flexibility than the Constant line does.
So for this demonstration, I’m just going to drop this onto table, and it’s going to add a reference line to both my x and y-axes here. Now, it also pops up with this menu here. Again, we’re given the scope. We can change it to per pane, per cell, the entire table, whatever we’re trying to aggregate against to calculate out some of these averages or find the minimums.
It allows us to choose the value. So I can select month of order date or create new parameter in this particular situation. From this dropdown, I can select Constant, Minimum, Maximum, et cetera. And again, I can change the label, the tooltip, even format the lines– all of that directly from this here. So that’s our Reference Line options.
Now, we also have here in our custom some new options that we haven’t seen yet, one being the Reference Band, another being a Distribution Band. I currently have a distribution band on the view. For now I’m going to clear that out. So I’m just going to remove this and cover it from scratch.
So I’ll start here with distribution band. If I drag this on to the view, again, we’re presented with these options on the scope or at what level of detail do we want to aggregate this at. If I drop it on the table, it’s going to draw that across the table.
And you can see here, we now have new options we haven’t seen yet. One is this computation, and it allows us to select either a value– so we can do that with percentages. This is the 60% and 80% of the average. So if we think about where that average line was, this is 80% of the average and 60% of the average.
I can also change this to percentiles– 95%, 90%, and so on. I can change it to quartiles, so I can see the upper and lower with the median. But one of my favorites is the standard deviation. We can change it to the standard deviation, and now we can start looking for outliers within our data.
For now I’m just going to use Sample as our default. And I’m just going to leave the factors the same. This is looking for plus or minus 1 standard deviation.
So I’ll close this menu. And now we can see it’s added in an upper standard deviation and lower bounds within this distribution band. Let me remove that for now. Again, using the standard deviation’s one of my favorite things to do with distribution band because it allows us to start analyzing if we have any outliers, and we can really start outlining them and highlighting them good or bad within our data using conditional formatting.
Now, again, I’m going to go back to our menu here and drag in a reference band. And you can see we’re actually presented with slightly different options here. Our reference band, I can draw not only across the pane table or cell, I can drop it onto individual axes as well. So if I wanted to drop it on my x-axis or y-axis, I now have the ability to do that individually.
And for this demonstration, I’m just going to drop it on my y-axis here, Sum of Sales. And we can see here, it defaults and it gives us this menu. It defaults to the min and maximum. So it’s drawing a reference band between the minimum value on the view and the maximum value on the view.
We’re also presented in this menu different options that we haven’t seen before. One is Band From and then Band To. And from here, I can start making different selections. So I have Sum of Sales as my value. I can choose the minimum to band from. But for now let’s choose Constant, so we can see what some of these others look like.
So if I choose Constant Value, again, just like our constant line, it defaults to the minimum value on the view. But let’s change that to, say, 50,000, like we did in our first example. We can see immediately that it implements that, and it draws that band from up to 50,000 here.
Now, Band to, that’s the top of the band. And again, I can do very similar things here. Instead of using the maximum, let’s say I wanted to use the median.
Now we can see it bands from 50,000 to the median. I can change that to average. We’re going to see it’s very close to the average. I can change that to the sum, which is a sum total of sales.
But for now, let’s just leave it at the maximum. So again, this gives us the ability in this Custom section, really, to start making very custom benchmarking or adding custom reference lines to our view or bands. It gives us a lot more options than what we were presented with some of the Summarize.
The Summarize section is definitely more easier to implement. It’s less clunky, if you will. So if you’re not familiar with some of these settings, adding a constant line is simple as dragging it there as opposed to a reference line. You have a little bit more options, a little bit more play there. So if you want to start getting used to the Analytics Pane, definitely start at the Summarize section.
The Custom section– again, you’re not going to hurt anything on the view. If you get lost, you can always just right-click and remove and just remove that from the view and start over. So don’t be scared to jump in there and play around with it. But really, that’s what this Custom section is all about. It’s just going to give you a little bit more options than what you were presented with up here, and you can customize to your analysis or to the story that you’re trying to tell using these reference bands, distribution bands, and lines.
Last, we’ll see we’re presented with a box plot option. Again, I won’t cover this in detail. It’s very similar to this. It’s going to simply add in a box plot, and it gives you a little bit more customization to your box plot.
So that is the Analytics Pane. Again, if you have any questions or want to follow through, this workbook is available on Playfair Data’s Tableau Public. So feel free to download it and play around with it. This has been Ethan Lang with Playfair+, and that concludes part 2, which was our final part of our Analytics Pane series. Thank you for watching, and catch you next time.