Ryan Sleeper
There can be value in viewing both actual data and normalized data. In this video, Ryan shows you how to combine parameters, calculated fields, and a secondary data source to toggle normalization by population on and off.
Hi, this is Ryan with Playfair Data TV. In this video, I’m going to show you not only how to normalize data in Tableau but how to add a toggle that allows your users to toggle that normalized data on and off. First, let me show you why this is needed over here in Tableau Desktop.
Let’s say we’ve got this US map looking at SUM of sales by state. Well, if we’re using any metric where we’re just adding it up, there are certain states that are going to generally have a better performance just because they have a larger population; states like California, New York, Texas, Florida.
These high-populous states simply have more people to buy stuff. It doesn’t mean necessarily that they are performing better. So what I’m going to do is normalize that data to divide the SUM of sales by the population to see how much we’re actually selling per person.
To create that normalized data, I’ve got a second dataset with two columns in it. The first column is the name of the state so that I can do a join to the Sample – Superstore dataset. The second column is its current population as of the last census.
I need to join the data source. And I’m currently only connecting to one thing. So I can simply click this Data Source tab to take me back to make some changes to that Orders table.
The first thing I need to do is add a second connection. I’m going to click Add. This is a Microsoft Excel file. So I will click Microsoft Excel. And the name of the file is “Population by State.” And I will click Open.
Tableau automatically created an inner join for me. Typically when I’m joining data sources in Tableau, I like to do a left join. The scenario is, typically, I’m working with the primary data source and realize that I need just a couple of extra columns to append to that first data source.
So instead of an inner join, which will keep only the matching records, I like to click on this Venn diagram icon and change that to Left. That will keep every single record from my original data source. And it will just append the population by state when there is a match. When there’s a name of a state on the right table, it joins to the name of the state on the left table.
There’s only two columns. That’s the only one that matches. So that’s it for my join. I’m now ready to go back to my Normalization Toggle sheet. And I’ve now created a joined data source that contains a new metric called “Population.”
Let’s do some quick quality checking, though, on a different sheet. I’m just starting a new sheet. And double-click on Population. And the answer that we get is 171 billion.
There’s a little over 300 million people in the US, so that number is quite inflated. This is a giant pitfall whenever you are joining data sources in Tableau. It can create duplicate records.
Because we only had one row per state in the Population data source but multiple rows per state in the Sample – Superstore dataset and Tableau can only do the join on that one dimension, it’s essentially adding them up over and over again. If I drag State onto the view, you will see the issue illustrated a little bit better.
Maybe I’ll just quickly turn this into a table and sort this in descending order. So this is telling us that there are 74 billion people in California. And that is a little bit inflated.
If I double-click on the Number of Records, what it’s showing me here is it’s taking that population in the state of California, and it’s essentially multiplying it 2,001 times because California appears that many times in the Sample – Superstore dataset. I show you on another video how to deduplicate joined rows in Tableau. But I’ll quickly show you a shortcut here to do it.
First, if I know that this measure is going to be used in the calculated field, I can right-click directly on it, hover over Create, and click Calculated Field. That saves me a few keystrokes because it puts the Population right there into the calculated field dialog for me.
And I’ll call this “Deduplicated Population.” And the formula is a level of detail expression, so it starts with an open curly bracket. I definitely encourage you to watch the videos on level of detail expressions here at Playfair Data TV if this is not familiar to you.
But this will be a FIXED level of detail expression at the State level. And we’re going to take the minimum. This just deduplicated those rows.
If we have 2,001 rows for California, it’s going to take the smallest number. So it essentially just turns those 2,001 records into one record. And we should end up with the correct answer.
These formulas, then, always conclude with a closed curly bracket. And I will click OK. And double-click on “Deduplicated Population.” And now we’re getting something that’s much more realistic, of 37 million. Let me remove State and see if we wind up with that number, just over 300 million. And we do.
So now I feel a lot more comfortable with this deduplicated population. So first of all, we’ve just deduplicated the rows. The next thing I’m going to show you is how to create a calculated field that normalizes that Sales amount.
So back to my Normalization Toggle sheet with the map on it. And this time, I’m going to create a formula. And I’ll call this “Sales Per Person.” And the formula would be SUM of my Deduplicated Population, just to get that in the box. That’s actually got to be my denominator.
So SUM of Sales was my metric. That’s my numerator, divided by SUM of Deduplicated Population. So that is Sales Per Person. We’ve just normalized that Sales amount.
If you want to see how this is doing, I could replace SUM of Sales with Sales Per Person. I’ll click OK and replace what’s on the Color Marks Card. And now we see a very different answer. California is not leading the way anymore.
I can also see from here that new normalized Sales Per Person is getting automatic formatting. Anytime you create a calculated field, I suggest you right-click on it and hover over Default Properties and set its number formatting. That way, as you start to use it, it will show up with whatever currency you set in this box.
For this one, it should probably be a currency still. I’ll leave it at two decimal places and click OK and see how we’re looking. So pretty small range there because it’s only two decimal places.
If I look at the color legend, this is going from 0 to $0.03 per person. So you might want to bump that up to three or four decimal places– but not necessarily the point of this video. So I’m going to keep moving for now.
We just normalized the data. The second thing I’m going to show you how to do is how to create a toggle that allows the user to decide if they want to see that SUM of Sales amount or the normalized value, Sales Per Person.
It starts by creating a parameter. So the way that I like to create a parameter is to right-click in any blank space on the Data pane and click Create Parameter. And I will call this “Normalization Toggle.”
And this is only going to have two outcomes. It will allow them to turn the normalization on or off. So I will change the data type to be Boolean because that’s binary.
Another thing I like about this is I can set the names of the aliases. So I could say “On” for true and “Off” for false. I’m just controlling the aliases of those binary outcomes.
That’s the entire parameter, so I’ll click OK. That parameter is going to do almost nothing on its own. I need to integrate it in a calculated field to give Tableau instructions for what to do with each of those two outcomes.
So the next thing I’ll do is create a calculated field. And I’ll call this my “Normalization Toggle Values.” You can call this whatever you want, whatever helps you find it.
This formula can be written in a couple of ways. But I’m going to use CASE WHEN logic because we’re referring to the same parameter for both of these outcomes. So whenever that is the case, I find CASE WHEN logic to be fairly elegant.
So I’ll just write “CASE [Normalization Toggle]”. And then on each line, I’ll write one WHEN statement, one for each of those two scenarios. So when that toggle is turned to True, then I want Tableau to show the Sales Per Person.
Whenever I’m typing out CASE WHEN logic, I always like to type the first line of logic. And then these always conclude with the word “END.” So I go ahead and type “END” just to make sure I’m on the right track.
That way I only have to troubleshoot one line of code if there’s a mistake, rather than typing out, say, 10 lines of code, getting to the end, finding there is a mistake. Well, then I have to troubleshoot 10 lines of code. So now that I know I’m on the right track, I’ll go type in the other scenario, which was when it is false, then I want Tableau to show the traditional, or the original, SUM of Sales value.
You’re seeing what appears to be a mix of nonaggregates and aggregates. But remember that Sales Per Person field was a calculated field where we already did the aggregation within the underlying formula. It’s SUM of Sales divided by SUM of Deduplicated Population. That’s why we’re not seeing an error message.
That’s the whole formula. I’m going to click OK. And I’m going to replace Sales Per Person with my newly created Normalization Toggle Values. I’ll drop that on Color.
We shouldn’t see anything change because we essentially just replaced Sales Per Person with Sales Per Person. But now if I show the toggle– here’s my parameter control, showing the Normalization Toggle– when it’s on, it shows Sales Per Person. When it’s off, it reverts back to the original SUM of Sales.
And now if I go to Presentation mode, let’s say I publish this up to Tableau Server, my user can decide for themselves. Do they want to see the original value? Or do they want to see the normalized value?
This has been Ryan with Playfair Data TV – thanks for watching!