This video shows you how to create new fields from existing fields with the first of three categories of Tableau calculated fields. You’ll learn how to create aggregate and row-level calculated fields and some common pitfalls to avoid while writing them.
Hi, this is Ryan with Playfair Data TV, and in this video, I’m going to be providing an introduction to calculated fields in Tableau. More specifically, the focus of this video is going to be the first of three different types of calculated fields in Tableau, which are called basic calculated fields. Basic calculated fields include both aggregate calculated fields and row-level calculated fields.
The other two types of calculated fields can be found on other videos. Those are table calculations and level of detail calculations. But to begin, let’s start with the most common type of calculated field that you will most likely be implementing when you’re using Tableau Desktop, and that is an aggregate, basic calculated field.
I’m going to throw out just one real life scenario. Let’s say we have this two-row line graph showing sales by month on the top, profit by month on the bottom. If this were a real world scenario, that second row would not be providing much value to me, because I would expect it to follow a very similar trend to the first row. If I am selling more per month, I most likely made a little bit more profit that month. If I have a huge drop in sales, there is most likely a corresponding drop in profit.
What I would much rather know and what I think would provide more value is if I knew of each sales dollar that I brought in, how much of that was profit. So in other words, what was the profit ratio. The formula for profit ratio is SUM of profit divided by SUM of sales. If I look here on the left side, I currently don’t have any measures called Profit Ratio. But that’s where calculated fields come in. We are going to compute that measure, that new calculated field, out of existing measures so that Tableau does the math for us and we’re able to compute a profit ratio.
There are several ways to start a calculated field in Tableau. My preferred way is to right-click in any blank space on the Data pane, and it’s the first option down, Create Calculated Field. You can also click in the down arrow in the top-right corner of the Dimensions area, the first choice is Create Calculated Field. You can also click Analysis in the top navigation, and the fourth option from the bottom is Create Calculated Field. Regardless of which one of those you click on, it will open an interface where you can type in the syntax for a calculation.
The first thing you need to do is give that calculated field a name. So I will call this Profit Ratio. And the formula is the aggregation S-U-M, open parentheses, the first measure. So at this point, we’re creating the numerator for this calculated field. Tableau is so flexible that there’s even multiple ways just to get the name of this measure into the box. My preferred method is to begin to type the name of the measure, and Tableau autofills for us, showing us what measures or fields begin with those letters. There’s Profit, I click on it, and it’s in the box. If it turns orange, that means it recognized it as a field in my data set.
The second way to add this measure is if you know exactly how it is spelled, you can just type the entire thing. And as long as it turns orange, you know that it recognized it as a field in your data set. You also can just drag the measure from the Measures area of the Data pane right into this box. And that orange arrow is showing us where it’s going to drop it within the syntax. I’m then going to close the parentheses, and there is our numerator.
What comes next is our operator. In this case, we are dividing the numerator by the denominator, so our operator is simply a divided by sign. Now we need to type out the denominator, which was SUM of Sales. So just like the numerator, aggregation, open parentheses, my preferred method is to begin to type the name of the field. Tableau autofills for me and I can select that field and then close parentheses.
That is the correct way to write a profit ratio calculated field, but a couple of notes on this. Spaces or hard breaks within a calculated field do not matter. So what we’re looking at on the screen now is the same thing as this, which is the same thing as this. That’s going to be a personal preference. For this particular formula, which is so short, I would most likely go with this option, where I’ve got spaces between the operators but it’s all on one line. You will find that you can write quite complex calculated fields in Tableau, so sometimes it does help to break up the logic or methodology on different lines within the code, which will make it easier for you to troubleshoot later on if something goes wrong. But that is the correct formula.
I’m going to show you how this works in practice, and then we’ll take a step back and I’ll show you several pitfalls when you’re writing aggregate calculated fields. But for now, we’re good, so I’ll click OK. And I’m going to add this as a third row. So I’m just going to drag that newly created pill. So note, this was not there before. But now that we’ve created a calculated field, you see a new measure with an equal sign preceding it. And I just drag that to the Rows Shelf, which created a third row.
By default, as you can see on the new y-axis, calculated fields get what’s called Automatic number formatting, which is essentially no number formatting. What I recommend you do is go ahead and set the default formatting. That way every time you use this calculated field, it starts out in the formatting that you use most often. For example, a profit ratio, it should most likely be expressed in terms of percentages. So I’ll go ahead and set the default formatting of that measure to be percentages by right-clicking on it, hovering over Default Properties and choosing Number Format. And I’ll choose Percentage. I’ll go ahead and leave it at two for now. Of course, you can choose whatever you like. After I click OK, you’ll see the y-axis change.
Here’s my biggest tip for you, perhaps across all of Playfair Data TV, which is whenever you’re making a calculated field, quality check the result. Make sure you trust that calculation before you add it and start analyzing it and/or adding it to reports. You’re going to see why this is so important in just a minute. But just remember to always quality check your calculated fields. And the way that I do that is I simply open a calculator on my machine and I do the math across a couple of these, just to spot check them.
So I’ll hover over January 2016, and I see that the numerator should be 2,450. I will then divide that by the denominator, which needs to be 14,237, and click Enter. Percentages would be this number multiplied by 100. So the number we are looking for in our newly created calculated field is 17.21 percent. And if I hover over January, sure enough, that is the result that we get.
Now that I trust that number, and this is what’s so nice about calculated fields in Tableau, I can use that measure on another view. Even if these individual elements of Profit and Sales aren’t anywhere on the view. So for example, I’ll start a new worksheet, double-click on Profit Ratio and break it down by the Category dimension.
And if you’ve ever looked at the Sample Superstore data set, you know that in terms of sales– actually, I’ll just throw that on the view as a second column here. In terms of sales, Office Supplies and Furniture are neck and neck. In fact, Furniture is a little bit higher than Office Supplies. But now with our newly created Profit Ratio calculated field, there is a new story emerging that shows us Furniture is actually way behind in terms of profit ratio. That’s a new insight that we did not have visibility into until we had this ability to create calculated fields.
Another thing that is really nice about calculated fields in Tableau is they’re reusable, and you only have to make them one time. So I’m thinking of Excel. We’re all familiar with Excel. So to put it in those terms, this would be very similar to if we had an Excel spreadsheet and in cell A1, we had the Profit value. In cell B1, we had the Sales value. Add in column C1, we typed equals A1 divided by B1 to compute the profit ratio.
The reason that this is so much nicer and scalable in Tableau as compared to Excel, is in Excel, we would have to then drag that formula into every single cell that we want it applied to. In Tableau, that’s not the case. We create it one time, and whether or not those individual elements are on the view, it works. We don’t have to worry about putting it in all these different correct places to get the result that we’re looking for.
But speaking of getting the result we’re looking for, let me cover a few potential error messages and pitfalls you might come across when writing these types of calculations. Once you’ve created a calculated field, and you see an equal sign preceding that measure– or dimension, you can also make calculated dimensions– you can edit those calculated fields by right-clicking on them and clicking the Edit button. So this is what we started with.
A couple of errors that you might come across. If you’re missing a parentheses somewhere. Instead of Tableau telling you that the calculation is valid, we have this red error message. It says our calculation contains errors. You can usually get a pretty good hint about what’s wrong by clicking on that red error message and reading what this is telling us. It says it expected a closing parentheses or comma while parsing argument list for sum. I look across this one row I’ve got, and sure enough, I forgot this parentheses. Once I put that back, we’re back to being valid.
You also need at least one operator in here. So if I deleted that divided by sign, we get another error message. This one’s different. It says syntax error. Maybe you are missing an operator. You’re right, I missed the divided by symbol between my numerator and denominator. But perhaps the most confusing error message and the one you’re most likely to come across the most is you cannot have a mix of aggregates and non aggregates.
So what I mean by that is if we aggregated the numerator as SUM, but we left off the aggregation for the denominator, we’ll get an error message because we’re mixing aggregates with non aggregates. Let’s read how Tableau describes that. It says we cannot mix aggregate and non aggregate arguments with this function. Once we add that back we will be valid once again.
So that is the correct way but by far the biggest pitfall is related to this concept of aggregation. I’m going to make another calculated version and I’ll call this Profit Ratio Wrong Way. And if I were to leave out aggregations altogether and just type Profit divided by Sales, this is a very important thing to note. Tableau is telling us this is valid. But just because Tableau tells us a calculation is valid, it does not mean that it is computing the correct answer. So again, back to my biggest tip that I’ve got for you. Just make sure you quality check any calculated field that you make, and you will catch any type of major error before sharing it at the office.
So I’m going to click OK and add this. Well, first to make it apples to apples, I’ll also change the default properties and make this a percentage. But just practicing what I preach, if this were real life, I would first quality check this formula by adding it as a fourth row. And we’ve already done the quality check to know that the answer is supposed to be 17.21 percent. But if I hover over the wrong version it came up with 1,015.8 percent. That is an extraordinarily high and unrealistic profit ratio. Profit ratio is a subset of sales. Per sales dollar, some of that money is going to be profit. We can’t have made 10 times as much profit on that sales value.
What’s happening is very subtle, but this has to do with the aggregation. We could have saw that something was wrong by looking at the pills on the Rows Shelf. When we did this the correct way, we see there is a new aggregation that says AGG. That’s Tableau telling us that that calculated field is being aggregated within the calculation. When we did it the wrong way, we’re back to the default aggregation, which is SUM. The reason Tableau told us that the calculation was valid is because it’s able to do the math correctly on each row. But what’s happening is, it’s doing Profit divided by Sales on 9,994 rows in the Sample Superstore data set. Then it gets to the bottom and it adds them all together. That’s why this says SUM, and that’s why it’s so highly inflated.
When it’s doing the aggregation within the calculated field– so I’m going to look at this the correct way one more time. It does this aggregation at the viz level of detail. So on this chart in the background, the viz level of detail is currently Month. That’s the most granular level of the analysis. It’s summing up the profit values for each month, creates our numerator. It then does the same thing per month for the sales values, and then it divides it by each other. So we’ve got January profit divided by January sales. And we get the correct answer. Moves on to the next month.
In the first case, it’s doing it per row and then just adding it all together and we get this very highly inflated number. So that’s a major pitfall that you can avoid simply by quality checking your calculated fields. Once you trust the calculated field, you can roll it out across your different reports.
The other type of basic calculated field is called a row-level calculation, because instead of aggregating or consolidating multiple rows into a single row, the second type of calculation does the number or text crunching per row in the data set. We’ll have a result for every single underlying row. A good example of this is with string calculations. We’re not adding up text, but you can do custom manipulations and segmentations on strings, and the result will be on each row in the data set.
There are lots of examples of this. So I’m just throwing out one random, real life example. We’ll say we want to create a new field that parses out the last name of each one of our customers. This one starts with the function SPLIT, open parentheses. Anytime you see something blue in a Tableau calculated field, you can click on it, and it will look it up in this little data dictionary on the right, give you a definition of what it’s doing, and how to use it in the syntax.
You can also browse all these different functions if you want to learn all the different options you have available to you. These are also categorical. So instead of having this list of several dozen options, I could narrow this down to just String types of functions, and I could browse through all of those. And again, this is just one random example. Because as you can see, there are many, many functions in here to learn.
If you don’t see this little data dictionary, you most likely see a calculated field dialog box that looks like this. To open the data dictionary, simply click this button to open that up. So we’re on SPLIT. What this does is I’m looking at the syntax here, is it looks at the string that happens before the first comma. So in this case, that will be our customer names. Then we type a comma. What comes next is the delimiter.
So if we’re trying to parse out the last name, theoretically, I need a space as a delimiter. So I’ll type tick mark, one space, and then comma. And then what comes next is the piece of that split that you want to keep. I’ll take the second piece of the Customer Name after the space. So I’ll type a two. So just to review this one more time before we go ahead and click the OK button and accept it. We’re using the Customer Name field, which was in our underlying data. We’re going to split that Customer Name by looking for this space delimiter, and we’re going to take the second piece of the Customer Name, or the first piece after that space delimiter.
I’ll click OK. We see a new calculated field. This time it’s up there as a dimension because it is qualitative and the results are static. This is a row-level calculated field, so we will get a result for each row in the data set. Just like I like to quality check my calculated fields that are built with measures, I also like the quality check my calculated fields that are built with dimensions or strings. But the way to do that is slightly different.
What I like to do is put the original dimension on the view first, followed by our newly created version, second. And just glancing down this list, that Last Name should parse out everything after the first space. So in real life, theoretically, if the person had three names, that might throw us off a little bit and pull in the middle name, occasionally. Yes, there’s ways to get around that, too. But I just wanted to show you one quick example of a row-level calculated field.
If I were to view the underlying data, we should see a new column called Last Name. And again, this is called row-level because per Customer Name, we’re seeing the last name get parsed out on every single row. That’s been basic calculated fields in Tableau.
This is Ryan with Playfair Data TV – thanks for watching!