Nick Cassara
This video will cover using basic techniques in Tableau Prep Builder to filter, aggregate and union a dataset. The goal of this video is to help new users get familiar with the offerings of Tableau Prep.
Hi, this is Nick with Playfair+, and today we’re gonna be going over kind of a quick start guide to Tableau Prep. This really is targeted as just providing a brief overview if you’re new or kind of just starting out. Awesome, and so once a Prep Builder is loaded, you’re gonna be greeted by this screen here. Now let’s just take a second to kind of review what’s going on here before I get in there and start building workflows.
At the top here, you’ll see this “Open a Flow”. So, if you have a Tableau Prep flow file saved locally on your computer, you can click this, it’ll pull up any ones that you might have saved. You can click that and hit “Open”, and Prep Builder will kind of load that automatically for you. Right next to that, we have “Connect to Data”. That’s gonna pull up kind of this pre-populated menu of different data types, whether it’s a database or a server or a file. They already have kind of pre-configured options for you to kind of connect your appropriate data. Underneath that, we have “Recent Flows”. Now, if you’ve used Prep Builder before, like me, these are kind of different files that I’ve worked on. So, there has kind of like a quick access menu here for me to quickly jump back into them. If you’ve never, this is your first time opening, you might see nothing here, which is expected. And below that, we’ve got “Sample Flows” provided by Tableau. And really just to help you get a feel for kind of how to perform manipulations and stuff like that, things you can build off of, if you will. So, without further ado, let’s go ahead and get connected to our data source, and then we can kind of kick this off. For this video, I’m gonna be using the Sample Superstore dataset provided by Tableau, but you can use any dataset you want. I’m really just trying to demonstrate concepts more than a specific workflow. Really just kind of want you to get a feel for the different things you can do. And you can feel free to take whatever I do in here and tweak it to your own use cases as necessary. So go ahead and hit “Connect to Data”. We’re gonna get that menu we talked about. And then there, the file I’m using is in a Microsoft Excel format.
And once I’ve gotten to the file here, I’m gonna go ahead and just click on it and hit “Open”. And from here, this can kind of populate this blank canvas here. So, kind of looking at the left here from top to bottom, we can see here we have this arrow, which is something I kind of want to cover before we get too far in. But if you see this and you see this “Connections” and this ‘+’, this pulls up the exact same menu as the “Connect to Data” button would. So, you can kind of go about this multiple different ways. Also, if you’ve already loaded data, you can hit this and load in another dataset as well, maybe two kinds of related or similar, but not the exact same tables that you might want to perform calculations on and ultimately join or union together. That’s kind of how you would go about doing that if that’s your end goal. And then from there, you can see under here it’ll list the different data connections that you have set up.
So right now, I just have the Sample Superstore Excel file loaded in here. And then under there, you can see the tables in that data. So right now, for this video, I’m gonna go ahead and just use the “Orders” table. But we also have the “People”, or the “Returns” can kind of work with it in any Excel file or any file you import with multiple tables in it, you will see the same kind of similar options here. So, we’re gonna go ahead and use “Orders” and just drag and drop that onto the canvas, easy enough.
And then I want to focus on this bottom right-hand section here. So, you can see here we’ve got working from left to right, the type column and little icons to kind of represent the different data types that our columns are. So, the hashtag being for integers or numbers, the ABC to represent a string, so kind of any text field we have. Little calendar icon, which I know it can be kind of hard to tell there, but that little calendar icon to represent date values. And then there the field name is just the column name and changes right now, which is blank, represents kind of any manipulation or transformation we might perform on the data itself. Let’s say we filtered something out. We didn’t want row ID two, that change would show here. And then lastly, the preview shows you exactly that, a preview of the values inside that different column there. So, give you kind of an idea of what the format of the data in there is gonna look like when you’re writing your filters and your calculated fields.
On the left, just real quick is just a path to where the file is saved. If you needed to reference that, that’s there as well. And if you wanted to change this input file, believe you could just hit browse and pick something else new that you wanted to. And connect that way. Now the first thing I wanna do is we’re gonna focus on an aggregation. There is quite a bit of data in here. And so, for this example, I’m really just gonna try to focus on a really kind of small subset of this to just demonstrate different techniques. So, what we’re gonna do is we’re going to take our regions, the sales regions included in this dataset, we’re gonna take the different categories of products that they have, and then we’re gonna take our sum of sales. And really what we’re gonna try to do is find the most profitable category in each of the different regions. We’re gonna split them out. We’re gonna kind of perform some formulas, some calculated fields. We’re gonna put everything back together and then we’re gonna create an output. That’s just kind of an overview of where we’re gonna go from now to the end of this video.
To do that, we’re gonna first start by creating our aggregation. And when you move your mouse, you can see here this ‘+’ and you click here, you kind of reveal this menu here. We’re gonna wanna go ahead and hit aggregate. That’s gonna create this new icon here on the table. And what we’re gonna wanna do is we’re gonna scroll through. So, on the left, we have all of our fields, just kind of anyone we wanna pull from. Looking at the left-hand side of the right side of the bottom pane here is grouped fields. On the right-hand, right-most side there is our aggregated fields. So, to kind of give you a demo of how this works, what we’re gonna do is we’re gonna take our region, we’re gonna hold and drag it onto the grouped fields.
That’s gonna first split out the four distinct regions that we have in this data set. Then we’re gonna wanna take our categories and we wanna take the unique categories in those four regions. And so, you can see here we have furniture, office supplies and technology in each of these four regions here. And then lastly, all I wanna do is I wanna take my sum of sales and I wanna drop that onto our aggregated fields, since that’s the field that we really need to aggregate up. And so, you can see here that’s gonna give us a sum by default and so it’ll give us a sum of all the total sales per category, per region. Sweet, so now that we’ve got that foundation set up, what I wanna do now is I kinda wanna break out each of the regions into four kinds of separate streams. And then from there, we can kind of work on each region individually, give you, talk a little bit about organization as well, and kind of show you how we might approach a workflow and a business problem using Tableau Prep. So, what we’re gonna do here, again, we’re gonna go to the ‘+’, but this time we’re going to add a clean step. Now, I know we could just go here and hit filter values. There is kind of always multiple ways to accomplish the same thing, but for organization’s sake and really just to kind of give you a visual demonstration, I’m gonna go ahead and kind of break steps out as necessary.
So here we’re gonna go ahead and add our clean step. We’re gonna go ahead and add filter values. And when you click on this, anyone who’s worked in Tableau Desktop before should kind of recognize this screen. Same one I think you get when you are creating a calculated field. So, calculation on the left, you can type in your formula manually and then on the right, kind of the trove of built-in functions that they have. You can kind of pull from it. I’m pretty sure if you click on one, it’ll kind of set up the syntax for you to fill in on your own. So, for this, we’re just doing a really basic filter. We’re gonna take our region and we’re just going to filter out the central region.
Remember for string data types, you want to surround our value in these quotes here. If it was a numeric data type, we wouldn’t really need the quotes. Actually, it would be mad at us if we used them. So, I’m gonna go ahead here and just filter out the central region. And you can see here at the bottom, that took effect and now we only have that region, only the categories in that region. And only the sales applicable to the categories in that region. On the vein of organization that I brought up earlier, when you’re building out a workflow, especially when you’re tackling more complex problems, and you might have more streams and you’re really isolating little bits of data to kind of perform different things and bring everything back together.
It can kind of resemble what I call spaghetti. It’s just gonna look like a mess, a bunch of lines everywhere, things going on. So, in the interest of not kind of getting lost in the systems that we build and the pipelines and the processes, I like to kind of organize my steps as I go through them, really kind of break down what’s going on. So then that way if I run into a problem, which we are gonna simulate in this video, it’s really easy to kind of trace back and kind of QA check yourself. Here, if you double click on the name, it’s gonna pull up this kind of rename menu option. You can just backspace that out. I’m just gonna name this the region that we’ve broken out here, which is central.
Now from here, what I wanna do is I wanna repeat this with the other three regions and I really just want to split them out, so I have four separate streams coming off of my aggregate step here. So, if I move my mouse and I come back here, you can see here, I get a ‘+’ sign kind of at different parts of the arrow.
And clicking at these different parts is gonna kind of do different things. So, I wanna kind of quickly go over that just to show you, so in case you’re working in here, what happens when you do that. So if I go here and I click the ‘+’, that’s right, kind of bordering this central pill, and I insert a clean step.
It’s gonna go ahead and break, like add a clean step in between the filter that I have going on and the aggregation. It’s essentially gonna add a link to the chain kind of in the middle and insert itself there, which is not what we wanna do in this case. We kind of wanna break out. So, if that is what you’re looking for, that is how you do that. But just in case that’s not what you wanna do, that’s what clicking there will do. So, we’re gonna go ahead and delete that. Deleting it will kind of shove the other links of the chain backwards, which is kind of nice.
Now, if I go ahead and click in the middle here, it’s gonna do the exact same thing. It’s gonna insert in the middle, it’s gonna kind of interrupt our chain and add an extra step in this already kind of built out flow, which is not what we wanna do either. So, you’re gonna wanna clear that out. If I want to create a separate stream that branches off, what I need to do is click the ‘+’ that’s closest to the step beforehand. And then if I add my clean step, as you can see here, that’s gonna add a completely different color-coded stream that breaks off separately that I can now kind of do things without kind of impacting this data going on over here. What we’re gonna do is the same kind of concept. We’re gonna pick our region that we want. I’m just gonna go down this list that’s going on over here. This time we’re gonna filter out East and we’re gonna save that.
And then I’m just going to rename this step East in case we happen to have any problems, I know exactly where I’m going to fix that. So, let’s go ahead here, add another clean step, we’re gonna filter and go to our region. And this time we want South. We’re gonna save that. Just double check to see South is there.
And let’s rename this to South. And then lastly, let’s add our final one in and we want West. So, let’s do region and then West and then save that out. Perfect, perfect. Okay, awesome. So now we have got our four regions kind of broken out into these four different streams coming off of our aggregation step. And so now we can kind of work on each of the regions. We’ve got all of our steps appropriately named so we know at this filter exactly what’s going on. From here, what I’d like to do is right now if you look at our data, we have our region, we have our category and we have our sales. Now, I know we only have three records here. So, if we wanted to figure out the most profitable, it probably would be a pretty quick kind of glance over the numbers. But let’s just pretend we have a bunch of data, and this is not easily interpretable and we’re kind of trying to figure it out. We wanna organize our data. That’s what I’d like to focus on next is adding a Calculated Field or might be referred to as a formula in other tools, but Tableau goes by Calculated Field.
And we want to organize our sales and our categories so that our most profitable category is on top or at least has a marker. And then that way we can quickly discern what our most profitable category is in each individual region. So, let’s go ahead and try and tackle that now. So, what we’re gonna do is we’re gonna hit create Calculated Field. Just like a filter, it’s gonna kind of have very similar layout. I’m gonna go ahead here and rename this and I’m gonna call this profita-, if I can spell, profitability sort.
And I’m going to write my formula here. Now, before I do that, I just wanna cover the field name is gonna create a new column in our dataset, just exactly like it would in Tableau Desktop. It’s gonna create a new column. And so, this field name is gonna act as that. Now, when we bring our data back together, we’re gonna need to make sure that the name of all the formulas that we add to the four different streams matches up. Otherwise, when we union everything, we’re gonna get kind of mismatched stuff. We’re gonna get nulls, the dataset’s not gonna look how we’re intending it to look. So, I just wanna call that out now that that’s what that field name serves as. And so just to be extra careful if your goal is to union some datasets back together and stuff, just make sure your naming is and you’re spelling, and all that sort of stuff is consistent. So next what we’re gonna do is we’re going to use the ‘Order By’ function. This is not the only way to do what I wanna accomplish here and I get that. But for all intents and purposes, that’s what I’m gonna use here.
There’s a bunch of different ways and so if you wanna use something else to accomplish just to kind of get some practice, get familiar with what’s available in there, I heavily encourage that. You know, there’s a bunch of things under the hood here and experience is the best teacher, so play around as much as you want with that. But for this, I want to order by and I wanna order my sales column. I wanna put it in descending order, I wanna add a rank. You see here our calculation is valid. I’m actually gonna copy this because we’re gonna recycle this amongst the three other streams we’ve got going on. But I’m gonna go ahead and save.
And so, if I pull this up, see here that it has now sorted in the western region. It has sorted furniture to the top, gives us our sales numbers so we can confirm. And the one ranking to show that this is the highest amount. So, it’s in descending order from highest to lowest. Just a quick glance can confirm that that did what we wanted it to do. So, we’re gonna go ahead here and what I’m gonna do is I’m gonna just apply this again. Again, just making sure to add my column, spell them correctly, and we’re gonna add my formula and save. Make sure that adds. And let’s go ahead here and we’re gonna do two more times. Let’s go ahead and add this. Sweet, let’s do this one.
Now, to show you what happens if you make a mistake and let’s say accidentally spell something wrong, I’m gonna call this one profitability sorter, different than the other ones. I’m gonna add the same formula and everything. And I’m gonna go ahead and save that. Now the logic still seems to be okay. And what we’re gonna do now is we’re gonna try to isolate our top performer or top selling category in each of the different regions. And then we’re gonna try to kind of slam everything back together like a sandwich. What I’m gonna do for this is since we’ve already kind of done a couple things in one, I’m gonna add a new clean step to help with visualizing. I’m gonna go to filter values and this time I’m gonna take the new profitability sort column we’ve added and I’m just gonna pick number one. Now because profitability sort is a numeric data type, I’m not gonna surround this value in quotes like this.
Like this, as you can see, Tableau Prep does not like that. It’s not correct. It’s not correct syntax for what we’re trying to filter. So, we’re gonna go ahead and just leave the integer alone as is and we’re gonna save. And now you can see our number one performer is isolated and if I wanted to just have a gut check, little confidence as I’m going through this here, I’d go ahead and just click on the last step.
Double check at the bottom here, make sure furniture, the sales number, and it is ranked number one. And then when I click on this, I’ve got furniture, the same sales number, and it is ranked number one. So, working as intended, we’ve isolated out the top performer for the western region. And in the theme of staying organized, I’m gonna call this top performer. And we can just add a little, well, we don’t need to, we already know we’re working in the west stream. So, we’re just gonna call this the top performer so we know that that’s what this step was accomplishing was just isolating our top value. So, let’s go ahead here and let’s do the same thing. We just wanna pick number one, I’m gonna save that. And we’ve got that there. Let’s call this our top performer and same thing here. Let’s go here and pick number one, save and rename.
Again, repetition is really key with this. You do something a bunch of times, you practice a bunch of steps and really drill it in, you cannot forget it. Kind of like a catchy song.
And so, let’s go here, profitability sort is one, save this, awesome. Okay, and let me not forget here. Okay, so now we’ve got our top performers all isolated out. What we wanna do now is we basically kind of wanna put everything back together. So, let’s go ahead here off the top one and we’re gonna hit the ‘+’ again. That’s where most, if not all of the main options you’re gonna be looking for are gonna be nested. We’re gonna go in here and hit union and that is gonna pull this icon that kinda looks like a stack of bricks together.
Now this brings me to another thing I wanna talk about, kind of on the same vein as where to click the ‘+’ when you’re adding a new step versus creating a new stream. If I want to add my other streams into this union, I need to do it kind of in a specific way. And you’ll see here, I’m gonna click and hold on this step.
And it’s gonna give me a kind of three option menu here. Working from right to left, we’re gonna have the join option. And so, if we had shared keys and I wanted to horizontally connect my data sets, I could do that here. Join with this union is what it would do. If I go down here to union, it’s actually going to create another union using this union as one of the parts that make up a new union. So, it’s gonna create a union between this top performer orange that I have highlighted and this union here. Now if I just wanna add my stream into this union, all I’m gonna do is just drop it over add and that’s gonna connect them together. So, I’m just gonna go ahead and repeat that again. We’re gonna highlight this, the options come up and drop it over add. And one last time, drop it over add, perfect. And so now we kind of can see our stream expands out here and collapsed back together here. So, what we’re gonna do is we’re gonna click on this and now I wanna take a look and see how our dataset is looking. So, any nulls, anything we should check on. Basically, just gonna use this as a mini-QA checkpoint. And just looking, we can already see we kind of have one pain point that we might wanna address. And so, profitability sort is mostly complete, but it does have a null here. And when we look here, we now actually have an extra profitability sorter column that’s filled in, but the rest of it is null. And so, this kind of touches on why we wanna stay organized in our work; because it can really help us cut down time and quickly isolate or fix things like this, which are natural things.
These sorts of things happen as you’re building. There is no perfect way to do this, and everybody makes mistakes, kind of just with any new skill or anything you work on. And so looking at this, the nice part about staying organized is we can quickly look and say, okay, we have a profitability sorter, it only has one value. And the only region it has a value in is our south region. So that tells me that our issue lies in the kind of south region stream we’ve created here. And so, if I go back here and take a look on the left here where I have the changes, I have my filter where I specified south and then underneath that, I can see here that I actually created profitability sorter instead of profitability sort. So, I can go ahead here, click on this, click this pencil and just correct the naming and hit save. And that is going to dynamically actually update this filter. That’s a Tableau Prep perk, but that’s not the standard. Like not every tool is going to dynamically correct all the stages upstream.
Sometimes you might actually have to go in and do that yourself. You might have to take a look, make sure that everything is updated, or you might encounter a bunch of errors. So that’s kind of why I was stressing so much about how you name stuff to kind of keep things consistent. Because you can kind of create a ripple effect that just grows. And if we hadn’t had everything organized, I would have had to go through kind of everything here and just double check and see where my kink in the chain was. Just another argument for staying organized while you do this. But if we go ahead here and click on union, I can see here now everything should be lined up. We’ve got our five columns here, no nulls, everything is filled out, it looks nice and pretty. And we actually are ready to go ahead and export this out for some sort of use case. So, let’s talk about that next. What we’re going to do is we’re going to hit this ‘+’ and we’re going to now drop down to our output step and pull this back up.
We’re going to kind of be greeted by a few different options here. First is this drop down. So, we’ve got File, Published data source, Database, and CRM Analytics. Going down the list, file is creating a local file. You can store that wherever you want. Create a CSV, a hyper file, an Excel file. You have a couple options there. We’ll go into that more as we go to create this file. But making our way down the list just to give you a brief rundown, published data source will be on Tableau Server. If you wanted to create a data source that goes right up there, that you could then build a dashboard off of or a Tableau product of sorts. You can push this output from this flow directly up there, which is really great. If you are trying to kind of, if maybe you have a dashboard that already exists and you just needed to dynamically, you want it to quickly update something, you do that and push it right up there and that change is already on server for, for you, which is great. Got database as an option. So, if you’re writing to database or CRM analytics, which is a new offering from Salesforce, I’m not going to go into in depth what that is, but you can find more information there. I also wrote a blog called The Quick Start Guide to Tableau Prep, similar to this video, just more of a written breakdown of what we’re doing here. And in there I’ve linked out to where a more in-depth breakdown of what CRM Analytics is, so if that sounds interesting to you, go ahead and take a look there. But moving on, it’s kind of down here. This browse button will open up your file explorer and it will allow you to pick a new location. So, we are going to do that, but I just kind of want to go down some of the options here and then we’ll circle back over there. But that will allow you to edit this location of where your output file is going to write to. So, if we go to our output type, as I was talking about before, you can create a hyper, an Excel workbook and a CSV.
So, for this video, we’re just going to create an Excel, keep it simple. And we’re now prompted to create a sheet. So, when you open up the workbook at the bottom, there’s usually a little sheet tab that tells you where you are. So, we’re going to call this one top performers, short and sweet on theme. And since that here, we’re just going to hit this create new worksheet. And then at the bottom, this worksheet is also, remember on the left here at the beginning of this video, we talked about tables. So that’s what table and worksheet you can think of as kind of synonymous terms in this instance, we have different options. In this case, we’re just going to go create this sheet. I don’t have this file existing yet. And we’re going to go ahead and just make a new one. So now we’re going to circle back up to browse that I brought up beforehand. And what I’m trying to do here is I just want to go to my desktop. That’s the path. And then you can see here it’s picked output, as the name by default.
I’m going to call this final sales report. We’re going to go ahead and hit accept. You can see here that updates the location, in Excel. And it’s going to create this final sales report file on my desktop with the worksheet top performers, which we will take a look at in a second here. So now all that’s left for me is I can either hit run flow or I can hit this plus, or this play button here. So, I hit the play button here. It ran very quickly. It’s done. And now just to make sure it did what I was hoping it would do, we’re going to minimize this. You can see here on my desktop; I have this final sales report Excel. If I go ahead and click on that, it’s going to pull this up here. And then I just expand this and get a better view. You can see here we have the columns that we were working with in Tableau Prep, our sort, our different regions, our categories, and then sales numbers here. And yeah, you can see at the bottom here, the sheet is called top performers. So exactly what we were expecting. And so now let’s just hop back into Tableau Prep Builder. We can kind of cover a recap and call it a day. So, in this video, we started by importing a file. In this case, we used the Sample Superstore from Tableau. From there, we aggregated our data. So, we broke out our regions, our categories, and then we took the sum of our sales in each of those regions and categories. From there, we then filtered out the different regions into separate streams so that we could kind of manipulate them individually. And then we performed Calculated Fields or formulas on those to help us with our sorting and deciphering of what was actually our top seller in each of those areas. From there, we then took all of our data, slapped it back together like a sandwich with a union, made sure that our data looked good. We did a quick quality check and actually ended up fixing a small bug. And then we created an Excel output that we saved to our desktop and double-checked. This has been Nick with Playfair+. Again, thanks for tuning in and I’ll catch you on the next one.