5 Things I Do When Working with Data for the First Time Help get your analysis started the right way What is the first thing you should do with a new dataset in Tableau? Well, here are five useful tips that will help you get started analyzing an unfamiliar data source.

5 Things I Do When Working with Data for the First Time

Help get your analysis started the right way

What is the first thing you should do with a new dataset in Tableau? Well, here are five useful tips that will help you get started analyzing an unfamiliar data source.

Hi, this is Ryan with Playfair Data TV. And in this video, I’m going to show you a few things that I like to do with the data source, when I’m working with if for the very first time. I’m going to show you how to view the underlying data, describe individual fields, reclassify dimensions and measures, set the default formatting on a measure, as well as view the number of records in the data source.

The first thing that I like to do is just generally get acquainted with the data source. You can do this in a couple of different ways. If you’re connecting to the data source for the first time, you would have gotten a preview of the first 1,000 rows, when you’re connecting to it. You can always get back to that, by the way, by clicking the data source tab in the bottom left corner of the authoring interface.

This is the screen that I’m talking about. As I mentioned, you can see the first 1,000 rows. There’s a different way to get to something similar though, so I’m going to jump back to sheet number one and show you a couple of other ways to connect or to view the underlying data.

First, is you can right click on the data source under the word Data on the data pane. And the third option is to view the data. When I click that, it opens a preview of the first 10,000 rows of data. This entire data source only has 9,994 records. So this actually is the entire data source that we’re looking at now.

One other way to get to that screen is to click Data in the top navigation, hover over the data source and click View Data there. But this is a way to just generally get acquainted with the data source, see what fields are in there. What dimension members are there within each column? What are the format of certain IDs, dates? There’s quite a few things you can see in here, but just to get kind of a high level of acquaintance with the data source.

If you ever need to learn more about an individual field, there is a separate method I like for that called describe. To describe a field, you simply right click on it. I’ll right click on the category dimension. And the last option is to describe that field. This gives you some information. Such as, which column is the does that field come from? What type of data is it?

But my favorite way to use the described function is to click this Load button in the bottom left corner of this little interface. When you click Load, it gives you a preview of the first 20 dimension members within that dimension. The category dimension only has three dimension members. So this is actually all three of them. But this is a good way to see what the dimension members are without having to use that field in a view.

The next thing I like to do is take a look and see if Tableau has classified the dimensions and measures as I want them to. So remember those definitions, a measure is anything that’s quantitative. And a dimension is anything that’s qualitative. Those are the default ways Tableau is going to classify these fields. But that’s not always the case.

The most common example of where that’s not necessarily true is with IDs. If you have some type of customer ID, for example. And its customer number one, customer number two, customer number three, Tableau’s going to see those numbers and automatically classify as a measure.
Another rule of thumb I’ve got. I don’t have this documented. But another rule of thumb I have when deciding if something should be a dimension or measure. Is I ask myself, would it ever make sense to add these numbers up? Would you ever add up customer number one, plus customer number two, plus customer number three to get to six? No. That number means nothing.

Instead, you would take a measure such as sales. And then break it down by customer ID to figure out how many sales I had per customer. So in that case, customer ID should actually be a dimension and not a measure.

If you’re ever in a similar situation and need to flip something from a measure to dimension. You can right click on it. So I’ll right click Discount. Don’t follow along, I’m just using this as one quick illustration. But this actually is a measure. So I’m not going to switch this permanently. But if I needed to switch this, you can right click. And an option about halfway down is to convert it to dimension. Once I click that, discount is now appear in the dimension’s area instead of the measure’s area.

I’m going to click on Do and show you one other way to do it. You could also simply left click on that measure and drag it into the dimension’s area. And same thing, discounts now appear as a dimension set of a measure. I’m going to undo that just to get back to where we started. And show you the next thing I like to do when I’m connecting to a data source for the first time.

I also like to set the default formatting of numbers. This one you might not come across until you start to use a measure within a chart. But just for illustration, I’ll show you what sales looks like. By default, if I just double click Sales, it creates a bar chart. But notice on the y-axis that the numbers– they’re in units, but there is no further formatting.

Sales should be probably in a currency, there should be dollar sign, maybe commas and/or decimal places. By default, it has no formatting. If I want to set the default formatting, you can right click on the measure, hover over default properties, and click number format.

There’s lots of formatting options, this one is pretty easy. Because it should be a currency. I’ll just choose currency standard based in the US. So it’s going to be English by default. Click OK, you’ll see the axis changes, there’s now dollar sign, as well as a comma every three units, as well as two decimal places. That’s the standard currency formatting.

What’s really nice about setting the default formatting, is now any time I use that sales measure in a future view, it will remember that formatting. I only have to do that one time. And every time I use the sales measure, it will start with that formatting.

I’m going to undo just a couple of times to show you the last thing I want to show you when I’m working with a data source for the first time. And it has to do with viewing the number of records. There’s a special field in here called number of records. The reason that it’s special is that it’s been generated by Tableau. This isn’t in my data source. Every data source that you connect to will come with at least three fields. These were generated by Tableau. They’re not in your data source.

Those three fields are measure names, measure values, and number of records. They’re identified with italic lettering. There’s two additional generated fields that can appear, if you’re working with any dimension that Tableau recognizes as geographic. And those are latitude and longitude. But those will only show up if you’ve got some type of mapping data. But the other three will always show up measure names, measure values, and number of records.

We’re going to focus on number of records for a moment. If you want to, follow along to get a feel for the drag and drop functionality of Tableau. I’m going to left click on number of records and drag it to the text marks card. When I drop it there, it simply sums up. And by the way, what number of records does, is it actually adds the number one to each row in your source, literally adds the number one.

For that reason, when I added it to the text marks card with an aggregation of sum. It just adds up all those numbers ones. And that’s what gives me the total number of rows or records in my dataset. This data source says, we already saw when we were previewing the underlying data has 9,994 rows.

The reason I like to do this check when I’m getting started with the new data source is for two reasons. One, is it acts as a quick QA Check or Quality Assurance Check. I’d like to check this number. Because if let’s say, I did a query in SQL or outside of Tableau. And I thought that the data source had 10 million records. Or let’s say, a colleague gives me a data source and says, hey, can you analyze this and tell me what you’re finding? I say, sure, how many records are in it? Roughly 10 million.

Well, if I did this quick QA Check and I see the number 10,000. I know something’s wrong. Either I have the wrong data source, maybe I’ve accidentally applied a filter to the data source, which is very common. It’s kind of hard once you’re in the authoring interface to know whether or not a filter has been applied. So this is just a quick QA Check.

The second reason that I like to check this number is for performance or efficiency related reasons. 10,000 records is nothing for Tableau. I’m not even going to think about how I write calculations, what I’m filtering. It’s going to be very easy for Tableau to handle. But if I did this quick check and I see that this number is 100 million, I might want to pay attention to things. Like how many filters do I have have? How am I writing calculated fields?

There are better and worse ways or more efficient ways to write Tableau calculated fields. Maybe instead of dragging the category dimension onto the view to see what the dimension members were. Maybe I would use that describe functionality that I just showed you. Because if I drag category in the view, it had to wait for Tableau to process all 100 million records just to give me the three dimension members.

Just a few things I like to do. We’re now ready to start our analyses in Tableau.

This has been Ryan with Playfair Data TV – thanks for watching!