Describing Your Data with Tableau Prep
Being able to take a new dataset and create an actionable visual analytics tool with it can be a challenge. In fact, sometimes figuring out the data and what you have can be the most time-intensive aspect of the visual analytics process. What if there was a straightforward, systematic, and easy approach you could implement every time that would help you describe your data faster? Well, that’s exactly what I’m going to share in this tutorial!
For this walkthrough, I will be using Tableau Prep, but you can transfer this strategy and most of these tactics to other data preparation tools including Excel, Alteryx, or SQL.

Where data is coming from – the source
The first thing I will typically do when starting the process of describing data is to try to get an understanding of where the data is coming from. In this case, I am not talking about what database I am connecting to or file type. I am talking about the true source of the data further “up stream”. For instance, if I am connecting to a sales table that is housed in Snowflake, I will spend some time figuring out the lineage of where that data is sourced from. Is it captured from a transaction system, on a website, or maybe a purchase order?
This is important to know because it will tell you a lot about the data itself such as what the medium is, other key fields that would be important for context, the level of detail you can expect to find in the data, and you can find out the cadence of when the data is extracted.
Create a free account, or login.
Unlock this tutorial and hundreds of other free visual analytics resources from our expert team.
Already have an account? Sign In
Describing the size, data types, and keys
Next, I will connect to the data in a tool like Tableau Prep Builder, Alteryx, or even Tableau Desktop directly. If I can, I like to connect the data to a tool meant for data cleansing. As you can see from the figure below, I have connected to the Sample Superstore dataset in Tableau Prep Builder and dragged the Orders table onto the canvas.

By clicking on the Orders table in the view, the Data Preview Pane will appear on the bottom half of the screen. This is a great place to quickly see how many fields are in the table, what their data types are, and begin to identify potential keys that you can use to join additional data on.

Next, I’ll add a Clean step to the canvas by hovering over the Order table icon, clicking the plus sign that appears, and choosing ‘Clean Step’.

With the Clean step added, click on that step and you will see a pane appear on the bottom half of the screen. At the top right of this pane you will see a count of the fields (columns) and a count of the rows in the dataset. This is helpful to glance at because it will help you determine if you need to extract a smaller subset of the data to work from rather than trying to work on the full dataset. The size of the dataset itself is one of the many factors that can negatively impact performance, so it’s best to gauge what you’re working with ahead of time.

As the entire Sample – Superstore dataset has 10,194 records, I do not have concerns about the size of this data slowing down my analyses. If this quick look revealed I was working with ten million or more rows, I may stop to consider if I could aggregate the data in a different way and/or if I needed every record.
What does each row represent (level of detail)
Using the Clean step and reviewing the pane displayed when clicking on that step, you can also get a sense of the level of detail of your dataset.

We’ve seen there are 10,194 rows in this data, but I’m seeing the same Order ID multiple times in the Order ID column. Does this mean we have duplicates? Not necessarily. Your data may be at a different level of detail than Order ID, which can cause the same ID to show up for multiple records. Scrolling to the right you can see there is another ID field called Product ID. This must mean that each row lists the Order ID and if there were multiple products in that order.
You can check that theory by selecting an Order ID that shows up multiple times in the data. You can see I have selected order CA-2020-115238 from the dataset, which has five rows. Looking at the crosstab at the bottom of the Preview pane, you can see each row has the same Order ID but five different Product IDs.

This is our level of detail for this particular table. To get a better understanding of the importance of understanding your data’s level of detail, my colleague, Associate Director of Data Engineering, Ariana Cukier, has done a great write-up.
Why It’s Important to Understand the Granularity of Data
Now that you have defined the source, size, data types, keys, and level of detail, you are better equipped to begin your data cleansing process. This process of describing data may seem simple on the surface, but sometimes can be half the battle if you are working with data that is not labeled or documented well. Hopefully using this process you can quickly describe your data!
Until next time,
Ethan Lang
Related Content
Tableau Careers You Didn’t Know About
Many believe that getting a job as a Tableau engineer means you will be using Tableau Desktop to build data…
Nick Cassara
A guide to using Tableau Prep Builder to explore data This video will cover using basic techniques in Tableau Prep…
A Quick Start Guide to Tableau Prep
In this tutorial, we’ll be covering the Tableau Prep tool and how we can use it to build out data…