Being able to take a new dataset and create an actionable visual analytics tool with it can be a challenge. In fact, sometimes figuring out the data and what you have can be the most time-intensive aspect of the visual analytics process. What if there was a straightforward, systematic, and easy approach you could implement every time that would help you describe your data faster? Well, that’s exactly what I’m going to share in this tutorial!

For this walkthrough, I will be using Tableau Prep, but you can transfer this strategy and most of these tactics to other data preparation tools including Excel, Alteryx, or SQL. 

Describing your data in Tableau Prep

 

Where data is coming from – the source

The first thing I will typically do when starting the process of describing data is to try to get an understanding of where the data is coming from. In this case, I am not talking about what database I am connecting to or file type. I am talking about the true source of the data further “up stream”. For instance, if I am connecting to a sales table that is housed in Snowflake, I will spend some time figuring out the lineage of where that data is sourced from. Is it captured from a transaction system, on a website, or maybe a purchase order? 

Your visual analytics journey has just begun.

Preview your account dashboard and learn how Playfair+ can support you.

This is important to know because it will tell you a lot about the data itself such as what the medium is, other key fields that would be important for context, the level of detail you can expect to find in the data, and you can find out the cadence of when the data is extracted. 

Create a free account, or login.

Unlock this tutorial and hundreds of other free visual analytics resources from our expert team.

Already have an account? Sign In

Name
Password
This field is for validation purposes and should be left unchanged.

Explore unlimited access to all offerings. See membership options.

Describing the size, data types, and keys

Next, I will connect to the data in a tool like Tableau Prep Builder, Alteryx, or even Tableau Desktop directly. If I can, I like to connect the data to a tool meant for data cleansing. As you can see from the figure below, I have connected to the Sample Superstore dataset in Tableau Prep Builder and dragged the Orders table onto the canvas. 

Describing Data - Adding the Orders Table to Tableau Prep Canvas

By clicking on the Orders table in the view, the Data Preview Pane will appear on the bottom half of the screen. This is a great place to quickly see how many fields are in the table, what their data types are, and begin to identify potential keys that you can use to join additional data on. 

Describing your fields, data types, and keys in Tableau Prep

Next, I’ll add a Clean step to the canvas by hovering over the Order table icon, clicking the plus sign that appears, and choosing ‘Clean Step’. 

Describing Data - Adding a clean step to the canvas in Tableau Prep

With the Clean step added, click on that step and you will see a pane appear on the bottom half of the screen. At the top right of this pane you will see a count of the fields (columns) and a count of the rows in the dataset. This is helpful to glance at because it will help you determine if you need to extract a smaller subset of the data to work from rather than trying to work on the full dataset. The size of the dataset itself is one of the many factors that can negatively impact performance, so it’s best to gauge what you’re working with ahead of time. 

Clean step pane in Tableau Prep

As the entire Sample – Superstore dataset has 10,194 records, I do not have concerns about the size of this data slowing down my analyses. If this quick look revealed I was working with ten million or more rows, I may stop to consider if I could aggregate the data in a different way and/or if I needed every record.

 

What does each row represent (level of detail)

Using the Clean step and reviewing the pane displayed when clicking on that step, you can also get a sense of the level of detail of your dataset. 

Clean step pane in Tableau Prep for the second time

We’ve seen there are 10,194 rows in this data, but I’m seeing the same Order ID multiple times in the Order ID column. Does this mean we have duplicates? Not necessarily. Your data may be at a different level of detail than Order ID, which can cause the same ID to show up for multiple records. Scrolling to the right you can see there is another ID field called Product ID. This must mean that each row lists the Order ID and if there were multiple products in that order. 

Learn to navigate uncharted waters.

Upgrade to Core or Premium benefits to take your data skills even further.

You can check that theory by selecting an Order ID that shows up multiple times in the data. You can see I have selected order CA-2020-115238 from the dataset, which has five rows. Looking at the crosstab at the bottom of the Preview pane, you can see each row has the same Order ID but five different Product IDs. 

Selecting a specific Order ID in Tableau Prep

This is our level of detail for this particular table. To get a better understanding of the importance of understanding your data’s level of detail, my colleague, Associate Director of Data Engineering, Ariana Cukier, has done a great write-up.

Why It’s Important to Understand the Granularity of Data

Now that you have defined the source, size, data types, keys, and level of detail, you are better equipped to begin your data cleansing process. This process of describing data may seem simple on the surface, but sometimes can be half the battle if you are working with data that is not labeled or documented well. Hopefully using this process you can quickly describe your data!

Until next time, 
Ethan Lang

Access Exclusive Analytics Resources

Dashboard templates, digital credentials, and more.

Related Content

Nick Cassara

A guide to using Tableau Prep Builder to explore data This video will cover using basic techniques in Tableau Prep…