Ryan Sleeper
You’ve downloaded the best Tableau product for you, but what now? This video explains the different data connections in Tableau and walks you through connecting to a new data source for the first time.
Hi, this is Ryan with Playfair Data TV. And in this video, I’m going to show you how to connect to data for the first time in Tableau.
As you can see on the screen, I have opened Tableau Desktop Professional. As of June 2018, I’m using Tableau Desktop 2018.1.
So this is the welcoming screen you would see. I’m on a PC. And what you see here down the left-hand side are the types of data that you can connect to.
The first list of options are called To a File. These are files like Microsoft Excel, Text Files– those are files that end with the extension .txt or .csv, JSON, there’s one Microsoft Access database, PDFs, and a couple of different types of Statistical Files.
Once you get to To a Server here, these are all the additional connections that are unlocked once you’ve got Tableau Desktop Professional, which is the most robust version of the Desktop product. The first data source you see here says Tableau Server. These are data sources that are saved to Tableau Server or Tableau Online, and you connect to them and bring the data back in and start using it from within Tableau Desktop.
There’s quite a few options here, and if you click the More button, you’ll see quite a few more. I can’t claim to have used even half of these. But some common ones that you might come across in your corporate setting are Google Analytics– if you analyze web analytics data– Amazon Redshift, HP Vertica, Microsoft SQL Server, MySQL databases are all fairly common. If you don’t see the type of data that you need to connect to, there’s also something called an ODBC connection. That stands for Other Databases.
And then lastly there’s also something called a Web Data Connector. These are custom data connections that people are building using APIs, that allow you to go out to the web and bring back data. One common example, or a good example that I’ve seen, is a Fitbit data connection. Somebody has coded this for you, so you can hit Web Data Connector, type in your Fitbit credentials, go out to the web, bring back your Fitbit data, and start analyzing it.
Point being, Tableau can connect to just about any type of data that you need to connect to, and if you don’t see it here somebody is probably working on a way for you to connect to it.
To get started with most of the training and tutorials at Playfair Data TV, we’re going to connect to a saved data source that comes with Tableau. It’s called Sample – Superstore. So if you’ve downloaded Tableau Desktop, you’ll see these saved data sources in the bottom left corner.
But we’re going to connect to a slightly more raw version of that data source. That data source is actually an Excel file that gets installed on your machine when you download Tableau. So I’m going to click Microsoft Excel to get started. That’s going to open this interface.
Again, I’m on a PC. But if you’re on a PC, Tableau will put a new folder in your Documents folder called My Tableau Repository. Within your Tableau Repository, there will be a folder called Data Sources. You’ll see the different versions of Tableau that you have installed on your machine. I’m based in the US, so I am in the English US folder. And then these are the Excel files, the saved data sources.
I’m going to click on Sample – Superstore to get started. And click Open. And that’s going to take me to the first interface that I have to share with you. This is where you can prepare a data source before you start using it in Tableau.
Tableau interprets Excel files like it would interpret a database. The reason that’s important is you’ll see something very similar if you’re connecting to a more enterprise-level data source. But right now we’re just connecting to Excel. But it looks just like it would look if we were connecting to a more enterprise-level database.
So the name of our database / Excel workbook is Sample – Superstore. And within that workbook, there are three tabs. Tableau interprets Excel workbook tabs the same way that it interprets database tables. So you can think of this Excel file as having three tables– Orders, People, and Returns.
To get started, I’m just going to left-click on the Orders table and drag it into the box that says Drag Sheets Here. I’m going to let go of the left mouse key. And from here, I’ve got a couple of decisions to make before I jump in and start analyzing this data in Tableau. The first big one is whether I want to connect to the data as a live connection or as an extract. And both of these have pros and cons to them.
We’ll start with live, which is the default. This is exactly what it sounds like. It’s a live connection to the underlying database. So if I drag and drop something onto the interface and try to analyze it, Tableau is going to query that underlying database.
The good thing about that is it’s the latest and greatest data. Every time I drag something on the view, it’s a live connection. So if there’s rows being updated while we’re analyzing our data, we’ll always see the latest data. It’s also a little bit more secure than an extract. Because it’s a live connection, it’s not actually moving data anywhere. We’re visualizing it, but we’re not saving the data in a different place.
The drawback to a live connection is performance-related. Right now, this Sample – Superstore data set has only 9,994 records. That’s not a lot for Tableau to handle. The data source is also saved on my machine, so there’s no competition. There’s not multiple people trying to hit the same data source. If, however, this was a more enterprise-level set up, and I was connecting to a SQL Server database for example, and say that database had 100 million records in it and we had 30 employees all trying to hit that data source at the same time, that could very potentially slow down our querying and our analyses in Tableau.
An extract will help alleviate that. That’s the other choice here on the right. What an extract does is it creates a snapshot of the data. And this one’s not quite as secure, because you are creating a new version, or basically a copy of the data. And you can save that file locally or save it somewhere in a folder on your network somewhere.
It’s not quite as secure because you’re copying the data. That data could potentially be sent outside of the organization. That being said, it would be very similar to saving an Excel workbook, which is very common. So it might not be that much of a concern for you. But just want to point out, live is slightly more secure than an extract.
Extract is going to give you a big performance boost because the data source is saved in a Tableau-optimized format. And since it’s saved locally, you could potentially be the only one that’s hitting that database, so there’s not people competing for that resource.
One other small drawback to an extract is because it’s creating a snapshot of the data, it’s not always up to date. So if you need to have the latest data– let’s say that your data refreshes once a day– you’d have to take one extra step to connect to that extract and refresh it so it brings back the latest rows of data.
We’ll leave it with a live connection for now. Again, it’s a small data set. It’s saved on my machine. So we’ll just stick with the default.
The next thing you can do at this level is add a filter. This is the highest-level filter in Tableau. This will filter the data source before we even start analyzing it. This is a very intuitive process. I’m not actually going to do this right now, but just to show you how intuitive it is, you simply click this Add button under the word Filters, and click Add again. This will allow you to choose any field in the data source, and you simply just tell Tableau which portions of that field you want to bring back.
One really common example is with time series data. Let’s say we’re doing a year-over-year analysis. If we connect to this table and there’s 10 years of data in it, I can just add a filter on date, which is called Order Date in the Sample – Superstore data set, and just bring back the last two years of data. I don’t need the other eight years of data. That other eight years of data is actually going to slow down the performance. So this would be a good opportunity and a good reason to add a filter at this level.
One other example is if I was the manager of a specific region, and I just cared about the performance of my region that I managed, I could add a filter at this level. Click Region and decide to bring back only the region that I manage for my analysis.
But I’m going to click Cancel. Click Cancel again. And I’m just going to show you one more thing in this tutorial, which is a small data modification you can do right here in Tableau. And it has to do with the data type.
At the top of each column of the data, you see these little icons, and that’s the data type. So number means it’s either an integer or a float– those are numbers with decimals. ABC means it’s a string or text, if you think about it that way. Calendar icon means it’s a date. And the globe icon means it’s geographic. If Tableau misclassifies any of this, or if you change your mind and decide to change the data type, you can easily do that by just clicking on one of those icons and choosing a different data type here.
But again, I’m going to stick with all the defaults for now. So I’m going to click off there.
And once we have prepared the data source, and are ready to start using it in Tableau, you simply click Sheet 1. And it will take you into what’s called the authoring interface, which we’ll talk about in a future video tutorial.
But for now, this has been Ryan with Playfair Data TV – thanks for watching!