# Quick Start Guide to TabPy

TabPy is a Tableau feature that allows you to utilize Python scripts as calculated fields for your visualizations. It is not to replace the use of Python for complicated data engineering and machine learning tasks but can help add additional functionality while building dashboards in Tableau. TabPy allows you to utilize Python to create a measure more efficiently than having to create sometimes dozens of complex LOD and table calculations to get to the same value. Python may seem daunting to those unfamiliar, but it is a relatively straightforward language and especially with advancements in open-source AI, there are plenty of easy ways to learn it.

In my experience, TabPy can be especially useful when doing exploratory data analysis. It adds an extra dimension to what sort of insights you are able to gain from your data. You can integrate some advanced learning techniques to guide your decisions on where to continue with your analysis. In this tutorial, I will show you how to set up TabPy on your machine and a couple of ways to integrate it to your analysis.

**TabPy setup**

If you want to follow along with my completed workbook, then that can be found in this Google Drive folder.

First let’s get started setting up TabPy. At this point, you should have Python installed to your computer, either directly or through *Anaconda Navigator**.* Open up your computer command prompt, which you can do either directly on your machine base or by creating an environment for it. I prefer to use the command prompt through *Anaconda Navigator,* or you can find it by searching your Windows machine for *Command Prompt*. The window should look something like this:

## Create a free account, or login.

Access free visual analytics written tutorials, newsletters, and special announcements.

Already have an account? Sign In

Explore unlimited access to all Playfair offerings.

Type “pip install tabpy” and hit enter and allow it to run.

Once that is finished running, close and reopen your command prompt and then simply type in “tabpy” to run TabPy on your computer. Keep the command prompt window open the entire time you are running TabPy in Tableau. You may be prompted if you would like to proceed without authentication. I do not currently have password authentication set up on my computer, but you can choose to do that if you wish. If you do wish to proceed without authentication, hit “y” and enter.

Once that is finished running, you will see on the last line it says” Web service listening on port 9004”. You will need that number to run TabPy in Tableau.

Open up Tableau and connect to the Sample-Superstore dataset. Click the *Help *dropdown, then *Settings and Performance, *and last* Manage Analytics Extensions.*

You should then see the below image on your screen. Select TabPy:

Type in “localhost” in the *Hostname* and “9004”, the number referenced in the command prompt, into Port. If for some reason yours is different, use that.

Hit *Save* and you should be notified of a successful connection:

Once you have a successful connection, you can begin utilizing Python in Tableau!

**Introduction to TabPy scripts**

TabPy can be utilized with four different calculated field functions that specify the output type of the script.

- SCRIPT_BOOL: Returns Boolean value
- SCRIPT_INT: Returns integer value
- SCRIPT_REAL: Returns numeric value
- SCRIPT_STR: Returns string value

Based on what you want your output to be, you will choose the appropriate script function. For example, I want to use TabPy to calculate the sum of sales multiplied by five. In this case, I could choose *SCRIPT_INT *or *SCRIPT_REAL* depending on if I want decimals or not. I am going to choose *SCRIPT_INT *for this example. In the Python script, where you want to input a field from the data, you will denote it as *_arg#* with # being the number of data input. Then, in the same order you input them after the script. For example:

SCRIPT_INT(“return [x*5 for x in _arg1]”, sum([Sales]))

In the script, *_arg1 *represents *Sales* from the Sample-Superstore dataset. If you wanted to have an equation with *Sales *and *Profit,* then you would add in an *_arg2* and *sum(Profit)* after *sum(Sales)*. To check our work, I also just created a standard Tableau calculation to cross reference with:

Would I implement TabPy for something like this? Probably not. But here are three use cases for utilizing TabPy in your workbooks.

The Beginner’s Guide to Basic Tableau Calculated Fields

**Random number**

One of my biggest gaps working in Tableau is not having the Excel equivalent of *RANDBETWEEN()*. However, you can use TabPy to do it with a Python script. The calculation is very simple. Say you want to randomly assign customers to different groups. You can use this script to easily do so:

SCRIPT_REAL(“

from numpy import random as rd

return [rd.randint(1,10) for i in _arg1]

“,WINDOW_MAX(INDEX()))

This calculation is saying that for the number of items in the view, produce a random integer in between 1 and 10. If you drag customer name to the view and this function as well, you can see your customers randomly assigned to groups 1 through 10:

**Correlation coefficient**

A simple, telling measure that can be a bit complicated to create in Tableau. Correlation coefficient gives insight to whether there is a relationship between two variables and is often a measure used early on in exploratory data analysis. One variation is the Pearson’s Product Moment correlation coefficient which can be denoted as:

With *r* being the correlation coefficient and *x* and *y *being the two measures in which you want to analyze their correlation. You can see the correlation coefficient in the summary statistics when you implement a trend line, but you aren’t able to utilize that value in your visuals beyond that. Using a standard Tableau calculated field to do it would require a “hodgepodge” of nested table calculations that can be difficult to put together. There is also the *Corr *function in Tableau but that still can require LODs in its calculation. Using TabPy makes it as simple as below:

SCRIPT_REAL(“

from numpy import numpy as np

correlation = np.corrcoef(_arg1, _arg2)[0,1]

return correlation

“,SUM([Profit]), sum([Sales]))

Even if you are unfamiliar with Python, it is easy to find example scripts all over the internet. It is important to note to those unfamiliar with Python that the syntax does matter and gaining a basic understanding would be very beneficial.

Now you have this as a calculated field you can integrate it into your analysis. Build out your view as you would like. I am going to drag *Region *to the Columns shelf, *Category* to the Rows shelf, and *Order ID* to the Detail property of the Marks card. Change the mark type to Square and size them as large as they can go. Then add the TabPy calculation to the Color and Label properties. I created an *index *table calculation on Order ID and filtered it to be at most one so that only one Order ID square was on the view for each *Region *and *Category *combination. Now you have a correlation matrix to use for your EDA.

**K-Means clustering**

Another more complex example is K-Means clustering. K-Means clustering is an example of an unsupervised learning method that looks to group together similar data points. If you are looking to fine-tune your model, It would be best to take it out of TabPy and do it directly in Python. But, for some data exploration you can utilize. TabPy to run a simple model and then use the results in your visualization. Here is an example of a script to run K-Means clustering:

The logic is as follows:

STR(SCRIPT_REAL(“

import numpy as np

import numpy.ma as ma

from sklearn.preprocessing import StandardScaler

from sklearn.cluster import KMeans, MiniBatchKMeans, AffinityPropagation

print(‘Start’)

# Scaling Features

sc= StandardScaler()

avg_price = sc.fit_transform(np.array(_arg1).reshape(-1,1))

med_review = sc.fit_transform(np.array(_arg2).reshape(-1,1))

med_beds = sc.fit_transform(np.array(_arg3).reshape(-1,1))

n_cl = _arg4[0]

# Combine Scaled feature

X_comb = np.column_stack([avg_price, med_review, med_beds])

# Handling null value with masked array

X = np.where(np.isnan(X_comb), ma.array(X_comb, mask=np.isnan(X_comb)).mean(axis=0), X_comb)

# Modeling

kmeans = KMeans(n_clusters = n_cl, random_state=99)

result = kmeans.fit_predict(X).tolist()

return result“,

sum([Sales]),

sum([Profit]),

count([Order ID]),

[Clusters]))

How to Do Customer Segmentation with Dynamic Clustering in Tableau

This calculation is aiming to cluster states based on similarities in Sales, Profit and Number of Orders. In the script, the *Clusters* parameter was added to allow the user to change the number of clusters without having to access the calculated field. Create a filled map on the view, add State to Detail and K-Means to Color to see how the states are clustered.

Now that you have learned how to set up TabPy on your computer and a few ways to utilize it, I encourage you to explore how you can use it to level up your analysis.

Stay after it,

Maddie

## Related Content

### How to Do Customer Segmentation with Dynamic Clustering in Tableau

Tableau has a few different built-in analytics features that allow you to both summarize and model your data in various…

Ethan Lang

Detecting Outliers in Non-Normalized Data In this tutorial, Ethan will explain how to implement the Median Absolute Deviation equation in…

### The Benefits of Data Visualization for Data Scientists

At Playfair Data, we have covered dozens of ways to visualize data, and shared our tips, tricks and best practices,…