Exploring Random Number Generation in Alteryx: Techniques and Applications
Random Number Generation (RNG) as the name suggests, is a useful tool that generates a random number, and can provide the foundation for numerous applications and algorithms. In the realm of data engineering, RNG is beneficial for a variety of tasks including data simulation and sampling, as well as, designing and implementing randomized algorithms. Alteryx offers a variety of random number generation techniques, tools, and functions that users can implement.
This tutorial explores the different use cases of random number generation, techniques that can be used to apply them, and practical examples to demonstrate their capabilities. From sampling data sets for machine learning to generating mock data for testing workflows, understanding and utilizing RNG in Alteryx can enhance and improve your workflows and add a new capability to your toolkit.
Importance and usages of ‘RNG’ in data engineering
RNG has many capabilities that can support a variety of tasks. Below are some key usages of RNG and their importance within data engineering:
Create a free account, or login.
Unlock this tutorial and hundreds of other free visual analytics resources from our expert team.
Already have an account? Sign In
Explore unlimited access to all offerings.
- Data Simulation: RNG is crucial for generating random data, which can be used to test workflows and create mock datasets. Since random number generation will not generate the exact same way with each implementation, it can be an effective tool to simulate organic data variations and conditions that you may encounter in real-world data, thus making it helpful when testing the performance of your workflows.
- Sampling: RNG allows the user to select random samples from a dataset, which ensures they are representative of the whole dataset and helps obtain unbiased and reliable results. This can be especially important for cross-validation in machine learning, where a dataset needs to be split into training and testing sets, or for examining the results of your workflows and data manipulations.
- Randomized Algorithms: Many algorithms and formulas in Alteryx can rely on RNG to function appropriately. For example, data splitting, randomized data sorting, and feature generation are all processes that benefit from using random number generation in their design. By utilizing RNG in these processes, the user can introduce variability and lead to more accurate outcomes.
By utilizing RNG in these ways, data professionals can improve the reliability, efficiency and accuracy of their workflows in Alteryx.
9 Quick Alteryx Tips to Optimize Your Data Workflows
Alteryx techniques for ‘RNG’
There are two different random number generation functions in Alteryx, Rand() and Randint()
Rand() Function
The Rand() function works by generating a decimal number between 0 and 1.
Steps to Use Rand():
- Open Alteryx Designer and drag a Formula tool onto your workflow canvas.
- Double-click the Formula tool to open its configuration window.
- In the Formula tool, select or create a new field where you want to insert the random values.
- Enter Rand() into the formula expression box.
- Click OK to apply the formula and close the window.
Applications for the Rand() function could include, splitting data into percentiles randomly, assigning values, random sampling, can be used in machine learning for splitting data into testing and training sets.
Randint() Function
The Randint() function works by entering a parameter into the brackets and the parameter (e.g, 10) is used as the ceiling for the number generated.
Steps to Use Randint():
- Open Alteryx Designer and drag a Formula tool onto your workflow canvas.
- Double-click the Formula tool to open its configuration window.
- In the Formula tool, select or create a new field where you want to insert the random values.
- Enter Randint(n) into the formula expression box (replace n with your desired maximum value).
- Click OK to apply the formula and close the window.
Applications for the Randint() function could include splitting data into percentiles, assigning values, generating data and random sampling.
Applications and examples of RNG techniques in Alteryx
For this example, I will demonstrate several different ways you can use RNG in mock data creation, data splitting, and data sampling.
Generating Mock Data
In the following two examples, I will show how you can use the random number generator to generate mock data.
Below, I have used the Randint() function to generate random dates, rolling two years from January 1, 2023. By setting the parameter to 730 (The number of days in two years), the number generator will generate a number between 1 and 730 and add it to the days, as specified in the formula. This causes the formula to generate random dates between January 01, 2023, and December 31, 2024.
DateTimeAdd(‘2023-01-01′,Randint(730),’days’)
In this next multi row formula, I have used Randint() again, but this time to generate random characters for customer IDs.
This formula works by generating two random numbers between 65 and 90, which corresponds to the ASCII values of the uppercase letters A-Z. I then added a hyphen and a sequential number, using the Order ID as the sequential number, in this case.
A sample output from this could be: SK-69542
CharFromInt(Randint(25) + 65) + CharFromInt(Randint(25) + 65) + “-” + ToString([Order ID])
Data Splitting and Sampling
In the following examples, I will show how you can use RNG to split your data, take random samples, or assign values based on percentages.
In this next example, I have shown how you can use RNG to randomly assign values to your data based on percentages. This formula uses the Rand() function to assign the product to “Lifetime” if the random number generated is less than 0.5, and Monthly if it is less than 0.8, otherwise it assigns it to Annual. Lifetime, Monthly, and Annual were three product names for this simulated dataset.
By splitting the random number into 0.5 and 0.8, it means roughly 50% of the data will be Lifetime, 30% will be Monthly and the remaining 20% will be Annual. A key benefit to using a random number generator for dividing into percentiles is that the percentages will not be exact, giving a more realistic and organic distribution across the dataset.
IF Rand() < 0.5 THEN “Lifetime”
ELSEIF Rand() < 0.8 THEN “Monthly”
ELSE “Annual”
ENDIF
In this next example, I have applied similar logic, but using the Randint() function instead of Rand().
This conditional statement assigns discount values by percentages. In the formula’s output, approximately 20% will have a 10% discount, 20% a 20% discount, and so forth.
IF Randint(100) <= 20 THEN 10
ELSEIF Randint(100) <= 40 THEN 20
ELSEIF Randint(100) <= 60 THEN 25
ELSEIF Randint(100) <= 80 THEN 33
ELSE 50
ENDIF
In the below example, I have shown how you can randomly sample your data using RNG.
First you make a field with simply a random number, and then you apply and configure the filter tool to filter the percentage you want.
In this example, I have configured the filter to pull all of the random numbers less than 0.20, which effectively extracts approximately 20% of the data.
Conclusion
To sum everything up, random number generation is an essential, multi-faceted tool that can enhance workflows and vastly improve data analysis. Alteryx provides powerful functions to facilitate the useful applications of RNG. By understanding and effectively using these tools, you can create reliable and robust workflows capable of handling many different scenarios within your dataset. From generating mock data to sampling your records, the knowledge and skill set to incorporate RNG into your workflows will help ensure versatility and accuracy. Embracing these techniques could not only streamline your processes but also prepare your workflows for real-world applications.
Thanks for reading!
Sophia Kohn
Related Content
An Introduction to Alteryx Generate Rows
The Generate Rows tool in Alteryx is a great tool to explore when you need to expand your dataset. When…
3 Tips for Data Quality Assurance (QA) in Alteryx
Anyone who works with data would probably say that ensuring accuracy is half the battle. While a final Quality Assurance…
9 Quick Alteryx Tips to Optimize Your Data Workflows
When beginning to develop an Alteryx workflow, sometimes I find myself asking, where should I start? What happens next? How…