Random Number Generation (RNG) as the name suggests, is a useful tool that generates a random number, and can provide the foundation for numerous applications and algorithms. In the realm of data engineering, RNG is beneficial for a variety of tasks including data simulation and sampling, as well as, designing and implementing randomized algorithms. Alteryx offers a variety of random number generation techniques, tools, and functions that users can implement.

This tutorial explores the different use cases of random number generation, techniques that can be used to apply them, and practical examples to demonstrate their capabilities. From sampling data sets for machine learning to generating mock data for testing workflows, understanding and utilizing RNG in Alteryx can enhance and improve your workflows and add a new capability to your toolkit.

Exploring Random Number Generation in Alteryx: Techniques and Applications

 

Importance and usages of ‘RNG’ in data engineering

RNG has many capabilities that can support a variety of tasks. Below are some key usages of RNG and their importance within data engineering:

Create a free account, or login.

Unlock this tutorial and hundreds of other free visual analytics resources from our expert team.

Already have an account? Sign In

Name
Password
This field is for validation purposes and should be left unchanged.

Explore unlimited access to all offerings.

See membership options.

  1. Data Simulation: RNG is crucial for generating random data, which can be used to test workflows and create mock datasets. Since random number generation will not generate the exact same way with each implementation, it can be an effective tool to simulate organic data variations and conditions that you may encounter in real-world data, thus making it helpful when testing the performance of your workflows.
  2. Sampling: RNG allows the user to select random samples from a dataset, which ensures they are representative of the whole dataset and helps obtain unbiased and reliable results. This can be especially important for cross-validation in machine learning, where a dataset needs to be split into training and testing sets, or for examining the results of your workflows and data manipulations.
  3. Randomized Algorithms: Many algorithms and formulas in Alteryx can rely on RNG to function appropriately. For example, data splitting, randomized data sorting, and feature generation are all processes that benefit from using random number generation in their design. By utilizing RNG in these processes, the user can introduce variability and lead to more accurate outcomes.

New to Playfair Data?

Watch our 90-second intro video and receive tips on making your data usable.
 
Watch Video

By utilizing RNG in these ways, data professionals can improve the reliability, efficiency and accuracy of their workflows in Alteryx.

9 Quick Alteryx Tips to Optimize Your Data Workflows

 

Alteryx techniques for ‘RNG’

There are two different random number generation functions in Alteryx, Rand() and Randint()

 

Rand() Function

The Rand() function works by generating a decimal number between 0 and 1.

Steps to Use Rand():

  1. Open Alteryx Designer and drag a Formula tool onto your workflow canvas.
  2. Double-click the Formula tool to open its configuration window.
  3. In the Formula tool, select or create a new field where you want to insert the random values.
  4. Enter Rand() into the formula expression box.
  5. Click OK to apply the formula and close the window.
Rand Function in Alteryx

Applications for the Rand() function could include, splitting data into percentiles randomly, assigning values, random sampling, can be used in machine learning for splitting data into testing and training sets.

 

Randint() Function

The Randint() function works by entering a parameter into the brackets and the parameter (e.g, 10) is used as the ceiling for the number generated. 

Steps to Use Randint():

  1. Open Alteryx Designer and drag a Formula tool onto your workflow canvas.
  2. Double-click the Formula tool to open its configuration window.
  3. In the Formula tool, select or create a new field where you want to insert the random values.
  4. Enter Randint(n) into the formula expression box (replace n with your desired maximum value).
  5. Click OK to apply the formula and close the window.
RandInt Function in Alteryx

Applications for the Randint() function could include splitting data into percentiles, assigning values, generating data and random sampling.

 

Applications and examples of RNG techniques in Alteryx

For this example, I will demonstrate several different ways you can use RNG in mock data creation, data splitting, and data sampling. 

 

Generating Mock Data

In the following two examples, I will show how you can use the random number generator to generate mock data.

Below, I have used the Randint() function to generate random dates, rolling two years from January 1, 2023. By setting the parameter to 730 (The number of days in two years), the number generator will generate a number between 1 and 730 and add it to the days, as specified in the formula. This causes the formula to generate random dates between January 01, 2023, and December 31, 2024.

DateTimeAdd(‘2023-01-01′,Randint(730),’days’)

DateTimeRNG Function

In this next multi row formula, I have used Randint() again, but this time to generate random characters for customer IDs. 

This formula works by generating two random numbers between 65 and 90, which corresponds to the ASCII values of the uppercase letters A-Z. I then added a hyphen and a sequential number, using the Order ID as the sequential number, in this case.

A sample output from this could be: SK-69542

CharFromInt(Randint(25) + 65) + CharFromInt(Randint(25) + 65) + “-” + ToString([Order ID])

 

Data Splitting and Sampling

In the following examples, I will show how you can use RNG to split your data, take random samples, or assign values based on percentages. 

In this next example, I have shown how you can use RNG to randomly assign values to your data based on percentages. This formula uses the Rand() function to assign the product to “Lifetime” if the random number generated is less than 0.5, and Monthly if it is less than 0.8, otherwise it assigns it to Annual. Lifetime, Monthly, and Annual were three product names for this simulated dataset.

By splitting the random number into 0.5 and 0.8, it means roughly 50% of the data will be Lifetime, 30% will be Monthly and the remaining 20% will be Annual. A key benefit to using a random number generator for dividing into percentiles is that the percentages will not be exact, giving a more realistic and organic distribution across the dataset.

IF Rand() < 0.5 THEN “Lifetime”
ELSEIF Rand() < 0.8 THEN “Monthly”
ELSE “Annual”
ENDIF

Product Alteryx Technique

In this next example, I have applied similar logic, but using the Randint() function instead of Rand().

This conditional statement assigns discount values by percentages. In the formula’s output, approximately 20% will have a 10% discount, 20% a 20% discount, and so forth.

IF Randint(100) <= 20 THEN 10
ELSEIF Randint(100) <= 40 THEN 20
ELSEIF Randint(100) <= 60 THEN 25
ELSEIF Randint(100) <= 80 THEN 33
ELSE 50
ENDIF

Percent Alteryx Technique

In the below example, I have shown how you can randomly sample your data using RNG.

First you make a field with simply a random number, and then you apply and configure the filter tool to filter the percentage you want.

Make your data more usable with our free newsletter.

Get monthly tips and tutorials:

In this example, I have configured the filter to pull all of the random numbers less than 0.20, which effectively extracts approximately 20% of the data.

RandSample Function
Filter in Alteryx

 

Conclusion

To sum everything up, random number generation is an essential, multi-faceted tool that can enhance workflows and vastly improve data analysis. Alteryx provides powerful functions to facilitate the useful applications of RNG. By understanding and effectively using these tools, you can create reliable and robust workflows capable of handling many different scenarios within your dataset. From generating mock data to sampling your records, the knowledge and skill set to incorporate RNG into your workflows will help ensure versatility and accuracy. Embracing these techniques could not only streamline your processes but also prepare your workflows for real-world applications.

Thanks for reading!
Sophia Kohn

Access Exclusive Benefits

Dashboard templates, digital credentials, and more.

Related Content