Data Prep Intermediate Exam
What to know:
Only Playfair+
Core
and
Premium
members are eligible
You will need the
Proxy
dataset
This test is designed to take 60 – 90 minutes
You must answer 20 of 25 questions correctly to pass
Following the exam, we’ll review your answers and respond within 3 – 5 business days.
Get Started
Δ
Step
1
of
26
3%
1. Which of these storage types is best for long-term storage that doesn’t need to be accessed or updated regularly?
A. Big Data
B. Cold Storage
C. Cube
D. Data Lake
2. Which of these storage types is best for data including pictures, audio files, and unknown data types?
A. Data Lake
B. Data Warehouse
C. SQL Database
D. Data mart
3. Which of these storage types is best for quickly accessing big data?
A. JSON
B. Parquet
C. Relational
D. Snowflake
4. What is a benefit of exporting a data set as a .csv instead of an excel file?
A. CSV files are more reliable when working with long text fields that may contain commas.
B. CSV files can have multiple tabs.
C. CSV files have higher row limits.
D. CSV files support more formatting styles.
5. Why are indexes important?
A. They help identify duplicate rows
B. To help trace errors
C. To join tables in a relational database
D. All of the above
6. What is the difference between a table and a view?
A. A view does not store data but a table does.
B. Any user can create a view, but only users with WRITE permissions can create a table.
C. Views are temporary.
D. Views don’t require any processing power.
7. Which change would improve query times and reduce cloud costs?
A. Adding more cores or processing power.
B. Moving records over five years old out of frequently used tables, and saving them in an archive or cold storage instead.
C. Removing all index fields.
D. Swapping all data types to strings.
8. What step in the data pipeline process is most likely to be the cause for misspelled values?
A. Data Entry
B. Data Extract
C. Data Transformation
D. Automated data loading
9. What step in the data pipeline process is most likely to be the cause of duplicate rows?
A. Data Transformation
B. Automated Data Loading
C. Neither a or b
D. Both a and b
10. Why are backups important?
A. Backups give data engineers a separate environment to test changes before implementing them.
B. Backups make queries run more efficiently.
C. Backups protect the database from accidental or malicious deletion.
D. Backups reduce the amount of storage required.
11. Why does a source of truth matter?
A. A source of truth matters for quality assurance.
B. A source of truth doesn’t matter as long as you record the changes you’ve made.
C. A source of truth is used in the final product.
D. A source of truth shows changes made to the data.
12. What counts as a source of truth?
A. A secondary/helper data source.
B. An original, primary data source.
C. A primary data source that has been edited.
D. A helper data source with documentation.
13. Which QA step would you use to check that a join is working as expected?
A. Count the number of rows.
B. Check for duplicate rows.
C. Look for outliers in numeric rows.
D. All of the above
14. Which of these QA steps would be best to take if you see a total that is higher than expected?
A. Check joins
B. Check formulas
C. Check filters
D. All of the above
15. What is the difference between a data lake and a data warehouse?
A. Data lakes store unstructured data but data warehouses require a schema.
B. Data warehouses are better at storing images and videos than data lakes.
C. Data changes are easier to make in data warehouses than in data lakes.
D. Data warehouses are known for their flexibility and scalability, data lakes are not.
16. Which of these methods would correctly compute the date 10 years in the future?
A. Add 3650 days to the date.
B. Calculate the day of year value, and add that value to 1/1/2035.
C. Cast the date into an integer (20250314) and then add 10000 to the number.
D. Separate out the day, month, and year values. Add 10 to the year value and then use the new year value and original month and day values to create a new date.
17. Suppose you are working with a dataset that tracks inventory counts and automates a monthly order for new materials at the beginning of each month. What refresh configuration makes the most sense?
A. Real-time
B. Hourly refreshes
C. Monthly refreshes at 8PM on the last day of the month.
D. Weekly refreshes at 2AM on Mondays.
18. Which of the following is this an example of? CustomerID[“AX-1320375”]
A. SQL
B. JSON
C. REGEX
D. R
19. Which CASE statement is equivalent to this IF Statement? IF [Roman] = ‘I’ THEN 1 ELSEIF [Roman] = ‘II’ THEN 2 ELSEIF [Roman] = ‘III’ THEN 3 END
A. CASE [Roman] WHEN ‘I’ THEN 1 WHEN ‘II’ THEN 2 WHEN ‘III’ THEN 3 END
B. CASE [Roman] WHEN 1 THEN ‘I’ WHEN 2 THEN ‘II’ WHEN 3 THEN ‘III’ END
C. CASE [Roman] IF 1 THEN ‘I’ IF 2 THEN ‘II’ IF 3 THEN ‘III’ END
D. None of the above
20. Simplify this logical statement IF A=TRUE THEN IF B=TRUE THEN TRUE END ELSEIF A=FALSE THEN TRUE END
A. B
B. TRUE
C. IF B THEN TRUE END
D. IF A AND B THEN TRUE END
E. IF A AND B OR NOT A THEN TRUE END
21. If you query a dataset with 49 columns using SELECT * FROM and you add in a WHERE clause, what will that do to the number of columns in the result?
A. Increase the number of columns.
B. Decrease the number of columns.
C. Have no impact.
D. You cannot use a WHERE clause with SELECT *.
22. If you have a column of User IDs with 4 unique entries, what would be the result of running the following aggregation on said column? CountD([UserID])
A. 2
B. 16
C. 8
D. 4
The final three questions in this exam are from Playfair Data's Proxy dataset. Download the latest version here
23. Sales and Usage have a _ relationship
A. One-to-one
B. One-to-many
C. Many-to-many
D. None of the above
24. Which column in the Sales table has the most unique values?
A. Campaign ID
B. Order ID
C. Customer ID
D. None of the above
25. How many Customer IDs have multiple records in the Sales table?
A. 0
B. 2
C. 7
D. 14
Let us know who is taking the test:
First Name
(Required)
Last Name
(Required)
Playfair+ Email
(Required)
Phone
Company Name
0% Completed!
Previous
Next
Exit
Cookie Settings