Data Prep Intermediate Exam

What to know:

  • Only Playfair+ Core and Premium members are eligible
  • You will need the Proxy dataset
  • This test is designed to take 60 – 90 minutes
  • You must answer 20 of 25 questions correctly to pass
  • Following the exam, we’ll review your answers and respond within 3 – 5 business days.

Step 1 of 26

1. Which of these storage types is best for long-term storage that doesn’t need to be accessed or updated regularly?
2. Which of these storage types is best for data including pictures, audio files, and unknown data types?
3. Which of these storage types is best for quickly accessing big data?
4. What is a benefit of exporting a data set as a .csv instead of an excel file?
5. Why are indexes important?
6. What is the difference between a table and a view?
7. Which change would improve query times and reduce cloud costs?
8. What step in the data pipeline process is most likely to be the cause for misspelled values?
9. What step in the data pipeline process is most likely to be the cause of duplicate rows?
10. Why are backups important?
11. Why does a source of truth matter?
12. What counts as a source of truth?
13. Which QA step would you use to check that a join is working as expected?
14. Which of these QA steps would be best to take if you see a total that is higher than expected?
15. What is the difference between a data lake and a data warehouse?
16. Which of these methods would correctly compute the date 10 years in the future?
17. Suppose you are working with a dataset that tracks inventory counts and automates a monthly order for new materials at the beginning of each month. What refresh configuration makes the most sense?
18. Which of the following is this an example of? CustomerID[“AX-1320375”]
19. Which CASE statement is equivalent to this IF Statement? IF [Roman] = ‘I’ THEN 1 ELSEIF [Roman] = ‘II’ THEN 2 ELSEIF [Roman] = ‘III’ THEN 3 END
20. Simplify this logical statement IF A=TRUE THEN IF B=TRUE THEN TRUE END ELSEIF A=FALSE THEN TRUE END
21. If you query a dataset with 49 columns using SELECT * FROM and you add in a WHERE clause, what will that do to the number of columns in the result?
22. If you have a column of User IDs with 4 unique entries, what would be the result of running the following aggregation on said column? CountD([UserID])
23. Sales and Usage have a _ relationship
24. Which column in the Sales table has the most unique values?
25. How many Customer IDs have multiple records in the Sales table?

Let us know who is taking the test: