
This image is AI Generated
Conceptual & Theoretical Questions
- What are the different storage levels supported in Spark? Briefly describe each one
- When would you choose MEMORY_AND_DISK over MEMORY_ONLY?
- What is caching in Spark?
- Why do we use caching and persisting in Spark?
- When should we avoid using caching?
- How can you uncache data in Spark?
- What is the difference between cache and persist?
- Your Delta table has grown significantly, and query performance is degrading. What steps would you take to optimize it?
Practical Coding Questions
- Retrieve Customer and Their Orders 50% of Their Highest Order
- Find Total Orders by Each Customer with List of Ordered Items
- Find Countries with More Females than Males
- Find Customers with Big Transactions After 10 Days
- Behind the Leader: Spot the #2 Salary
- Identifying Top 3 Performers in Each Department
- Rank Customers by Lifetime Spending
- Rank the Employee Salaries and Extract Top Earners
- Identify Frequent Customers
- How to Filter and Retrieve the Second Transaction in PySpark
- How can you find the origin and destination locations for each customer using PySpark
- How to find the most frequently used word in a text file
- How to Generate Unique Match Schedule Between Departments
- How can you unpack the nested list with duplicates and using the list recurssion
- How can you unpack the nested list with duplicates and with-out using the list recurssion
- How can you Flatten the list without Duplicates
- How can you Flatten the list and Count the Occurrences of each item in the list