Python PySpark Puzzle 8 - Rank the Employee Salaries and Extract Top Earners

SAS
0

Given an employee salary dataset, write a query to perform the following operations:

  • Rank employees within each department based on their salary.
  • Calculate the average salary for each department.
  • Extract the top two highest-paid employees from each department.
  • Ensure that ties in salary rankings are handled appropriately.

Input Data:

Expected Output:

Input DataFrame Script

employee_data = [
    (1001, 2024, 2, 101, 50000),
    (1002, 2024, 4, 101, 60000),
    (1003, 2024, 1, 101, 45000),
    (1004, 2024, 5, 101, 65000),
    (1005, 2024, 3, 101, 55000),
    (2001, 2024, 5, 102, 90000),
    (2002, 2024, 2, 102, 75000),
    (2003, 2024, 3, 102, 80000),
    (2004, 2024, 4, 102, 85000),
    (2005, 2024, 1, 102, 70000),
    (3001, 2024, 2, 103, 55000),
    (3002, 2024, 4, 103, 65000),
    (3003, 2024, 5, 103, 70000),
    (3004, 2024, 3, 103, 60000),
    (3005, 2024, 1, 103, 50000),
    (4001, 2024, 4, 104, 80000),
    (4002, 2024, 5, 104, 85000),
    (4003, 2024, 1, 104, 60000),
    (4004, 2024, 3, 104, 75000),
    (4005, 2024, 2, 104, 70000),
    (5001, 2024, 5, 105, 65000),
    (5002, 2024, 2, 105, 50000),
    (5003, 2024, 3, 105, 55000),
    (5004, 2024, 4, 105, 60000),
    (5005, 2024, 1, 105, 45000)
]

employee_DF = spark.createDataFrame(employee_data,"EMPID int,FiscalYear int,Rating int,DepartmentID int, Salary int")

display(employee_DF)

department_data = [ ("HR", 101),
                    ("Engineering", 102),
                    ("Sales", 103),
                    ("Finance", 104),
                    ("Marketing", 105)
                    ]
department_DF = spark.createDataFrame(department_data,"Department_Name string,DepartmentID int")
display(department_DF)

Try solving the question yourself! If you need help, click below to reveal the solution.





Post a Comment

0Comments

Post a Comment (0)