How can you find the origin and destination locations for each customer using PySpark

SAS
0

Given a dataset of customer transactions or movements over time, how can you determine the origin (first location) and destination (last location) for each customer using PySpark?


Input Data:

Expected Output:

Input DataFrame Script

from pyspark.sql.types import StructType,StructField,IntegerType,StringType

schema = StructType([
    StructField("Customer_ID", IntegerType()),
    StructField("TicketNumber" ,StringType()),
    StructField("Origin" ,StringType()),
    StructField("Destination", StringType())
])

data = [(1,"T-12345","Hyderabad","Kolkatta"),
        (1,"T-12345","Kolkatta","Patna"),
        (1,"T-12345","Patna","Delhi"),
        (2,"T-56789","Chennai","NCR"),
        (2,"T-56789","NCR","Agra")        
        ]

fabricofdata_DF = spark.createDataFrame(data,schema)
fabricofdata_DF.show()

Try solving the question yourself! If you need help, click below to reveal the solution.





Post a Comment

0Comments

Post a Comment (0)