Python Pyspark Puzzle 4

SAS
0

Track customers who made a transaction greater than 50000 exactly on the 10th day after account creation. Use PySpark filtering, date difference functions, and DataFrame operations to solve this real-world query puzzle.

Input Data

Expected Output

Input DataFrame Script


from pyspark.sql.types import StructType,StructField,StringType,DateType,IntegerType
from pyspark.sql.functions import col,cast

data = [
    ("C001", "2024-01-01", "2024-01-11", 60000),  
    ("C001", "2024-01-01", "2024-01-02", 45000),  
    ("C002", "2024-02-01", "2024-02-01", 52000),
    ("C003", "2024-03-05", "2024-03-15", 51000),
    ("C004", "2024-01-10", "2024-09-18", 48000)
]

schema = StructType([
    StructField("CustomerID", StringType(), True),
    StructField("AccountCreationDate", StringType(), True),
    StructField("TransactionDate", StringType(), True),
    StructField("TransactionAmount", IntegerType(), True)
])

fabricofdata_DF = spark.createDataFrame(data, schema)
fabricofdata_DF = fabricofdata_DF.withColumn("AccountCreationDate", col("AccountCreationDate").cast(DateType())) \
                                 .withColumn("TransactionDate", col("TransactionDate").cast(DateType()))

display(fabricofdata_DF)
      

Try solving the question yourself! If you need help, click below to reveal the solution.




Post a Comment

0Comments

Post a Comment (0)

#buttons=(Ok, Go it!) #days=(20)

Our website uses cookies to enhance your experience. Check Now
Ok, Go it!