I have a large text file, and I want to analyze it using PySpark to determine which word appears most frequently. Could you help me with a PySpark script
Input Data:
Expected Output:
Input Data(save it to Text File)
$Input Data
Apache Spark is a fast and general engine for large-scale data processing. Spark runs programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general computation graphs. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine earning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.
#Pyspark to read the text file
from pyspark.sql.functions import split,explode,col,count,desc
file_location = "/FileStore/tables/Sample_Data-1.txt"
fabricofdata_DF =spark.read.text(file_location)
fabricofdata_DF.show(truncate=False)
Try solving the question yourself! If you need help, click below to reveal the solution.