How to find the most frequently used word in a text file using PySpark

SAS
0

I have a large text file, and I want to analyze it using PySpark to determine which word appears most frequently. Could you help me with a PySpark script


Input Data:

Expected Output:

Input Data(save it to Text File)

$Input Data 
      Apache Spark is a fast and general engine for large-scale data processing. Spark runs programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general computation graphs. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine earning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

#Pyspark to read the text file 
from pyspark.sql.functions import split,explode,col,count,desc
file_location = "/FileStore/tables/Sample_Data-1.txt"

fabricofdata_DF =spark.read.text(file_location)
fabricofdata_DF.show(truncate=False)

Try solving the question yourself! If you need help, click below to reveal the solution.





Post a Comment

0Comments

Post a Comment (0)