What is caching in Spark

SAS
0
Answer Thumbnail

Caching in Spark is a method used to store intermediate results of a DataFrame or RDD in memory, which helps avoid recomputation during future transformations. In PySpark, calling .cache() stores the data using a default memory-and-disk storage with deserialized format for DataFrames; for RDDs, it's stored in memory in a serialized format.



Tags:

Post a Comment

0Comments

Post a Comment (0)

#buttons=(Ok, Go it!) #days=(20)

Our website uses cookies to enhance your experience. Check Now
Ok, Go it!