What is caching in Spark

SAS

April 20, 2025

0

Answer Thumbnail

Caching in Spark is a method used to store intermediate results of a DataFrame or RDD in memory, which helps avoid recomputation during future transformations. In PySpark, calling .cache() stores the data using a default memory-and-disk storage with deserialized format for DataFrames; for RDDs, it's stored in memory in a serialized format.

Tags:

Newer
Older

Post a Comment (0)

Share to other apps

Copy Post Link