How to find spark RDD/Dataframe size?
If you are simply looking to count the number of rows in the rdd, do: val distFile = sc.textFile(file) println(distFile.count) If you are interested in the bytes, you can use the SizeEstimator: import org.apache.spark.util.SizeEstimator println(SizeEstimator.estimate(distFile)) https://spark.apache.org/docs/latest/api/java/org/apache/spark/util/SizeEstimator.html