What does “Stage Skipped” mean in Apache Spark web UI?
Typically it means that data has been fetched from cache and there was no need to re-execute given stage. It is consistent with your DAG which shows that the next stage requires shuffling (reduceByKey). Whenever there is shuffling involved Spark automatically caches generated data: Shuffle also generates a large number of intermediate files on disk. … Read more