Spark save(write) parquet only one file

Use coalesce before write operation

dataFrame.coalesce(1).write.format("parquet").mode("append").save("temp.parquet")


EDIT-1

Upon a closer look, the docs do warn about coalesce

However, if you’re doing a drastic coalesce, e.g. to numPartitions =
1, this may result in your computation taking place on fewer nodes
than you like (e.g. one node in the case of numPartitions = 1)

Therefore as suggested by @Amar, it’s better to use repartition

Leave a Comment