writing a csv with column names and reading a csv file which is being generated from a sparksql dataframe in Pyspark

July 22, 2023 by Tarik

Try

df.coalesce(1).write.format('com.databricks.spark.csv').save('path+my.csv',header="true")

Note that this may not be an issue on your current setup, but on extremely large datasets, you can run into memory problems on the driver. This will also take longer (in a cluster scenario) as everything has to push back to a single location.

Leave a Comment Cancel reply