How to save a DataFrame as compressed (gzipped) CSV?

This code works for Spark 2.1, where .codec is not available.

df.write
  .format("com.databricks.spark.csv")
  .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
  .save(my_directory)

For Spark 2.2, you can use the df.write.csv(...,codec="gzip") option described here: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=codec

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)