What are the differences between saveAsTable and insertInto in different SaveMode(s)?

DISCLAIMER I’ve been exploring insertInto for some time and although I’m far from an expert in this area I’m sharing the findings for greater good.

Does insertInto always expect the table to exist?

Yes (per the table name and the database).

Moreover not all tables can be inserted into, i.e. a (permanent) table, a temporary view or a temporary global view are fine, but not:

  1. a bucketed table

  2. an RDD-based table

Do SaveModes have any impact on insertInto?

(That’s recently been my question, too!)

Yes, but only SaveMode.Overwrite. After you think about insertInto the other 3 save modes don’t make much sense (as it simply inserts a dataset).

what’s the differences between saveAsTable with SaveMode.Append and insertInto given that table already exists?

That’s a very good question! I’d say none, but let’s see by just one example (hoping that proves something).

scala> spark.version
res13: String = 2.4.0-SNAPSHOT

sql("create table my_table (id long)")
scala> spark.range(3).write.mode("append").saveAsTable("my_table")
org.apache.spark.sql.AnalysisException: The format of the existing table default.my_table is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`.;
  at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:117)
  at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:76)
...
scala> spark.range(3).write.insertInto("my_table")
scala> spark.table("my_table").show
+---+
| id|
+---+
|  2|
|  0|
|  1|
+---+

does insertInto with SaveMode.Overwrite make any sense?

I think so given it pays so much attention to SaveMode.Overwrite. It simply re-creates the target table.

spark.range(3).write.mode("overwrite").insertInto("my_table")
scala> spark.table("my_table").show
+---+
| id|
+---+
|  1|
|  0|
|  2|
+---+

Seq(100, 200, 300).toDF.write.mode("overwrite").insertInto("my_table")
scala> spark.table("my_table").show
+---+
| id|
+---+
|200|
|100|
|300|
+---+

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)