Explode the Array of Struct in Hive

You need to explode only once (in conjunction with LATERAL VIEW). After exploding you can use a new column (called prod_and_ts in my example) which will be of struct type. Then, you can resolve the product_id and timestamps members of this new struct column to retrieve the desired result. SELECT user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps … Read more

How to export data from Spark SQL to CSV

You can use below statement to write the contents of dataframe in CSV format df.write.csv(“/data/home/csv”) If you need to write the whole dataframe into a single CSV file, then use df.coalesce(1).write.csv(“/data/home/sample.csv”) For spark 1.x, you can use spark-csv to write the results into CSV files Below scala snippet would help import org.apache.spark.sql.hive.HiveContext // sc – … Read more

PySpark: withColumn() with two conditions and three outcomes

There are a few efficient ways to implement this. Let’s start with required imports: from pyspark.sql.functions import col, expr, when You can use Hive IF function inside expr: new_column_1 = expr( “””IF(fruit1 IS NULL OR fruit2 IS NULL, 3, IF(fruit1 = fruit2, 1, 0))””” ) or when + otherwise: new_column_2 = when( col(“fruit1”).isNull() | col(“fruit2”).isNull(), … Read more

Hive insert query like SQL

Some of the answers here are out of date as of Hive 0.14 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL It is now possible to insert using syntax such as: CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2)); INSERT INTO TABLE students VALUES (‘fred flintstone’, 35, 1.28), (‘barney rubble’, 32, 2.32);

How to Update/Drop a Hive Partition?

You can update a Hive partition by, for example: ALTER TABLE logs PARTITION(year = 2012, month = 12, day = 18) SET LOCATION ‘hdfs://user/darcy/logs/2012/12/18’; This command does not move the old data, nor does it delete the old data. It simply sets the partition to the new location. To drop a partition, you can do … Read more