How to update partition metadata in Hive , when partition data is manualy deleted from HDFS

EDIT : Starting with Hive 3.0.0 MSCK can now discover new partitions or remove missing partitions (or both) using the following syntax : MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS] This was implemented in HIVE-17824 As correctly stated by HakkiBuyukcengiz, MSCK REPAIR doesn’t remove partitions if the corresponding folder on HDFS was manually deleted, it only … Read more

Query HIVE table in pyspark

We cannot pass the Hive table name directly to Hive context sql method since it doesn’t understand the Hive table name. One way to read Hive table in pyspark shell is: from pyspark.sql import HiveContext hive_context = HiveContext(sc) bank = hive_context.table(“default.bank”) bank.show() To run the SQL on the hive table: First, we need to register … Read more

Hive Alter table change Column Name

Change Column Name/Type/Position/Comment: ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] Example: CREATE TABLE test_change (a int, b int, c int); // will change column a’s name to a1 ALTER TABLE test_change CHANGE a a1 INT;

How to skip CSV header in Hive External Table?

As of Hive v0.13.0, you can use skip.header.line.count table property: create external table testtable (name string, message string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ location ‘/testtable’ TBLPROPERTIES (“skip.header.line.count”=”1”); Use ALTER TABLE for an existing table: ALTER TABLE tablename SET TBLPROPERTIES (“skip.header.line.count”=”1”); Please note that while it works it comes with … Read more