hive – Page 3 – Tarik Billa

Querying on multiple Hive stores using Apache Spark

September 24, 2023 by Tarik

I think this is possible by making use of Spark SQL capability of connecting and reading data from remote databases using JDBC. After an exhaustive R & D, I was successfully able to connect to two different hive environments using JDBC and load the hive tables as DataFrames into Spark for further processing. Environment details … Read more

Query HIVE table in pyspark

September 19, 2023 by Tarik

We cannot pass the Hive table name directly to Hive context sql method since it doesn’t understand the Hive table name. One way to read Hive table in pyspark shell is: from pyspark.sql import HiveContext hive_context = HiveContext(sc) bank = hive_context.table(“default.bank”) bank.show() To run the SQL on the hive table: First, we need to register … Read more

Skip first line of csv while loading in hive table

September 18, 2023 by Tarik

To get this you can use hive’s property which is TBLPROPERTIES (“skip.header.line.count”=”1”) you can also refer example – CREATE TABLE temp ( name STRING, id INT ) row format delimited fields terminated BY ‘\t’ lines terminated BY ‘\n’ tblproperties(“skip.header.line.count”=”1”);

java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

September 18, 2023 by Tarik

Looks like problem with your metastore. If you are using the default hive metastore embedded derby. Lock file would be there in case of abnormal exit. if you remove that lock file this issue would be solved rm metastore_db/*.lck

Can I change a table from internal to external in hive?

September 16, 2023 by Tarik

ALTER TABLE <table> SET TBLPROPERTIES(‘EXTERNAL’=’TRUE’) Note: EXTERNAL and TRUE need to caps or it will not work

select rows in sql with latest date for each ID repeated multiple times [duplicate]

September 5, 2023 by Tarik

This question has been asked before. Please see this question. Using the accepted answer and adapting it to your problem you get: SELECT tt.* FROM myTable tt INNER JOIN (SELECT ID, MAX(Date) AS MaxDateTime FROM myTable GROUP BY ID) groupedtt ON tt.ID = groupedtt.ID AND tt.Date = groupedtt.MaxDateTime

Is there a way to make a multi line comment in hive scripts

September 1, 2023 by Tarik

As per my knowledge, multi-line comments are not supported in Hive scripts as of now. Seems like this JIRA introduced only single line comments, starting with — in Hive 0.8

Loading Data from a .txt file to Table Stored as ORC in Hive

August 30, 2023 by Tarik

LOAD DATA just copies the files to hive datafiles. Hive does not do any transformation while loading data into tables. So, in this case the input file /home/user/test_details.txt needs to be in ORC format if you are loading it into an ORC table. A possible workaround is to create a temporary table with STORED AS … Read more

Hive Alter table change Column Name

August 26, 2023 by Tarik

Change Column Name/Type/Position/Comment: ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] Example: CREATE TABLE test_change (a int, b int, c int); // will change column a’s name to a1 ALTER TABLE test_change CHANGE a a1 INT;

what is HiveServer and Thrift server [closed]

August 24, 2023 by Tarik

HiveServer2 (HS2) is a service that enables clients to execute queries against Hive. HiveServer2 is the successor to HiveServer1 which has been deprecated. HS2 supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC. You can find more details about hiveserver at https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Overview Hive Service … Read more