Query HIVE table in pyspark

Question

We cannot pass the Hive table name directly to Hive context sql method since it doesn’t understand the Hive table name. One way to read Hive table in pyspark shell is:

from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
bank = hive_context.table("default.bank")
bank.show()

To run the SQL on the hive table:
First, we need to register the data frame we get from reading the hive table.
Then we can run the SQL query.

bank.registerTempTable("bank_temp")
hive_context.sql("select * from bank_temp").show()

Leave a Comment Cancel reply