We cannot pass the Hive table name directly to Hive context sql method since it doesn’t understand the Hive table name. One way to read Hive table in pyspark shell is:
from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
bank = hive_context.table("default.bank")
bank.show()
To run the SQL on the hive table:
First, we need to register the data frame we get from reading the hive table.
Then we can run the SQL query.
bank.registerTempTable("bank_temp")
hive_context.sql("select * from bank_temp").show()