hadoop – Page 2 – Tarik Billa

What does msck stands for in Msck repair command

December 21, 2023 by Tarik

Similar to how fsckstands for filesystem consistency check, msck is Hive’s metastore consistency check.

No data nodes are started

December 20, 2023 by Tarik

That error you are getting in the DN log is described here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids From that page: At the moment, there seem to be two workarounds as described below. Workaround 1: Start from scratch I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude … Read more

Is there an equivalent to `pwd` in hdfs?

December 20, 2023 by Tarik

hdfs dfs -pwd does not exist because there is no “working directory” concept in HDFS when you run commands from command line. You cannot execute hdfs dfs -cd in HDFS shell, and then run commands from there, since both HDFS shell and hdfs dfs -cd commands do not exist too, thus making the idea of … Read more

How to make shark/spark clear the cache?

December 19, 2023 by Tarik

To remove all cached data: sqlContext.clearCache() Source: https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SQLContext.html If you want to remove an specific Dataframe from cache: df.unpersist()

Spark-submit not working when application jar is in hdfs

December 16, 2023 by Tarik

The only way it worked for me, when I was using –master yarn-cluster

How to rename a hive table without changing location?

December 14, 2023 by Tarik

Yeah we can do that. You just need to follow below three commands in sequence. Lets say you have a external table test_1 in hive. And you want to rename it test_2 which should point test_2 location not test_1. Then you need to convert this table into Managed table using below command. test_1 -> pointing … Read more

Add a column in a table in HIVE QL

December 14, 2023 by Tarik

You cannot add a column with a default value in Hive. You have the right syntax for adding the column ALTER TABLE test1 ADD COLUMNS (access_count1 int);, you just need to get rid of default sum(max_count). No changes to that files backing your table will happen as a result of adding the column. Hive handles … Read more

Find port number where HDFS is listening

December 11, 2023 by Tarik

Below command available in Apache hadoop 2.7.0 onwards, this can be used for getting the values for the hadoop configuration properties. fs.default.name is deprecated in hadoop 2.0, fs.defaultFS is the updated value. Not sure whether this will work incase of maprfs. hdfs getconf -confKey fs.defaultFS # ( new property ) or hdfs getconf -confKey fs.default.name … Read more

Select top 2 rows in Hive

December 10, 2023 by Tarik

Yes, here you can use LIMIT. You can try it by the below query: SELECT * FROM employee_list SORT BY salary DESC LIMIT 2

What exactly is hadoop namenode formatting?

December 6, 2023 by Tarik

Hadoop NameNode is the centralized place of an HDFS file system which keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. In short, it keeps the metadata related to datanodes. When we format namenode it formats the meta-data related to data-nodes. By … Read more