What does msck stands for in Msck repair command
Similar to how fsckstands for filesystem consistency check, msck is Hive’s metastore consistency check.
Similar to how fsckstands for filesystem consistency check, msck is Hive’s metastore consistency check.
That error you are getting in the DN log is described here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids From that page: At the moment, there seem to be two workarounds as described below. Workaround 1: Start from scratch I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude … Read more
hdfs dfs -pwd does not exist because there is no “working directory” concept in HDFS when you run commands from command line. You cannot execute hdfs dfs -cd in HDFS shell, and then run commands from there, since both HDFS shell and hdfs dfs -cd commands do not exist too, thus making the idea of … Read more
To remove all cached data: sqlContext.clearCache() Source: https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SQLContext.html If you want to remove an specific Dataframe from cache: df.unpersist()
The only way it worked for me, when I was using –master yarn-cluster
Yeah we can do that. You just need to follow below three commands in sequence. Lets say you have a external table test_1 in hive. And you want to rename it test_2 which should point test_2 location not test_1. Then you need to convert this table into Managed table using below command. test_1 -> pointing … Read more
You cannot add a column with a default value in Hive. You have the right syntax for adding the column ALTER TABLE test1 ADD COLUMNS (access_count1 int);, you just need to get rid of default sum(max_count). No changes to that files backing your table will happen as a result of adding the column. Hive handles … Read more
Below command available in Apache hadoop 2.7.0 onwards, this can be used for getting the values for the hadoop configuration properties. fs.default.name is deprecated in hadoop 2.0, fs.defaultFS is the updated value. Not sure whether this will work incase of maprfs. hdfs getconf -confKey fs.defaultFS # ( new property ) or hdfs getconf -confKey fs.default.name … Read more
Yes, here you can use LIMIT. You can try it by the below query: SELECT * FROM employee_list SORT BY salary DESC LIMIT 2
Hadoop NameNode is the centralized place of an HDFS file system which keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. In short, it keeps the metadata related to datanodes. When we format namenode it formats the meta-data related to data-nodes. By … Read more