Avro vs. Parquet

Avro is a Row based format. If you want to retrieve the data as a whole you can use Avro Parquet is a Column based format. If your data consists of a lot of columns but you are interested in a subset of columns then you can use Parquet HBase is useful when frequent updating … Read more

Name node is in safe mode. Not able to leave

In order to forcefully let the namenode leave safemode, following command should be executed: bin/hadoop dfsadmin -safemode leave You are getting Unknown command error for your command as -safemode isn’t a sub-command for hadoop fs, but it is of hadoop dfsadmin. Also after the above command, I would suggest you to once run hadoop fsck … Read more

Difference between HBase and Hadoop/HDFS

Hadoop is basically 3 things, a FS (Hadoop Distributed File System), a computation framework (MapReduce) and a management bridge (Yet Another Resource Negotiator). HDFS allows you store huge amounts of data in a distributed (provides faster read/write access) and redundant (provides better availability) manner. And MapReduce allows you to process this huge data in a … Read more

What is the difference between partitioning and bucketing a table in Hive ?

Partitioning data is often used for distributing load horizontally, this has performance benefit, and helps in organizing data in a logical fashion. Example: if we are dealing with a large employee table and often run queries with WHERE clauses that restrict the results to a particular country or department . For a faster query response … Read more

Apache Spark: The number of cores vs. the number of executors

To hopefully make all of this a little more concrete, here’s a worked example of configuring a Spark app to use as much of the cluster as possible: Imagine a cluster with six nodes running NodeManagers, each equipped with 16 cores and 64GB of memory. The NodeManager capacities, yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores, should probably be set … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)