hbase – Page 2 – Tarik Billa

How to read from hbase using spark

May 22, 2023 by Tarik

A Basic Example to Read the HBase data using Spark (Scala), You can also wrtie this in Java : import org.apache.hadoop.hbase.client.{HBaseAdmin, Result} import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor } import org.apache.hadoop.hbase.mapreduce.TableInputFormat import org.apache.hadoop.hbase.io.ImmutableBytesWritable import org.apache.spark._ object HBaseRead { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(“HBaseRead”).setMaster(“local[2]”) val sc = new SparkContext(sparkConf) val conf = HBaseConfiguration.create() val … Read more

Hive load CSV with commas in quoted fields

April 27, 2023 by Tarik

If you can re-create or parse your input data, you can specify an escape character for the CREATE TABLE: ROW FORMAT DELIMITED FIELDS TERMINATED BY “,” ESCAPED BY ‘\\’; Will accept this line as 4 fields 1,some text\, with comma in it,123,more text

Hbase quickly count number of rows

April 10, 2023 by Tarik

Use RowCounter in HBase RowCounter is a mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in … Read more

How does Hive compare to HBase?

April 8, 2023 by Tarik

It’s hard to find much about Hive, but I found this snippet on the Hive site that leans heavily in favor of HBase (bold added): Hive is based on Hadoop which is a batch processing system. Accordingly, this system does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting … Read more

Scalable Image Storage

April 8, 2023 by Tarik

We have been using CouchDB for that, saving images as an “Attachment”. But after a year the multi-dozen GB CouchDB Database files turned out to be a headache. For example CouchDB replication still has issues if you use it with very large document sizes. So we just rewrote our software to use CouchDB for image … Read more

Command like SQL LIMIT in HBase

April 4, 2023 by Tarik

From the HBase shell you can use LIMIT: hbase> scan ‘test-table’, {‘LIMIT’ => 5} From the Java API you can use Scan.setMaxResultSize(N) or scan.setMaxResultsPerColumnFamily(N). HBase API docs – Scan.setMaxResultSize HBase API docs – Scan.setMaxResultsPerColumnFamily

Large scale data processing Hbase vs Cassandra [closed]

January 29, 2023 by Tarik

As a Cassandra developer, I’m better at answering the other side of the question: Cassandra scales better. Cassandra is known to scale to over 400 nodes in a cluster; when Facebook deployed Messaging on top of HBase they had to shard it across 100-node HBase sub-clusters. Cassandra supports hundreds, even thousands of ColumnFamilies. “HBase currently … Read more

How to delete all data from solr and hbase

December 25, 2022 by Tarik

If you want to clean up Solr index – you can fire http url – http://host:port/solr/[core name]/update?stream.body=<delete><query>*:*</query></delete>&commit=true (replace [core name] with the name of the core you want to delete from). Or use this if posting data xml data: <delete><query>*:*</query></delete> Be sure you use commit=true to commit the changes Don’t have much idea with clearing … Read more

Difference between HBase and Hadoop/HDFS

November 23, 2022 by Tarik

Hadoop is basically 3 things, a FS (Hadoop Distributed File System), a computation framework (MapReduce) and a management bridge (Yet Another Resource Negotiator). HDFS allows you store huge amounts of data in a distributed (provides faster read/write access) and redundant (provides better availability) manner. And MapReduce allows you to process this huge data in a … Read more

When to use Hadoop, HBase, Hive and Pig?

October 18, 2022 by Tarik

MapReduce is just a computing framework. HBase has nothing to do with it. That said, you can efficiently put or fetch data to/from HBase by writing MapReduce jobs. Alternatively you can write sequential programs using other HBase APIs, such as Java, to put or fetch the data. But we use Hadoop, HBase etc to deal … Read more