How to read from hbase using spark

A Basic Example to Read the HBase data using Spark (Scala), You can also wrtie this in Java : import org.apache.hadoop.hbase.client.{HBaseAdmin, Result} import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor } import org.apache.hadoop.hbase.mapreduce.TableInputFormat import org.apache.hadoop.hbase.io.ImmutableBytesWritable import org.apache.spark._ object HBaseRead { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(“HBaseRead”).setMaster(“local[2]”) val sc = new SparkContext(sparkConf) val conf = HBaseConfiguration.create() val … Read more

How does Hive compare to HBase?

It’s hard to find much about Hive, but I found this snippet on the Hive site that leans heavily in favor of HBase (bold added): Hive is based on Hadoop which is a batch processing system. Accordingly, this system does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting … Read more

Scalable Image Storage

We have been using CouchDB for that, saving images as an “Attachment”. But after a year the multi-dozen GB CouchDB Database files turned out to be a headache. For example CouchDB replication still has issues if you use it with very large document sizes. So we just rewrote our software to use CouchDB for image … Read more

Command like SQL LIMIT in HBase

From the HBase shell you can use LIMIT: hbase> scan ‘test-table’, {‘LIMIT’ => 5} From the Java API you can use Scan.setMaxResultSize(N) or scan.setMaxResultsPerColumnFamily(N). HBase API docs – Scan.setMaxResultSize HBase API docs – Scan.setMaxResultsPerColumnFamily

Large scale data processing Hbase vs Cassandra [closed]

As a Cassandra developer, I’m better at answering the other side of the question: Cassandra scales better. Cassandra is known to scale to over 400 nodes in a cluster; when Facebook deployed Messaging on top of HBase they had to shard it across 100-node HBase sub-clusters. Cassandra supports hundreds, even thousands of ColumnFamilies. “HBase currently … Read more

How to delete all data from solr and hbase

If you want to clean up Solr index – you can fire http url – http://host:port/solr/[core name]/update?stream.body=<delete><query>*:*</query></delete>&commit=true (replace [core name] with the name of the core you want to delete from). Or use this if posting data xml data: <delete><query>*:*</query></delete> Be sure you use commit=true to commit the changes Don’t have much idea with clearing … Read more

Difference between HBase and Hadoop/HDFS

Hadoop is basically 3 things, a FS (Hadoop Distributed File System), a computation framework (MapReduce) and a management bridge (Yet Another Resource Negotiator). HDFS allows you store huge amounts of data in a distributed (provides faster read/write access) and redundant (provides better availability) manner. And MapReduce allows you to process this huge data in a … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)