How should international geographical addresses be stored in a relational database?

I will summarize my thoughts from my blog post – A lesson in address storage (on archive.org). On my current project [I work for a logistics company] we’re storing international addresses. I’ve done research on addresses all over the world in the design of this portion of the database. There’s a lot of different formats. … Read more

Why is Solr so much faster than Postgres?

First, Solr doesn’t use B-trees. A Lucene (the underlying library used by Solr) index is made of a read-only segments. For each segment, Lucene maintains a term dictionary, which consists of the list of terms that appear in the segment, lexicographically sorted. Looking up a term in this term dictionary is made using a binary … Read more

What exactly is a wide column store?

Let’s start with the definition of a wide column database. Its architecture uses (a) persistent, sparse matrix, multi-dimensional mapping (row-value, column-value, and timestamp) in a tabular format meant for massive scalability (over and above the petabyte scale). A relational database is designed to maintain the relationship between the entity and the columns that describe the … Read more

Is there any way to get the column name along with the output while execute any query in Hive?

If we want to see the columns names of the table in HiveQl, the following hive conf property should be set to true. hive> set hive.cli.print.header=true; If you prefer to see the column names always then update the $HOME/.hiverc file with the above setting in the first line.. –Hive automatically looks for a file named … Read more

Standard use of ‘Z’ instead of NULL to represent missing data?

Sack your contractor. Okay, seriously, this isn’t standard practice. This can be seen simply because all RDBMS that I have ever worked with implement NULL, logic for NULL, take account of NULL in foreign keys, have different behaviour for NULL in COUNT, etc, etc. I would actually contend that using ‘Z’ or any other place … Read more

Best representation of an ordered list in a database?

Solution: make index a string (because strings, in essence, have infinite “arbitrary precision”). Or if you use an int, increment index by 100 instead of 1. The performance problem is this: there is no “in between” values between two sorted items. item index —————– gizmo 1 <<—— Oh no! no room between 1 and 2. … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)