Why many refer to Cassandra as a Column oriented database?

Question

If you take a look at the Readme file at Apache Cassandra git repo, it says that,

Cassandra is a partitioned row store. Rows are organized into tables
with a required primary key.

Partitioning means that Cassandra can distribute your data across
multiple machines in an application-transparent matter. Cassandra will
automatically repartition as machines are added and removed from the
cluster.

Row store means that like relational databases, Cassandra organizes
data by rows and columns.

Column oriented or columnar databases are stored on disk column wise.

e.g: Table Bonuses table

  ID         Last    First   Bonus
  1          Doe     John    8000
  2          Smith   Jane    4000
  3          Beck    Sam     1000

In a row-oriented database management system, the data would be stored like this: 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;
In a column-oriented database management system, the data would be stored like this:
1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;
Cassandra is basically a column-family store
Cassandra would store the above data as,

     "Bonuses" : {
           row1 : { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000},
           row2 : { "ID":2, "Last":"Smith", "First":"Jane", "Bonus":4000}
           ...
     }

Also, the number of columns in each row doesn’t have to be the same. One row can have 100 columns and the next row can have only 1 column.
Read this for more details.

Leave a Comment Cancel reply