Why many refer to Cassandra as a Column oriented database?

  • If you take a look at the Readme file at Apache Cassandra git repo, it says that,

Cassandra is a partitioned row store. Rows are organized into tables
with a required primary key.

Partitioning means that Cassandra can distribute your data across
multiple machines in an application-transparent matter. Cassandra will
automatically repartition as machines are added and removed from the
cluster.

Row store means that like relational databases, Cassandra organizes
data by rows and columns.

  • Column oriented or columnar databases are stored on disk column wise.

    e.g: Table Bonuses table

      ID         Last    First   Bonus
      1          Doe     John    8000
      2          Smith   Jane    4000
      3          Beck    Sam     1000
    
  • In a row-oriented database management system, the data would be stored like this: 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;

  • In a column-oriented database management system, the data would be stored like this:
    1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;

  • Cassandra is basically a column-family store

  • Cassandra would store the above data as,

     "Bonuses" : {
           row1 : { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000},
           row2 : { "ID":2, "Last":"Smith", "First":"Jane", "Bonus":4000}
           ...
     }
  • Also, the number of columns in each row doesn’t have to be the same. One row can have 100 columns and the next row can have only 1 column.

  • Read this for more details.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)