jump to navigation

Apache Cassandra Quick tour March 22, 2010

Posted by Terry.Cho in Distributed System.
Tags: , , , , ,
trackback

Cassandra is distributed database system. It is donated to Apache open source group by Facebook at 2008.The Cassandra is based on Google Big Table data model and Facebook Dynamo distributed architecture. It doesn’t use SQL and optimized to high scale size of data & transaction handling. Even though Cassandra is implemented with Java language, other language can use the Cassandra as a client. (It supports Ruby,Perl,Python,Scala,PHP etc).

It is used to High Scale Size SNS like Face book,Digg,Twitter etc. It doesn’t support complex relationship like Foreign Key. It just provides Key & Value relationship like Java Hashmap. It is very easy to install and use.

Let’s look at data model of Cassandra

Data Model

Cassandra is based on google big table data model. It is called “Column DB”. It is totally different from traditional RDBMS.

Column

Column data structure which consists of column name and column value.

{name: emailAddress, value:cassandra@apache.org}
{name:age , value:20}

Column Family

Column family is set of columns. It is similar to row in RDBMS table. I will explain more detail about difference between Column Family and row in RDBMS later. Column Family has a key which identify each row in data set. Each row has a number of Columns.

For example, one row is

Cassandra = { emailAddress:casandra@apache.org , age:20}

“Cassandra” is key for the row, and the row has two columns. Keys of the columns are “emailAddress” and “age”. Each column value is “casandra@apache.org” and “20”.

Let’s look at Column Family which has a number of rows.

UserProfile={
Cassandra={ emailAddress:”casandra@apache.org” , age:”20”}
TerryCho= { emailAddress:”terry.cho@apache.org” , gender:”male”}
Cath= { emailAddress:”cath@apache.org” , age:”20”,gender:”female”,address:”Seoul”}
}

One of interest thing is each row can have different scheme. Cassandra row has “emailAddress” ,”age” column. TerryCho row has “emailAddress”,”gender” column. This characteristic is called as “Schemeless” (Data structure of each row in column family can be different)

KeySpace

Keyspace is logical set of column family for management perspective. It doesn’t impact data structure.

Super Column & Super Column Family

As I mentioned earlier, column value can have a column itself. (Similar to Java Hashtable can have ValueObject class as a ‘Object’ type)

{name:”username”
value: firstname{name:”firstname”,value=”Terry”}
value: lastname{name:”lastname”,value=”Cho”}
}

As a same way column family also can have column family like this

UserList={
Cath:{
username:{firstname:”Cath”,lastname:”Yoon”}
address:{city:”Seoul”,postcode:”1234”}
}
Terry:{
username:{firstname:”Terry”,lastname:”Cho”}
account:{bank:”hana”,accounted:”1234”}
}
}

UserList column family has two rows with key “Cath” and “Terry”. Each of the “Carry” and “Terry” row  has two column families – “Cath” row has “username” and “address’ column family, “Terry” row has “username” and “account” column family.

Cassandra Quick Test

Download Cassandra from http://incubator.apache.org/cassandra/ Extract zip file and run bin/cassandra.bat

We will connect Cassandra node with CLI interface. It is located in /bin/cassandra-cli.bat

The default TCP port number is 9160. You can change the port number in “conf/storage-conf.xml”

In “/conf/storage-conf.xml” file, default key space with name “Keyspace1” is defined. Column family type of the Keyspace is like this

Let’s put a new row with key name “Terry” which has Column (key=”gender”, value=”Male”)

Comments»

1. joshy - June 9, 2010

Great Article for me or anybody who need a quick review of column family model.

Terry.Cho - June 9, 2010

thanx 🙂

2. mallikarjungunda - August 18, 2010

Hi,

I have a requirement of using Cassandra in my application. In my application there is one table with lot of data and most of my application uses that table. Due to lot of data,performance of the application is decreasing when i use that table is in Oracle.

So, I have decided to use the Cassandra database for that one table and all other tables in oracle. Lot of business logic is dependent on that table.

No my question is, Can I use the Cassandra for a table which has lot of business logic.

I am unable to implement lot of where clauses for Cassandra database.

Is there any supporting tool to use Cassandra in an efficient way?

Please let me know…
i am in urgency..

Thanks in advance

By Mallik

Terry.Cho - August 19, 2010

First in case of Cassandra, there is no tools like admin, developer tool kit etc. As i know you have to develop by your self.
Cassandra is designed for handle huge # of data quickly but it is hard to handle complex or relational data.
If you have to handle complex business logic i recommend you to use RDBMS with data base partitioning + Data Grid for cache (memcached or Oracle coherence)
Cheers
-Terry

3. Sridhar - August 26, 2010

Hi Terry.Cho

I am a newbie to cassandra.
we are planning to migrate from mysql to cassandra.
First of all im writing a sample application where in I insert data in to two tables(column families) created in cassandra.
My question is that i want the data in the tables to be automatically get deleted if it is some ‘n’ days old.(I mean i want only the last ‘x’ days data to be present in the DB).
Are there any stored procedure kind of stuff in cassandra.

How to handle this kind of issue?
Also do we have any trigger kind of support in cassandra.
Any help in this is greatly appreciated..

4. vignesh - September 24, 2010

Actually iam devoloping a report in Pentaho report Designer, my backend is mysql. now we are migrating to cassandra db. but pentaho report designer access only jdbc supported database, cassandra didnt support JDBC, is they support ODBC ? any other solution there?

5. vignesh - September 24, 2010

Actually iam devoloping reports in pentaho report designer, my database is mysql. now we are migrating our database to cassandra. now i hav one issue , pentaho report designer support only jdbc supported database, cassandra not support jdbc/odbc, any other solution to access cassandra in pentaho report designer?

6. Christophe - November 15, 2010

Very good introduction ! Thanks.

7. Jafar Mortazavian - November 21, 2010

It was simply complex! many thanks.

8. Ekrem SABAN - March 28, 2011

A column family does not correspond to a a “row” in RDBMS, but to a “table”:
“In analogy with relational databases, a column family is as a “table”, each key-value pair being a “row”.” (Wikipedia with references)

9. lycog - April 12, 2011

It is very good for beginner like me. Thanks.

10. Q - July 13, 2011

very good introduction for a newbie like me ^_^

11. Siddhartha - August 5, 2011

Nice article. Thanks.

12. dinesh kumar - September 4, 2011

it is very easy to understand for beginner ….

13. praveen - December 19, 2011

it is awsome … expecting many more exaples

14. RG - March 16, 2012

a very good intro article!!

15. HashFold › Useful bookmarks on Cassandra - January 15, 2013

[…] Data & Setup: https://javamaster.wordpress.com/2010/03/22/apache-cassandra-quick-tour/ […]

16. satyanand - March 13, 2013

Excellent article for understanding why and how cassansdra

17. Big Data annd No SQL links | Fresh Water Perl - August 17, 2013

[…] Cassandra quick tour […]

18. Cassandra LInks | Fresh Water Perl - August 19, 2013

[…] Cassandra quick tour […]


Leave a reply to RG Cancel reply