jump to navigation

Apache Cassandra Quick tour March 22, 2010

Posted by Terry.Cho in Distributed System.
Tags: , , , , ,

Cassandra is distributed database system. It is donated to Apache open source group by Facebook at 2008.The Cassandra is based on Google Big Table data model and Facebook Dynamo distributed architecture. It doesn’t use SQL and optimized to high scale size of data & transaction handling. Even though Cassandra is implemented with Java language, other language can use the Cassandra as a client. (It supports Ruby,Perl,Python,Scala,PHP etc).

It is used to High Scale Size SNS like Face book,Digg,Twitter etc. It doesn’t support complex relationship like Foreign Key. It just provides Key & Value relationship like Java Hashmap. It is very easy to install and use.

Let’s look at data model of Cassandra

Data Model

Cassandra is based on google big table data model. It is called “Column DB”. It is totally different from traditional RDBMS.


Column data structure which consists of column name and column value.

{name: emailAddress, value:cassandra@apache.org}
{name:age , value:20}

Column Family

Column family is set of columns. It is similar to row in RDBMS table. I will explain more detail about difference between Column Family and row in RDBMS later. Column Family has a key which identify each row in data set. Each row has a number of Columns.

For example, one row is

Cassandra = { emailAddress:casandra@apache.org , age:20}

“Cassandra” is key for the row, and the row has two columns. Keys of the columns are “emailAddress” and “age”. Each column value is “casandra@apache.org” and “20”.

Let’s look at Column Family which has a number of rows.

Cassandra={ emailAddress:”casandra@apache.org” , age:”20”}
TerryCho= { emailAddress:”terry.cho@apache.org” , gender:”male”}
Cath= { emailAddress:”cath@apache.org” , age:”20”,gender:”female”,address:”Seoul”}

One of interest thing is each row can have different scheme. Cassandra row has “emailAddress” ,”age” column. TerryCho row has “emailAddress”,”gender” column. This characteristic is called as “Schemeless” (Data structure of each row in column family can be different)


Keyspace is logical set of column family for management perspective. It doesn’t impact data structure.

Super Column & Super Column Family

As I mentioned earlier, column value can have a column itself. (Similar to Java Hashtable can have ValueObject class as a ‘Object’ type)

value: firstname{name:”firstname”,value=”Terry”}
value: lastname{name:”lastname”,value=”Cho”}

As a same way column family also can have column family like this


UserList column family has two rows with key “Cath” and “Terry”. Each of the “Carry” and “Terry” row  has two column families – “Cath” row has “username” and “address’ column family, “Terry” row has “username” and “account” column family.

Cassandra Quick Test

Download Cassandra from http://incubator.apache.org/cassandra/ Extract zip file and run bin/cassandra.bat

We will connect Cassandra node with CLI interface. It is located in /bin/cassandra-cli.bat

The default TCP port number is 9160. You can change the port number in “conf/storage-conf.xml”

In “/conf/storage-conf.xml” file, default key space with name “Keyspace1” is defined. Column family type of the Keyspace is like this

Let’s put a new row with key name “Terry” which has Column (key=”gender”, value=”Male”)


REST Architecture overview November 12, 2009

Posted by Terry.Cho in Uncategorized.
Tags: , , ,
1 comment so far

REST Architecture

REST was introduced in 2000 from a thesis written by Roy Fielding, one of the founders of web (HTTP). As the current architecture wasn’t able to make full use of the superiority of the original web design, the Representational Safe Transfer (REST) was introduced as a network-based architecture that could best utilize the merits of web.

Basics of REST

Simply put, REST is a HTTP URI + HTTP Method, which clearly states the target resource through URL and defines the operations of such relevant resource through Method.


One of the key features of REST is representing all resources as ‘Resource’. This Resource is expressed through HTTP URL. For example, a user named ‘bcho’ from a javastudy website can be expressed as http://www.javastudy.co.kr/users/bcho while a HP printing machine located at the 9th floor of a Gangnam office can be expressed as http://printers/localtion/seoul/kangnamgu/9f/hp. In this way, all resources can be expressed through HTTP URL.


Then how are the operations of the relevant resource represented? In this case, HTTP Method is utilized.

*  In order to bring the member information of bcho from a javastudy website,
URI : http://www.javastudy.co.kr/users/bcho
Method : GET

* Also in order to create the relevant member,
URI : http://www.javastudy.co.kr/users/bcho
Method : POST
<name>Cho Dae Hyup</name>

*  In order to delete the relevant member,
URI : http://www.javastudy.co.kr/users/bcho
Method : DELETE

*  In order to change the information of the relevant member,
URI : http://www.javastudy.co.kr/users/bcho
Method : PUT
<name>Cho Dae Hyup</name>

* In other words, the 4 Methods from the HTTP Protocol define CRUD of the Resource.

HTTP Method Meaning
POST Create
GET Select
PUT Create or Update

Shortcomings of REST

However, there are several shortcomings such as the number of available Methods is only 4. For example, Methods like ‘send email’ or ‘log write’ cannot clearly be expressed through HTTP Method.

As the existing programming style has taken an operation-oriented approach centered on functions or Methods, those methods conflict with the resource-based approaches that REST embraces. The reason for REST being called as architecture rather than a simple protocol is that it can be appropriately applied to resources that have CRUD (ex. DBMS).

Then how can these shortcomings be resolved?

As a matter of fact, CRUD is not enough to express all operations. To express operations with control or functional features, HTTP/PUT or POST Method is used or a functional approach is required. For example, a ‘send mail’ operation can be changed into a meaning of ‘create a mail to send it to someone’ through HTTP/POSThttp://www.xxx./sendmail/to/{emailaddress}.

Still, there are cases that cannot change the meaning of the context.

In these cases, HTTP/PUT or URL is used to grant the meaning of control. The grade of user ID bcho can be changed through the following.

For example, http://www.xxx/users/bcho/upgrade

In fact, the most difficult challenge in designing a REST-based architecture is how to define this URL. One of the merits of REST is that it is quite easy to grasp the meanings through this URL or HTTP Method. Therefore, much effort is required in defining the URL.

Pros & Cons


The existing web infrastructure can be utilized in tact.

This is one of the largest advantages. As the existing HTTP is used as it is, there is no need to break a firewall when making a remote calling and the load balancer equipment like L4 can be used in tact.

The exciting thing is that the web cache can be used as it is. Because every resource is expressed uniquely through URL, it can be stored within the web cache and especially operations that are selective can be returned by this cache without going through actual business transactions. Such features are strongly beneficial from performance and resource utilization perspectives.


Compared with the web service, which encompasses so many cumbersome SPECs like WS*-I, WS Reliable Messaging, WS Transaction, REST doesn’t need a separate SPEC. Generally REST is called as a ‘Defactor standard’, which only requires proper use of HTTP URL and Method.


No standard and, therefore, hard to manage

The reason that REST is drawing much public attention these days is because non enterprise companies like Google, Yahoo and Amazon are eyeing away from the complexity and difficult standards of web service. As the meanings of data don’t sound like mission critical business requirements, a standard that is easy enough to exchange data transactions would be satisfactory and doesn’t need to be at an enterprise-level. Also there is no company like vendors that wants to take the lead in creating a standard.

There are only standard-looking ones that are used frequently and being created tacitly. (These are called as ‘Defactor standard’.)

As the standards are ambiguous, it becomes a problem in managing them during development. Considering that a clear-cut standard paves the way for a development process or pattern to be created in accordance with several SPECs, a proprietary standard of REST should first be established and used to design a REST-based system. However, in some cases, a misunderstanding on the REST concept can place a REST flag on the wrong architecture. In fact, Flickr.com – a leading runner of WEB 2.0 – once evoked a controversy around REST architecture by putting the name, ‘hybrid REST’ on the API that was designed in a RPC style without internalizing the REST features.

Note. When Flickr’s Hybrid Rest processes a certain operation,

Methods are handed over to query string in the form of http://URL/operation?name=operationname. At a glance, this looks like a RESTful design. But actually the URL for every resource is the same and the operations are simply divided by query string. This goes against the original design principle of REST, which grants a unique URL to every resource.

However, it is worrisome that many local people misunderstand such kind of design as REST and some portals actually offer this kind of services.

As mentioned-above shortly, REST is not a protocol like web service. It is architecture. Because it is a resource-based architecture, system should be designed suitable to the REST concept.

Use of Alternative Key

For example, resource is usually represented by one row from the DB. In the case of DB, primary keys exist in the format of complex keys. (Multiple columns are combined to become one PK, in this case) Although this can be a valid design for the DB, HTTP URL has a hierarchical structure according to / and, therefore, such representation becomes unnatural.

For example, if the PK of DB is defined as “residence registration # of the householder” + “region” + “name,” there is no problem expressing like such in the DB. But this way of definition becomes unnatural (having a strange meaning) when being expressed in REST because it would look like userinfo/{ residence registration # of the householder }/{ region }/{ name}.

In addition, there are many problems in assigning a unique key to a resource and one way to resolve this problem is to use an alternative key (AK). In this case, a unique key with no meaning acts like a key and be used in a field called ‘AK’ within the DB table. Already Google’s REST adopted an architecture using such kind of AK.

However, adding an AK field to the DB means the overall DB design needs to be changed. If this is the case, an architectural approach is required when using REST because the architecture of the overall system must be changed for REST.

Other options to make a RESTful design can include 1) techniques that express the relations between resources through href (link), 2) versioning method, 3) naming rule, 4) cross cutting concern transaction by using ESB, 5) routing, etc. The architecture design method and highly-advanced REST will be covered in the next contribution, titled ‘Highly-Advanced REST Architecture’.

Misunderstanding on REST

REST = HTTP + XML Protocol?

I once proposed to have a discussion on REST at one local community web site on the grounds of designing a highly-advanced REST system for the project. Then, I became to realize that most people simply understood REST as something sending an XML by using HTTP.


REST is an architecture that expresses the resource by best-utilizing the web features. (It is not a protocol) And of course, it must use HTTP. However, XML is not mandatory. It is still ok to use other languages like JSON or YAML. Depending on how well the resource is represented and the web features are utilized determine a correct understanding on the REST architecture concept.

The reason why I am writing this paper is because there is no document that explains the REST concept or principle well enough.

Is REST easier than WebService?

Not exactly. Of course, REST would be much easier than Web Service if being developed from scratch. Because creating a proprietary standard, designing a message with a simple XML, and simply sending that message through HTTP would be required only. However this is looking from a service provider prospective. For those who actually use this service, any XML or JSON data being returned upon request to the HTTP client need to be parsed one by one.

What about web service? Thanks to its clear-cut standard, a web service will be created automatically once the coding takes place based on POJO or JAX-WS. Especially the client stub can be created automatically through WSDL according to the given service contract. For this reason, even though users don’t know the protocol spec at all, web service can be easily utilized as if calling and using a java library.

From my opinion, it looks easier to develop a web service rather than REST. The simplest and the most productive option is a WS-I based web service that is simple enough and well arranged.

Prospect of REST

Not much demand for REST exists in Korea yet. Maybe this is because of users trying to avoid designing highly complicated system or maybe because not much demand has been created for an open-style system.

However already prestigious overseas web sites are offering services under a REST-based architecture. Amazon, one of the representative open API companies is planning to do a REST-based migration on the open APIs previously developed with web service. Although REST is a defector standard, it is known as a difficult standard, which is unavoidable in the web world.

In addition, REST rising from the open side is about to be included in the next JEE6 version as being added as a JAX-RS (JSR311), one of J2EE Specs. (For an open source, REST is included in Apache CXF and a framework called ‘Jersey’ of Sun. Even though REST is translated into a specification, it would be a specification about ‘how’ to implement and still its flexibility will stay unchanged.) A specification called, ‘WADL’ which is similar to ‘WSDL’ of web service has been released as a standard for REST. However, ‘WADL’ only indicates URLs for REST service and yet representing the schema of every message within the actual operations. For this reason, it is right to say that there is no standardized service contract for REST yet, and it is difficult to judge that whether this would serve as an advance or disadvantage in the future.


For its advantage of simplicity and maximum utilization of web features, REST is being spread out surprisingly in the overseas region. As a rare technology developed from the open side, REST has even been adopted as a standard technology (JSR-311).

Still, due to its strong flexibility, REST cannot be controlled or managed easily and, for this reason, hasn’t yet been widely used in the Enterprise System but instead being mostly used for service system. Due to a lack of understanding on the REST concept in Korea as well as inactivated open API, REST is not a popular option yet.

However, it is time for local developers to make preparation for using REST as this concept is becoming main-stream among prestigious international service companies.

In the next contribution, a more sophisticated REST architecture will be covered.