jump to navigation

Replication architecture in Cassandra and HBase March 19, 2010

Posted by Terry.Cho in Distributed System.
Tags: , , , ,
trackback
Now i’m research distributed database architecture.
I found a very interesting article.
Apache Cassandra and Hadoop Hbase are most popular distributed database.
Twitter and Facebook are using Cassandra.
These solution is started from Google Big Table. So the data model is very similar.
The data model is called “Column database”. I will introduce the model later.
However my concern is how to replicate data across region (data centers in different region)
Here is very interesting information.
In case of Cassandra, it replicates data in every transaction. A coordinator captures changes and propagate it to other nodes.
But fiber based low latency network is required and there are no reference yet.
HBase data replication architecture looks very practical.
It captures change log and put it into replication queue. The replication message is passed to other nodes.
This mechanism is very similar to CDC (Change Data Capture).
Oracle Goden Gate, Quest Share Flex, MySQL geo replication are using this mechanism.

HBase replication looks more reasonable. It has common architecture and they have a reference.

===

After i had written this article, i got a feed back. Followed by the comment the article which i referenced is written by  fan of Hbase. Cassandra supports geo replication and has reference in face book. And Digg will deploy Cassandra in different data center.

But as i know even if facebook has two data center, they have fiber-link between the center. It is not a real geo replication. I will more research about cassandra data replication feature and re-post about this issue later.

Advertisements

Comments»

1. Jonathan Ellis - March 19, 2010

Hi Terry,

The “nosql battle” post you cite is written by an HBase fanboy and it’s unfortunately basically an anti-Cassandra FUD piece. Most of what it says about Cassandra, and some of what it says about HBase, is completely wrong.

Cassandra replication works very well in realtime across normal WAN links; Facebook’s largest Cassandra cluster spans East and West coast data centers, and Digg is deploying to 2 DCs soon.

Section 5.2 of the Cassandra whitepaper covers how this works in more detail: http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf

Terry.Cho - March 20, 2010

Thank you very much for the information.
I will review the white paper.

2. Big Data annd No SQL links | Fresh Water Perl - August 17, 2013

[…] Replication in Cassandra and HBase blog […]

3. Cassandra LInks | Fresh Water Perl - August 19, 2013

[…] Replication in Cassandra and HBase blog […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: