Posted by Terry.Cho in Distributed System.
Tags: Cassandra, Distributed System, geo replication, Hadoop, HBase
Now i’m research distributed database architecture.
I found a very interesting article.
Apache Cassandra and Hadoop Hbase are most popular distributed database.
Twitter and Facebook are using Cassandra.
These solution is started from Google Big Table. So the data model is very similar.
The data model is called “Column database”. I will introduce the model later.
However my concern is how to replicate data across region (data centers in different region)
Here is very interesting information.
In case of Cassandra, it replicates data in every transaction. A coordinator captures changes and propagate it to other nodes.
But fiber based low latency network is required and there are no reference yet.
HBase data replication architecture looks very practical.
It captures change log and put it into replication queue. The replication message is passed to other nodes.
This mechanism is very similar to CDC (Change Data Capture).
Oracle Goden Gate, Quest Share Flex, MySQL geo replication are using this mechanism.
HBase replication looks more reasonable. It has common architecture and they have a reference.
After i had written this article, i got a feed back. Followed by the comment the article which i referenced is written by fan of Hbase. Cassandra supports geo replication and has reference in face book. And Digg will deploy Cassandra in different data center.
But as i know even if facebook has two data center, they have fiber-link between the center. It is not a real geo replication. I will more research about cassandra data replication feature and re-post about this issue later.