Monday, April 20, 2009

Scaling Out – Facebook

1.       What is the problem? Is the problem real?

As web operations increase in demand and scale, a single datacenter is no longer sufficient to handle the load. Also, being in the same physical location makes it a single point of failure – transient (power failures, network failures) or permanent (earthquake). The problem is how to maintain consistency among the datacenters without compromising on responsiveness or correctness. Very real problem!

 

2.       What is the solution's main idea (nugget)?

In a standard (web server, memcache, database) architecture, the issue with multiple datacenters is the replication lag when data is updated. The modified value has to be passed on to all the databases, and stale data has to be removed from cache. The solution addresses exactly that by adding extra information in the replication stream, thereby removing data from the cache when the corresponding database is updated. Also, since only the master database can accept writes, the layer 7 load balancers make a call on where to send a user (master or slave databases) to depending on the URI (which indicates the operation).

 

3.       Why is solution different from previous work?

It is a practical implementation of some of the distributed systems concepts in maintaining consistency, and has been working well for nine months.

 

4.       Does the paper (or do you) identify any fundamental/hard trade-offs?

(i) Simplicity vs. performance: By having just one master database that can accept writes, it simplifies write operations and update propagations, but this might result in extra latency for users when they want to write, and probably scaling issues as well.

(ii) Performance vs. Correctness: During the replication lag, users are still directed to the old data. Only the master needs to update the data for write to return, making it more responsive. This is in contrast to a scheme that finishes writes only after all databases have committed the values. This scheme is suited for Facebook as the workload is probably not write-intensive.

 

5.       Do you think the work will be influential in 10 years? Why or why not?

Even if not exactly this, principles of CAP will be influential in the vastly growing world of web operations.


No comments:

Post a Comment