Monday, April 20, 2009

WheelFS

1.       What is the problem? Is the problem real?

Applications have varying requirements of consistency, replication, availability etc. and often would like to have control over these settings. WheelFS is a solution to that end. This is a very practical problem with different systems wanting to turn these knobs differently.

2.       What is the solution's main idea (nugget)?

The key contribution of this paper is introducing a set of cues using which applications can better specify the properties they would prefer, in the functioning of the distributed file system. These cues can better control the trade-offs between consistency, availability and placement.

3.       Why is solution different from previous work?

The solution is different from previous work in that WheelFS provides explicit high-level knobs to control the functioning of the system. But I did find it hard to clearly point out the contrast w.r.t. prior work.

4.       Do you think the work will be influential in 10 years? Why or why not?

Yes, I definitely think these principles will be influential. But to be fair, many of these “cues” have already been deployed in specialized settings and it would be interesting to see how people react to this “generic” architecture. In my opinion, preference would still be for a specialized solution that is tailored for the scenario in hand, and works.

 

Scaling Out – Facebook

1.       What is the problem? Is the problem real?

As web operations increase in demand and scale, a single datacenter is no longer sufficient to handle the load. Also, being in the same physical location makes it a single point of failure – transient (power failures, network failures) or permanent (earthquake). The problem is how to maintain consistency among the datacenters without compromising on responsiveness or correctness. Very real problem!

 

2.       What is the solution's main idea (nugget)?

In a standard (web server, memcache, database) architecture, the issue with multiple datacenters is the replication lag when data is updated. The modified value has to be passed on to all the databases, and stale data has to be removed from cache. The solution addresses exactly that by adding extra information in the replication stream, thereby removing data from the cache when the corresponding database is updated. Also, since only the master database can accept writes, the layer 7 load balancers make a call on where to send a user (master or slave databases) to depending on the URI (which indicates the operation).

 

3.       Why is solution different from previous work?

It is a practical implementation of some of the distributed systems concepts in maintaining consistency, and has been working well for nine months.

 

4.       Does the paper (or do you) identify any fundamental/hard trade-offs?

(i) Simplicity vs. performance: By having just one master database that can accept writes, it simplifies write operations and update propagations, but this might result in extra latency for users when they want to write, and probably scaling issues as well.

(ii) Performance vs. Correctness: During the replication lag, users are still directed to the old data. Only the master needs to update the data for write to return, making it more responsive. This is in contrast to a scheme that finishes writes only after all databases have committed the values. This scheme is suited for Facebook as the workload is probably not write-intensive.

 

5.       Do you think the work will be influential in 10 years? Why or why not?

Even if not exactly this, principles of CAP will be influential in the vastly growing world of web operations.


Wednesday, April 15, 2009

Portable Cloud Computing, Google AppEngine

I will club the other articles for this class together as they touch upon the same theme. 

Two commercial options for using the cloud are available now – Amazon’s S3/EC2 and Google’s AppEngine. The former sort of provides just the machines and resources, and lets the user do whatever he wants. The latter is a more structured approach and allows users a set of APIs to use the cloud facilities (like Google Query Language (GQL) to access the datastore etc. and host applications). While the AppEngine is particularly attractive because it automatically gives applications access to all the nice scalability features, there is a warning that this has the potential to tie applications to the Google API for using clouds. For example, you cannot take an EC2 service and run it on AppEngine, while the reverse is true. While the AppDrop does help in porting AppEngine applications to run flawlessly on EC2, it comes at the cost of scalability. True, someone can still hack in and provide all the database and scalability support, but this is an ugly and potentially dangerous way to move forward. 

This calls for the community to take stock of the situation and push towards a standard and open cloud API, with open source implementations. If you are looking for an inspirational model, there is always LAMP! :-)

The Open Cloud Manifesto

1..       What is the problem? Is the problem real?

Cloud computing is in its infancy now, and users of the cloud range from big corporations to small users relying on the cloud for “hosting” abilities. This paper aims to start a discussion to understand the benefits and risks of cloud computing – very real problem!

 2.       What is the solution's main idea (nugget)?

It is important for the community to come up with a set of open standards that enable innovation below the API with different organizations deploying different techniques, but not tying down the application to any particular interface. Applications should be able to seamlessly “shift” across clouds. Also, if clouds were to become a “service”, it is imperative that there are tight security guarantees as well proper metering and monitoring systems.

 3.       Does the paper (or do you) identify any fundamental/hard trade-offs?

While third-party cloud providers (even if proprietary) greatly reduce the overhead of startups, they have the long-term effect of possibly tying down the application to the specific set of interfaces needed to use the cloud. Likewise, the security guarantees of data leakage etc. provided by third-party clouds are not strong. This makes the prospect of being tied to a particular cloud provider even more shaky.

 4.       Do you think the work will be influential in 10 years? Why or why not?

I think this will be influential. The emergence of an open standard for cloud providers seems imperative, more so because the deployment seems to be progressing hand-in-hand with a reasonable revenue model. Also, the fact that this paper pushes towards good monitoring and metering means that it is serious about this being commercially viable.

 5.       Others:

a. Third-party clouds being shared by different corporations/users presents a great opportunity to reduce power wastage.