By John | April 16, 2008
I was reading an interesting article this morning by Dries Buytaert, the founder of Drupal, and in my opinion he is a genius. In his article “Drupal in the Clouds“, he discusses opportunities for Drupal moving up the stack and taking advantage of the new cloud technologies. Some of the issues for a highly scalable Drupal delivery are the care, feeding, and tuning of the stack that is required to run Drupal (i.e., the LAMP stack). A well designed highly scalable Drupal implementation requires load balancers, multiple web servers, distributed memory caching, database clustering, and possibly even CDN integration with something like Akamai. A Drupal implementation can start to sound more like a data center with all of it’s glorious care and feeding.
Drift in the clouds. Can cloud technology improve a Drupal implementation? In Dries opinion services like Amazon’s EC2 will make server resource planing (i.e., capacity planning) easier and cloud database vendors like Elastra will make cloud relational databases more scalable. Dries also talks about what CouchDB and Hadoop might provide in the future for Drupal. I am all in on this. I would love to see Mr. Buytaert and gang take on these tasks and make the Drupal expereince even better than it already is.
However, in my opinion there are three specific areas where cloud technology can help improve Drupal’s infrastructure and scalability issues:
The key to a well designed cloud is it’s ability to provision resources rapidly. Products like Amazon’s EC2 can provision one to many servers in less than a minute. The idea is that the cloud provider has a set of predefined catalog images (i.e., stacks) that can be selected and started on demand. In the case of Drupal, most cloud providers have a catalog of LAMP stack images that a cloud customer can spin up within minutes. However, these stacks in most case don’t include the Drupal core let alone all the select Drupal modules. Also, the cloud customer still has to add all the secret sauce Dries describes above. Vendors like CohesiveFT and rPath make this process a little easier because they provide factory like interfaces for customers to create and share their own pre-built stacks. However, this still leaves the customer with a single image delivery model. Moving up the food chain of provisioning, vendors like RightScale and 3Tera are disrupting the base cloud technologies. These vendors provide multiple server infrastructure images like the ones depicted in the following example.
3Tera and RightScale give a cloud customer the ability to pick from a catalog of pre-built infrastructures. In some of the testing I have done on 3Tera, I have been able to spin up an infrastructure like the one depicted in the previous picture in less than 10 minutes. 3Tera does not run on Amazon’s EC2. RightScale, which has a similar interface to 3Tera, today only runs on top of EC2. I have interviewed a few of these vendors and users. I am finding out that there is an eco system of players building “Best Practice” infrastructures based on these complex platforms. The idea is that a consultant that is really good at MySql clustering can create a 3Tera or RightScale catalog image and charge a reasonable uplift for his or her efforts. The end result is that as a Drupal consumer I get to spin up a very complex infrastructure that is designed by a subject matter expert in a matter of minutes. When gentleman like Buytaert and experts like the Tag1 consulting company he discusses start creating 3Tera and RightScale images, that’s when the game will start to change.
Provisioning servers in minutes and infrastructures in less than 10 minutes, as cool as it sounds, by itself is really just a nice-to-have. Without elasticity (i.e., the ability to dynamically grow or shrink servers), it is useless in a highly scalable infrastructure. A state of the art cloud provider must provide elasticity. In the case of Drupal a customer, he needs to have an infrastructure in place that allows for bursts of activity. The classic example is the slashdot effect; however, better examples are sites that promote special events. Recently a relatively unknown author was spotlighted on the Oprah show and millions of visitors hit his poor two little servers. He probably lost a lot of potential customers that first time. He has since hooked up with 3Tera and that mistake should not happen again. Flexiscale over in the UK told me that they got into the cloud business because one of their customer’s ran a video stream for a well known UK rock band, and it took down their complete infrastructure. They decided they would never let that happen again and came up with a cloud a year before clouds were invented. The key to implementing elasticity is providing autonomic like provisioning. RightScale uses CollectD to integrate performance metrics with it’s provisioning scripting language. A cloud customer can select predefined metrics to make on demand provisiong decisions. Here is an example of one that can be used in the RightScale infrastructure for adding new servers.
Here is a sample alert template from RightScale.
high network tx activity
if interface/if_octets-eth0.tx > ‘50000000′ for 30 min then escalate to ‘critical’.
Escalations are the actions that are defined to be taken.
Here again, the disruptive opportunity is when individuals like Dries Buytaert, companies like Zenoss and cloud providers like RightScale start to join collaborative forces.
Actually everything we have discussed thus far about provisioning and elasticity can be done today. What the future holds, in my opinion, is how can products like Drupal take advantage of things like CouchDB, Hadoop, and cloud table databases like Amazon’s SimpleDB? Will there be ways to marry enterprise search technologies like Hadoop to highly scalable taxonomy/category searches. There are already a lot of interesting activities happening around the Apache Lucene project with Nutch and Solr. It’s only a matter of time before some smart guy or girl gets it all glued together. In fact, Robert Douglass provides a great presentation on integrating Solr with Drupal to improve on Drupal’s basic search. A lot more work needs to be done in this area. Another area of opportunity might be if someone starts thinking of porting some of the relational database tables into some of the new cloud database technologies like SimpleSB. Buytaert discusses the opportunities of using CouchDB with Drupal. In my opinion that would enhance the retrieval and management of objects (e.g., documents). However, things like Amazon’s SimpleDB or possibly in the future Google’s Bigtable might be used as replacements for some of today’s slower relational tables.
All in all, the future looks very interesting for Drupal, Dries, and the Clouds…
For more on clouds, see my Cloud Vendors A to Z post.