IT Management and Cloud Blog

« | Home | »

The Night the NYT Used Hadoop and EC2 to Convert 4TB’s

By John | March 24, 2008

Self-service, Prorated Super Computing Fun!

I hen began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3. (In fact, it work so well that we ran it twice, since after we were done we noticed an error in the PDFs.)

Topics: 7core, amazon, aws, cloud computing, ec2 | 4 Comments »

4 Responses to “The Night the NYT Used Hadoop and EC2 to Convert 4TB’s”

  1. Matching the Customer With the Right Cloud (Part 1) | John M Willis ESM Blog Says:
    March 26th, 2008 at 6:51 am

    [...] is a great platform for proto-typing and/or on demand resources. I think the NYT story of converting 4TB in one evening is a great advertisement for S3/EC2 as a one-off solution platform. If you are a [...]

  2. Cloud Review | John M Willis ESM Blog Says:
    March 30th, 2008 at 7:12 am

    [...] The Night the NYT Used Hadoop and EC2 to Convert 4TB’s [...]

  3. Cloud Favorites | IT Management and Cloud Blog Says:
    September 26th, 2008 at 6:37 am

    [...] The Night the NYT Used Hadoop and EC2 to Convert 4TB’s [...]

  4. Cloud Favorites | IT Management and Cloud Blog Says:
    September 26th, 2008 at 6:37 am

    [...] The Night the NYT Used Hadoop and EC2 to Convert 4TB’s [...]

Comments