« What the FCC’s Auction Means | Home | Powered by Hadoop »
The Night the NYT Used Hadoop and EC2 to Convert 4TB’s
By John | March 24, 2008
Self-service, Prorated Super Computing Fun!
I hen began some rough calculations and determined that if I used only four machines, it could take some time to generate all 11 million article PDFs. But thanks to the swell people at Amazon, I got access to a few more machines and churned through all 11 million articles in just under 24 hours using 100 EC2 instances, and generated another 1.5TB of data to store in S3. (In fact, it work so well that we ran it twice, since after we were done we noticed an error in the PDFs.)
Topics: 7core, amazon, aws, cloud computing, ec2 | 4 Comments »


March 26th, 2008 at 6:51 am
[...] is a great platform for proto-typing and/or on demand resources. I think the NYT story of converting 4TB in one evening is a great advertisement for S3/EC2 as a one-off solution platform. If you are a [...]
March 30th, 2008 at 7:12 am
[...] The Night the NYT Used Hadoop and EC2 to Convert 4TB’s [...]
September 26th, 2008 at 6:37 am
[...] The Night the NYT Used Hadoop and EC2 to Convert 4TB’s [...]
September 26th, 2008 at 6:37 am
[...] The Night the NYT Used Hadoop and EC2 to Convert 4TB’s [...]