MapReduce cookbook for machine learning
Here’s a paper from Stanford showing how to use MapReduce to scalably implement ten different machine learning algorithms!
View ArticleCloud: commodity or proprietary?
A few days ago Google announced its App Engine, which lets folks build applications that run in Google’s cloud. Amazon has for a while had a number of services to let folks run applications in Amazon’s...
View ArticleHadoop Sorts a Petabyte
Woot! Owen and Arun have posted new Hadoop sort benchmark results. This is a great milestone for both throughput (a petabyte in ~16 hours) and latency (a terabyte in ~1 minute).
View ArticleSome early Avro benchmarks
Avro is my current project. It’s a slightly different take on data serialization. Most data serialization systems, like Thrift and Protocol Buffers, rely on code generation, which can be awkward with...
View ArticleJoining Cloudera
I will be leaving Yahoo! at the end of this month to join Cloudera. About five years ago I was working with Mike Cafarella on Apache Nutch, an open-source web-search engine. Initially we were able to...
View Article