Facebook uses Hadoop to gain information about their members. It helded the second biggest Hadoop database behind Yahoo! [4] By May 2010 Facebook got a bigger Hadoop Cluster than Yahoo! [2] Google is not know to use Hadoop. Because it is an opponent product to GFS and Googles BigTable.

Apache Hadoop has been voted being top innovator in 2011. [9]

Facebook’s Hadoop Cluster

This is absolutely mind-blowing.

In 2008 Facebook got

  • 2500 cores per cluster
  • 1 PB compressed data per cluster (over 2 PB of uncompressed data)
  • hundreds of jobs running

[3]

In 2010 it got

  • 2000 machines with 32 GB RAM each
    • 800 with 16 cores
    • 1200 with 8 cores
  • mostly 12 TB space per machine
    • some even with 24 TB space
  • 21 PB data in a single HDFS cluster

[2] Amazing!

But Yahoo!’s cluster even got 14 PB data. Even a huge amount of data. And this is very hard to manage. [2]

Facebook’s Hadoop Cluster is it’s Datawarehouse

Every single day Facebook

  • adds 12 TB of compressed new data
  • 800 TB compressed data scanned per day
  • manages 65 M files in it’s HDFS and
  • manages 30,000 clients in it’s HDFS NameNodes

It’s absolutely amazing – is it? [2]

Consider a thick book of about 500 pages is nearly one MB (1,000 KB). So the data added every day to Facebook is 12 M books. One book being 8 centimeters / 3 inches thick [7]: the data would be a stack from 960 km / 3,15 M feet. That’s a longer stack of books than from Berlin to Paris (“just” 91% of it). Every day.

The stack of books reached in 6 weeks once around the globe. And within a year about 8.75 times. So within a year we could stack it from the earth to the moon. Every single year. And still advancing. Amazing.

Note: geniune distances taken by Wolfram Alpha. [8]

[1] http://www.heise.de/open/meldung/Apache-Hadoop-ist-Innovator-Of-The-Year-1215111.html
[2] http://hadoopblog.blogspot.com/2010/05/facebook-has-worlds-largest-hadoop.html
[3] http://www.insidefacebook.com/2008/06/06/facebook-using-hadoop-for-large-scale-internal-analytics/
[4] http://highscalability.com/facebook-hadoop-and-hive
[5] http://www.guardian.co.uk/megas/winners-2011
[6] http://labs.google.com/papers/gfs.html
[7] http://www.wisegeek.com/how-much-text-is-in-a-kilobyte-or-megabyte.htm
[8] http://www.wolframalpha.com
[9] http://www.danielschulz.it/?p=388