DMSIG – Analytics at Petabyte Scale: Cloudera and Facebook on Hadoop and Hive, 1/25/2010
THIS EVENT HAS ALREADY OCCURRED.
THE PRESENTATIONS ARE AVAILABLE:
- Hadoop: Distributed Data Processing (Amr Awadallah)
- Facebook’s Petabyte Scale Data Warehouse (Ashish Thusoo)
Location: LinkedIn, 2027 Stierlin Ct., Mountain View, CA 94043. Notice: NEW MEETING LOCATION for 2010. This is NOT the main LinkedIn building (2029), it is their new building to the right (2027). For PARKING, you may have to go to the back of the new building (2027).
Date: Monday January 25, 2010; 6:30 pm Notice: NEW MEETING day of Month for 2010 – the fourth Monday of each month!
Cost: Free and open to all who wish to attend, but membership is only $20/year. Anyone may join our mailing list at no charge, and receive announcements of upcoming events.
Speakers: Amr Awadallah, Cloudera, and Ashish Thusoo, Facebook
TITLE 1: ”Hadoop: Distributed Data Processing”
Hadoop is an open-source distributed platform designed to economically store and process data using clustered commodity hardware. Hadoop is Apache’s implementation of the MapReduce/GFS frameworks popularized by Google. In this talk we will demystify this powerful platform, and describe how it enables you to consolidate many different data storage and processing needs in an economically scalable cloud resource.

SPEAKER BIOGRAPHY
Dr. Amr Awadallah is Chief Technical Officer and Founder for Cloudera, Inc. Before Cloudera, he was vice president of product intelligence engineering at Yahoo! Inc., where he worked since June 2000 after Yahoo acquired his first startup (VivaSmart). Dr. Awadallah received his PhD from Stanford University in 2007 and his BS/MS degrees from Cairo University in 1992 and 1995, respectively.
TITLE 2: ”Facebook’s Petabyte Scale Data Warehouse Using Hive and Hadoop”
Hive is an open source, peta-byte scale date warehousing framework built on top of Hadoop that enables scalable analytics on large data sets using SQL and some language extensions. Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook – both engineering and non-engineering. This talk will highlight how Hive and Hadoop allow us at Facebook to offer a cheap, scalable and flexible infrastructure to do different kinds of analysis. We will talk about the architecture, applications and capabilities of this infrastructure which handles close to 8000 jobs a day and stores nearly 2.5PB of compressed data.
SPEAKER BIOGRAPHY
Ashish Thusoo has been with Facebook for the last couple of years and is managing the Facebook data infrastructure team in his most recent role. He started the Hive project at Facebook along with Joydeep and serves at the project lead for Hive at Apache. He is also part of the Hadoop PMC at Apache and has presented Hive at a number of conferences, forums and panels. Ashish has deep expertise in data processing and parallel processing technologies, infrastructure and applications built on those infrastructures. In the past he has worked at Oracle in areas of Parallel Query Execution as well as XML Databases. At Oracle he built many core data warehousing and query processing features and was recognized as one of the leaders in the Parallel Execution team. These features are regularly used in most Oracle based data warehouses.
Our schedule for meetings in 2010 is the 4th Monday of the Month. Specifically:
Mon 1/25
Mon 2/22
Mon 3/22
Mon 4/26
Mon 5/24
Mon 6/28
Mon 7/26
Mon 8/23
Mon 9/27
Mon 10/25
Mon 11/29
Thanks, Greg