DMSIG – Algorithmic and Statistical Perspectives on Large-Scale Data Analysis, 2/22/2010
Location: LinkedIn, 2027 Stierlin Ct., Mountain View, CA 94043. Notice: NEW MEETING LOCATION for 2010
Date: Monday Feb 22, 2010; 6:30 pm Notice: NEW MEETING day of Month for 2010 – the fourth Monday of each month!
Cost: Free and open to all who wish to attend, but membership is only $20/year. Anyone may join our mailing list at no charge, and receive announcements of upcoming events.
Speaker: Michael W. Mahoney, Stanford University
TITLE: ”Algorithmic and Statistical Perspectives on Large-Scale Data Analysis“
DESCRIPTION:
Computer scientists and statisticians have historically adopted quite different views on data and thus on data analysis. In recent years, however, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are also useful in practice for solving large-scale scientific and Internet data analysis problems. After reviewing these two complementary perspectives on data, I will describe two recent examples of improved algorithms that used ideas from both areas in novel ways. The first example has to do with improved methods for structure identification from large-scale DNA SNP data, a problem which can be viewed as trying to find good columns or features from a large data matrix. The second example has to do with selecting good clusters or communities from a data graph, or demonstrating that there are none, a problem that has wide application in the analysis of social and information networks. Understanding how statistical ideas are useful for obtaining improved algorithms in these two applications may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale scientific and Internet data analysis problems more generally.
SPEAKER BIOGRAPHY
Dr. Mahoney is currently at Stanford University. His research interests focus on theoretical and applied aspects of algorithms for large-scale data problems in scientific and Internet applications. Currently, he is working on geometric network analysis; developing approximate computation and regularization methods for large informatics graphs; and applications to community detection, clustering, and information dynamics in large social and information networks. In the past, he has worked on randomized matrix algorithms and applications in genetics and medical imaging. He has been a faculty member at Yale University and a researcher at Yahoo Research, and his PhD was is computational statistical mechanics at Yale University. See also http://cs.stanford.edu/people/mmahoney/
Also he is involved in running the MMDS 2010 meeting on June 15-18, 2010. See details up at the web page http://mmds.stanford.edu/ soon, or details of prior year’s Workshop on Algorithms for Modern Massive Data Sets.

Michael Mahoney
