Randomized algorithms for matrices and data

Date

Monday, March 26, 2012 - 6:30pm - 8:00pm

Venue

Intuit

Intuit
2600 Casey Ave MTV-09 Innovation and Invention
Mountain View, CA 94043
Speaker: 
Michael W. Mahoney

Event Details

Randomized algorithms for very large matrix problems (such as matrixmultiplication, least-squares regression, the Singular ValueDecomposition, etc.) have received a great deal of attention in recentyears.  Much of this work was motivated by problems in large-scaledata analysis; this approach provides a novel paradigm andcomplementary perspective to traditional numerical linear algebraapproaches to matrix computations; and the success of this line ofwork opens the possibility of performing matrix-based computationswith truly massive data sets.  Originating within theoretical computerscience, this work was subsequently extended and applied in importantways by researchers from numerical linear algebra, statistics, appliedmathematics, data analysis, and machine learning, as well as domainscientists.In this talk, we will provide an overview of this approach, with anemphasis on a few simple core ideas that underlie not only recenttheoretical advances but also the usefulness of these tools inlarge-scale data analysis applications.  Crucial in this context isthe connection with the concept of statistical leverage.Historically, this notion, and in particular the diagonal elements ofthe so-called hat matrix, has been used in regression diagnostics toidentify errors and outliers.  Recently, however, the connection withstatistical leverage has proved crucial in the development of improvedmatrix algorithms that come with worst-case guarantees, that areamenable to high-quality numerical implementation, and that are alsouseful to domain scientists.  These developments, how to approximatevery precisely the statistical leverage scores in time qualitativelyfaster than the usual naive method, and an example of how these ideascan be applied in large-scale distributed and parallel computationalenvironments will all be described.

Speaker Bio

Michael Mahoney is at Stanford University.  His research interestscenter around algorithms for very large-scale statistical dataanalysis, including both theoretical and applied aspects of problemsin scientific and Internet domains.  His current resear ch interestsinclude geometric network analysis; developing approximate computationand regularization methods for large informatics graphs; andapplications to community detection, clustering, and informationdynamics in large social and information networks.  He has also workedon randomized matrix algorithms and their applications to genetics,medical imaging, and Internet problems.  He has been a faculty memberat Yale University and a researcher at Yahoo, and his PhD was iscomputational statistical mechanics at Yale University.

Event page provided by ACM