BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Association for Computing Machinery - ECPv6.1.3//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Association for Computing Machinery
X-ORIGINAL-URL:http://www.sfbayacm.org
X-WR-CALDESC:Events for Association for Computing Machinery
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20190310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20191103T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20190225T183000
DTEND;TZID=America/Los_Angeles:20190225T210000
DTSTAMP:20240418T233329
CREATED:20181202T184231Z
LAST-MODIFIED:20190224T195230Z
UID:909-1551119400-1551128400@www.sfbayacm.org
SUMMARY:Why Deep Learning Works: Implicit Self-Regularization in Deep Neural Networks
DESCRIPTION:6:30 arrive\, register\, pizza\, network7:00 ACM intro7:10 to 8:30-ish talk7:10 Live stream Click the link to set your PC up and get readyhttps://www.youtube.com/watch?v=2qF8TezRwS0 \nDescription/Abstract:Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs)\, including both production quality\, pre-trained models and smaller models trained from scratch. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of self-regularization\, implicitly sculpting a more regularized energy or penalty landscape. In particular\, the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models\, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT\, most notably its extension to Universality classes of Heavy-Tailed matrices\, and applying them to these empirical results\, we develop a theory to identify 5+1 Phases of Training\, corresponding to increasing amounts of implicit self-regularization. For smaller and/or older DNNs\, this implicit self-regularization is like traditional Tikhonov regularization\, in that there appears to be a “size scale” separating signal from noise. For state-of-the-art DNNs\, however\, we identify a novel form of heavy-tailed self-regularization\, similar to the self-organization seen in the statistical physics of disordered systems. This implicit self-regularization can depend strongly on the many knobs of the training process. In particular\, by exploiting the generalization gap phenomena\, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. This demonstrates that—all else being equal—DNN optimization with larger batch sizes leads to less-well implicitly-regularized models\, and it provides an explanation for the generalization gap phenomena. Joint work with Charles Martin of Calculation Consulting\, Inc. \nBio: https://www.stat.berkeley.edu/~mmahoney/Michael W. Mahoney is at the UCB in the Department of Statistics and at the International Computer Science Institute (ICSI). He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning\, including randomized matrix algorithms and randomized numerical linear algebra\, geometric network analysis tools for structure extraction in large informatics graphs\, scalable implicit regularization methods\, and applications in genetics\, astronomy\, medical imaging\, social network analysis\, and internet data analysis. He received him PhD from Yale University with a dissertation in computational statistical mechanics. He has worked and taught at Yale University in the Math department\, Yahoo Research\, and Stanford University in the Math department. Among other things\, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI)\, He was on the National Research Council’s Committee on the Analysis of Massive Data. He co-organized the Simons Institute’s fall 2013 program on the Theoretical Foundations of Big Data Analysis\, and he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets. He is currently the lead PI for the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. He holds several patents for work done at Yahoo Research and as Lead Data Scientist for Vieu Labs\, Inc.\, a startup re-imagining consumer video for billions of users. More information is available at https://www.stat.berkeley.edu/~mmahoney/ \nLong version of the paper (upon which the talk is based):https://arxiv.org/abs/1810.01075 \nPrevious presentations of the subject:https://www.youtube.com/watch?v=_Ni5UDrVwYU&feature=youtu.be (Martin at LBNL\, June 2018)
URL:http://www.sfbayacm.org/event/why-deep-learning-works-implicit-self-regularization-in-deep-neural-networks/
LOCATION:Walmart Labs\, 860 W California Ave. Fireside Café\, Sunnyvale\, CA\, US
CATEGORIES:featured
ATTACH;FMTTYPE=image/png:http://www.sfbayacm.org/humphrey/wp-content/uploads/2018/12/909_image_highres_475806850-2.png
END:VEVENT
END:VCALENDAR