Today's large-scale web services operate in warehouse-sized datacenters and run on clusters of machines that are shared across many kinds of interactive and batch jobs.
he first part of this talk, I'll describe a collection of techniques and practices for lowering response times (especially in the tail of the latency distribution) in large distributed systems whose components run on shared clusters of machines, where pieces of these systems are subject to interference by other tasks, and where unpredictable latency hiccups are the norm, not the exception.
In the second part of the talk, I'll highlight some recent work on using large-scale distributed systems for training deep neural networks. I'll discuss how we can utilize both model-level parallelism and data-level parallelism in order to train large models on large datasets more quickly. I'll also highlight how we have applied this work to a variety of problems in domains such as speech recognition, object recognition, and language modeling.
Jeff joined Google in 1999 and is currently a Google Fellow in Google's Knowledge Group. He has co-designed/implemented five generations of Google's crawling, indexing, and query serving systems, and co-designed/implemented major pieces of Google's initial advertising and AdSense for Content systems. He is also a co-designer and co-implementor of Google's distributed computing infrastructure, including the MapReduce, BigTable and Spanner systems, protocol buffers, LevelDB, systems infrastructure for statistical machine translation, and a variety of internal and external libraries and developer tools. He is currently working on large-scale distributed systems for machine learning. He is a Fellow of the ACM, a member of the U.S. National Academy of Engineering, and a recipient of the Mark Weiser Award and the ACM-Infosys Foundation Award in the Computing Sciences.
Event page provided by ACM