Effectively Scaling Deep Learning Frameworks
(To 40 GPUs and Beyond)
Welcome everyone! I’m excited to be here today and get the opportunity to present some of the work that we’ve been doing at SVAIL, the Baidu Silicon Valley AI Lab. This talk describes a change in the way we’ve been training most of our models over the past year or so, and some work we had to get there — namely, how we have managed to train our models on dozens of GPUs using common deep learning frameworks (TensorFlow).