a convenient framework for efficient parallel multipass
play

A Convenient Framework for Efficient Parallel Multipass Algorithms - PowerPoint PPT Presentation

A Convenient Framework for Efficient Parallel Multipass Algorithms Markus Weimer Joint Work with Sriram Rao and Martin Zinkevich Intro / Point of view taken ML is data compression: from large training data to a small model We typically


  1. A Convenient Framework for Efficient Parallel Multipass Algorithms Markus Weimer Joint Work with Sriram Rao and Martin Zinkevich

  2. Intro / Point of view taken  ML is data compression: from large training data to a small model  We typically iterate over the training data  The state shared between iterations is relatively small O(model)  Many algorithms can be expressed as data-parallel loops with synchronization 2 12/11/10

  3. In MapReduce Pass Result Overhead per Iteration: Data (each pass) • Job setup • Data Loading • Disk I/O 3 12/11/10

  4. Worker/Aggregator Advantages: Final Result Initial Data • Schedule once per Job • Data stays in memory • P2P communication 4 12/11/10

  5. Worker 1. Load data 2. Iterate: 1. Iterates over data 2. Communicates state 3. Waits for input state of next pass 5 12/11/10

  6. Worker 1. Load data 2. Iterate: 1. Iterates over data  user supplied function 2. Communicates state 3. Waits for input state of next pass 6 12/11/10

  7. Aggregator  Receive state from the workers  Aggregate state  Send state to all workers Yahoo! Presentation, Confidential 7 12/11/10

  8. Aggregator  Receive state from the workers  Aggregate state  user supplied  Send state to all workers Yahoo! Presentation, Confidential 8 12/11/10

  9. Failure Handling in the Framework  Worker › Meh (SGD) › Restart on different machine (else)  Aggregator › Restart on different machine › Re-request data from workers 9 12/11/10

  10. Experiments: Parallel Stochastic Gradient Descent Work() Stochastic Gradient Descent pass Aggregate() Average Models 10 12/11/10

  11. Does it work? – Objective over #Passes 0.8 Parallel eta=0.8 0.7 Sequential, eta=0.1 0.6 Sequential, eta=0.8 0.5 Parallel, eta=6.4 0.4 0.3 0.2 0.1 0 1 11 21 31 41 51 61 71 81 91 11 12/11/10

  12. Is it fast? – Time per pass (8 machines) 1.00 0.45 0.06 0.03 0.03 Sequential MapReduce W/A 10 Passes W/A 100 Passes W/A Limit 12 12/11/10

  13. markus weimer Yahoo! Labs weimer@yahoo-inc.com 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend