parallelizing semi
play

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with - PowerPoint PPT Presentation

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with MapReduce Nick Gauthier Advisor: Dr. Rakesh Verma COMPUTER SCIENCE Motivation Semi-supervised learning(SSL) algorithms are slow when working with large-scale data.


  1. Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with MapReduce Nick Gauthier Advisor: Dr. Rakesh Verma COMPUTER SCIENCE

  2. Motivation • Semi-supervised learning(SSL) algorithms are slow when working with large-scale data. • Some SSL methods have already been parallelized, but not all.

  3. Goal • Parallelize SSL algorithms utilizing MapReduce. • Learning the research paradigm.

  4. Objectives • Introduce an efficient and significantly faster way to work with large-scale data for some SSL methods by parallelizing them within the MapReduce framework.

  5. Expected Impact • Parallelize semi-supervised algorithms that have not yet been parallelized. • Improving the runtime and efficiency of a semi-supervised algorithm by removing bottlenecks.

  6. Deliverables • Report • Poster presentation • Documentation of the process • Potentially software

  7. Methods: Objective 1 • Study the algorithms • Write pseudocode for the typical semi- supervised method • Convert the typical semi-supervised pseudocode to MapReduce pseudocode

  8. Methods: Objective 1 • Implement the MapReduce pseudocode with Python 3. • Mapper • Combiner • Reducer • Driver program.

  9. Methods: Objective 1 • Test, debug, test again, etc. until the bugs are worked out. • Repeat testing until conclusion is reached.

  10. Methods: Objective 1 • Example: Semi-Supervised Expectation Maximization (SS-EM) • Define parameters • E-Step • Assign expected labels • M-Step • Calculate probability of newly assigned labels • Repeat E & M step until convergence is reached.

  11. Methods: Objective 1 https://www.guru99.com/introduction-to-mapreduce.html

  12. Results: Objective 1 • I am currently in the stage of converting the semi-supervised pseudocode to MapReduce pseudocode.

  13. Remaining Work • I still have yet to write and implement the MapReduce code with Python 3. • Testing the code once implemented. • Create report, poster, and documentation.

  14. Acknowledgements The REU project is sponsored by NSF under award NSF-1659755. Special thanks to the following UH offices for providing financial support to the project: Department of Computer Science; College of Natural Sciences and Mathematics; Dean of Graduate and Professional Studies; VP for Research; and the Provost's Office. The views and conclusions contained in this presentation are those of the author and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend