Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with - - PowerPoint PPT Presentation

parallelizing semi
SMART_READER_LITE
LIVE PREVIEW

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with - - PowerPoint PPT Presentation

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with MapReduce Nick Gauthier Advisor: Dr. Rakesh Verma COMPUTER SCIENCE Motivation Semi-supervised learning(SSL) algorithms are slow when working with large-scale data.


slide-1
SLIDE 1

Parallelizing Semi- Supervised Learning Algorithms with MapReduce

Nick Gauthier Advisor: Dr. Rakesh Verma

COMPUTER SCIENCE ReDAS Lab

slide-2
SLIDE 2

Motivation

  • Semi-supervised learning(SSL) algorithms

are slow when working with large-scale data.

  • Some SSL methods have already been

parallelized, but not all.

slide-3
SLIDE 3

Goal

  • Parallelize SSL algorithms utilizing

MapReduce.

  • Learning the research paradigm.
slide-4
SLIDE 4

Objectives

  • Introduce an efficient and significantly

faster way to work with large-scale data for some SSL methods by parallelizing them within the MapReduce framework.

slide-5
SLIDE 5

Expected Impact

  • Parallelize semi-supervised algorithms

that have not yet been parallelized.

  • Improving the runtime and efficiency of a

semi-supervised algorithm by removing bottlenecks.

slide-6
SLIDE 6

Deliverables

  • Report
  • Poster presentation
  • Documentation of the process
  • Potentially software
slide-7
SLIDE 7

Methods: Objective 1

  • Study the algorithms
  • Write pseudocode for the typical semi-

supervised method

  • Convert the typical semi-supervised

pseudocode to MapReduce pseudocode

slide-8
SLIDE 8

Methods: Objective 1

  • Implement the MapReduce pseudocode with

Python 3.

  • Mapper
  • Combiner
  • Reducer
  • Driver program.
slide-9
SLIDE 9

Methods: Objective 1

  • Test, debug, test again, etc. until the bugs are

worked out.

  • Repeat testing until conclusion is reached.
slide-10
SLIDE 10

Methods: Objective 1

  • Example: Semi-Supervised Expectation

Maximization (SS-EM)

  • Define parameters
  • E-Step
  • Assign expected labels
  • M-Step
  • Calculate probability of newly assigned

labels

  • Repeat E & M step until convergence is

reached.

slide-11
SLIDE 11

Methods: Objective 1

https://www.guru99.com/introduction-to-mapreduce.html

slide-12
SLIDE 12

Results: Objective 1

  • I am currently in the stage of converting

the semi-supervised pseudocode to MapReduce pseudocode.

slide-13
SLIDE 13

Remaining Work

  • I still have yet to write and implement the

MapReduce code with Python 3.

  • Testing the code once implemented.
  • Create report, poster, and documentation.
slide-14
SLIDE 14

Acknowledgements

The REU project is sponsored by NSF under award NSF-1659755. Special thanks to the following UH

  • ffices for providing financial support to the

project: Department of Computer Science; College

  • f Natural Sciences and Mathematics; Dean of

Graduate and Professional Studies; VP for Research; and the Provost's Office. The views and conclusions contained in this presentation are those of the author and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.