ParallelFX : parallelism made easy Jrmie Laval (Garuma) - - PowerPoint PPT Presentation

parallelfx parallelism made easy
SMART_READER_LITE
LIVE PREVIEW

ParallelFX : parallelism made easy Jrmie Laval (Garuma) - - PowerPoint PPT Presentation

ParallelFX : parallelism made easy Jrmie Laval (Garuma) jeremie.laval@gmail.com http://garuma.wordpress.com What ? Ease the development of parallel (multi-threaded) applications that take advantage of multi-core processors.


slide-1
SLIDE 1

ParallelFX : parallelism made easy

Jérémie Laval (Garuma) jeremie.laval@gmail.com http://garuma.wordpress.com

slide-2
SLIDE 2

What ?

  • Ease the development of parallel (multi-threaded)

applications that take advantage of multi-core processors.

  • Written by Microsoft and running only on .NET.
  • 3 main components :
  • A T

ask API similar in usage to classic Thread

  • Parallel loops : for, foreach...
  • Plinq (Parallel Linq) : allows Linq queries to run in parallel

The goal : create an open-source cross-platform implementation of ParallelFX running on Mono.

slide-3
SLIDE 3

Why ?

  • T
  • day trend is to improve the number of core in CPUs, not

their individual speed.

  • Theoretically it should give a ×n performance boost but it's

not (n being the number of core).

  • Reason : application are designed to be single-threaded.
  • Second reason : usually doing multi-threading "by hand" is

hard and not efficient.

slide-4
SLIDE 4

How ? (current design)

Mono Application

Shared work pool

ParallelFX library (Scheduler) Thread Worker

Local work pool OS thread

Thread Worker

Local work pool OS thread

Steal Retrieve Manage

slide-5
SLIDE 5

Peeking into the "magic"

Shared work pool Local work pool

  • Stack with a back-off layer.
  • Uses CAS for atomic operations,

completely lock-free.

  • Back-off layer allows correct

performances at high load.

  • Deque-like (3 operations :

pushBottom, popBottom, popT

  • p).
  • Worker uses pushBottom &

popBottom (LIFO-style).

  • popT
  • p used by stealers (FIFO-style).
  • Minimize CAS and uses no lock.
slide-6
SLIDE 6

Example

  • A classic ray-tracer implementation written with no

multithreading in mind.

  • Processing time on my computer : ~31s.
slide-7
SLIDE 7

Example (next)

  • Using current implementation of ParallelFX. Color mask

represents the ThreadWorker which did the work.

  • Processing time on my computer : ~18s, almost 42%

speed-up.

slide-8
SLIDE 8

Thanks for your attention Questions ?