CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th - - PowerPoint PPT Presentation

cnc for tuning hints on ocr
SMART_READER_LITE
LIVE PREVIEW

CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th - - PowerPoint PPT Presentation

CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th Annual CnC Workshop September 8, 2015 Acknowledgements This work was done as part of my internship with the OCR team, part of Intel Federal, LLC at Jones Farm (Hillsboro, OR).


slide-1
SLIDE 1

CnC for Tuning Hints on OCR

Nick Vrvilo, Rice University The 7th Annual CnC Workshop September 8, 2015

slide-2
SLIDE 2

Acknowledgements

This work was done as part of my internship with the OCR team, part of Intel Federal, LLC at Jones Farm (Hillsboro, OR). Mentors (Intel): Josh Fryman and Romain Cledat Habanero Team (Rice): Vivek Sarkar, Kath Knobe, Zoran Budimlić, and Sanjay Chatterjee

2

slide-3
SLIDE 3

Objective

Demonstrate the effectiveness of OCR tuning hints by way of code generation from a higher- level programming model (CnC).

3

slide-4
SLIDE 4

OCR Tunings

Objective

CnC-OCR Scaffolding CnC App Code CnC Graph

hints handler

4

slide-5
SLIDE 5

Open Community Runtime (OCR)*

OCR project goals:

  • Provide effective abstraction for diverse

hardware

  • Typify future task-based execution models
  • Handle large-scale parallelism efficiently
  • Maintain a separation of concerns

(application/scheduling/resources)

  • Open source (encourage collaboration)

* OCR ==> X-Stack Traleika Glacier project’s implementation

5

slide-6
SLIDE 6

Outline

  • Introduction
  • OCR Hints API
  • CnC on OCR
  • Tuning Hints Implementation and Analysis

6

slide-7
SLIDE 7

CnC / OCR Concept Mapping

Concept OCR construct CnC construct Task classes (code) EDT template Step collection Task instance EDT Step instance Data classes All DBs have type void*

(keeping track of individual DBs’ types is the app programmer's responsibility)

Item collection Data instance Datablock Item instance Unique instance identifier GUID Tag (step tag / item key) Dependence registration Event add dependence Item get Dependence satisfaction Event satisfy Item put

7

slide-8
SLIDE 8

OCR Hints API: Example

// Assume we have a template and a datablock

  • crGuid_t edt;
  • crEdtCreate(&edt, template, 0, NULL, 1, NULL,

EDT_PROP_NONE, NULL_GUID, NULL); { // Set an OCR hint

  • crHint_t stepHints;
  • crHintInit(&stepHints, OCR_HINT_EDT_T);
  • crGetHint(edt, &stepHints);
  • crSetHintValue(&stepHints, OCR_HINT_EDT_PRIORITY, 100);
  • crSetHint(edt, &stepHints);

}

  • crAddDependence(datablock, edt, 0, DB_DEFAULT_MODE);

8

slide-9
SLIDE 9

OCR Hints API:

Pros

  • Generic
  • Conceptually decoupled
  • Light-weight

Cons

  • Verbose
  • Placed in app source code
  • Limited expressiveness

9 9

slide-10
SLIDE 10

Outline

  • Introduction
  • OCR Hints API
  • CnC on OCR
  • Tuning Hints Implementation and Analysis

10

slide-11
SLIDE 11

CnC-OCR Developer Workflow

Write graph spec Run translator tool (produces skeleton project) Flesh-out skeleton code Run program (functionality check) debug Write tuning spec(s) Re-run translator tool (updates scaffolding code) Re-run program (performance check) fine-tuning

11

slide-12
SLIDE 12

OCR Tunings

CnC-OCR + Tuning

CnC-OCR Scaffolding CnC App Code CnC Graph

hints handler

12

slide-13
SLIDE 13

Separation of Concerns in CnC

  • Graph specification can be written without

implementation details

  • Step function implementations written without

knowledge of the external graph (only its own inputs and outputs)

  • Tuning specification given in a separate file
  • Easy to mix-in different tunings for performance

testing

  • Try combinations of tunings until you find the

ideal configuration

13

slide-14
SLIDE 14

Outline

  • Introduction
  • OCR Hints API
  • CnC on OCR
  • Tuning Hints Implementation and Analysis

14

slide-15
SLIDE 15

Tuning Hints Overview

  • 1. Step / item distribution
  • 2. Step affinity with input
  • 3. Step priority
  • 4. Scheduler throttling
  • 5. Partial item requests

15

slide-16
SLIDE 16

Hint #1: Step / Item Distribution Functions

  • What?

Declare a function for mapping individual step / item instances from a collection onto the set of OCR policy domains.

  • Why?

– Distributed OCR currently lacks advanced schedule/placement heuristics. – Need control of distribution for a reasonable baseline.

16

slide-17
SLIDE 17

Smith-Waterman Sequence Alignment

  • Each input sequence

length ~200k

  • Dynamic programming
  • ptimization on ~40-billion

cell matrix

  • Tiles of 177x153 cells
  • Total of 1138x1322 tiles

17

slide-18
SLIDE 18

Smith-Waterman Specification

Graph Specification

[ int above[] : i, j ]; [ int left[] : i, j ]; [ SeqData *data : () ]; ( swStep: i, j ) <- [ data: () ], [ above: i, j ] $when(i > 0), [ left: i, j ] $when(j > 0)

  • > [ below @ above: i+1, j ],

[ right @ left: i, j+1 ], ( swStep: i+i, j ) $when(i+1 < #nth);

Tuning Specification

[ above ]: { distfn: (i / 16) % $RANKS }; [ left ]: { distfn: (i / 16) % $RANKS }; ( swStep ): { distfn: (i / 16) % $RANKS };

18 18

slide-19
SLIDE 19

Smith-Waterman Sequence Alignment

  • Each input sequence

length ~200k

  • Dynamic programming
  • ptimization on ~40-billion

cell matrix

  • Tiles of 177x153 cells
  • Total of 1138x1322 tiles
  • Default: CnC default

distribution

  • Row-block: Rows in blocks
  • f 16
  • 10 runs per configuration

19

10 20 30 40 50 1 2 4 8 Average Execution Time (seconds) Node Count CnC-OCR Default CnC-OCR Row-Block iCnC Row-Block 115.40 141.49

slide-20
SLIDE 20

Hint #2: Step Affinity with Input Item

  • What?

Declare that a step instance be affinitized with

  • ne of its input items.
  • Why?

– OCR can use this affinity to improve scheduling heuristics. – More expressive way to specify tunings like hint #1.

20

slide-21
SLIDE 21

Smith-Waterman Specification

Graph Specification

[ int above[] : i, j ]; [ int left[] : i, j ]; [ SeqData *data : () ]; ( swStep: i, j ) <- [ data: () ], [ above: i, j ] $when(i > 0), [ left: i, j ] $when(j > 0)

  • > [ below @ above: i+1, j ],

[ right @ left: i, j+1 ], ( swStep: i+i, j ) $when(i+1 < #nth);

Tuning Specification

[ above ]: { distfn: (i / 16) % $RANKS }; [ left ]: { distfn: (i / 16) % $RANKS }; ( swStep ): { placeWith: above };

21 21

slide-22
SLIDE 22

Hint #3: Step Priority Weights

  • What?

Express a priority weight for a given CnC step, such that steps with heavier weights should execute earlier.

  • Why?

– Search problems: prioritize paths likely to find the answer sooner – Enable concurrency: prefer task with high-demand

  • utput (many consumers)

22

slide-23
SLIDE 23

N-Queens Puzzle

  • Board size: 13x13
  • Solutions possible: 73,312

♛ ♛ ♛ ♛ ♛ ♛ ♛ ♛

23

slide-24
SLIDE 24

N-Queens Specification

  • Graph:

[ u64 solutions[4]: i ]; ( placeQueen: row, board )

  • > ( placeQueen: row+1, board_prime ),

[ solutions: ? ];

  • Tuning:

( placeQueen /* row, board */ ): { priority: row };

24

slide-25
SLIDE 25

Implementation of Step Priority Weights

Description Default Scheduler Priority Scheduler Location Base data structure deque bin-heap utils/ Scheduler interface wrapper deque bin-heap scheduler-

  • bject/

Scheduler (aggregate) root object wst pr-wsh scheduler-

  • bject/

Scheduler heuristic behavior hc priority scheduler- heuristic/

25

slide-26
SLIDE 26

N-Queens Puzzle

  • Board size: 13x13
  • Solutions possible: 73,312
  • Solutions sought: 5,000
  • DEQ: Default work-stealing deque
  • DFS: Prioritize deep rows
  • BFS: Prioritize shallow rows
  • 50 runs per configuration

1 2 3 4 DEQ DFS BFS Average execution time (seconds)

26

slide-27
SLIDE 27

Hint #4: Stoker Step (Scheduler Throttling)

  • What?

Annotate the work-creating steps (which we call stokers) so that the runtime can differentiate them from non-work-creating steps (which we call quenchers).

  • Why?

– If the scheduler has plenty of work to do, we can throttle by not running any more stoker steps for the time being. – For work stealing, we can prioritized stoker-steps for stealing, mitigates the need for more stealing in the near- term.

27

slide-28
SLIDE 28

Task-Bomb (Synthetic Example)

  • Root step creates Z=32

stoker steps

  • Each stoker creates
  • Y=100 quencher tasks
  • One stoker task
  • Recursion creates X=200

levels

  • Since the stoker is always

created last, we would expect all of the stokers to run in a depth-first manner when using the standard work-stealing deque scheduler

$initialize stoker(0,0) quencher(0,0,0) … quencher(0,0,Y) stoker(0,1) quencher(0,1,0) … quencher(0,1,Y) stoker(0,2) … … stoker(Z,0) quencher(Z,0,0) … quencher(Z,0,Y) stoker(Z,1) …

28

slide-29
SLIDE 29

Task-Bomb CnC Graph Spec

[ void *done: () ]; ( stoker: i, j )

  • > ( quencher: i, j, $rangeTo(Y) ),

( stoker: i, j+1 ) $when(j<X); ( quencher: i, j, k )

  • > [ done: () ] $when(i==0 && j==X && k==Y);

( $initialize: () ) -> ( stoker: $range(Z), 0 ); ( $finalize: () ) <- [ done: () ];

29

slide-30
SLIDE 30

Task-Bomb CnC Tunings

Alternative 1: Stoker / Quencher

( stoker ): { stoker: true };

Alternative 2: Priorities

( stoker ): { priority: -1 };

30 30

slide-31
SLIDE 31

Task-Bomb (Synthetic Example)

  • Root step creates Z=32

stoker steps

  • Each stoker creates
  • Y=100 quencher tasks
  • One stoker task
  • Recursion creates X=200

levels

  • Default scheduler dies

(deque overflow)

  • Stoker hint allows for

throttling

  • Similar performance via

priorities

0.5 1 1.5 2 2.5 3 3.5 4 Default Priority Stoker Average Execution Time (seconds)

31

slide-32
SLIDE 32

Hint #5: Partial Item Inputs

  • What?

Allow the programmer to specify that a step only accesses a sub-range of the bytes of an input item.

  • Why?

– For distributed memory, can transfer just the part that will be accessed when an item is an input to a remote step.

  • Work In Progress

32

slide-33
SLIDE 33

Summary

  • OCR hints demonstrated via CnC tuning and code

generation:

– Step / item distribution – Scheduler throttling – Step affinity with input – Partial item requests – Step priority

  • CnC provides benefits of high-level paradigm to

OCR:

– Expressiveness – Separation of concerns

  • OCR design strategy makes it possible to add new

hint handlers with a reasonable amount of effort

33