CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th - PowerPoint PPT Presentation

CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th Annual CnC Workshop September 8, 2015

Acknowledgements This work was done as part of my internship with the OCR team, part of Intel Federal, LLC at Jones Farm (Hillsboro, OR). Mentors (Intel): Josh Fryman and Romain Cledat Habanero Team (Rice): Vivek Sarkar, Kath Knobe, Zoran Budimlić , and Sanjay Chatterjee 2

Objective Demonstrate the effectiveness of OCR tuning hints by way of code generation from a higher- level programming model ( CnC ). 3

Objective CnC Tunings CnC App Code Graph CnC-OCR Scaffolding hints OCR handler 4

Open Community Runtime (OCR)* OCR project goals: • Provide effective abstraction for diverse hardware • Typify future task-based execution models • Handle large-scale parallelism efficiently • Maintain a separation of concerns (application/scheduling/resources) • Open source (encourage collaboration) * OCR ==> X- Stack Traleika Glacier project’s implementation 5

Outline • Introduction • OCR Hints API • CnC on OCR • Tuning Hints Implementation and Analysis 6

CnC / OCR Concept Mapping Concept OCR construct CnC construct Task classes (code) EDT template Step collection Task instance EDT Step instance All DBs have type void* (keeping track of individual DBs’ Data classes Item collection types is the app programmer's responsibility) Data instance Datablock Item instance Unique instance identifier GUID Tag (step tag / item key) Dependence registration Event add dependence Item get Dependence satisfaction Event satisfy Item put 7

OCR Hints API: Example // Assume we have a template and a datablock ocrGuid_t edt ; ocrEdtCreate(& edt , template , 0, NULL, 1, NULL, EDT_PROP_NONE, NULL_GUID, NULL); { // Set an OCR hint ocrHint_t stepHints ; ocrHintInit(& stepHints , OCR_HINT_EDT_T); ocrGetHint( edt , & stepHints ); ocrSetHintValue(& stepHints , OCR_HINT_EDT_PRIORITY, 100); ocrSetHint( edt , & stepHints ); } ocrAddDependence( datablock , edt , 0, DB_DEFAULT_MODE); 8

OCR Hints API: Pros Cons • Generic • Verbose • Conceptually decoupled • Placed in app source code • Light-weight • Limited expressiveness 9 9

CnC-OCR Developer Workflow debug Write Run translator tool Run program Flesh-out graph spec (functionality check) (produces skeleton project) skeleton code Write Re-run translator tool Re-run program tuning spec(s) (updates scaffolding code) (performance check) fine-tuning 11

CnC-OCR + Tuning CnC Tunings CnC App Code Graph CnC-OCR Scaffolding hints OCR handler 12

Separation of Concerns in CnC • Graph specification can be written without implementation details • Step function implementations written without knowledge of the external graph (only its own inputs and outputs) • Tuning specification given in a separate file • Easy to mix-in different tunings for performance testing • Try combinations of tunings until you find the ideal configuration 13

Tuning Hints Overview 1. Step / item distribution 2. Step affinity with input 3. Step priority 4. Scheduler throttling 5. Partial item requests 15

Hint #1: Step / Item Distribution Functions • What? Declare a function for mapping individual step / item instances from a collection onto the set of OCR policy domains. • Why? – Distributed OCR currently lacks advanced schedule/placement heuristics. – Need control of distribution for a reasonable baseline. 16

Smith-Waterman Sequence Alignment • Each input sequence length ~200k • Dynamic programming optimization on ~40-billion cell matrix • Tiles of 177x153 cells • Total of 1138x1322 tiles 17

Smith-Waterman Specification Graph Specification Tuning Specification [ int above[] : i, j ]; [ above ]: { [ int left[] : i, j ]; distfn: (i / 16) % $RANKS [ SeqData *data : () ]; }; ( swStep: i, j ) [ left ]: { <- [ data: () ], distfn: (i / 16) % $RANKS [ above: i, j ] $when(i > 0), }; [ left: i, j ] $when(j > 0) -> [ below @ above: i+1, j ], ( swStep ): { [ right @ left: i, j+1 ], distfn: (i / 16) % $RANKS ( swStep: i+i, j ) $when(i+1 < #nth); }; 18 18

115.40 141.49 50 Average Execution Time (seconds) Smith-Waterman Sequence Alignment 40 • Each input sequence length ~200k 30 • Dynamic programming optimization on ~40-billion cell matrix 20 • Tiles of 177x153 cells • Total of 1138x1322 tiles 10 • Default: CnC default distribution 0 • Row-block: Rows in blocks 1 2 4 8 of 16 Node Count CnC-OCR Default CnC-OCR Row-Block • 10 runs per configuration iCnC Row-Block 19

Hint #2: Step Affinity with Input Item • What? Declare that a step instance be affinitized with one of its input items. • Why? – OCR can use this affinity to improve scheduling heuristics. – More expressive way to specify tunings like hint #1. 20

Smith-Waterman Specification Graph Specification Tuning Specification [ int above[] : i, j ]; [ above ]: { [ int left[] : i, j ]; distfn: (i / 16) % $RANKS [ SeqData *data : () ]; }; ( swStep: i, j ) [ left ]: { <- [ data: () ], distfn: (i / 16) % $RANKS [ above: i, j ] $when(i > 0), }; [ left: i, j ] $when(j > 0) -> [ below @ above: i+1, j ], ( swStep ): { [ right @ left: i, j+1 ], placeWith: above ( swStep: i+i, j ) $when(i+1 < #nth); }; 21 21

Hint #3: Step Priority Weights • What? Express a priority weight for a given CnC step, such that steps with heavier weights should execute earlier. • Why? – Search problems: prioritize paths likely to find the answer sooner – Enable concurrency: prefer task with high-demand output (many consumers) 22

♛ N-Queens Puzzle • Board size: 13x13 ♛ • Solutions possible: 73,312 ♛ ♛ ♛ ♛ ♛ ♛ 23

N-Queens Specification • Graph: [ u64 solutions[4]: i ]; ( placeQueen: row, board ) -> ( placeQueen: row+1, board_prime ), [ solutions: ? ]; • Tuning: ( placeQueen /* row, board */ ): { priority: row }; 24

Implementation of Step Priority Weights Description Default Priority Location Scheduler Scheduler Base data structure deque bin-heap utils/ Scheduler interface deque bin-heap scheduler- wrapper object/ Scheduler (aggregate) wst pr-wsh scheduler- root object object/ Scheduler heuristic hc priority scheduler- behavior heuristic/ 25

N-Queens Puzzle 4 • Board size: 13x13 Average execution time (seconds) • Solutions possible: 73,312 • Solutions sought: 5,000 3 • DEQ: Default work-stealing deque • DFS: Prioritize deep rows 2 • BFS: Prioritize shallow rows • 50 runs per configuration 1 0 DEQ DFS BFS 26

Hint #4: Stoker Step (Scheduler Throttling) • What? Annotate the work-creating steps (which we call stokers ) so that the runtime can differentiate them from non-work-creating steps (which we call quenchers ). • Why? – If the scheduler has plenty of work to do, we can throttle by not running any more stoker steps for the time being. – For work stealing, we can prioritized stoker-steps for stealing, mitigates the need for more stealing in the near- term. 27

Task-Bomb (Synthetic Example) • Root step creates Z=32 stoker steps quencher(0,0,0) • Each stoker creates … • Y=100 quencher tasks stoker(0,0) quencher(0,1,0) • quencher(0,0,Y) One stoker task … … • Recursion creates X=200 stoker(0,1) $initialize quencher(0,1,Y) levels quencher(Z,0,0) stoker(0,2) … … • Since the stoker is always stoker(Z,0) created last, we would quencher(Z,0,Y) expect all of the stokers to stoker(Z,1) … run in a depth-first manner when using the standard work-stealing deque scheduler 28

Task-Bomb CnC Graph Spec [ void *done: () ]; ( stoker: i, j ) -> ( quencher: i, j, $rangeTo(Y) ), ( stoker: i, j+1 ) $when(j<X); ( quencher: i, j, k ) -> [ done: () ] $when(i==0 && j==X && k==Y); ( $initialize: () ) -> ( stoker: $range(Z), 0 ); ( $finalize: () ) <- [ done: () ]; 29

Task-Bomb CnC Tunings Alternative 1: Alternative 2: Stoker / Quencher Priorities ( stoker ): { ( stoker ): { stoker: true priority: -1 }; }; 30 30

Task-Bomb 4 (Synthetic Example) Average Execution Time (seconds) • Root step creates Z=32 3.5 stoker steps • Each stoker creates 3 • Y=100 quencher tasks 2.5 • One stoker task • Recursion creates X=200 2 levels 1.5 • Default scheduler dies 1 (deque overflow) ☠ • Stoker hint allows for 0.5 throttling • 0 Similar performance via priorities Default Priority Stoker 31

Hint #5: Partial Item Inputs • What? Allow the programmer to specify that a step only accesses a sub-range of the bytes of an input item. • Why? – For distributed memory, can transfer just the part that will be accessed when an item is an input to a remote step. • Work In Progress 32

CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th - PowerPoint PPT Presentation

CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th Annual CnC Workshop September 8, 2015 Acknowledgements This work was done as part of my internship with the OCR team, part of Intel Federal, LLC at Jones Farm (Hillsboro, OR).

CNC Router What is it good for? About the CNC Full name is CNC Router but is shortened to CNC

2 Axis CNC Plasma Cutter CNC CNC or computer numerical control is a way to control machine

Process for OCR Audit and Remediation What is an OCR Complaint? How do I resolve an OCR

OCR for CJK Mark Ravina CEAL Technology Forum 2018 I am an OCR end-user, not an OCR developer

SOLAR FAADE NDS 2005 - Modul 08 : CNC Shifted Seating Unit NDS 2005 - Modul 08 : CNC Cutplan

ABBYY Fi ABBYY Fi ABBYY FineReader ABBYY FineReader R R d d OCR and PDF Conversion OCR and

M-Files OCR Presented By: Syed Raza What is OCR? OCR - Optical Character Recognition

CNC PINpad USA, December 2014 Configuration Configuration Description POS Dollar General

What Does OCR Do? OCR enforces several civil rights laws. These laws prohibit discrimination on

OCR Level 2 ITQ - Unit 59 - Presentation Software Using OCR Level 2 ITQ - Unit 59 - Presentation

OCR Level 1 ITQ - Unit 58 - Presentation Software Using OCR Level 1 ITQ - Unit 58 - Presentation

CNC MILLING CNC TURNING VIBRATORY FINISHING LASER PART MARKING COMPOENT

Evaluating Binarization for OCR Donald B. Curtis MyFamily.com, Inc. Genealogical Data

Introduction to OCR ZHANG Xinyun SmartMore Outline Background Text Detection Text

OCR vs. text2Pitman ... Tell me about plans. OCR How old are you? It is time to close

OCR Post-Processing Michal Richter Noisy channel approach I Scanning of the document and OCR

Translation Caching: Skip, Dont Walk (The Page Table) Thomas W. Barr, Alan L. Cox, Scott

Magnifying (unknown) rare clusters to increase the chance of detection, using unsupervised

North Carolina Forest Carbon Offsets Workshop November 13, 2012 North Carolina Forest Service

Georgia Tech, Sony/Toshiba/IBM W Workshop on Software and Applications k h S ft d A li ti

Impr oving Memor y Hier ar chy Per for mance For Ir r egular Applications J ohn Mellor- Crummey

Data Needs for Sampling the Internet to Measure Performance Juana Sanchez UCLA Statistics In

Op#miza#on of High-Order Stencils* Kevin Stock

NETWORK COMMUNITY DETECTION IN PRACTICAL SCENARIOS Lovro Subelj University of Ljubljana