Multi-Threaded Composition of Finite-State Automata Bryan Jurish - PowerPoint PPT Presentation

Multi-Threaded Composition of Finite-State Automata Bryan Jurish Kay-Michael Würzner Berlin-Brandenburg Academy of Sciences University of Potsdam jurish@bbaw.de wuerzner@uni-potsdam.de FSMNLP 2013 St. Andrews, 17 th July, 2013 FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 1/25

Overview The Big Idea The Situation The Approach Parallel Composition Algorithms Master-Slave Peer-to-Peer Experiments Materials Method Results Concluding Remarks FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 2/25

— The Big Idea — FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 3/25

The Situation No Free Lunch (anymore) CPU frequency growth stagnating Multiprocessor systems increasingly popular ❀ “horizontal” scaling / multi-threading T 3 = ( T 1 ◦ T 2 ) (W)FST Composition Online: lexical lookup, Viterbi decoding, parsing, . . . Offline: lexicon compilation, statistical modelling, . . . no generic parallel implementation (that we know of) 1 S ( N ) = Amdahl’s Law (1 − P )+ P N Not all algorithms scale well horizontally ( P ≪ 1 ) For FSTs, P may depend on FST topology ❀ not all FST compositions scale horizontally! FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 4/25

The Basics Definition Given two ε -free FSTs T 1 = � Σ , Γ , Q 1 , q 0 1 , F 1 , E 1 � and T 2 = � Γ , ∆ , Q 2 , q 0 2 , F 2 , E 2 � , T 3 = ( T 1 ◦ T 2 ) is itself an FST with: � � T 3 = Σ , ∆ , Q 1 × Q 2 , ( q 0 1 , q 0 2 ) , F 1 × F 2 , E 3 �� E 3 = ( q 1 , q 2 ) , ( r 1 , r 2 ) , a, c ( q 1 ,r 1 ,a,b ) ∈ E 1 , ( q 2 ,r 2 ,b,c ) ∈ E 2 � � � T 3 � = ( x, z ) | ∃ y : ( x, y ) ∈ � T 1 � & ( y, z ) ∈ � T 2 � � T 1 � ◦ � T 2 � = Properties simple construction requires ε -free FSTs worst-case O time = O ( | E 1 × E 2 | ) FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 5/25

Serial Algorithm compose ( T 1 = � Σ , Γ , Q 1 , q 0 1 , F 1 , E 1 � , T 2 = � Γ , ∆ , Q 2 , q 0 2 , F 2 , E 2 � ) 1 Q ← { ( q 0 1 , q 0 2 ) } /* initialize */ 2 V ← { ( q 0 1 , q 0 2 ) } /* visitation queue */ 3 while V � = ∅ do ( q 1 , q 2 ) ← pop( V ) 4 /* visit state */ if ( q 1 , q 2 ) ∈ F 1 × F 2 then 5 /* final state */ F ← F ∪ { ( q 1 , q 2 ) } 6 foreach ( e 1 , e 2 ) ∈ E [ q 1 ] × E [ q 2 ] with o[ e 1 ] = i[ e 2 ] do 7 /* align edges */ ∈ Q then if (n[ e 1 ] , n[ e 2 ]) / 8 Q ← Q ∪ { (n[ e 1 ] , n[ e 2 ]) } 9 V ← V ∪ { (n[ e 1 ] , n[ e 2 ]) } 10 /* enqueue for visitation */ E ← E ∪ { ( q 1 , q 2 ) , (n[ e 1 ] , n[ e 2 ]) , i[ e 1 ] , o[ e 2 ] } 11 12 return T 3 = � Σ , ∆ , Q, ( q 0 1 , q 0 2 ) , F, E � FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 6/25

The Approach Parallel State Visitation (lines 4–11) breadth-first search of output states ( V : FIFO ) distributed output data ( Q, F, E ) shared visitation queue ( V ) Amdahl’s Law Revisited | Q | | Q | S max : ≈ = 1+depth( T 3 ) 1+max q ∈ Q min π ∈ Π( q 0 ,q ) | π | 1 1 − P = S max assumes constant (average) state complexity worst-case breadth-first visitation 2 3 1 0 1 2 3 0 4 5 S max = 1 ; P = 0 S max = 3 2 ; P = 1 3 FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 7/25

— Algorithms — FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 8/25

Algorithm (Sketch): Master-Slave slave0 slave1 master slave2 slave3 Superordinate Distribution of Work state-pairs ( q 1 , q 2 ) passed to slaves for visitation Slave Tasks align & expand transitions, globally enqueue visitation requests Shared Global Data V ⊆ Q 1 × Q 2 V : visitation queue Q ⊆ Q 1 × Q 2 Q : visited states n_q : output state counter (for serialization) n_up : number of tasks currently assigned (for termination) FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 9/25

Algorithm (Sketch): Peer-to-Peer peer0 peer1 peer3 peer2 � q 1 + q 2 State Partitioning Function � r : ( q 1 , q 2 ) �→ mod N 2 peer i visits states with r ( q 1 , q 2 ) = i Peer-to-Peer Message Passing V ∈ ℘ ( E 1 × E 2 ) N × N messages are aligned transitions ( e 1 , e 2 ) sender: r (p[ e 1 ] , p[ e 2 ]) ❀ receiver: r (n[ e 1 ] , n[ e 2 ]) Shared Global Data n_q : output state counter (for serialization) n_up : number of messages currently enqueued (for termination) FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 10/25

— Experiments — FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 11/25

Experiments Materials 2,266 randomly generated WFSTs T trie spine + random arcs depth( T ) ≤ 32 (piecewise-) uniform sampling | Q T | , | E T | , | Σ | “embarrassingly parallel” topology P ( T − 1 ◦T ) > 99% g++ v4.4.5 algorithms implemented in C++ hexadecacore test machine 16 hardware cores Method for each generated T , compute ( T − 1 ◦ T ) 1 sample selection filter 64 sec ≤ t serial ≤ 8 sec varied number of threads N ∈ { 1 , 2 , 4 , 8 , 16 } Evaluation average running time 8 iterations per configuration structural properties of T , ( T − 1 ◦ T ) | Q | , | E | , . . . FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 12/25

Results: Master-Slave 8 serial ms: 2 ms: 4 ms: 8 ms:16 4 S = t.serial / t.ms 2 1 0.5 P ≈ − 23 . 5% σ = 82 . 3% 0.25 1 2 4 8 16 32 64 128 E / Q FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 13/25

Results: Peer-to-Peer 8 serial pp: 2 pp: 4 pp: 8 pp:16 4 S = t.serial / t.pp 2 P ≈ 83 . 1 % σ = 7 . 18% 1 1 2 4 8 16 32 64 128 E / Q FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 14/25

So What About NLP? Lexical Lookup � � many “small” compositions Id( w ) ◦ T Lex w ∈ W topology-dependent S max ❀ prefer high-level fork() over W Corpus Analysis single “large” composition A Corpus ◦ T Anal distributed representation ❀ serialization overhead Model Compilation offline “large” composition T Error ◦ A Lex partitioning function ❀ task-dependent tuning FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 15/25

Concluding Remarks Summary No (more) Free Lunch parallelization of “traditional” serial algorithms Amdahl’s Law Applied maximum speedup depends on FST topology Sharing (data) Hurts distributed synchronization improves performance Future Directions improve sampling procedure extend to other FST operations determinization minimization cascaded best-path lookup FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 16/25

The End Thank you for listening! FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 17/25

— Addenda — 2d Plots t serial : S E : S E/Q : S 3d Plots E/Q : N : S t serial : N : S Q : E : histogram t serial : E/Q : histogram FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 18/25

Plots: 2d: t serial : S 2 8 serial serial ms: 2 pp: 2 ms: 4 pp: 4 ms: 8 pp: 8 ms:16 pp:16 4 S = t.serial / t.ms S = t.serial / t.pp 1 2 0.5 1 0.1 1 0.1 1 t.serial t.serial 8 8 serial serial ms: 8 pp: 8 ms: 8 pp: 8 4 2 4 S = t.serial / t.ms S = t.serial / t.pp 1 0.5 2 0.25 0.125 1 0.1 1 0.1 1 t.serial t.serial FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 19/25

Plots: 2d: E : S 4 8 serial serial ms: 2 pp: 2 ms: 4 pp: 4 ms: 8 pp: 8 ms:16 pp:16 2 4 S = t.serial / t.ms S = t.serial / t.pp 1 2 0.5 0.25 1 100000 1e+06 1e+07 100000 1e+06 1e+07 nec nec 8 8 serial serial ms: 8 pp: 8 ms: 8 pp: 8 4 2 4 S = t.serial / t.ms S = t.serial / t.pp 1 0.5 2 0.25 0.125 1 100000 1e+06 1e+07 100000 1e+06 1e+07 nec nec FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 20/25

Plots: 2d: E/Q : S 8 8 serial serial ms: 2 pp: 2 ms: 4 pp: 4 ms: 8 pp: 8 ms:16 pp:16 4 4 S = t.serial / t.ms 2 S = t.serial / t.pp 1 2 0.5 1 0.25 1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128 E / Q E / Q 8 8 serial serial ms: 8 pp: 8 ms: 8 pp: 8 4 2 4 S = t.serial / t.ms S = t.serial / t.pp 1 0.5 2 0.25 0.125 1 1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128 E / Q E / Q FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 21/25

Plots: 3d: E/Q : N : S master-slave peer-to-peer S = t.serial / t.ms S = t.serial / t.pp 16 6 16 6 5 5 8 8 4 4 N N 3 3 4 4 2 2 1 1 2 2 1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128 E / Q E / Q FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 22/25

Plots: 3d: t serial : N : S master-slave peer-to-peer S = t.serial / t.ms S = t.serial / t.pp 16 6 16 6 5 5 8 8 4 4 N N 3 3 4 4 2 2 1 1 2 2 0.1 1 0.1 1 t.serial t.serial FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 23/25

Plots: 3d: Q : E : histogram raw smoothed 25 12 10 1e+07 1e+07 20 8 15 nec 6 1e+06 1e+06 10 4 5 2 100000 100000 0 0 10000 100000 1e+06 1e+07 10000 100000 1e+06 1e+07 nqc nqc FSMNLP 2013 / Jurish | Würzner / Multi-threaded composition – p. 24/25

Multi-Threaded Composition of Finite-State Automata Bryan Jurish - PowerPoint PPT Presentation

Multi-Threaded Composition of Finite-State Automata Bryan Jurish Kay-Michael Wrzner Berlin-Brandenburg Academy of Sciences University of Potsdam jurish@bbaw.de wuerzner@uni-potsdam.de FSMNLP 2013 St. Andrews, 17 th July, 2013 FSMNLP 2013 /

Detecting Data Races in Multi-Threaded Programs Eraser A Dynamic Data-Race Detector for

Computation Finite State Automata (12.2) Definition 1 A Finite State Automata (FSA) is a 5-tuple (

Introduction to Finite Automata Languages Deterministic Finite Automata Representations of

3.9: Empty-string Finite Automata In this and the following two sections, we will study three

Finite Automata: Informal Finite Automata: Informal p.1/20 Computational models The

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

3.10: Nondeterministic Finite Automata In this section, we study the second of our more restricted

3.7: Simplification of Finite Automata In this section, we: say what it means for a finite

Expressive Completeness over Nat and Finite orders MLO=Automata=regular expressions (over finite

Finite state automata Finite graphs with labels on edges/nodes Lecture 2 a set of nodes

The State Automata Formalism Untimed models of discrete event systems Languages Regular

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Synchronizing Finite Automata Lecture IV. Synchronizing Automata and Markov Chains Mikhail Volkov

Synchronizing Finite Automata Lecture III. Expansion Method Mikhail Volkov Ural Federal

Finite Automata A finite automaton has a finite set of states with which it accepts or rejects

Threshold Implementations: Comprehend and Apply Svetla Nikova, KU Leuven, Belgium July 4rd, 2013

z-Transform Chapter 6 Chapter 6 D I Dr. Iyad Jafar d J f Outline Outline Definition

Digital System Design for Circuit and Electronics Additional material Intro. VLSI: CMOS inverter

the GLIBv3 + Expansion Board February 2016 M. Fras, Electronics Division, MPI for Physics,

Data Structures and Models of Computation Gerth Stlting Brodal Inauguration talk, Department

Previously Instance recognition Local features: detection and description Window-based

Efficient implementation of a spectrum scanner on a software-defined radio platform Franois

Case-based Reasoning Idea: experiences themselves are stored. These are called cases.

Multi-Threaded Composition of Finite-State Automata Bryan Jurish - PowerPoint PPT Presentation

Multi-Threaded Composition of Finite-State Automata Bryan Jurish Kay-Michael Wrzner Berlin-Brandenburg Academy of Sciences University of Potsdam jurish@bbaw.de wuerzner@uni-potsdam.de FSMNLP 2013 St. Andrews, 17 th July, 2013 FSMNLP 2013 /

Detecting Data Races in Multi-Threaded Programs Eraser A Dynamic Data-Race Detector for

Computation Finite State Automata (12.2) Definition 1 A Finite State Automata (FSA) is a 5-tuple (

Introduction to Finite Automata Languages Deterministic Finite Automata Representations of

3.9: Empty-string Finite Automata In this and the following two sections, we will study three

Finite Automata: Informal Finite Automata: Informal p.1/20 Computational models The

CSC 473 Automata, Grammars &amp; Languages 9/29/10 Automata, Grammars and Languages Discourse 03

3.10: Nondeterministic Finite Automata In this section, we study the second of our more restricted

3.7: Simplification of Finite Automata In this section, we: say what it means for a finite

Expressive Completeness over Nat and Finite orders MLO=Automata=regular expressions (over finite

Finite state automata Finite graphs with labels on edges/nodes Lecture 2 a set of nodes

The State Automata Formalism Untimed models of discrete event systems Languages Regular

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Synchronizing Finite Automata Lecture IV. Synchronizing Automata and Markov Chains Mikhail Volkov

Synchronizing Finite Automata Lecture III. Expansion Method Mikhail Volkov Ural Federal

Finite Automata A finite automaton has a finite set of states with which it accepts or rejects

Threshold Implementations: Comprehend and Apply Svetla Nikova, KU Leuven, Belgium July 4rd, 2013

z-Transform Chapter 6 Chapter 6 D I Dr. Iyad Jafar d J f Outline Outline Definition

Digital System Design for Circuit and Electronics Additional material Intro. VLSI: CMOS inverter

the GLIBv3 + Expansion Board February 2016 M. Fras, Electronics Division, MPI for Physics,

Data Structures and Models of Computation Gerth Stlting Brodal Inauguration talk, Department

Previously Instance recognition Local features: detection and description Window-based

Efficient implementation of a spectrum scanner on a software-defined radio platform Franois

Case-based Reasoning Idea: experiences themselves are stored. These are called cases.

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03