Multi-Threaded Composition of Finite-State Automata Bryan Jurish - - PowerPoint PPT Presentation

multi threaded composition of finite state automata
SMART_READER_LITE
LIVE PREVIEW

Multi-Threaded Composition of Finite-State Automata Bryan Jurish - - PowerPoint PPT Presentation

Multi-Threaded Composition of Finite-State Automata Bryan Jurish Kay-Michael Wrzner Berlin-Brandenburg Academy of Sciences University of Potsdam jurish@bbaw.de wuerzner@uni-potsdam.de FSMNLP 2013 St. Andrews, 17 th July, 2013 FSMNLP 2013 /


slide-1
SLIDE 1

Multi-Threaded Composition of Finite-State Automata

Bryan Jurish Kay-Michael Würzner

Berlin-Brandenburg Academy of Sciences University of Potsdam jurish@bbaw.de wuerzner@uni-potsdam.de

FSMNLP 2013

  • St. Andrews, 17th July, 2013

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 1/25

slide-2
SLIDE 2

Overview

The Big Idea The Situation The Approach Parallel Composition Algorithms Master-Slave Peer-to-Peer Experiments Materials Method Results Concluding Remarks

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 2/25

slide-3
SLIDE 3

— The Big Idea —

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 3/25

slide-4
SLIDE 4

The Situation

No Free Lunch (anymore) CPU frequency growth stagnating Multiprocessor systems increasingly popular

❀ “horizontal” scaling / multi-threading

(W)FST Composition

T3 = (T1 ◦ T2)

Online: lexical lookup, Viterbi decoding, parsing, . . . Offline: lexicon compilation, statistical modelling, . . . no generic parallel implementation (that we know of) Amdahl’s Law

S(N) =

1 (1−P)+ P

N

Not all algorithms scale well horizontally (P ≪ 1) For FSTs, P may depend on FST topology

❀ not all FST compositions scale horizontally!

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 4/25

slide-5
SLIDE 5

The Basics

Definition

Given two ε-free FSTs T1 = Σ, Γ, Q1, q01, F1, E1 and T2 = Γ, ∆, Q2, q02, F2, E2, T3 = (T1 ◦ T2) is itself an FST with: T3 =

  • Σ, ∆, Q1 × Q2, (q01, q02), F1 × F2, E3
  • E3

=

  • (q1,r1,a,b)∈E1,

(q2,r2,b,c)∈E2

  • (q1, q2), (r1, r2), a, c
  • T3

=

  • (x, z) | ∃y : (x, y) ∈ T1 & (y, z) ∈ T2
  • =

T1 ◦ T2

Properties

simple construction requires ε-free FSTs worst-case Otime = O(|E1 × E2|)

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 5/25

slide-6
SLIDE 6

Serial Algorithm

compose(T1= Σ, Γ, Q1, q01, F1, E1,T2= Γ, ∆, Q2, q02, F2, E2)

1 Q ← {(q01, q02)} /* initialize */ 2 V ← {(q01, q02)} /* visitation queue */ 3 while V = ∅ do 4

(q1, q2) ←pop(V )

/* visit state */ 5 if (q1, q2) ∈ F1 × F2 then /* final state */ 6

F ← F ∪ {(q1, q2)}

7 foreach (e1, e2) ∈ E[q1] × E[q2] with o[e1] = i[e2] do /* align edges */ 8 if (n[e1], n[e2]) /

∈ Q then

9

Q ← Q ∪ {(n[e1], n[e2])}

10

V ← V ∪ {(n[e1], n[e2])}

/* enqueue for visitation */ 11

E ← E ∪ {(q1, q2), (n[e1], n[e2]), i[e1], o[e2]}

12 return T3 = Σ, ∆, Q, (q01, q02), F, E

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 6/25

slide-7
SLIDE 7

The Approach

Parallel State Visitation

(lines 4–11) breadth-first search of output states (V : FIFO) distributed output data (Q, F, E) shared visitation queue (V )

Amdahl’s Law Revisited

Smax :≈

|Q| 1+depth(T3)

=

|Q| 1+maxq∈Q minπ∈Π(q0,q) |π|

P = 1 −

1 Smax

assumes constant (average) state complexity worst-case breadth-first visitation

1 2 3

Smax= 1 ; P= 0

1 4 2 3 5

Smax= 3

2 ; P= 1 3

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 7/25

slide-8
SLIDE 8

— Algorithms —

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 8/25

slide-9
SLIDE 9

Algorithm (Sketch): Master-Slave

slave1 slave3 slave2 slave0 master

Superordinate Distribution of Work state-pairs (q1, q2) passed to slaves for visitation Slave Tasks align & expand transitions, globally enqueue visitation requests Shared Global Data

V : visitation queue V ⊆ Q1 × Q2 Q : visited states Q ⊆ Q1 × Q2 n_q : output state counter

(for serialization) n_up: number of tasks currently assigned (for termination)

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 9/25

slide-10
SLIDE 10

Algorithm (Sketch): Peer-to-Peer

peer0 peer1 peer3 peer2

State Partitioning Function

r : (q1, q2) →

q1+q2

2

  • mod N

peer i visits states with r(q1, q2) = i Peer-to-Peer Message Passing

V ∈ ℘(E1 × E2)N×N

messages are aligned transitions (e1, e2) sender: r(p[e1], p[e2]) ❀ receiver: r(n[e1], n[e2]) Shared Global Data n_q : output state counter (for serialization) n_up: number of messages currently enqueued (for termination)

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 10/25

slide-11
SLIDE 11

— Experiments —

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 11/25

slide-12
SLIDE 12

Experiments

Materials 2,266 randomly generated WFSTs T

trie spine + random arcs

depth(T ) ≤ 32

(piecewise-) uniform sampling

|QT |, |ET |, |Σ|

“embarrassingly parallel” topology

P(T −1◦T ) > 99%

algorithms implemented in C++

g++ v4.4.5

hexadecacore test machine

16 hardware cores

Method for each generated T , compute (T −1 ◦ T )

sample selection filter

1 64 sec ≤ tserial ≤ 8 sec

varied number of threads

N ∈ {1, 2, 4, 8, 16}

Evaluation average running time

8 iterations per configuration

structural properties of T , (T −1 ◦ T )

|Q|, |E|, . . .

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 12/25

slide-13
SLIDE 13

Results: Master-Slave

0.25 0.5 1 2 4 8 1 2 4 8 16 32 64 128 S = t.serial / t.ms E / Q serial ms: 2 ms: 4 ms: 8 ms:16

P ≈ −23.5% σ = 82.3%

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 13/25

slide-14
SLIDE 14

Results: Peer-to-Peer

1 2 4 8 1 2 4 8 16 32 64 128 S = t.serial / t.pp E / Q serial pp: 2 pp: 4 pp: 8 pp:16

P ≈ 83.1 % σ = 7.18%

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 14/25

slide-15
SLIDE 15

So What About NLP?

Lexical Lookup many “small” compositions

  • Id(w) ◦ TLex
  • w∈W

topology-dependent Smax

❀ prefer high-level fork() over W

Corpus Analysis single “large” composition

ACorpus ◦ TAnal

distributed representation

❀ serialization overhead

Model Compilation

  • ffline “large” composition

TError ◦ ALex

partitioning function

❀ task-dependent tuning

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 15/25

slide-16
SLIDE 16

Concluding Remarks

Summary No (more) Free Lunch parallelization of “traditional” serial algorithms Amdahl’s Law Applied maximum speedup depends on FST topology Sharing (data) Hurts distributed synchronization improves performance Future Directions improve sampling procedure extend to other FST operations

determinization minimization cascaded best-path lookup

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 16/25

slide-17
SLIDE 17

The End

Thank you for listening!

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 17/25

slide-18
SLIDE 18

— Addenda —

2d Plots

tserial : S E : S E/Q : S

3d Plots

E/Q : N : S tserial : N : S Q : E : histogram tserial : E/Q : histogram

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 18/25

slide-19
SLIDE 19

Plots: 2d: tserial : S

0.5 1 2 0.1 1 S = t.serial / t.ms t.serial serial ms: 2 ms: 4 ms: 8 ms:16 1 2 4 8 0.1 1 S = t.serial / t.pp t.serial serial pp: 2 pp: 4 pp: 8 pp:16 0.125 0.25 0.5 1 2 4 8 0.1 1 S = t.serial / t.ms t.serial serial ms: 8 ms: 8 1 2 4 8 0.1 1 S = t.serial / t.pp t.serial serial pp: 8 pp: 8

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 19/25

slide-20
SLIDE 20

Plots: 2d: E : S

0.25 0.5 1 2 4 100000 1e+06 1e+07 S = t.serial / t.ms nec serial ms: 2 ms: 4 ms: 8 ms:16 1 2 4 8 100000 1e+06 1e+07 S = t.serial / t.pp nec serial pp: 2 pp: 4 pp: 8 pp:16 0.125 0.25 0.5 1 2 4 8 100000 1e+06 1e+07 S = t.serial / t.ms nec serial ms: 8 ms: 8 1 2 4 8 100000 1e+06 1e+07 S = t.serial / t.pp nec serial pp: 8 pp: 8

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 20/25

slide-21
SLIDE 21

Plots: 2d: E/Q : S

0.25 0.5 1 2 4 8 1 2 4 8 16 32 64 128 S = t.serial / t.ms E / Q serial ms: 2 ms: 4 ms: 8 ms:16 1 2 4 8 1 2 4 8 16 32 64 128 S = t.serial / t.pp E / Q serial pp: 2 pp: 4 pp: 8 pp:16 0.125 0.25 0.5 1 2 4 8 1 2 4 8 16 32 64 128 S = t.serial / t.ms E / Q serial ms: 8 ms: 8 1 2 4 8 1 2 4 8 16 32 64 128 S = t.serial / t.pp E / Q serial pp: 8 pp: 8

FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 21/25

slide-22
SLIDE 22

Plots: 3d: E/Q : N : S

master-slave peer-to-peer

S = t.serial / t.ms 1 2 4 8 16 32 64 128 E / Q 2 4 8 16 N 1 2 3 4 5 6 S = t.serial / t.pp 1 2 4 8 16 32 64 128 E / Q 2 4 8 16 N 1 2 3 4 5 6 FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 22/25

slide-23
SLIDE 23

Plots: 3d: tserial : N : S

master-slave peer-to-peer

S = t.serial / t.ms 0.1 1 t.serial 2 4 8 16 N 1 2 3 4 5 6 S = t.serial / t.pp 0.1 1 t.serial 2 4 8 16 N 1 2 3 4 5 6 FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 23/25

slide-24
SLIDE 24

Plots: 3d: Q : E : histogram

raw smoothed

10000 100000 1e+06 1e+07 nqc 100000 1e+06 1e+07 5 10 15 20 25 10000 100000 1e+06 1e+07 nqc 100000 1e+06 1e+07 nec 2 4 6 8 10 12 FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 24/25

slide-25
SLIDE 25

Plots: 3d: tserial : E/Q : histogram

raw smoothed

0.1 1 t.serial 1 2 4 8 16 32 64 128 E / Q 5 10 15 20 25 0.1 1 t.serial 1 2 4 8 16 32 64 128 E / Q 1 2 3 4 5 6 7 8 FSMNLP 2013 / Jurish|Würzner / Multi-threaded composition – p. 25/25