1IC43AE MA4A311 Minjoon Seo 1,2* , Sewon - - PowerPoint PPT Presentation

1 i c 4 3 ae ma 4 a 311
SMART_READER_LITE
LIVE PREVIEW

1IC43AE MA4A311 Minjoon Seo 1,2* , Sewon - - PowerPoint PPT Presentation

1IC43AE MA4A311 Minjoon Seo 1,2* , Sewon Min 3* , Ali Farhadi 2,4,5 , Hannaneh Hajishirzi 2 1,.3-CM 5EAMIANAEE


slide-1
SLIDE 1

1IC43AE MA4A311

Minjoon Seo1,2*, Sewon Min3*, Ali Farhadi2,4,5, Hannaneh Hajishirzi2 1,.3-CM5EAMIANAEE4C1AEC5EAMIAN ,CCEEAI,8123,* 0N ECEIAAE

UWNLP

slide-2
SLIDE 2
  • RNN

RNN

!" #" #$ #% #& !& !% !$ !' “Intelligent” “and” “invigorating” “film”

RNN RNN

slide-3
SLIDE 3

A

slide-4
SLIDE 4

2C:AG

  • 2C:AGA1
  • 2A7:C::
  • 1C:AGAA
  • AA-01:FC:GAC
  • 2::GACAC::2(
  • AA:):A7:
  • :7AA7:

FLOP = Floating-point

  • perations i.e. # of

computations

slide-5
SLIDE 5

1CG

  • 1CG0
  • 17C
  • )0CG
  • 20FCGC
  • 17GCC71(
  • C:CC
  • 7)-
slide-6
SLIDE 6

.....-

  • ...-..
  • -.....
  • .
  • ...

How can we make RNNs faster on CPUs?

slide-7
SLIDE 7

.

  • :891

919: various levels 11)(&

  • :9::991 ,,
  • Skim:99::991
  • Fully read0::991

Just & Carpenter. “A theory of reading: From eye fixations to comprehension.” Psychological review 87.4 (1980): 329

slide-8
SLIDE 8
  • !"

#$ “Intelligent”

slide-9
SLIDE 9
  • !"

#$ “Intelligent”

slide-10
SLIDE 10
  • READ

!" #$ “Intelligent” %"=1

slide-11
SLIDE 11
  • Big RNN

READ

!" #" #$ “Intelligent” %"=1

slide-12
SLIDE 12
  • Big RNN

READ

!" !# $# $% “Intelligent” “and” &#=1

slide-13
SLIDE 13
  • Big RNN

READ SKIM

!" !# $# $% “Intelligent” “and” &#=1 &"=2

slide-14
SLIDE 14
  • Big RNN

Small RNN

READ SKIM

!" #" #$ !$ !% “Intelligent” “and” &$=1 &"=2

slide-15
SLIDE 15
  • Big RNN

Small RNN

COPY READ SKIM

!" #" #$ !$ !% “Intelligent” “and” &$=1 &"=2

slide-16
SLIDE 16
  • Big RNN

Small RNN

COPY READ SKIM

Big RNN Small RNN

COPY READ SKIM

!" #" #$ #% #& !& !% !$ !' “Intelligent” “and” “invigorating” “film” (&=1 ("=2 ($=1 (%=2

slide-17
SLIDE 17
slide-18
SLIDE 18

CB

  • FBLLN
  • Big RNNB==>FL:L>BP>,=
  • Small RNNB==>FL:L>BP>,=
  • = --=>=,(=,#
  • 2B==>FL:L>B:>=>LN>>FL>
  • .BH=:L>L>>FLB>B==>FL:L>
  • :DDH=:L>FDO::DDHLBFL>B==>FL:L>
  • >FBF:DD L>BF>>F>>B>:DD>

15

  • 5=)#--5==#
  • 0OF:B:DDO:C>=>BBFFNBBP>L>
slide-19
SLIDE 19

)()

! " # = %

&∈(

" #; * Pr(*) */ = Multinomial 9/ * = [*;, *=, … , *?] 9/ = softmax(D(E/, F/G;)) E/: F/G;: Input Previous hidden state

But the sample space is exponentially large!

slide-20
SLIDE 20

2HOLHLJG,

  • .HFILGBBJ?GLGLJLE
  • HEBJ?GL;EEF(
  • 9079.
  • G?BJ?GLLFLHG
  • 2BCNJGCJ?LHLJG
  • 1FEHLFP GBLE()
  • -?LFLHG
  • HONJGBHH?FIJEJEL
  • 0EE?JGLE?JGBLJGGBNJIJFLJRLHG
slide-21
SLIDE 21

(),(112

! " # = %

&∈(

" #; * Pr(*) ∇ log ! " # = ! ∇ log " #; * + log " #; * ∇ log Pr(*)

Gradient can be sampled But the sample space is exponentially large!

slide-22
SLIDE 22

2IPIEH,

  • .IGJNEHCCEHELEH?F
  • IFE?RCEHEFFEGL(
  • 9079.
  • ;HELLEGEIH
  • 2ECOEH?IEH
  • 1NGFIBG HCF()
  • -ELLEGEIH
  • IPOEH?CIIGJEE?FLNFL
  • 0NFFREBBHEFNEHCEHEHCOEJGESEIH
slide-23
SLIDE 23

,201 111 -171&)(

  • 01
  • AA1p
  • 1112g ,2A2
  • 0G111A!17A2A
  • .111171A2
  • 0177 7AA1AG
  • 111 1A1A1AG
  • 011
slide-24
SLIDE 24

)ACB

  • 2BB42C2BB
  • C2ACCA2
  • CC2CB
  • (
  • EB
  • DBCBEA2BB
  • C2ADBCBEA(2C2BCD(
  • ABC
slide-25
SLIDE 25
  • 0C0-12.
  • -12.,B()
  • -CB B
  • CB00,C ()
  • CC77B7
slide-26
SLIDE 26
  • Model

SST Rotten Tomatoes Baseline (LSTM) 86.4% 82.5% LSTM-Jump

  • 79.3% / 1.6x Speed

VCRNN 81.9% / 2.6x FLOP

  • Skim-RNN

86.4% / 3.0x FLOP 84.2% / 1.3x Speed

slide-27
SLIDE 27
  • F1

EM FLOP-R Baseline (LSTM+Att) 75.5% 67.0% 1.0x VCRNN 74.9% 65.4% 1.0x Skim-RNN 75.0% 66.0% 2.3x

slide-28
SLIDE 28

1..

0.2 0.4 0.6 0.8 1 73 74 75 76

B(50) S(20-0.2) S(50-0.2) B(60) S(50-0.1) S(20-0.1) S(20-0.05) B(1-lstm) B

  • Inv. Flop-R

F1

F1(Skim-RNN)

  • Inv. Flop-R

F1(Baseline)

slide-29
SLIDE 29

. )*(

,

slide-30
SLIDE 30

.A. ,.*(..

...))..

slide-31
SLIDE 31
  • the

successful scheduling , budgeting , construction-site safety , availability and transportation

  • f

building materials , logistics , inconvenience to the public caused by construction delays and bidding , etc . The largest construction projects are referred to as megaprojects

1 fw 1 bw 2 fw 2 bw

())!

slide-32
SLIDE 32

#

64 68 72 76 1 1.5 2 2.5 3

F1 Flop-R (Float operation Reduction)

d’ = 10 d’ = 0

slide-33
SLIDE 33

:

  • Skim-RNN---..---
  • ----:-
  • ---:
  • -:-.:.latency.
slide-34
SLIDE 34
  • --(
  • ----
  • )
  • -
slide-35
SLIDE 35

.!/

  • /:
  • .//./