S p e c t r a l Me t h o d s i n o u r S P i C - - PowerPoint PPT Presentation

s p e c t r a l me t h o d s i n o u r s p i c e 1 6 s u
SMART_READER_LITE
LIVE PREVIEW

S p e c t r a l Me t h o d s i n o u r S P i C - - PowerPoint PPT Presentation

The UKs European university S p e c t r a l Me t h o d s i n o u r S P i C e 1 6 S u b mi s s i o n F a r h a n a F e r d o u s i L i z a a n d Ma r e k G r z e s S c h o


slide-1
SLIDE 1

The UK’s European university

F a r h a n a F e r d

  • u

s i L i z a a n d Ma r e k G r z e s

S c h

  • l
  • f

C

  • m

p u t i n g , U n i v e r s i t y

  • f

K e n t , U K

S p e c t r a l Me t h

  • d

s i n

  • u

r S P i C e ’ 1 6 S u b mi s s i

  • n
slide-2
SLIDE 2

Our Team

Hidden Markov Models Natural Language Processing Spectral Learning

slide-3
SLIDE 3

Our Score

The highest score among methods that did not use Neural Networks

slide-4
SLIDE 4

Initial Attempts

slide-5
SLIDE 5

Spectral learning for HMMs (Hsu et al. 2012)

Observable Operator Model for HMMs Empirical moment calculation: Transformed operators for HMMs

U defines an m-dimensional subspace that preserves the state dynamics.

slide-6
SLIDE 6

The Main Parameters of the Method

  • The number of hidden states
slide-7
SLIDE 7

Main Methods

slide-8
SLIDE 8

Weighted Finite Automata and Sequence Prediction

Balle et. al. (EMNLP 2014)

slide-9
SLIDE 9

Hankel Matrix

Balle et al. (EMNLP 2014)

slide-10
SLIDE 10

The Basis

Balle et al. (EMNLP 2014)

slide-11
SLIDE 11

The Main Parameters of the Method

  • The number of hidden states
  • The basis
  • The basis can be chosen from a sub-block of the

Hankel matrix where the rows and columns correspond to the substrings and the cells correspond to the frequencies of the substrings in the data.

  • Therefore, the maximum length of the substrings can

be considered as a parameter

slide-12
SLIDE 12

Parameter Tuning

  • A combination of (manual) coordinate ascent

and random search

  • Why random search?

(BERGSTRA AND BENGIO (2012))

slide-13
SLIDE 13

Other Methods

  • 3-gram model with smoothing worked better

than spectral learning on 3 problems

slide-14
SLIDE 14

Experimental results (1)

  • The Spectral Method did well on problems 1, 2,

3, 9,12

  • Presumably, those problems have small

numbers of hidden states

N

  • f

S t a t e s

5 1 1 5 2

S c

  • r

e

. 5 5 . 6 . 6 5 . 7 . 7 5 . 8 . 8 5 . 9 . 9 5

No of st at e s vs Score ( Sm all num ber of st at es)

D a t a s e t 1 D a t a s e t 2 D a t a s e t 9 D a t a s e t 3 D a t a s e t 1 2

slide-15
SLIDE 15

Experimental result (2)

  • Score prediction is invariant to changes in the

number of states on problems 4, 5, 7, 8,10,11,13

N

  • f

S t a t e s

2 4 6 8 1

S c

  • r

e

. 2 5 . 3 . 3 5 . 4 . 4 5 . 5 . 5 5 . 6

No of st at es vs Score

D a t a s e t 4 D a t a s e t 5 D a t a s e t 7 D a t a s e t 8 D a t a s e t 1 D a t a s e t 1 3

slide-16
SLIDE 16

Experimental result (3)

  • On problems 5, 8 and 10, 3-gram with

smoothing gave slightly batter results than the corresponding spectral approach

P r

  • b

l e m n

  • 4

5 6 7 8 9 1

S c

  • r

e

. 3 5 . 4 . 4 5 . 5 . 5 5 . 6 . 6 5

spectral vs n-gram

S p e c t r a l 3

  • g

r a m w i t h K N s m

  • t

h i n g

slide-17
SLIDE 17

The Final Parameter Values for WFA

slide-18
SLIDE 18

T H E U K ’ S E U R O P E A N U N I V E R S I T Y

w w w . k e n t . a c . u k