Improving historical spelling normalization with bi-directional - - PowerPoint PPT Presentation

improving historical spelling normalization with bi
SMART_READER_LITE
LIVE PREVIEW

Improving historical spelling normalization with bi-directional - - PowerPoint PPT Presentation

Problem definition Neural network approach Multi-task learning Improving historical spelling normalization with bi-directional LSTMs and multi-task learning Marcel Bollmann 1 Anders Sgaard 2 1 Ruhr-Universitt Bochum, Germany 2 University of


slide-1
SLIDE 1

Problem definition Neural network approach Multi-task learning

Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

Marcel Bollmann1 Anders Søgaard2

1Ruhr-Universität Bochum, Germany 2University of Copenhagen, Denmark

COLING 2016 December 13, 2016

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-2
SLIDE 2

Problem definition Neural network approach Multi-task learning The Anselm corpus Dealing with spelling variation

Motivation

Sample of a manuscript from Early New High German

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-3
SLIDE 3

Problem definition Neural network approach Multi-task learning The Anselm corpus Dealing with spelling variation

A corpus of Early New High German

◮ Medieval religious treatise

“Interrogatio Sancti Anselmi de Passione Domini”

◮ > 50 manuscripts and

prints (in German)

◮ 14th–16th century ◮ Various dialects

◮ Bavarian ◮ Middle German ◮ Low German ◮ ...

Sample from an Anselm manuscript http://www.linguistics.rub.de/anselm/

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-4
SLIDE 4

Problem definition Neural network approach Multi-task learning The Anselm corpus Dealing with spelling variation

Examples for historical spellings

Frau (woman) fraw, frawe, fräwe, frauwe, fraüwe, frow, frouw, vraw, vrow, vorwe, vrauwe, vrouwe Kind (child) chind, chinde, chindt, chint, kind, kinde, kindi, kindt, kint, kinth, kynde, kynt Mutter (mother) moder, moeder, mueter, müeter, muoter, muotter, muter, mutter, mvoter, mvter, mweter

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-5
SLIDE 5

Problem definition Neural network approach Multi-task learning The Anselm corpus Dealing with spelling variation

Dealing with spelling variation

The problems...

◮ Difficult to annotate with

tools aimed at modern data

◮ High variance in spelling ◮ None/very little training

data

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-6
SLIDE 6

Problem definition Neural network approach Multi-task learning The Anselm corpus Dealing with spelling variation

Dealing with spelling variation

The problems...

◮ Difficult to annotate with

tools aimed at modern data

◮ High variance in spelling ◮ None/very little training

data Normalization...

◮ Removes variance ◮ Enables re-using of

existing tools

◮ Useful annotation layer

(e.g. for corpus query) Normalization as the mapping of historical spellings to their modern-day equivalents.

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-7
SLIDE 7

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Our approach

◮ Character-based sequence labelling

Hist

vrow

Norm

frau

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-8
SLIDE 8

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Our approach

◮ Character-based sequence labelling

Hist

v r o w

Norm

f r a u

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-9
SLIDE 9

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Our approach

◮ Character-based sequence labelling

Hist

v r o w

Norm

f r a u

◮ Not all examples are so straightforward...

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-10
SLIDE 10

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Our approach

Hist

vsfuret

Norm

ausführt

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-11
SLIDE 11

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Our approach

Hist

v s f u r e t

Norm

a u s f ü h r t

◮ Iterated Levenshtein distance alignment (Wieling et al., 2009)

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-12
SLIDE 12

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Our approach

Hist

v s f u r e t

Norm

a u s f ü h r ε t

◮ Iterated Levenshtein distance alignment (Wieling et al., 2009) ◮ Epsilon label for “deletions”

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-13
SLIDE 13

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Our approach

Hist

v s f u r e t

Norm

a u s f üh r ε t

◮ Iterated Levenshtein distance alignment (Wieling et al., 2009) ◮ Epsilon label for “deletions” ◮ Leftward merging of “insertions”

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-14
SLIDE 14

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Our approach

Hist

_ v s f u r e t

Norm

a u s f üh r ε t

◮ Iterated Levenshtein distance alignment (Wieling et al., 2009) ◮ Epsilon label for “deletions” ◮ Leftward merging of “insertions” ◮ Special “beginning of word” symbol

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-15
SLIDE 15

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Our model

<BOS> v r

  • w

ε f r a u embedding layer stack of bi-LSTM layers prediction layer

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-16
SLIDE 16

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Evaluation

◮ 44 texts from the Anselm corpus

◮ ≈ 4,200 – 13,200 tokens per text

(average: 7,353 tokens)

◮ 1,000 tokens for evaluation ◮ 1,000 tokens for development (not used) ◮ Remaining tokens for training ◮ Pre-processing

◮ Remove punctuation ◮ Lowercase all words Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-17
SLIDE 17

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Methods for comparison

◮ Norma (Bollmann, 2012)

◮ Developed on the same corpus ◮ Methods ◮ Automatically learned “replacement rules” ◮ Weighted Levenshtein distance ◮ Requires lexical resource

◮ CRFsuite (Okazaki, 2007)

◮ Same input as the bi-LSTM model ◮ Features: two surrounding characters Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-18
SLIDE 18

Problem definition Neural network approach Multi-task learning Normalization as sequence labelling Bi-LSTM model Evaluation

Results

ID Region Norma CRF Bi-LSTM B2 West Central 76.10% 74.60% 82.00% D3 East Central 80.50% 77.20% 80.10% M East Upper 74.30% 72.80% 83.90% M5 East Upper 80.60% 76.40% 77.70% St2 West Upper 73.20% 73.20% 78.20% . . . . . . . . . Average 77.83% 75.73% 79.90%

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-19
SLIDE 19

Problem definition Neural network approach Multi-task learning Learning a joint model Evaluation Conclusion

Multi-task learning

Stack of bi-LSTMs embedding layer prediction layer

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-20
SLIDE 20

Problem definition Neural network approach Multi-task learning Learning a joint model Evaluation Conclusion

Multi-task learning

Stack of bi-LSTMs embedding layer prediction layer for A prediction layer for B

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-21
SLIDE 21

Problem definition Neural network approach Multi-task learning Learning a joint model Evaluation Conclusion

Multi-task learning

<BOS> v r

  • w

Stack of bi-LSTMs ε f r a u embedding layer prediction layer for A prediction layer for B

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-22
SLIDE 22

Problem definition Neural network approach Multi-task learning Learning a joint model Evaluation Conclusion

Multi-task learning

<BOS> f r a w Stack of bi-LSTMs ε f r a u embedding layer prediction layer for A prediction layer for B

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-23
SLIDE 23

Problem definition Neural network approach Multi-task learning Learning a joint model Evaluation Conclusion

One prediction layer for each text

... Embedding Bi-LSTM Stack Predict (B2) Predict (D3) Predict (M5) · · · · · · · · · Predict (St2) ... ... ... ...

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-24
SLIDE 24

Problem definition Neural network approach Multi-task learning Learning a joint model Evaluation Conclusion

Evaluation

◮ Each of the 44 texts as a separate task

◮ Training: Randomly sample from all texts ◮ Evaluation: Use the prediction layer for the current task

◮ For comparison: Norma/CRF

◮ Augment training set with 10,000 randomly sampled

instances

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-25
SLIDE 25

Problem definition Neural network approach Multi-task learning Learning a joint model Evaluation Conclusion

Results

ID Region Norma Bi-LSTM

Plain Aug. Plain MTL

B2 West Central 76.10% 77.60% 82.00% 79.60% D3 East Central 80.50% 80.20% 80.10% 81.20% M East Upper 74.30% 74.40% 83.90% 80.90% M5 East Upper 80.60% 80.70% 77.70% 82.90% St2 West Upper 73.20% 73.40% 78.20% 79.90% . . . . . . . . . Average 77.83% 77.48% 79.90% 80.55%

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-26
SLIDE 26

Problem definition Neural network approach Multi-task learning Learning a joint model Evaluation Conclusion

Results

ID Region Norma Bi-LSTM

Plain Aug. Plain MTL

B2 West Central 76.10% 77.60% 82.00% 79.60% D3 East Central 80.50% 80.20% 80.10% 81.20% M East Upper 74.30% 74.40% 83.90% 80.90% M5 East Upper 80.60% 80.70% 77.70% 82.90% St2 West Upper 73.20% 73.40% 78.20% 79.90% . . . . . . . . . Average 77.83% 77.48% 79.90% 80.55%

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-27
SLIDE 27

Problem definition Neural network approach Multi-task learning Learning a joint model Evaluation Conclusion

Conclusion

◮ Deep learning works for historical spelling normalization

◮ ...despite small datasets (≈ 4,200 – 13,200 tokens per text)

◮ Outperforms Norma & CRF baseline

◮ ...despite not using a lexical resource (like Norma)

◮ Multi-task learning setup improves results

◮ Way to deal with data sparsity problem ◮ Many improvements conceivable Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-28
SLIDE 28

Problem definition Neural network approach Multi-task learning

Thank you for listening!

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL

slide-29
SLIDE 29

References

References

Bollmann, M. (2012). (Semi-)automatic normalization of historical texts using distance measures and the Norma

  • tool. In Proceedings of the Second Workshop on Annotation
  • f Corpora for Research in the Humanities (ACRH-2).

Lisbon, Portugal. Okazaki, N. (2007). CRFsuite: a fast implementation of conditional random fields (CRFs). http://www.chokkan.org/software/crfsuite/. Retrieved from http://www.chokkan.org/software/crfsuite/ Wieling, M., Proki´ c, J., & Nerbonne, J. (2009). Evaluating the pairwise string alignment of pronunciations. In Proceedings

  • f the EACL 2009 Workshop on Language Technology and

Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH – SHELT&R 2009) (pp. 26–34). Athens, Greece.

Marcel Bollmann, Anders Søgaard Historical spelling normalization with bi-LSTMs and MTL