Language Processing for Different Domains and Genres: Machine - - PowerPoint PPT Presentation

language processing for different domains and genres
SMART_READER_LITE
LIVE PREVIEW

Language Processing for Different Domains and Genres: Machine - - PowerPoint PPT Presentation

Language Processing for Different Domains and Genres: Machine Learning Introduction Caroline Sporleder, Ines Rehbein Universit at des Saarlandes Wintersemester 2009/10 5.11.2009 Caroline Sporleder, Ines Rehbein Introduction Machine


slide-1
SLIDE 1

Language Processing for Different Domains and Genres: Machine Learning Introduction

Caroline Sporleder, Ines Rehbein

Universit¨ at des Saarlandes

Wintersemester 2009/10 5.11.2009

Caroline Sporleder, Ines Rehbein Introduction

slide-2
SLIDE 2

Machine Learning Basics

Goal: develop computer programs that automatically improve with experience by learning from representative input (and output) data Motivation: for many problems, the best way of computing the correct

  • utput from the input is not known.

manually determining input output rules by (informed) trial and error is time consuming and typically results in low coverage (but high precision) Example: predicting the plural form of a German noun

Caroline Sporleder, Ines Rehbein Introduction

slide-3
SLIDE 3

Example: German Plural Formation

Nine possibilities:

1 no ending, no umlaut: das Zimmer - die Zimmer (rooms) 2 no ending, but umlaut: der Faden - die F¨

aden (thread)

3 -e: der Hund - die Hunde (dogs) 4 -e plus umlaut: der Stuhl - die St¨

uhle (chairs)

5 -er: das Kind - die Kinder (children) 6 -er plus umlaut: das Lamm - die L¨

ammer (lambs)

7 -n: die Straße - die Straßen (streets) 8 -en: die Bank - die Banken (banks) 9 -s: das Trio - die Trios (trios) Caroline Sporleder, Ines Rehbein Introduction

slide-4
SLIDE 4

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender

Caroline Sporleder, Ines Rehbein Introduction

slide-5
SLIDE 5

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en

Caroline Sporleder, Ines Rehbein Introduction

slide-6
SLIDE 6

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en Applying Rule 1:

Caroline Sporleder, Ines Rehbein Introduction

slide-7
SLIDE 7

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en Applying Rule 1: das Kind (n)

Caroline Sporleder, Ines Rehbein Introduction

slide-8
SLIDE 8

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en Applying Rule 1: das Kind (n) ⇒ die Kinder

Caroline Sporleder, Ines Rehbein Introduction

slide-9
SLIDE 9

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en Applying Rule 1: das Kind (n) ⇒ die Kinder der Hund (m)

Caroline Sporleder, Ines Rehbein Introduction

slide-10
SLIDE 10

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en Applying Rule 1: das Kind (n) ⇒ die Kinder der Hund (m) ⇒ die Hunde

Caroline Sporleder, Ines Rehbein Introduction

slide-11
SLIDE 11

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en Applying Rule 1: das Kind (n) ⇒ die Kinder der Hund (m) ⇒ die Hunde die Bank (f)

Caroline Sporleder, Ines Rehbein Introduction

slide-12
SLIDE 12

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en Applying Rule 1: das Kind (n) ⇒ die Kinder der Hund (m) ⇒ die Hunde die Bank (f) ⇒ die Banken

Caroline Sporleder, Ines Rehbein Introduction

slide-13
SLIDE 13

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en Applying Rule 1: das Kind (n) ⇒ die Kinder der Hund (m) ⇒ die Hunde die Bank (f) ⇒ die Banken das Zimmer (n)

Caroline Sporleder, Ines Rehbein Introduction

slide-14
SLIDE 14

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en Applying Rule 1: das Kind (n) ⇒ die Kinder der Hund (m) ⇒ die Hunde die Bank (f) ⇒ die Banken das Zimmer (n) ⇒ die Zimmerer

Caroline Sporleder, Ines Rehbein Introduction

slide-15
SLIDE 15

Hand-crafting Rules

Hypothesis 1: ending is determined by noun’s grammatical gender Rule 1: masculine ⇒ -e neuter ⇒ -er feminine ⇒ -en Applying Rule 1: das Kind (n) ⇒ die Kinder der Hund (m) ⇒ die Hunde die Bank (f) ⇒ die Banken das Zimmer (n) ⇒ *die Zimmerer (die Zimmer)

Caroline Sporleder, Ines Rehbein Introduction

slide-16
SLIDE 16

Hand-crafting Rules (2)

Hypothesis 2: morpho-phonological form also influences ending

Caroline Sporleder, Ines Rehbein Introduction

slide-17
SLIDE 17

Hand-crafting Rules (2)

Hypothesis 2: morpho-phonological form also influences ending Rule 2: don’t add ending if noun already ends in -e, -en, or -er

Caroline Sporleder, Ines Rehbein Introduction

slide-18
SLIDE 18

Hand-crafting Rules (2)

Hypothesis 2: morpho-phonological form also influences ending Rule 2: don’t add ending if noun already ends in -e, -en, or -er Applying Rules 1 and 2:

Caroline Sporleder, Ines Rehbein Introduction

slide-19
SLIDE 19

Hand-crafting Rules (2)

Hypothesis 2: morpho-phonological form also influences ending Rule 2: don’t add ending if noun already ends in -e, -en, or -er Applying Rules 1 and 2: das Zimmer (n)

Caroline Sporleder, Ines Rehbein Introduction

slide-20
SLIDE 20

Hand-crafting Rules (2)

Hypothesis 2: morpho-phonological form also influences ending Rule 2: don’t add ending if noun already ends in -e, -en, or -er Applying Rules 1 and 2: das Zimmer (n) ⇒ die Zimmer

Caroline Sporleder, Ines Rehbein Introduction

slide-21
SLIDE 21

Hand-crafting Rules (2)

Hypothesis 2: morpho-phonological form also influences ending Rule 2: don’t add ending if noun already ends in -e, -en, or -er Applying Rules 1 and 2: das Zimmer (n) ⇒ die Zimmer die Ampel (f)

Caroline Sporleder, Ines Rehbein Introduction

slide-22
SLIDE 22

Hand-crafting Rules (2)

Hypothesis 2: morpho-phonological form also influences ending Rule 2: don’t add ending if noun already ends in -e, -en, or -er Applying Rules 1 and 2: das Zimmer (n) ⇒ die Zimmer die Ampel (f) ⇒ die Ampelen

Caroline Sporleder, Ines Rehbein Introduction

slide-23
SLIDE 23

Hand-crafting Rules (2)

Hypothesis 2: morpho-phonological form also influences ending Rule 2: don’t add ending if noun already ends in -e, -en, or -er Applying Rules 1 and 2: das Zimmer (n) ⇒ die Zimmer die Ampel (f) ⇒ *die Ampelen (die Ampeln)

Caroline Sporleder, Ines Rehbein Introduction

slide-24
SLIDE 24

Hand-crafting Rules (3)

Rule 3: -e and -en become ø and -n if the last syllable of the singular contains a schwa.

Caroline Sporleder, Ines Rehbein Introduction

slide-25
SLIDE 25

Hand-crafting Rules (3)

Rule 3: -e and -en become ø and -n if the last syllable of the singular contains a schwa. Applying Rules 1, 2 and 3:

Caroline Sporleder, Ines Rehbein Introduction

slide-26
SLIDE 26

Hand-crafting Rules (3)

Rule 3: -e and -en become ø and -n if the last syllable of the singular contains a schwa. Applying Rules 1, 2 and 3: die Ampel (f)

Caroline Sporleder, Ines Rehbein Introduction

slide-27
SLIDE 27

Hand-crafting Rules (3)

Rule 3: -e and -en become ø and -n if the last syllable of the singular contains a schwa. Applying Rules 1, 2 and 3: die Ampel (f) ⇒ die Ampeln

Caroline Sporleder, Ines Rehbein Introduction

slide-28
SLIDE 28

Hand-crafting Rules (3)

Rule 3: -e and -en become ø and -n if the last syllable of the singular contains a schwa. Applying Rules 1, 2 and 3: die Ampel (f) ⇒ die Ampeln der Nachbar (m)

Caroline Sporleder, Ines Rehbein Introduction

slide-29
SLIDE 29

Hand-crafting Rules (3)

Rule 3: -e and -en become ø and -n if the last syllable of the singular contains a schwa. Applying Rules 1, 2 and 3: die Ampel (f) ⇒ die Ampeln der Nachbar (m) ⇒ die Nachbare

Caroline Sporleder, Ines Rehbein Introduction

slide-30
SLIDE 30

Hand-crafting Rules (3)

Rule 3: -e and -en become ø and -n if the last syllable of the singular contains a schwa. Applying Rules 1, 2 and 3: die Ampel (f) ⇒ die Ampeln der Nachbar (m) ⇒ *die Nachbare (die Nachbarn)

Caroline Sporleder, Ines Rehbein Introduction

slide-31
SLIDE 31

Machine Learning German Plural Formation

Learn from input-output pairs: Zimmer, Zimmer Faden, F¨ aden Hund, Hunde Stuhl, St¨ uhle Kind, Kinder Lamm, L¨ ammer Straße, Straßen Bank, Banken Trio, Trios Ampel, Ampeln Nachbar, Nachbarn Maus, M¨ ause ⇒ Input is typically represented as a feature vector.

Caroline Sporleder, Ines Rehbein Introduction

slide-32
SLIDE 32

Machine Learning German Plural Formation (2)

What information needs to be represented for the task to be learnable (i.e., which features need to be modelled)?

Caroline Sporleder, Ines Rehbein Introduction

slide-33
SLIDE 33

Machine Learning German Plural Formation (2)

What information needs to be represented for the task to be learnable (i.e., which features need to be modelled)?

1 Zimmer: <n>, Zimmer 2 Faden: <m>, F¨

aden

3 Hund: <m>, Hunde 4 Stuhl: < m>, St¨

uhle

5 Kind: <n>, Kinder 6 Lamm: <n>, L¨

ammer

7 Straße: <f>, Straßen 8 Bank: <f>, Banken 9 Trio: <n>, Trios 10 Ampel: <f>, Ampeln 11 Nachbar: <m>, Nachbarn 12 Maus: <f>, M¨

ause

Caroline Sporleder, Ines Rehbein Introduction

slide-34
SLIDE 34

Machine Learning German Plural Formation (2)

What information needs to be represented for the task to be learnable (i.e., which features need to be modelled)?

1 Zimmer: <n, er>, Zimmer 2 Faden: <m, en>, F¨

aden

3 Hund: <m, ø>, Hunde 4 Stuhl: <m, ø>, St¨

uhle

5 Kind: <n, ø>, Kinder 6 Lamm: <n, ø>, L¨

ammer

7 Straße: <f, e>, Straßen 8 Bank: <f, ø>, Banken 9 Trio: <n, ø>, Trios 10 Ampel: <f, ø>, Ampeln 11 Nachbar: <m, ø>, Nachbarn 12 Maus: <f, ø>, M¨

ause

Caroline Sporleder, Ines Rehbein Introduction

slide-35
SLIDE 35

Machine Learning German Plural Formation (2)

What information needs to be represented for the task to be learnable (i.e., which features need to be modelled)?

1 Zimmer: <n, er, a-schwa>, Zimmer 2 Faden: <m, en, e-schwa>, F¨

aden

3 Hund: <m, ø, no schwa>, Hunde 4 Stuhl: <m, ø, no schwa>, St¨

uhle

5 Kind: <n, ø, no schwa>, Kinder 6 Lamm: <n, ø, no schwa>, L¨

ammer

7 Straße: <f, e, e-schwa>, Straßen 8 Bank: <f, ø, no schwa>, Banken 9 Trio: <n, ø, no schwa>, Trios 10 Ampel: <f, ø, e-schwa>, Ampeln 11 Nachbar: < m, ø, no schwa>, Nachbarn 12 Maus: <f, ø, no schwa>, M¨

ause

Caroline Sporleder, Ines Rehbein Introduction

slide-36
SLIDE 36

Basic Terminology

instance: one input-output pair, where the input is a feature vector Hund, m, ø, no schwa, Hunde label or class to be predicted: the output value Hunde label can be nominal, numeric, binary etc. features: types of information encoded in the input singular form, gender, ending of singular, schwa information feature values: the values the features assume (for a given instance) Hund, m, ø, no schwa values can be nominal, numeric, binary etc.

Caroline Sporleder, Ines Rehbein Introduction

slide-37
SLIDE 37

Basic Terminology (2)

training set: set of manually labelled instances from which target function (mapping from input to output) is learnt test set: set of instances to which the trained machine learner is applied, test set labels are used to compute performance of classifier (labels are not known to the classifier) development set: set of instances used to choose the best parameter settings (typically those that optimise performance on the development set)

Caroline Sporleder, Ines Rehbein Introduction

slide-38
SLIDE 38

Basic Types of Machine Learning

Supervised Machine Learning: system learns target function (mapping from input to output) from a labelled training set (decision tree learners, Naive Bayes, k-NN etc.) Unsupervised Machine Learning: no labelled training set, system searches for best model to account for unlabelled test set (clustering etc.) Semi-Supervised Machine Learning: combination of supervised and unsupervised; training set consists

  • f a small labelled seed set and a large unlabelled set

Caroline Sporleder, Ines Rehbein Introduction

slide-39
SLIDE 39

Machine Learning Spaces

Instances are points (vectors) in n-dimensional space, where n is the number of features Example: predict customer satisfaction (happy vs. not happy) of the clients of a call centre from (i) their age and (ii) the number of minutes they were kept in the waiting loop.

Caroline Sporleder, Ines Rehbein Introduction

slide-40
SLIDE 40

Machine Learning Spaces

Instances are points (vectors) in n-dimensional space, where n is the number of features Example: predict customer satisfaction (happy vs. not happy) of the clients of a call centre from (i) their age and (ii) the number of minutes they were kept in the waiting loop.

60 30 90 20 40 60 Age Waiting time

Caroline Sporleder, Ines Rehbein Introduction

slide-41
SLIDE 41

Supervised Machine Learning (1)

Classifier uses training set to try to find the correct decision boundary between the classes.

60 30 90 20 40 60 Age Waiting time

Training set

Caroline Sporleder, Ines Rehbein Introduction

slide-42
SLIDE 42

Supervised Machine Learning (1)

Classifier uses training set to try to find the correct decision boundary between the classes.

60 30 90 20 40 60 Age Waiting time

Training set

Caroline Sporleder, Ines Rehbein Introduction

slide-43
SLIDE 43

Separability

Whether and how two sets can be separated depends on the machine learning algorithm (linear or non-linear) the dimensionality of the space (the number of features) and the suitability of the features Data sets which can be separated by a line (2-d), plane (3-d), or hyperplane (higher dimensional space) are called linearly separable.

60 30 90 20 40 60 Age Waiting time

Caroline Sporleder, Ines Rehbein Introduction

slide-44
SLIDE 44

Generalisation (1)

All machine learners generalise (opposite rote learning) If they didn’t generalise, they wouldn’t be able to

Caroline Sporleder, Ines Rehbein Introduction

slide-45
SLIDE 45

Generalisation (1)

All machine learners generalise (opposite rote learning) If they didn’t generalise, they wouldn’t be able to label unseen instances ignore outliers (e.g., mislabelled instances in the training set) ⇒ overfitting

Caroline Sporleder, Ines Rehbein Introduction

slide-46
SLIDE 46

Generalisation (1)

All machine learners generalise (opposite rote learning) If they didn’t generalise, they wouldn’t be able to label unseen instances ignore outliers (e.g., mislabelled instances in the training set) ⇒ overfitting However: different machine learners generalise in different ways (inductive bias)

Caroline Sporleder, Ines Rehbein Introduction

slide-47
SLIDE 47

Generalisation (2)

60 30 90 20 40 60 Age Waiting time

Caroline Sporleder, Ines Rehbein Introduction

slide-48
SLIDE 48

Generalisation (2)

60 30 90 20 40 60 Age Waiting time

Caroline Sporleder, Ines Rehbein Introduction

slide-49
SLIDE 49

Example: Decision Tree Learning (1)

Training Data for “Play Tennis” Task day weather temperature wind play tennis? Tues sun warm no no Sun rain cold no no Mon sun medium no yes Wed rain warm yes no Sat sun warm yes yes Wed sun warm yes yes Mon sun warm yes yes Sun sun warm yes yes

Caroline Sporleder, Ines Rehbein Introduction

slide-50
SLIDE 50

Example: Alternative Decision Trees (2)

Weather Wind Temperature sun rain no yes medium warm cold yes yes no no no

Caroline Sporleder, Ines Rehbein Introduction

slide-51
SLIDE 51

Example: Alternative Decision Trees (2)

Wind no yes Temperature Temperature warm cold medium cold medium warm no no yes no yes Day Mon Tue Wed Sat Sun Weather no yes yes yes yes rain sun yes

Caroline Sporleder, Ines Rehbein Introduction

slide-52
SLIDE 52

Example: Alternative Decision Trees (2)

Weather sun rain yes no

Caroline Sporleder, Ines Rehbein Introduction

slide-53
SLIDE 53

Example: Alternative Decision Trees (2)

Weather Wind Temperature sun rain no yes medium warm cold yes yes no no no

Inductive bias: the simplest tree that fits the data (Occam’s razor)

Caroline Sporleder, Ines Rehbein Introduction

slide-54
SLIDE 54

Applying the Classifier (1)

Assumption: the decision boundary learnt from the training set is also a good decision boundary for any test set because the test set is drawn from the same distribution as the training set (i.e., the two sets are not fundamentally different)

60 30 90 20 40 60 Age Waiting time

Test set

Caroline Sporleder, Ines Rehbein Introduction

slide-55
SLIDE 55

Applying the Classifier (1)

Assumption: the decision boundary learnt from the training set is also a good decision boundary for any test set because the test set is drawn from the same distribution as the training set (i.e., the two sets are not fundamentally different)

60 30 90 20 40 60 Age Waiting time

Test set

Caroline Sporleder, Ines Rehbein Introduction

slide-56
SLIDE 56

Applying the Classifier (2)

What happens if the test data are differently distributed?

60 30 90 20 40 60 Age Waiting time

Test set

Caroline Sporleder, Ines Rehbein Introduction

slide-57
SLIDE 57

Applying the Classifier (2)

What happens if the test data are differently distributed?

60 30 90 20 40 60 Age Waiting time

Test set

Caroline Sporleder, Ines Rehbein Introduction

slide-58
SLIDE 58

Semi-Supervised Machine Learning (1)

Co-Training two classifiers, representing different views of the data train both on a small labelled seed set apply both to large unlabelled set classifier A trains classifier B and vice versa Example: Named Entity Recognition Peter saw a documentary on United Cakes Ltd. It was said they were planning to buy shares in Henderson.

Caroline Sporleder, Ines Rehbein Introduction

slide-59
SLIDE 59

Semi-Supervised Machine Learning (1)

Co-Training two classifiers, representing different views of the data train both on a small labelled seed set apply both to large unlabelled set classifier A trains classifier B and vice versa Example: Named Entity Recognition Peter saw a documentary on United Cakes Ltd. It was said they were planning to buy shares in Henderson.

Caroline Sporleder, Ines Rehbein Introduction

slide-60
SLIDE 60

Semi-Supervised Machine Learning (1)

Co-Training two classifiers, representing different views of the data train both on a small labelled seed set apply both to large unlabelled set classifier A trains classifier B and vice versa Example: Named Entity Recognition Peter saw a documentary on United Cakes Ltd. It was said they were planning to buy shares in Henderson.

Caroline Sporleder, Ines Rehbein Introduction

slide-61
SLIDE 61

Co-Training: Example

Classifier A Classifier B Labelled Set Unlabelled Set

Caroline Sporleder, Ines Rehbein Introduction

slide-62
SLIDE 62

Co-Training: Example

Classifier A Classifier B Labelled Set Unlabelled Set train

Caroline Sporleder, Ines Rehbein Introduction

slide-63
SLIDE 63

Co-Training: Example

Classifier A Classifier B Labelled Set Unlabelled Set apply

Caroline Sporleder, Ines Rehbein Introduction

slide-64
SLIDE 64

Co-Training: Example

Classifier A Classifier B Labelled Set Unlabelled Set select new training examples

Caroline Sporleder, Ines Rehbein Introduction

slide-65
SLIDE 65

Co-Training: Example

Classifier A Classifier B Labelled Set Unlabelled Set train

Caroline Sporleder, Ines Rehbein Introduction

slide-66
SLIDE 66

Semi-Supervised Machine Learning (2)

Self-Training similar to co-training but only one classifier not clear whether it works

Caroline Sporleder, Ines Rehbein Introduction