Computational linguistics and NLP: How far from generic linguistics? - - PowerPoint PPT Presentation

computational linguistics and nlp how far from generic
SMART_READER_LITE
LIVE PREVIEW

Computational linguistics and NLP: How far from generic linguistics? - - PowerPoint PPT Presentation

Computational linguistics and NLP: How far from generic linguistics? Andrey Kutuzov University of Oslo Language Technology Group with thanks to Joachim Nivre and Abigail See January 17, 2018 Contents What is NLP? 1 Case 1: Redefining parts


slide-1
SLIDE 1

Computational linguistics and NLP: How far from generic linguistics?

Andrey Kutuzov University of Oslo Language Technology Group with thanks to Joachim Nivre and Abigail See

January 17, 2018

slide-2
SLIDE 2

Contents

1

What is NLP?

2

Case 1: Redefining parts of speech

3

Case 2: Tracing diachronic semantic shifts

1

slide-3
SLIDE 3

Defining the field

◮ Computational Linguistics (CL);

2

slide-4
SLIDE 4

Defining the field

◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP);

2

slide-5
SLIDE 5

Defining the field

◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU);

2

slide-6
SLIDE 6

Defining the field

◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field:

2

slide-7
SLIDE 7

Defining the field

◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective.

2

slide-8
SLIDE 8

Defining the field

◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective. ◮ Dates back probably to medieval mystics looking for regularities in

sacred texts;

2

slide-9
SLIDE 9

Defining the field

◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective. ◮ Dates back probably to medieval mystics looking for regularities in

sacred texts;

◮ In the modern sense of the word, starts in the XX century:

2

slide-10
SLIDE 10

Defining the field

◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective. ◮ Dates back probably to medieval mystics looking for regularities in

sacred texts;

◮ In the modern sense of the word, starts in the XX century:

◮ George Zipf (studied statistics of natural language); 2

slide-11
SLIDE 11

Defining the field

◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective. ◮ Dates back probably to medieval mystics looking for regularities in

sacred texts;

◮ In the modern sense of the word, starts in the XX century:

◮ George Zipf (studied statistics of natural language); ◮ Noam Chomsky (introduced transformational grammar); 2

slide-12
SLIDE 12

Defining the field

◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective. ◮ Dates back probably to medieval mystics looking for regularities in

sacred texts;

◮ In the modern sense of the word, starts in the XX century:

◮ George Zipf (studied statistics of natural language); ◮ Noam Chomsky (introduced transformational grammar); ◮ machine translation hype in the 1950s. 2

slide-13
SLIDE 13

NLP/CL is booming

The number of submissions to the annual Association for Computational Linguistics conference (ACL)

3

slide-14
SLIDE 14

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’;

4

slide-15
SLIDE 15

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);

4

slide-16
SLIDE 16

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);

◮ Information very often occurs in the form of (digital) texts. 4

slide-17
SLIDE 17

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);

◮ Information very often occurs in the form of (digital) texts. 4

slide-18
SLIDE 18

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);

◮ Information very often occurs in the form of (digital) texts.

◮ Important: both academic and industrial field!

◮ people drifting from universities to companies and back. 4

slide-19
SLIDE 19

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);

◮ Information very often occurs in the form of (digital) texts.

◮ Important: both academic and industrial field!

◮ people drifting from universities to companies and back.

◮ Computational linguists contribute to many working systems:

4

slide-20
SLIDE 20

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);

◮ Information very often occurs in the form of (digital) texts.

◮ Important: both academic and industrial field!

◮ people drifting from universities to companies and back.

◮ Computational linguists contribute to many working systems:

◮ machine translation 4

slide-21
SLIDE 21

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);

◮ Information very often occurs in the form of (digital) texts.

◮ Important: both academic and industrial field!

◮ people drifting from universities to companies and back.

◮ Computational linguists contribute to many working systems:

◮ machine translation ◮ speech recognition 4

slide-22
SLIDE 22

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);

◮ Information very often occurs in the form of (digital) texts.

◮ Important: both academic and industrial field!

◮ people drifting from universities to companies and back.

◮ Computational linguists contribute to many working systems:

◮ machine translation ◮ speech recognition ◮ web search engines 4

slide-23
SLIDE 23

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);

◮ Information very often occurs in the form of (digital) texts.

◮ Important: both academic and industrial field!

◮ people drifting from universities to companies and back.

◮ Computational linguists contribute to many working systems:

◮ machine translation ◮ speech recognition ◮ web search engines ◮ grammar and spell checking 4

slide-24
SLIDE 24

Recent boost

◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);

◮ Information very often occurs in the form of (digital) texts.

◮ Important: both academic and industrial field!

◮ people drifting from universities to companies and back.

◮ Computational linguists contribute to many working systems:

◮ machine translation ◮ speech recognition ◮ web search engines ◮ grammar and spell checking ◮ virtual personal assistants (Siri, Alexa, Cortana) ◮ etc. 4

slide-25
SLIDE 25

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

5

slide-26
SLIDE 26

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages.

5

slide-27
SLIDE 27

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate.

5

slide-28
SLIDE 28

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:

5

slide-29
SLIDE 29

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:

  • 1. ‘rule-based’ (‘hand-crafted’);

5

slide-30
SLIDE 30

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:

  • 1. ‘rule-based’ (‘hand-crafted’);
  • 2. ‘data-driven’ (statistical).

5

slide-31
SLIDE 31

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:

  • 1. ‘rule-based’ (‘hand-crafted’);
  • 2. ‘data-driven’ (statistical).

◮ Statistics is at the core of today’s NLP

.

5

slide-32
SLIDE 32

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:

  • 1. ‘rule-based’ (‘hand-crafted’);
  • 2. ‘data-driven’ (statistical).

◮ Statistics is at the core of today’s NLP

.

◮ We run experiments to test hypotheses:

5

slide-33
SLIDE 33

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:

  • 1. ‘rule-based’ (‘hand-crafted’);
  • 2. ‘data-driven’ (statistical).

◮ Statistics is at the core of today’s NLP

.

◮ We run experiments to test hypotheses:

◮ ‘there are 10 parts of speech in this language’, 5

slide-34
SLIDE 34

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:

  • 1. ‘rule-based’ (‘hand-crafted’);
  • 2. ‘data-driven’ (statistical).

◮ Statistics is at the core of today’s NLP

.

◮ We run experiments to test hypotheses:

◮ ‘there are 10 parts of speech in this language’, ◮ ‘word co-occurrence information improves document classification’. 5

slide-35
SLIDE 35

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:

  • 1. ‘rule-based’ (‘hand-crafted’);
  • 2. ‘data-driven’ (statistical).

◮ Statistics is at the core of today’s NLP

.

◮ We run experiments to test hypotheses:

◮ ‘there are 10 parts of speech in this language’, ◮ ‘word co-occurrence information improves document classification’.

◮ Replicability (the same experiment must always yield the same

result);

5

slide-36
SLIDE 36

Is it linguistics at all?

Differences from ‘traditional’ or ‘generic’ linguistics

◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:

  • 1. ‘rule-based’ (‘hand-crafted’);
  • 2. ‘data-driven’ (statistical).

◮ Statistics is at the core of today’s NLP

.

◮ We run experiments to test hypotheses:

◮ ‘there are 10 parts of speech in this language’, ◮ ‘word co-occurrence information improves document classification’.

◮ Replicability (the same experiment must always yield the same

result);

◮ Reproducibility (similar experiments should yield comparable

results).

5

slide-37
SLIDE 37

Stress on practice

6

slide-38
SLIDE 38

Stress on practice

◮ Research should be practical.

6

slide-39
SLIDE 39

Stress on practice

◮ Research should be practical. ◮ ‘Show me your code!’

6

slide-40
SLIDE 40

Stress on practice

◮ Research should be practical. ◮ ‘Show me your code!’ ◮ ‘Show me the scores of your system!’

6

slide-41
SLIDE 41

Stress on practice

◮ Research should be practical. ◮ ‘Show me your code!’ ◮ ‘Show me the scores of your system!’ ◮ Empirical evaluation on particular

problems.

6

slide-42
SLIDE 42

Stress on practice

◮ Research should be practical. ◮ ‘Show me your code!’ ◮ ‘Show me the scores of your system!’ ◮ Empirical evaluation on particular

problems.

◮ Test data sets.

6

slide-43
SLIDE 43

Stress on practice

◮ Research should be practical. ◮ ‘Show me your code!’ ◮ ‘Show me the scores of your system!’ ◮ Empirical evaluation on particular

problems.

◮ Test data sets. ◮ Shared tasks (competitions).

6

slide-44
SLIDE 44

Publishing activities

◮ Conferences:

◮ ACL 7

slide-45
SLIDE 45

Publishing activities

◮ Conferences:

◮ ACL ◮ EMNLP 7

slide-46
SLIDE 46

Publishing activities

◮ Conferences:

◮ ACL ◮ EMNLP ◮ EACL 7

slide-47
SLIDE 47

Publishing activities

◮ Conferences:

◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL 7

slide-48
SLIDE 48

Publishing activities

◮ Conferences:

◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING 7

slide-49
SLIDE 49

Publishing activities

◮ Conferences:

◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING ◮ LREC...

◮ Journals:

7

slide-50
SLIDE 50

Publishing activities

◮ Conferences:

◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING ◮ LREC...

◮ Journals:

◮ ‘Computational Linguistics’ (CL); 7

slide-51
SLIDE 51

Publishing activities

◮ Conferences:

◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING ◮ LREC...

◮ Journals:

◮ ‘Computational Linguistics’ (CL); ◮ ‘Transactions of the Association for Computational Linguistics’

(TACL).

7

slide-52
SLIDE 52

Publishing activities

◮ Conferences:

◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING ◮ LREC...

◮ Journals:

◮ ‘Computational Linguistics’ (CL); ◮ ‘Transactions of the Association for Computational Linguistics’

(TACL).

◮ Unlike in other fields, journals are not that important.

7

slide-53
SLIDE 53

Publishing activities

◮ Conferences:

◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING ◮ LREC...

◮ Journals:

◮ ‘Computational Linguistics’ (CL); ◮ ‘Transactions of the Association for Computational Linguistics’

(TACL).

◮ Unlike in other fields, journals are not that important.

7

slide-54
SLIDE 54

Publishing activities

◮ Most of the papers can be found in the Association for

Computational Linguistics (ACL) Anthology:

◮ ❤tt♣s✿✴✴❛❝❧❛♥t❤♦❧♦❣②✳✐♥❢♦✴

❤tt♣s✿✴✴❛r①✐✈✳♦r❣✴❧✐st✴❝s✳❈▲✴r❡❝❡♥t

8

slide-55
SLIDE 55

Publishing activities

◮ Most of the papers can be found in the Association for

Computational Linguistics (ACL) Anthology:

◮ ❤tt♣s✿✴✴❛❝❧❛♥t❤♦❧♦❣②✳✐♥❢♦✴

◮ Double blind peer review almost everywhere...

❤tt♣s✿✴✴❛r①✐✈✳♦r❣✴❧✐st✴❝s✳❈▲✴r❡❝❡♥t

8

slide-56
SLIDE 56

Publishing activities

◮ Most of the papers can be found in the Association for

Computational Linguistics (ACL) Anthology:

◮ ❤tt♣s✿✴✴❛❝❧❛♥t❤♦❧♦❣②✳✐♥❢♦✴

◮ Double blind peer review almost everywhere... ◮ ...recent years: open preprints published online:

❤tt♣s✿✴✴❛r①✐✈✳♦r❣✴❧✐st✴❝s✳❈▲✴r❡❝❡♥t

8

slide-57
SLIDE 57

Publishing activities

◮ Most of the papers can be found in the Association for

Computational Linguistics (ACL) Anthology:

◮ ❤tt♣s✿✴✴❛❝❧❛♥t❤♦❧♦❣②✳✐♥❢♦✴

◮ Double blind peer review almost everywhere... ◮ ...recent years: open preprints published online:

◮ ❤tt♣s✿✴✴❛r①✐✈✳♦r❣✴❧✐st✴❝s✳❈▲✴r❡❝❡♥t 8

slide-58
SLIDE 58

Publishing activities

◮ Most of the papers can be found in the Association for

Computational Linguistics (ACL) Anthology:

◮ ❤tt♣s✿✴✴❛❝❧❛♥t❤♦❧♦❣②✳✐♥❢♦✴

◮ Double blind peer review almost everywhere... ◮ ...recent years: open preprints published online:

◮ ❤tt♣s✿✴✴❛r①✐✈✳♦r❣✴❧✐st✴❝s✳❈▲✴r❡❝❡♥t 8

slide-59
SLIDE 59

Machine learning

◮ NLP is now being rapidly transformed by another field(s):

9

slide-60
SLIDE 60

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

9

slide-61
SLIDE 61

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

◮ Some problems are so complex that we can‘t formulate exact

algorithms for them.

9

slide-62
SLIDE 62

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

◮ Some problems are so complex that we can‘t formulate exact

algorithms for them.

◮ To solve such problems, one can use machine learning: 9

slide-63
SLIDE 63

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

◮ Some problems are so complex that we can‘t formulate exact

algorithms for them.

◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training

material and improve with experience;

9

slide-64
SLIDE 64

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

◮ Some problems are so complex that we can‘t formulate exact

algorithms for them.

◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training

material and improve with experience;

◮ thus, we train our systems on linguistic data (usually large text

collections: corpora).

9

slide-65
SLIDE 65

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

◮ Some problems are so complex that we can‘t formulate exact

algorithms for them.

◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training

material and improve with experience;

◮ thus, we train our systems on linguistic data (usually large text

collections: corpora).

◮ Artificial neural networks are one of popular machine learning

approaches for language modeling.

9

slide-66
SLIDE 66

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

◮ Some problems are so complex that we can‘t formulate exact

algorithms for them.

◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training

material and improve with experience;

◮ thus, we train our systems on linguistic data (usually large text

collections: corpora).

◮ Artificial neural networks are one of popular machine learning

approaches for language modeling.

Deep learning renaissance

◮ ‘Deep learning’ is training and using multi-layered artificial neural networks. 9

slide-67
SLIDE 67

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

◮ Some problems are so complex that we can‘t formulate exact

algorithms for them.

◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training

material and improve with experience;

◮ thus, we train our systems on linguistic data (usually large text

collections: corpora).

◮ Artificial neural networks are one of popular machine learning

approaches for language modeling.

Deep learning renaissance

◮ ‘Deep learning’ is training and using multi-layered artificial neural networks. ◮ After long ‘winter’ (since 60s and 70s), it is now again popular. 9

slide-68
SLIDE 68

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

◮ Some problems are so complex that we can‘t formulate exact

algorithms for them.

◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training

material and improve with experience;

◮ thus, we train our systems on linguistic data (usually large text

collections: corpora).

◮ Artificial neural networks are one of popular machine learning

approaches for language modeling.

Deep learning renaissance

◮ ‘Deep learning’ is training and using multi-layered artificial neural networks. ◮ After long ‘winter’ (since 60s and 70s), it is now again popular. ◮ Deep neural approaches are very efficient in NLP

.

9

slide-69
SLIDE 69

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

◮ Some problems are so complex that we can‘t formulate exact

algorithms for them.

◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training

material and improve with experience;

◮ thus, we train our systems on linguistic data (usually large text

collections: corpora).

◮ Artificial neural networks are one of popular machine learning

approaches for language modeling.

Deep learning renaissance

◮ ‘Deep learning’ is training and using multi-layered artificial neural networks. ◮ After long ‘winter’ (since 60s and 70s), it is now again popular. ◮ Deep neural approaches are very efficient in NLP

.

◮ ‘Do we need anything except neural networks now?’ 9

slide-70
SLIDE 70

Machine learning

◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.

◮ Some problems are so complex that we can‘t formulate exact

algorithms for them.

◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training

material and improve with experience;

◮ thus, we train our systems on linguistic data (usually large text

collections: corpora).

◮ Artificial neural networks are one of popular machine learning

approaches for language modeling.

Deep learning renaissance

◮ ‘Deep learning’ is training and using multi-layered artificial neural networks. ◮ After long ‘winter’ (since 60s and 70s), it is now again popular. ◮ Deep neural approaches are very efficient in NLP

.

◮ ‘Do we need anything except neural networks now?’ ◮ Another reason for the recent boost of interest towards our discipline. 9

slide-71
SLIDE 71

Problems and challenges

NLP has its problems

10

slide-72
SLIDE 72

Problems and challenges

NLP has its problems

◮ equity and diversity;

10

slide-73
SLIDE 73

Problems and challenges

NLP has its problems

◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:

10

slide-74
SLIDE 74

Problems and challenges

NLP has its problems

◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:

◮ how to preserve anonymity? 10

slide-75
SLIDE 75

Problems and challenges

NLP has its problems

◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:

◮ how to preserve anonymity? ◮ preprint publishing is good in disseminating science and making it

  • pen, but...

10

slide-76
SLIDE 76

Problems and challenges

NLP has its problems

◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:

◮ how to preserve anonymity? ◮ preprint publishing is good in disseminating science and making it

  • pen, but...

◮ ...people can use ArXiv for flag-planting, and to simply circumvent the

peer-review process.

10

slide-77
SLIDE 77

Problems and challenges

NLP has its problems

◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:

◮ how to preserve anonymity? ◮ preprint publishing is good in disseminating science and making it

  • pen, but...

◮ ...people can use ArXiv for flag-planting, and to simply circumvent the

peer-review process.

◮ machine learning models amplifying biases and discrimination in

data [Zhao et al., 2017]

10

slide-78
SLIDE 78

Problems and challenges

NLP has its problems

◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:

◮ how to preserve anonymity? ◮ preprint publishing is good in disseminating science and making it

  • pen, but...

◮ ...people can use ArXiv for flag-planting, and to simply circumvent the

peer-review process.

◮ machine learning models amplifying biases and discrimination in

data [Zhao et al., 2017]

◮ sometimes research success depends on computational power:

◮ ‘...do we have enough GPUs?’ 10

slide-79
SLIDE 79

Science?

◮ People wonder:

◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’ 11

slide-80
SLIDE 80

Science?

◮ People wonder:

◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’

◮ Is it a science or an engineering discipline?

11

slide-81
SLIDE 81

Science?

◮ People wonder:

◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’

◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards

empirical problems?

11

slide-82
SLIDE 82

Science?

◮ People wonder:

◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’

◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards

empirical problems?

◮ Motivation for research can be different:

11

slide-83
SLIDE 83

Science?

◮ People wonder:

◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’

◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards

empirical problems?

◮ Motivation for research can be different:

  • 1. trying to provide a computational explanation for linguistic or

psycholinguistic phenomenon;

11

slide-84
SLIDE 84

Science?

◮ People wonder:

◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’

◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards

empirical problems?

◮ Motivation for research can be different:

  • 1. trying to provide a computational explanation for linguistic or

psycholinguistic phenomenon;

  • 2. trying to provide a working component of a speech or natural

language system.

11

slide-85
SLIDE 85

Science?

◮ People wonder:

◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’

◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards

empirical problems?

◮ Motivation for research can be different:

  • 1. trying to provide a computational explanation for linguistic or

psycholinguistic phenomenon;

  • 2. trying to provide a working component of a speech or natural

language system.

◮ Do our top-tier conferences belong to CL or to NLP then?

11

slide-86
SLIDE 86

Science?

◮ People wonder:

◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’

◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards

empirical problems?

◮ Motivation for research can be different:

  • 1. trying to provide a computational explanation for linguistic or

psycholinguistic phenomenon;

  • 2. trying to provide a working component of a speech or natural

language system.

◮ Do our top-tier conferences belong to CL or to NLP then? ◮ The overwhelming majority of papers are empirical today.

11

slide-87
SLIDE 87

Science?

◮ People wonder:

◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’

◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards

empirical problems?

◮ Motivation for research can be different:

  • 1. trying to provide a computational explanation for linguistic or

psycholinguistic phenomenon;

  • 2. trying to provide a working component of a speech or natural

language system.

◮ Do our top-tier conferences belong to CL or to NLP then? ◮ The overwhelming majority of papers are empirical today. ◮ No final answer yet.

11

slide-88
SLIDE 88

Language IS complicated

12

slide-89
SLIDE 89

Language IS complicated

‘...human language is magnificent, and complex, and challenging. It has tons of nuances, and corners, and oddities, and surprises.

12

slide-90
SLIDE 90

Language IS complicated

‘...human language is magnificent, and complex, and challenging. It has tons of nuances, and corners, and oddities, and surprises. While natural language processing researchers, and natural language generation researchers—and linguists! who do a lot of the heavy lifting—made some impressive advances towards our understanding of language and how to process it, we are still just barely scratching the surface on this.’

12

slide-91
SLIDE 91

Interaction with traditional linguistics

Linguistics is back

13

slide-92
SLIDE 92

Interaction with traditional linguistics

Linguistics is back

◮ NLP is re-embracing linguistic structure now;

13

slide-93
SLIDE 93

Interaction with traditional linguistics

Linguistics is back

◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches

acknowledge it;

13

slide-94
SLIDE 94

Interaction with traditional linguistics

Linguistics is back

◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches

acknowledge it;

◮ Linguistic structures induced into machine learning systems reduce

search space, bringing improvements [Dyer, 2017];

13

slide-95
SLIDE 95

Interaction with traditional linguistics

Linguistics is back

◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches

acknowledge it;

◮ Linguistic structures induced into machine learning systems reduce

search space, bringing improvements [Dyer, 2017];

◮ Language is not just sequences of words / characters / bytes.

13

slide-96
SLIDE 96

Interaction with traditional linguistics

Linguistics is back

◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches

acknowledge it;

◮ Linguistic structures induced into machine learning systems reduce

search space, bringing improvements [Dyer, 2017];

◮ Language is not just sequences of words / characters / bytes. ◮ But what can NLP give to traditional linguistics?

13

slide-97
SLIDE 97

Interaction with traditional linguistics

Linguistics is back

◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches

acknowledge it;

◮ Linguistic structures induced into machine learning systems reduce

search space, bringing improvements [Dyer, 2017];

◮ Language is not just sequences of words / characters / bytes. ◮ But what can NLP give to traditional linguistics? ◮ Or to humanities in general?

13

slide-98
SLIDE 98

Interaction with traditional linguistics

Linguistics is back

◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches

acknowledge it;

◮ Linguistic structures induced into machine learning systems reduce

search space, bringing improvements [Dyer, 2017];

◮ Language is not just sequences of words / characters / bytes. ◮ But what can NLP give to traditional linguistics? ◮ Or to humanities in general? ◮ I will now outline 2 case studies from my own research.

13

slide-99
SLIDE 99

Contents

1

What is NLP?

2

Case 1: Redefining parts of speech

3

Case 2: Tracing diachronic semantic shifts

13

slide-100
SLIDE 100

‘Redefining parts of speech with word embeddings’

(Presented at the CoNLL2016, [Kutuzov et al., 2016])

14

slide-101
SLIDE 101

‘Redefining parts of speech with word embeddings’

(Presented at the CoNLL2016, [Kutuzov et al., 2016])

‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]

14

slide-102
SLIDE 102

‘Redefining parts of speech with word embeddings’

(Presented at the CoNLL2016, [Kutuzov et al., 2016])

‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]

◮ In natural languages, parts of speech boundaries are flexible:

14

slide-103
SLIDE 103

‘Redefining parts of speech with word embeddings’

(Presented at the CoNLL2016, [Kutuzov et al., 2016])

‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]

◮ In natural languages, parts of speech boundaries are flexible:

◮ Participles in English are in many respects both verbs and adjectives 14

slide-104
SLIDE 104

‘Redefining parts of speech with word embeddings’

(Presented at the CoNLL2016, [Kutuzov et al., 2016])

‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]

◮ In natural languages, parts of speech boundaries are flexible:

◮ Participles in English are in many respects both verbs and adjectives ◮ Determiners and possessive pronouns overlap 14

slide-105
SLIDE 105

‘Redefining parts of speech with word embeddings’

(Presented at the CoNLL2016, [Kutuzov et al., 2016])

‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]

◮ In natural languages, parts of speech boundaries are flexible:

◮ Participles in English are in many respects both verbs and adjectives ◮ Determiners and possessive pronouns overlap

◮ Finding groups of words ‘on the verge’ between different PoS can

reveal inconsistencies in corpus annotation.

14

slide-106
SLIDE 106

‘Redefining parts of speech with word embeddings’

(Presented at the CoNLL2016, [Kutuzov et al., 2016])

‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]

◮ In natural languages, parts of speech boundaries are flexible:

◮ Participles in English are in many respects both verbs and adjectives ◮ Determiners and possessive pronouns overlap

◮ Finding groups of words ‘on the verge’ between different PoS can

reveal inconsistencies in corpus annotation.

◮ For that, we employed word embeddings (as in word2vec).

14

slide-107
SLIDE 107

Brief recap on distributional semantic models

◮ based on distributions of word co-occurrences in large training

corpora;

❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s

15

slide-108
SLIDE 108

Brief recap on distributional semantic models

◮ based on distributions of word co-occurrences in large training

corpora;

◮ represent word meaning as dense lexical vectors (word

embeddings);

❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s

15

slide-109
SLIDE 109

Brief recap on distributional semantic models

◮ based on distributions of word co-occurrences in large training

corpora;

◮ represent word meaning as dense lexical vectors (word

embeddings);

◮ words occurring in similar contexts have similar vectors;

❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s

15

slide-110
SLIDE 110

Brief recap on distributional semantic models

◮ based on distributions of word co-occurrences in large training

corpora;

◮ represent word meaning as dense lexical vectors (word

embeddings);

◮ words occurring in similar contexts have similar vectors; ◮ vector representations are continuous:

❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s

15

slide-111
SLIDE 111

Brief recap on distributional semantic models

◮ based on distributions of word co-occurrences in large training

corpora;

◮ represent word meaning as dense lexical vectors (word

embeddings);

◮ words occurring in similar contexts have similar vectors; ◮ vector representations are continuous:

◮ words are in common vector space and can be more or less close to

each other. ❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s

15

slide-112
SLIDE 112

Brief recap on distributional semantic models

◮ based on distributions of word co-occurrences in large training

corpora;

◮ represent word meaning as dense lexical vectors (word

embeddings);

◮ words occurring in similar contexts have similar vectors; ◮ vector representations are continuous:

◮ words are in common vector space and can be more or less close to

each other.

◮ one can find nearest semantic neighbors of a given word by

calculating cosine similarity between vectors.

❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s

15

slide-113
SLIDE 113

Brief recap on distributional semantic models

◮ based on distributions of word co-occurrences in large training

corpora;

◮ represent word meaning as dense lexical vectors (word

embeddings);

◮ words occurring in similar contexts have similar vectors; ◮ vector representations are continuous:

◮ words are in common vector space and can be more or less close to

each other.

◮ one can find nearest semantic neighbors of a given word by

calculating cosine similarity between vectors.

◮ more on word embeddings elsewhere:

◮ ❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴

✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s

15

slide-114
SLIDE 114

Try yourself!

Word embedding models for English and Norwegian online You can try our WebVectors demo service: ❤tt♣✿✴✴✈❡❝t♦rs✳♥❧♣❧✳❡✉✴❡①♣❧♦r❡✴❡♠❜❡❞❞✐♥❣s✴ (mobile-friendly)

16

slide-115
SLIDE 115

Case 1: Redefining parts of speech

General idea

◮ PoS can be inferred from word embeddings;

17

slide-116
SLIDE 116

Case 1: Redefining parts of speech

General idea

◮ PoS can be inferred from word embeddings; ◮ Classification problem: word vector as input, PoS as output;

17

slide-117
SLIDE 117

Case 1: Redefining parts of speech

General idea

◮ PoS can be inferred from word embeddings; ◮ Classification problem: word vector as input, PoS as output; ◮ Words with incorrect predictions are ‘outliers’: their distributional

patterns differ from other words in the same class.

17

slide-118
SLIDE 118

Case 1: Redefining parts of speech

General idea

◮ PoS can be inferred from word embeddings; ◮ Classification problem: word vector as input, PoS as output; ◮ Words with incorrect predictions are ‘outliers’: their distributional

patterns differ from other words in the same class. Data British National Corpus (BNC):

◮ About 89M words; ◮ We replaced words with their lemmas and Universal PoS tags:

◮ ‘love_VERB’ ◮ ‘love_NOUN’

Tags used: ADJ, ADP , ADV, AUX, CONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, SCONJ, SYM, VERB, X

17

slide-119
SLIDE 119

Workflow

◮ Continuous Skipgram model [Mikolov et al., 2013] to learn word

embeddings from the BNC;

18

slide-120
SLIDE 120

Workflow

◮ Continuous Skipgram model [Mikolov et al., 2013] to learn word

embeddings from the BNC;

◮ Logistic regression multinomial classifier trained to predict PoS

based on embeddings:

18

slide-121
SLIDE 121

Workflow

◮ Continuous Skipgram model [Mikolov et al., 2013] to learn word

embeddings from the BNC;

◮ Logistic regression multinomial classifier trained to predict PoS

based on embeddings:

◮ The training set: the BNC 10 000 most frequent words; 18

slide-122
SLIDE 122

Workflow

◮ Continuous Skipgram model [Mikolov et al., 2013] to learn word

embeddings from the BNC;

◮ Logistic regression multinomial classifier trained to predict PoS

based on embeddings:

◮ The training set: the BNC 10 000 most frequent words; ◮ The test set: the next 17 000 top frequent words; 18

slide-123
SLIDE 123

Workflow

◮ Continuous Skipgram model [Mikolov et al., 2013] to learn word

embeddings from the BNC;

◮ Logistic regression multinomial classifier trained to predict PoS

based on embeddings:

◮ The training set: the BNC 10 000 most frequent words; ◮ The test set: the next 17 000 top frequent words; ◮ The 2nd test set: tokens from Universal Dependencies Treebank 18

slide-124
SLIDE 124

Classification results

19

slide-125
SLIDE 125

Classification results

Classifier F-score (training set) Training set Baseline 1-feature classifier 0.22 Logistic regression 0.98 Test set Logistic regression 0.91 UD Treebank test set Logistic regression (OOV words omitted) 0.99

19

slide-126
SLIDE 126

Not from this crowd: analyzing outliers

Interesting errors with high frequency

◮ VERB → ADJ set of verbs dominantly used in passive: ‘to

intertwine’, ‘to disillusion’;

20

slide-127
SLIDE 127

Not from this crowd: analyzing outliers

Interesting errors with high frequency

◮ VERB → ADJ set of verbs dominantly used in passive: ‘to

intertwine’, ‘to disillusion’;

◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,

‘$1’) tagged as nouns: a controversial decision;

20

slide-128
SLIDE 128

Not from this crowd: analyzing outliers

Interesting errors with high frequency

◮ VERB → ADJ set of verbs dominantly used in passive: ‘to

intertwine’, ‘to disillusion’;

◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,

‘$1’) tagged as nouns: a controversial decision;

◮ NUM → NOUN is mostly years and decades (‘the sixties’) tagged in

the BNC as numerals;

20

slide-129
SLIDE 129

Not from this crowd: analyzing outliers

Interesting errors with high frequency

◮ VERB → ADJ set of verbs dominantly used in passive: ‘to

intertwine’, ‘to disillusion’;

◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,

‘$1’) tagged as nouns: a controversial decision;

◮ NUM → NOUN is mostly years and decades (‘the sixties’) tagged in

the BNC as numerals;

◮ In the BNC, ‘£50’ is a noun, but ‘1776’ a numeral! 20

slide-130
SLIDE 130

Not from this crowd: analyzing outliers

Interesting errors with high frequency

◮ VERB → ADJ set of verbs dominantly used in passive: ‘to

intertwine’, ‘to disillusion’;

◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,

‘$1’) tagged as nouns: a controversial decision;

◮ NUM → NOUN is mostly years and decades (‘the sixties’) tagged in

the BNC as numerals;

◮ In the BNC, ‘£50’ is a noun, but ‘1776’ a numeral! ◮ Possible minor inconsistencies in the annotation strategy; 20

slide-131
SLIDE 131

Not from this crowd: analyzing outliers

Interesting errors with high frequency

◮ VERB → ADJ set of verbs dominantly used in passive: ‘to

intertwine’, ‘to disillusion’;

◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,

‘$1’) tagged as nouns: a controversial decision;

◮ NUM → NOUN is mostly years and decades (‘the sixties’) tagged in

the BNC as numerals;

◮ In the BNC, ‘£50’ is a noun, but ‘1776’ a numeral! ◮ Possible minor inconsistencies in the annotation strategy; ◮ similar problem in the Penn Treebank [Manning, 2011]. 20

slide-132
SLIDE 132

Not from this crowd: analyzing outliers

Interesting errors with high frequency

◮ VERB → ADJ set of verbs dominantly used in passive: ‘to

intertwine’, ‘to disillusion’;

◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,

‘$1’) tagged as nouns: a controversial decision;

◮ NUM → NOUN is mostly years and decades (‘the sixties’) tagged in

the BNC as numerals;

◮ In the BNC, ‘£50’ is a noun, but ‘1776’ a numeral! ◮ Possible minor inconsistencies in the annotation strategy; ◮ similar problem in the Penn Treebank [Manning, 2011].

◮ ADV → ADJ systematic error: adjectives like ‘plain’, ‘clear’ or

‘sharp’ erroneously tagged in the corpus as adverbs.

20

slide-133
SLIDE 133

Not from this crowd: analyzing outliers

Interesting errors with high coverage

◮ SCONJ → ADV: ‘seeing’ and ‘immediately’. Clear tagging errors,

mostly in initial positions: ‘Immediately, she lowered the gun’;

21

slide-134
SLIDE 134

Not from this crowd: analyzing outliers

Interesting errors with high coverage

◮ SCONJ → ADV: ‘seeing’ and ‘immediately’. Clear tagging errors,

mostly in initial positions: ‘Immediately, she lowered the gun’;

◮ ADP → ADJ separate word group: ‘cross’, ‘pre’ and ‘pro’ (‘Did

anyone encounter any trouble from Hibs fans in Edinburgh pre season?’). Closer to adjectives or adverbs than to prepositions?

21

slide-135
SLIDE 135

Not from this crowd: analyzing outliers

Interesting errors with high coverage

◮ SCONJ → ADV: ‘seeing’ and ‘immediately’. Clear tagging errors,

mostly in initial positions: ‘Immediately, she lowered the gun’;

◮ ADP → ADJ separate word group: ‘cross’, ‘pre’ and ‘pro’ (‘Did

anyone encounter any trouble from Hibs fans in Edinburgh pre season?’). Closer to adjectives or adverbs than to prepositions? Intermediate findings

  • 1. ‘Boundary cases’ detected by classifying embeddings reveal

sub-classes of words on the verge between different PoS.

21

slide-136
SLIDE 136

Not from this crowd: analyzing outliers

Interesting errors with high coverage

◮ SCONJ → ADV: ‘seeing’ and ‘immediately’. Clear tagging errors,

mostly in initial positions: ‘Immediately, she lowered the gun’;

◮ ADP → ADJ separate word group: ‘cross’, ‘pre’ and ‘pro’ (‘Did

anyone encounter any trouble from Hibs fans in Edinburgh pre season?’). Closer to adjectives or adverbs than to prepositions? Intermediate findings

  • 1. ‘Boundary cases’ detected by classifying embeddings reveal

sub-classes of words on the verge between different PoS.

  • 2. We can quickly discover systematic errors or inconsistencies in PoS

annotations, whether they be automatic or manual.

21

slide-137
SLIDE 137

Embeddings as PoS predictors

What if we employ KNN classifier instead of logistic regression?

22

slide-138
SLIDE 138

Embeddings as PoS predictors

What if we employ KNN classifier instead of logistic regression?

◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set;

22

slide-139
SLIDE 139

Embeddings as PoS predictors

What if we employ KNN classifier instead of logistic regression?

◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,

uses all the dimensions equally;

22

slide-140
SLIDE 140

Embeddings as PoS predictors

What if we employ KNN classifier instead of logistic regression?

◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,

uses all the dimensions equally;

◮ logistic regression learns to find the relevant features.

22

slide-141
SLIDE 141

Embeddings as PoS predictors

What if we employ KNN classifier instead of logistic regression?

◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,

uses all the dimensions equally;

◮ logistic regression learns to find the relevant features.

How many features are really important for the classifier?

22

slide-142
SLIDE 142

Embeddings as PoS predictors

What if we employ KNN classifier instead of logistic regression?

◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,

uses all the dimensions equally;

◮ logistic regression learns to find the relevant features.

How many features are really important for the classifier?

◮ We ranked all embedding components (features, vector dimensions)

by their correlation to PoS class;

22

slide-143
SLIDE 143

Embeddings as PoS predictors

What if we employ KNN classifier instead of logistic regression?

◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,

uses all the dimensions equally;

◮ logistic regression learns to find the relevant features.

How many features are really important for the classifier?

◮ We ranked all embedding components (features, vector dimensions)

by their correlation to PoS class;

◮ Then, trained classifiers on more and more of top-ranked features;

22

slide-144
SLIDE 144

Embeddings as PoS predictors

What if we employ KNN classifier instead of logistic regression?

◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,

uses all the dimensions equally;

◮ logistic regression learns to find the relevant features.

How many features are really important for the classifier?

◮ We ranked all embedding components (features, vector dimensions)

by their correlation to PoS class;

◮ Then, trained classifiers on more and more of top-ranked features; ◮ Measured their accuracy on the training set.

22

slide-145
SLIDE 145

Embeddings as PoS predictors

Classifier accuracy depending on the number of used vector components

23

slide-146
SLIDE 146

Embeddings as PoS predictors

Classifier accuracy depending on the number of used vector components Part of speech affiliation is distributed among many components of the word embeddings: not concentrated in few features.

23

slide-147
SLIDE 147

Case 1: Redefining parts of speech

Summary for the Case 1

◮ Word co-occurrences yield robust data about part of speech word

clusters;

24

slide-148
SLIDE 148

Case 1: Redefining parts of speech

Summary for the Case 1

◮ Word co-occurrences yield robust data about part of speech word

clusters;

◮ This is precisely because part of speech boundaries are a

continuum;

◮ PoS is rather a non-categorical linguistic phenomenon;

24

slide-149
SLIDE 149

Case 1: Redefining parts of speech

Summary for the Case 1

◮ Word co-occurrences yield robust data about part of speech word

clusters;

◮ This is precisely because part of speech boundaries are a

continuum;

◮ PoS is rather a non-categorical linguistic phenomenon; ◮ This knowledge is distributed among many vector components (at

least 100 in our case of a 300-dimensional model).

24

slide-150
SLIDE 150

Contents

1

What is NLP?

2

Case 1: Redefining parts of speech

3

Case 2: Tracing diachronic semantic shifts

24

slide-151
SLIDE 151

Armed conflicts in time

‘Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants’

(presented at the EMNLP2017, [Kutuzov et al., 2017])

25

slide-152
SLIDE 152

Armed conflicts in time

‘Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants’

(presented at the EMNLP2017, [Kutuzov et al., 2017])

General overview We employed diachronic word embedding models in the task of temporal analogical reasoning for armed conflicts relations, spanning

  • ver 16 years (1994–2010).

25

slide-153
SLIDE 153

Armed conflicts in time

‘Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants’

(presented at the EMNLP2017, [Kutuzov et al., 2017])

General overview We employed diachronic word embedding models in the task of temporal analogical reasoning for armed conflicts relations, spanning

  • ver 16 years (1994–2010).

UCDP data

25

slide-154
SLIDE 154

Armed conflicts in time

Gold standard

◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset

(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.

26

slide-155
SLIDE 155

Armed conflicts in time

Gold standard

◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset

(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.

◮ A manually annotated geographical and temporal dataset with

information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].

26

slide-156
SLIDE 156

Armed conflicts in time

Gold standard

◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset

(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.

◮ A manually annotated geographical and temporal dataset with

information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].

◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:

26

slide-157
SLIDE 157

Armed conflicts in time

Gold standard

◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset

(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.

◮ A manually annotated geographical and temporal dataset with

information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].

◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:

◮ starting and ending dates of armed conflicts between years 1994 and

2010;

26

slide-158
SLIDE 158

Armed conflicts in time

Gold standard

◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset

(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.

◮ A manually annotated geographical and temporal dataset with

information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].

◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:

◮ starting and ending dates of armed conflicts between years 1994 and

2010;

◮ 2 sides in each: sideA is a government, sideB is an insurgent group. 26

slide-159
SLIDE 159

Armed conflicts in time

Gold standard

◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset

(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.

◮ A manually annotated geographical and temporal dataset with

information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].

◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:

◮ starting and ending dates of armed conflicts between years 1994 and

2010;

◮ 2 sides in each: sideA is a government, sideB is an insurgent group.

◮ Resulting test set of 673 conflicts, with 137 unique

Location–Insurgent pairs.

26

slide-160
SLIDE 160

Armed conflicts in time

Gold standard

◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset

(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.

◮ A manually annotated geographical and temporal dataset with

information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].

◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:

◮ starting and ending dates of armed conflicts between years 1994 and

2010;

◮ 2 sides in each: sideA is a government, sideB is an insurgent group.

◮ Resulting test set of 673 conflicts, with 137 unique

Location–Insurgent pairs.

◮ We know what armed groups were active and when;

26

slide-161
SLIDE 161

Armed conflicts in time

Gold standard

◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset

(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.

◮ A manually annotated geographical and temporal dataset with

information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].

◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:

◮ starting and ending dates of armed conflicts between years 1994 and

2010;

◮ 2 sides in each: sideA is a government, sideB is an insurgent group.

◮ Resulting test set of 673 conflicts, with 137 unique

Location–Insurgent pairs.

◮ We know what armed groups were active and when; ◮ ...now we can try to extract the same data directly from texts.

26

slide-162
SLIDE 162

Our task: temporal analogical reasoning

Diachronic cultural shifts and one-to-many armed conflict relations between typed named entities.

27

slide-163
SLIDE 163

Our task: temporal analogical reasoning

Diachronic cultural shifts and one-to-many armed conflict relations between typed named entities.

◮ Locations:

  • 1. India2003
  • 2. India2003
  • 3. Uganda2003
  • 4. Iraq2004

27

slide-164
SLIDE 164

Our task: temporal analogical reasoning

Diachronic cultural shifts and one-to-many armed conflict relations between typed named entities.

◮ Locations:

  • 1. India2003
  • 2. India2003
  • 3. Uganda2003
  • 4. Iraq2004

27

slide-165
SLIDE 165

Our task: temporal analogical reasoning

Diachronic cultural shifts and one-to-many armed conflict relations between typed named entities.

◮ Locations:

  • 1. India2003
  • 2. India2003
  • 3. Uganda2003
  • 4. Iraq2004

◮ Armed groups:

  • 1. Kashmir Liberation Front2003
  • 2. ULFA2003
  • 3. Lord’s Resistance Army2003
  • 4. ???

27

slide-166
SLIDE 166

Our task: temporal analogical reasoning

Diachronic cultural shifts and one-to-many armed conflict relations between typed named entities.

◮ Locations:

  • 1. India2003
  • 2. India2003
  • 3. Uganda2003
  • 4. Iraq2004

◮ Armed groups:

  • 1. Kashmir Liberation Front2003
  • 2. ULFA2003
  • 3. Lord’s Resistance Army2003
  • 4. ???

(the correct answer(s): Ansar al-Islam, al-Mahdi Army and Islamic State).

27

slide-167
SLIDE 167

Case 2: Tracing diachronic semantic shifts

Incremental diachronic word embeddings

1994 CBOW CBOW CBOW 1995 1996 model 1994 model 1995 model 1996 Yearly corpora Yearly corpora

28

slide-168
SLIDE 168

Case 2: Tracing diachronic semantic shifts

The essence of the approach

◮ Diachronic CBOW word embeddings models [Mikolov et al., 2013];

29

slide-169
SLIDE 169

Case 2: Tracing diachronic semantic shifts

The essence of the approach

◮ Diachronic CBOW word embeddings models [Mikolov et al., 2013]; ◮ trained incrementally on the English Gigaword news corpus

[Parker et al., 2011];

29

slide-170
SLIDE 170

Case 2: Tracing diachronic semantic shifts

The essence of the approach

◮ Diachronic CBOW word embeddings models [Mikolov et al., 2013]; ◮ trained incrementally on the English Gigaword news corpus

[Parker et al., 2011];

◮ years 1994–2010 (yearly subcorpora about 250–320M content

words each);

29

slide-171
SLIDE 171

Case 2: Tracing diachronic semantic shifts

The essence of the approach

◮ Diachronic CBOW word embeddings models [Mikolov et al., 2013]; ◮ trained incrementally on the English Gigaword news corpus

[Parker et al., 2011];

◮ years 1994–2010 (yearly subcorpora about 250–320M content

words each);

◮ learned linear projections from the embeddings of locations to the

embeddings of armed groups in each year;

29

slide-172
SLIDE 172

Case 2: Tracing diachronic semantic shifts

The essence of the approach

◮ Diachronic CBOW word embeddings models [Mikolov et al., 2013]; ◮ trained incrementally on the English Gigaword news corpus

[Parker et al., 2011];

◮ years 1994–2010 (yearly subcorpora about 250–320M content

words each);

◮ learned linear projections from the embeddings of locations to the

embeddings of armed groups in each year;

◮ projections (transformation matrices) are applied to the model from

the next year.

29

slide-173
SLIDE 173

Case 2: Tracing diachronic semantic shifts

Location–insurgent relations (‘semantic directions’) in t-SNE Year 2000 model Year 2001 model Year 2002 model

30

slide-174
SLIDE 174

Case 2: Tracing diachronic semantic shifts

Linear projections

◮ Linear regression minimizing the error in transforming one set of

vectors into another:

31

slide-175
SLIDE 175

Case 2: Tracing diachronic semantic shifts

Linear projections

◮ Linear regression minimizing the error in transforming one set of

vectors into another:

Input vectors Learning trans- formation Target vectors Result Location1994_1 Solving normal equations Armed group1994_1 Linear transformation matrix (projection) Location1994_2 Armed group1994_2 ... ... Location1994_n Armed group1994_n

31

slide-176
SLIDE 176

Case 2: Tracing diachronic semantic shifts

Linear projections

◮ Linear regression minimizing the error in transforming one set of

vectors into another:

Input vectors Learning trans- formation Target vectors Result Location1994_1 Solving normal equations Armed group1994_1 Linear transformation matrix (projection) Location1994_2 Armed group1994_2 ... ... Location1994_n Armed group1994_n

◮ The learned transformation matrix from 1994 can predict an armed

group vector from a location vector for 1995, etc;

31

slide-177
SLIDE 177

Case 2: Tracing diachronic semantic shifts

Linear projections

◮ Linear regression minimizing the error in transforming one set of

vectors into another:

Input vectors Learning trans- formation Target vectors Result Location1994_1 Solving normal equations Armed group1994_1 Linear transformation matrix (projection) Location1994_2 Armed group1994_2 ... ... Location1994_n Armed group1994_n

◮ The learned transformation matrix from 1994 can predict an armed

group vector from a location vector for 1995, etc;

◮ can be trained either on all the conflicts from the past and present

years (up-to-now)...

31

slide-178
SLIDE 178

Case 2: Tracing diachronic semantic shifts

Linear projections

◮ Linear regression minimizing the error in transforming one set of

vectors into another:

Input vectors Learning trans- formation Target vectors Result Location1994_1 Solving normal equations Armed group1994_1 Linear transformation matrix (projection) Location1994_2 Armed group1994_2 ... ... Location1994_n Armed group1994_n

◮ The learned transformation matrix from 1994 can predict an armed

group vector from a location vector for 1995, etc;

◮ can be trained either on all the conflicts from the past and present

years (up-to-now)...

◮ ...or only on the salient conflicts: active in the last year (previous)

31

slide-179
SLIDE 179

Case 2: Tracing diachronic semantic shifts

Evaluation scores

Only in-vocabulary pairs All pairs, including OOV up-to-now previous up-to-now previous Training mode @1 @5 @10 @1 @5 @10 @1 @5 @10 @1 @5 @10 Separate 0.0 0.7 2.1 0.5 1.1 2.4 0.0 0.5 1.6 0.4 0.8 1.8 Cumulative 1.7 8.3 13.8 2.9 9.6 15.2 1.5 7.4 12.2 2.5 8.5 13.4 Incremental static 54.9 82.8 90.1 60.4 79.6 84.8 20.8 31.5 34.2 23.0 30.3 32.2 Incremental dynamic 32.5 64.5 72.2 42.6 64.8 71.5 28.1 56.1 62.9 37.3 56.7 62.6

32

slide-180
SLIDE 180

Case 2: Tracing diachronic semantic shifts

Evaluation scores

Only in-vocabulary pairs All pairs, including OOV up-to-now previous up-to-now previous Training mode @1 @5 @10 @1 @5 @10 @1 @5 @10 @1 @5 @10 Separate 0.0 0.7 2.1 0.5 1.1 2.4 0.0 0.5 1.6 0.4 0.8 1.8 Cumulative 1.7 8.3 13.8 2.9 9.6 15.2 1.5 7.4 12.2 2.5 8.5 13.4 Incremental static 54.9 82.8 90.1 60.4 79.6 84.8 20.8 31.5 34.2 23.0 30.3 32.2 Incremental dynamic 32.5 64.5 72.2 42.6 64.8 71.5 28.1 56.1 62.9 37.3 56.7 62.6

◮ Average accuracies of predicting next-year armed groups from

locations.

◮ 3 baselines and the proposed incremental dynamic approach.

32

slide-181
SLIDE 181

Case 2: Tracing diachronic semantic shifts

Summary for the Case 2

  • 1. Word embeddings can be used to trace temporal dynamics of

semantic relations in word pairs.

33

slide-182
SLIDE 182

Case 2: Tracing diachronic semantic shifts

Summary for the Case 2

  • 1. Word embeddings can be used to trace temporal dynamics of

semantic relations in word pairs.

◮ This can help people in political science and peace studies to

automatize mining their data.

33

slide-183
SLIDE 183

Case 2: Tracing diachronic semantic shifts

Summary for the Case 2

  • 1. Word embeddings can be used to trace temporal dynamics of

semantic relations in word pairs.

◮ This can help people in political science and peace studies to

automatize mining their data.

  • 2. The necessary prerequisites:

◮ incremental updating of the models with new textual data (not training

from scratch);

◮ expanding the models’ vocabulary. 33

slide-184
SLIDE 184

Case 2: Tracing diachronic semantic shifts

Summary for the Case 2

  • 1. Word embeddings can be used to trace temporal dynamics of

semantic relations in word pairs.

◮ This can help people in political science and peace studies to

automatize mining their data.

  • 2. The necessary prerequisites:

◮ incremental updating of the models with new textual data (not training

from scratch);

◮ expanding the models’ vocabulary.

Now you can decide for yourself, whether NLP/CL is a science or an engineering discipline :-)

33

slide-185
SLIDE 185

Q&A

Thank you for your attention! Questions are welcome. Computational linguistics and NLP: How far are they from generic linguistics? ❤tt♣✿✴✴✈❡❝t♦rs✳♥❧♣❧✳❡✉✴❡①♣❧♦r❡✴❡♠❜❡❞❞✐♥❣s✴ Andrey Kutuzov (andreku@ifi.uio.no) Language Technology Group University of Oslo

33

slide-186
SLIDE 186

References I

Dyer, C. (2017). Should neural network architecture reflect linguistic structure? In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), page 1. Association for Computational Linguistics. Gleditsch, N. P ., Wallensteen, P ., Eriksson, M., Sollenberg, M., and Strand, H. (2002). Armed conflict 1946-2001: A new dataset. Journal of peace research, 39(5):615–637. Houston, A. C. (1985). Continuity and change in English morphology: The variable (ING). PhD thesis, University of Pennsylvania.

34

slide-187
SLIDE 187

References II

Kreutz, J. (2010). How and when armed conflicts end: Introducing the UCDP conflict termination dataset. Journal of Peace Research, 47(2):243–250. Kutuzov, A., Velldal, E., and Øvrelid, L. (2016). Redefining part-of-speech classes with distributional semantic models. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pages 115–125. Association for Computational Linguistics.

35

slide-188
SLIDE 188

References III

Kutuzov, A., Velldal, E., and Øvrelid, L. (2017). Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1824–1829. Association for Computational Linguistics. Manning, C. D. (2011). Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In Computational Linguistics and Intelligent Text Processing, pages 171–189. Springer.

36

slide-189
SLIDE 189

References IV

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26. Parker, R., Graff, D., Kong, J., Chen, K., and Maeda, K. (2011). English gigaword fifth edition ldc2011t07. Technical report, Linguistic Data Consortium, Philadelphia. Zhao, J., Wang, T., Yatskar, M., Ordonez, V., and Chang, K.-W. (2017). Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2979–2989. Association for Computational Linguistics.

37