Computational linguistics and NLP: How far from generic linguistics? - - PowerPoint PPT Presentation
Computational linguistics and NLP: How far from generic linguistics? - - PowerPoint PPT Presentation
Computational linguistics and NLP: How far from generic linguistics? Andrey Kutuzov University of Oslo Language Technology Group with thanks to Joachim Nivre and Abigail See January 17, 2018 Contents What is NLP? 1 Case 1: Redefining parts
Contents
1
What is NLP?
2
Case 1: Redefining parts of speech
3
Case 2: Tracing diachronic semantic shifts
1
Defining the field
◮ Computational Linguistics (CL);
2
Defining the field
◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP);
2
Defining the field
◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU);
2
Defining the field
◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field:
2
Defining the field
◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective.
2
Defining the field
◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective. ◮ Dates back probably to medieval mystics looking for regularities in
sacred texts;
2
Defining the field
◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective. ◮ Dates back probably to medieval mystics looking for regularities in
sacred texts;
◮ In the modern sense of the word, starts in the XX century:
2
Defining the field
◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective. ◮ Dates back probably to medieval mystics looking for regularities in
sacred texts;
◮ In the modern sense of the word, starts in the XX century:
◮ George Zipf (studied statistics of natural language); 2
Defining the field
◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective. ◮ Dates back probably to medieval mystics looking for regularities in
sacred texts;
◮ In the modern sense of the word, starts in the XX century:
◮ George Zipf (studied statistics of natural language); ◮ Noam Chomsky (introduced transformational grammar); 2
Defining the field
◮ Computational Linguistics (CL); ◮ Natural Language Processing (NLP); ◮ Natural Language Understanding (NLU); ◮ More or less the same academic field: ◮ scientific study of language from a computational perspective. ◮ Dates back probably to medieval mystics looking for regularities in
sacred texts;
◮ In the modern sense of the word, starts in the XX century:
◮ George Zipf (studied statistics of natural language); ◮ Noam Chomsky (introduced transformational grammar); ◮ machine translation hype in the 1950s. 2
NLP/CL is booming
The number of submissions to the annual Association for Computational Linguistics conference (ACL)
3
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’;
4
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);
4
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);
◮ Information very often occurs in the form of (digital) texts. 4
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);
◮ Information very often occurs in the form of (digital) texts. 4
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);
◮ Information very often occurs in the form of (digital) texts.
◮ Important: both academic and industrial field!
◮ people drifting from universities to companies and back. 4
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);
◮ Information very often occurs in the form of (digital) texts.
◮ Important: both academic and industrial field!
◮ people drifting from universities to companies and back.
◮ Computational linguists contribute to many working systems:
4
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);
◮ Information very often occurs in the form of (digital) texts.
◮ Important: both academic and industrial field!
◮ people drifting from universities to companies and back.
◮ Computational linguists contribute to many working systems:
◮ machine translation 4
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);
◮ Information very often occurs in the form of (digital) texts.
◮ Important: both academic and industrial field!
◮ people drifting from universities to companies and back.
◮ Computational linguists contribute to many working systems:
◮ machine translation ◮ speech recognition 4
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);
◮ Information very often occurs in the form of (digital) texts.
◮ Important: both academic and industrial field!
◮ people drifting from universities to companies and back.
◮ Computational linguists contribute to many working systems:
◮ machine translation ◮ speech recognition ◮ web search engines 4
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);
◮ Information very often occurs in the form of (digital) texts.
◮ Important: both academic and industrial field!
◮ people drifting from universities to companies and back.
◮ Computational linguists contribute to many working systems:
◮ machine translation ◮ speech recognition ◮ web search engines ◮ grammar and spell checking 4
Recent boost
◮ In the last 20 years, NLP sees an incredible boost. ◮ The main reason is information: ‘oil of the XXI century’; ◮ Business wants to process information (especially IT companies);
◮ Information very often occurs in the form of (digital) texts.
◮ Important: both academic and industrial field!
◮ people drifting from universities to companies and back.
◮ Computational linguists contribute to many working systems:
◮ machine translation ◮ speech recognition ◮ web search engines ◮ grammar and spell checking ◮ virtual personal assistants (Siri, Alexa, Cortana) ◮ etc. 4
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages.
5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate.
5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:
5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:
- 1. ‘rule-based’ (‘hand-crafted’);
5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:
- 1. ‘rule-based’ (‘hand-crafted’);
- 2. ‘data-driven’ (statistical).
5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:
- 1. ‘rule-based’ (‘hand-crafted’);
- 2. ‘data-driven’ (statistical).
◮ Statistics is at the core of today’s NLP
.
5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:
- 1. ‘rule-based’ (‘hand-crafted’);
- 2. ‘data-driven’ (statistical).
◮ Statistics is at the core of today’s NLP
.
◮ We run experiments to test hypotheses:
5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:
- 1. ‘rule-based’ (‘hand-crafted’);
- 2. ‘data-driven’ (statistical).
◮ Statistics is at the core of today’s NLP
.
◮ We run experiments to test hypotheses:
◮ ‘there are 10 parts of speech in this language’, 5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:
- 1. ‘rule-based’ (‘hand-crafted’);
- 2. ‘data-driven’ (statistical).
◮ Statistics is at the core of today’s NLP
.
◮ We run experiments to test hypotheses:
◮ ‘there are 10 parts of speech in this language’, ◮ ‘word co-occurrence information improves document classification’. 5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:
- 1. ‘rule-based’ (‘hand-crafted’);
- 2. ‘data-driven’ (statistical).
◮ Statistics is at the core of today’s NLP
.
◮ We run experiments to test hypotheses:
◮ ‘there are 10 parts of speech in this language’, ◮ ‘word co-occurrence information improves document classification’.
◮ Replicability (the same experiment must always yield the same
result);
5
Is it linguistics at all?
Differences from ‘traditional’ or ‘generic’ linguistics
◮ Traditional linguistics usually describes and compares languages. ◮ NLP is closer to mathematics and engineering: we calculate. ◮ Building computational models of linguistic phenomena:
- 1. ‘rule-based’ (‘hand-crafted’);
- 2. ‘data-driven’ (statistical).
◮ Statistics is at the core of today’s NLP
.
◮ We run experiments to test hypotheses:
◮ ‘there are 10 parts of speech in this language’, ◮ ‘word co-occurrence information improves document classification’.
◮ Replicability (the same experiment must always yield the same
result);
◮ Reproducibility (similar experiments should yield comparable
results).
5
Stress on practice
6
Stress on practice
◮ Research should be practical.
6
Stress on practice
◮ Research should be practical. ◮ ‘Show me your code!’
6
Stress on practice
◮ Research should be practical. ◮ ‘Show me your code!’ ◮ ‘Show me the scores of your system!’
6
Stress on practice
◮ Research should be practical. ◮ ‘Show me your code!’ ◮ ‘Show me the scores of your system!’ ◮ Empirical evaluation on particular
problems.
6
Stress on practice
◮ Research should be practical. ◮ ‘Show me your code!’ ◮ ‘Show me the scores of your system!’ ◮ Empirical evaluation on particular
problems.
◮ Test data sets.
6
Stress on practice
◮ Research should be practical. ◮ ‘Show me your code!’ ◮ ‘Show me the scores of your system!’ ◮ Empirical evaluation on particular
problems.
◮ Test data sets. ◮ Shared tasks (competitions).
6
Publishing activities
◮ Conferences:
◮ ACL 7
Publishing activities
◮ Conferences:
◮ ACL ◮ EMNLP 7
Publishing activities
◮ Conferences:
◮ ACL ◮ EMNLP ◮ EACL 7
Publishing activities
◮ Conferences:
◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL 7
Publishing activities
◮ Conferences:
◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING 7
Publishing activities
◮ Conferences:
◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING ◮ LREC...
◮ Journals:
7
Publishing activities
◮ Conferences:
◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING ◮ LREC...
◮ Journals:
◮ ‘Computational Linguistics’ (CL); 7
Publishing activities
◮ Conferences:
◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING ◮ LREC...
◮ Journals:
◮ ‘Computational Linguistics’ (CL); ◮ ‘Transactions of the Association for Computational Linguistics’
(TACL).
7
Publishing activities
◮ Conferences:
◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING ◮ LREC...
◮ Journals:
◮ ‘Computational Linguistics’ (CL); ◮ ‘Transactions of the Association for Computational Linguistics’
(TACL).
◮ Unlike in other fields, journals are not that important.
7
Publishing activities
◮ Conferences:
◮ ACL ◮ EMNLP ◮ EACL ◮ NAACL ◮ COLING ◮ LREC...
◮ Journals:
◮ ‘Computational Linguistics’ (CL); ◮ ‘Transactions of the Association for Computational Linguistics’
(TACL).
◮ Unlike in other fields, journals are not that important.
7
Publishing activities
◮ Most of the papers can be found in the Association for
Computational Linguistics (ACL) Anthology:
◮ ❤tt♣s✿✴✴❛❝❧❛♥t❤♦❧♦❣②✳✐♥❢♦✴
❤tt♣s✿✴✴❛r①✐✈✳♦r❣✴❧✐st✴❝s✳❈▲✴r❡❝❡♥t
8
Publishing activities
◮ Most of the papers can be found in the Association for
Computational Linguistics (ACL) Anthology:
◮ ❤tt♣s✿✴✴❛❝❧❛♥t❤♦❧♦❣②✳✐♥❢♦✴
◮ Double blind peer review almost everywhere...
❤tt♣s✿✴✴❛r①✐✈✳♦r❣✴❧✐st✴❝s✳❈▲✴r❡❝❡♥t
8
Publishing activities
◮ Most of the papers can be found in the Association for
Computational Linguistics (ACL) Anthology:
◮ ❤tt♣s✿✴✴❛❝❧❛♥t❤♦❧♦❣②✳✐♥❢♦✴
◮ Double blind peer review almost everywhere... ◮ ...recent years: open preprints published online:
❤tt♣s✿✴✴❛r①✐✈✳♦r❣✴❧✐st✴❝s✳❈▲✴r❡❝❡♥t
8
Publishing activities
◮ Most of the papers can be found in the Association for
Computational Linguistics (ACL) Anthology:
◮ ❤tt♣s✿✴✴❛❝❧❛♥t❤♦❧♦❣②✳✐♥❢♦✴
◮ Double blind peer review almost everywhere... ◮ ...recent years: open preprints published online:
◮ ❤tt♣s✿✴✴❛r①✐✈✳♦r❣✴❧✐st✴❝s✳❈▲✴r❡❝❡♥t 8
Publishing activities
◮ Most of the papers can be found in the Association for
Computational Linguistics (ACL) Anthology:
◮ ❤tt♣s✿✴✴❛❝❧❛♥t❤♦❧♦❣②✳✐♥❢♦✴
◮ Double blind peer review almost everywhere... ◮ ...recent years: open preprints published online:
◮ ❤tt♣s✿✴✴❛r①✐✈✳♦r❣✴❧✐st✴❝s✳❈▲✴r❡❝❡♥t 8
Machine learning
◮ NLP is now being rapidly transformed by another field(s):
9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
◮ Some problems are so complex that we can‘t formulate exact
algorithms for them.
9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
◮ Some problems are so complex that we can‘t formulate exact
algorithms for them.
◮ To solve such problems, one can use machine learning: 9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
◮ Some problems are so complex that we can‘t formulate exact
algorithms for them.
◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training
material and improve with experience;
9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
◮ Some problems are so complex that we can‘t formulate exact
algorithms for them.
◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training
material and improve with experience;
◮ thus, we train our systems on linguistic data (usually large text
collections: corpora).
9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
◮ Some problems are so complex that we can‘t formulate exact
algorithms for them.
◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training
material and improve with experience;
◮ thus, we train our systems on linguistic data (usually large text
collections: corpora).
◮ Artificial neural networks are one of popular machine learning
approaches for language modeling.
9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
◮ Some problems are so complex that we can‘t formulate exact
algorithms for them.
◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training
material and improve with experience;
◮ thus, we train our systems on linguistic data (usually large text
collections: corpora).
◮ Artificial neural networks are one of popular machine learning
approaches for language modeling.
Deep learning renaissance
◮ ‘Deep learning’ is training and using multi-layered artificial neural networks. 9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
◮ Some problems are so complex that we can‘t formulate exact
algorithms for them.
◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training
material and improve with experience;
◮ thus, we train our systems on linguistic data (usually large text
collections: corpora).
◮ Artificial neural networks are one of popular machine learning
approaches for language modeling.
Deep learning renaissance
◮ ‘Deep learning’ is training and using multi-layered artificial neural networks. ◮ After long ‘winter’ (since 60s and 70s), it is now again popular. 9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
◮ Some problems are so complex that we can‘t formulate exact
algorithms for them.
◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training
material and improve with experience;
◮ thus, we train our systems on linguistic data (usually large text
collections: corpora).
◮ Artificial neural networks are one of popular machine learning
approaches for language modeling.
Deep learning renaissance
◮ ‘Deep learning’ is training and using multi-layered artificial neural networks. ◮ After long ‘winter’ (since 60s and 70s), it is now again popular. ◮ Deep neural approaches are very efficient in NLP
.
9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
◮ Some problems are so complex that we can‘t formulate exact
algorithms for them.
◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training
material and improve with experience;
◮ thus, we train our systems on linguistic data (usually large text
collections: corpora).
◮ Artificial neural networks are one of popular machine learning
approaches for language modeling.
Deep learning renaissance
◮ ‘Deep learning’ is training and using multi-layered artificial neural networks. ◮ After long ‘winter’ (since 60s and 70s), it is now again popular. ◮ Deep neural approaches are very efficient in NLP
.
◮ ‘Do we need anything except neural networks now?’ 9
Machine learning
◮ NLP is now being rapidly transformed by another field(s): ◮ data science and machine learning.
◮ Some problems are so complex that we can‘t formulate exact
algorithms for them.
◮ To solve such problems, one can use machine learning: ◮ programs which learn to make correct decisions on some training
material and improve with experience;
◮ thus, we train our systems on linguistic data (usually large text
collections: corpora).
◮ Artificial neural networks are one of popular machine learning
approaches for language modeling.
Deep learning renaissance
◮ ‘Deep learning’ is training and using multi-layered artificial neural networks. ◮ After long ‘winter’ (since 60s and 70s), it is now again popular. ◮ Deep neural approaches are very efficient in NLP
.
◮ ‘Do we need anything except neural networks now?’ ◮ Another reason for the recent boost of interest towards our discipline. 9
Problems and challenges
NLP has its problems
10
Problems and challenges
NLP has its problems
◮ equity and diversity;
10
Problems and challenges
NLP has its problems
◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:
10
Problems and challenges
NLP has its problems
◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:
◮ how to preserve anonymity? 10
Problems and challenges
NLP has its problems
◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:
◮ how to preserve anonymity? ◮ preprint publishing is good in disseminating science and making it
- pen, but...
10
Problems and challenges
NLP has its problems
◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:
◮ how to preserve anonymity? ◮ preprint publishing is good in disseminating science and making it
- pen, but...
◮ ...people can use ArXiv for flag-planting, and to simply circumvent the
peer-review process.
10
Problems and challenges
NLP has its problems
◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:
◮ how to preserve anonymity? ◮ preprint publishing is good in disseminating science and making it
- pen, but...
◮ ...people can use ArXiv for flag-planting, and to simply circumvent the
peer-review process.
◮ machine learning models amplifying biases and discrimination in
data [Zhao et al., 2017]
10
Problems and challenges
NLP has its problems
◮ equity and diversity; ◮ traditional reviewing schemes conflicting with ArXiv:
◮ how to preserve anonymity? ◮ preprint publishing is good in disseminating science and making it
- pen, but...
◮ ...people can use ArXiv for flag-planting, and to simply circumvent the
peer-review process.
◮ machine learning models amplifying biases and discrimination in
data [Zhao et al., 2017]
◮ sometimes research success depends on computational power:
◮ ‘...do we have enough GPUs?’ 10
Science?
◮ People wonder:
◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’ 11
Science?
◮ People wonder:
◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’
◮ Is it a science or an engineering discipline?
11
Science?
◮ People wonder:
◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’
◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards
empirical problems?
11
Science?
◮ People wonder:
◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’
◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards
empirical problems?
◮ Motivation for research can be different:
11
Science?
◮ People wonder:
◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’
◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards
empirical problems?
◮ Motivation for research can be different:
- 1. trying to provide a computational explanation for linguistic or
psycholinguistic phenomenon;
11
Science?
◮ People wonder:
◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’
◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards
empirical problems?
◮ Motivation for research can be different:
- 1. trying to provide a computational explanation for linguistic or
psycholinguistic phenomenon;
- 2. trying to provide a working component of a speech or natural
language system.
11
Science?
◮ People wonder:
◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’
◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards
empirical problems?
◮ Motivation for research can be different:
- 1. trying to provide a computational explanation for linguistic or
psycholinguistic phenomenon;
- 2. trying to provide a working component of a speech or natural
language system.
◮ Do our top-tier conferences belong to CL or to NLP then?
11
Science?
◮ People wonder:
◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’
◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards
empirical problems?
◮ Motivation for research can be different:
- 1. trying to provide a computational explanation for linguistic or
psycholinguistic phenomenon;
- 2. trying to provide a working component of a speech or natural
language system.
◮ Do our top-tier conferences belong to CL or to NLP then? ◮ The overwhelming majority of papers are empirical today.
11
Science?
◮ People wonder:
◮ ‘What are your research questions?’ ◮ ‘Just lots of numbers with very small differences?’
◮ Is it a science or an engineering discipline? ◮ Or may be CL is a science and NLP is its application towards
empirical problems?
◮ Motivation for research can be different:
- 1. trying to provide a computational explanation for linguistic or
psycholinguistic phenomenon;
- 2. trying to provide a working component of a speech or natural
language system.
◮ Do our top-tier conferences belong to CL or to NLP then? ◮ The overwhelming majority of papers are empirical today. ◮ No final answer yet.
11
Language IS complicated
12
Language IS complicated
‘...human language is magnificent, and complex, and challenging. It has tons of nuances, and corners, and oddities, and surprises.
12
Language IS complicated
‘...human language is magnificent, and complex, and challenging. It has tons of nuances, and corners, and oddities, and surprises. While natural language processing researchers, and natural language generation researchers—and linguists! who do a lot of the heavy lifting—made some impressive advances towards our understanding of language and how to process it, we are still just barely scratching the surface on this.’
12
Interaction with traditional linguistics
Linguistics is back
13
Interaction with traditional linguistics
Linguistics is back
◮ NLP is re-embracing linguistic structure now;
13
Interaction with traditional linguistics
Linguistics is back
◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches
acknowledge it;
13
Interaction with traditional linguistics
Linguistics is back
◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches
acknowledge it;
◮ Linguistic structures induced into machine learning systems reduce
search space, bringing improvements [Dyer, 2017];
13
Interaction with traditional linguistics
Linguistics is back
◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches
acknowledge it;
◮ Linguistic structures induced into machine learning systems reduce
search space, bringing improvements [Dyer, 2017];
◮ Language is not just sequences of words / characters / bytes.
13
Interaction with traditional linguistics
Linguistics is back
◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches
acknowledge it;
◮ Linguistic structures induced into machine learning systems reduce
search space, bringing improvements [Dyer, 2017];
◮ Language is not just sequences of words / characters / bytes. ◮ But what can NLP give to traditional linguistics?
13
Interaction with traditional linguistics
Linguistics is back
◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches
acknowledge it;
◮ Linguistic structures induced into machine learning systems reduce
search space, bringing improvements [Dyer, 2017];
◮ Language is not just sequences of words / characters / bytes. ◮ But what can NLP give to traditional linguistics? ◮ Or to humanities in general?
13
Interaction with traditional linguistics
Linguistics is back
◮ NLP is re-embracing linguistic structure now; ◮ Even the strongest proponents of purely data-driven approaches
acknowledge it;
◮ Linguistic structures induced into machine learning systems reduce
search space, bringing improvements [Dyer, 2017];
◮ Language is not just sequences of words / characters / bytes. ◮ But what can NLP give to traditional linguistics? ◮ Or to humanities in general? ◮ I will now outline 2 case studies from my own research.
13
Contents
1
What is NLP?
2
Case 1: Redefining parts of speech
3
Case 2: Tracing diachronic semantic shifts
13
‘Redefining parts of speech with word embeddings’
(Presented at the CoNLL2016, [Kutuzov et al., 2016])
14
‘Redefining parts of speech with word embeddings’
(Presented at the CoNLL2016, [Kutuzov et al., 2016])
‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]
14
‘Redefining parts of speech with word embeddings’
(Presented at the CoNLL2016, [Kutuzov et al., 2016])
‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]
◮ In natural languages, parts of speech boundaries are flexible:
14
‘Redefining parts of speech with word embeddings’
(Presented at the CoNLL2016, [Kutuzov et al., 2016])
‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]
◮ In natural languages, parts of speech boundaries are flexible:
◮ Participles in English are in many respects both verbs and adjectives 14
‘Redefining parts of speech with word embeddings’
(Presented at the CoNLL2016, [Kutuzov et al., 2016])
‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]
◮ In natural languages, parts of speech boundaries are flexible:
◮ Participles in English are in many respects both verbs and adjectives ◮ Determiners and possessive pronouns overlap 14
‘Redefining parts of speech with word embeddings’
(Presented at the CoNLL2016, [Kutuzov et al., 2016])
‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]
◮ In natural languages, parts of speech boundaries are flexible:
◮ Participles in English are in many respects both verbs and adjectives ◮ Determiners and possessive pronouns overlap
◮ Finding groups of words ‘on the verge’ between different PoS can
reveal inconsistencies in corpus annotation.
14
‘Redefining parts of speech with word embeddings’
(Presented at the CoNLL2016, [Kutuzov et al., 2016])
‘Grammatical categories exist along a continuum which does not exhibit sharp boundaries between the categories’ [Houston, 1985]
◮ In natural languages, parts of speech boundaries are flexible:
◮ Participles in English are in many respects both verbs and adjectives ◮ Determiners and possessive pronouns overlap
◮ Finding groups of words ‘on the verge’ between different PoS can
reveal inconsistencies in corpus annotation.
◮ For that, we employed word embeddings (as in word2vec).
14
Brief recap on distributional semantic models
◮ based on distributions of word co-occurrences in large training
corpora;
❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s
15
Brief recap on distributional semantic models
◮ based on distributions of word co-occurrences in large training
corpora;
◮ represent word meaning as dense lexical vectors (word
embeddings);
❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s
15
Brief recap on distributional semantic models
◮ based on distributions of word co-occurrences in large training
corpora;
◮ represent word meaning as dense lexical vectors (word
embeddings);
◮ words occurring in similar contexts have similar vectors;
❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s
15
Brief recap on distributional semantic models
◮ based on distributions of word co-occurrences in large training
corpora;
◮ represent word meaning as dense lexical vectors (word
embeddings);
◮ words occurring in similar contexts have similar vectors; ◮ vector representations are continuous:
❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s
15
Brief recap on distributional semantic models
◮ based on distributions of word co-occurrences in large training
corpora;
◮ represent word meaning as dense lexical vectors (word
embeddings);
◮ words occurring in similar contexts have similar vectors; ◮ vector representations are continuous:
◮ words are in common vector space and can be more or less close to
each other. ❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s
15
Brief recap on distributional semantic models
◮ based on distributions of word co-occurrences in large training
corpora;
◮ represent word meaning as dense lexical vectors (word
embeddings);
◮ words occurring in similar contexts have similar vectors; ◮ vector representations are continuous:
◮ words are in common vector space and can be more or less close to
each other.
◮ one can find nearest semantic neighbors of a given word by
calculating cosine similarity between vectors.
❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴ ✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s
15
Brief recap on distributional semantic models
◮ based on distributions of word co-occurrences in large training
corpora;
◮ represent word meaning as dense lexical vectors (word
embeddings);
◮ words occurring in similar contexts have similar vectors; ◮ vector representations are continuous:
◮ words are in common vector space and can be more or less close to
each other.
◮ one can find nearest semantic neighbors of a given word by
calculating cosine similarity between vectors.
◮ more on word embeddings elsewhere:
◮ ❤tt♣s✿✴✴✇✇✇✳❛❝❛❞❡♠✐❛✳❡❞✉✴✸✺✻✽✺✼✵✾✴❚❡❛❝❤✐♥❣❴❝♦♠♣✉t❡rs❴✇❤❛t❴
✇♦r❞s❴♠❡❛♥❴♠♦❞❡r♥❴✇♦r❞❴❡♠❜❡❞❞✐♥❣❴♠♦❞❡❧s
15
Try yourself!
Word embedding models for English and Norwegian online You can try our WebVectors demo service: ❤tt♣✿✴✴✈❡❝t♦rs✳♥❧♣❧✳❡✉✴❡①♣❧♦r❡✴❡♠❜❡❞❞✐♥❣s✴ (mobile-friendly)
16
Case 1: Redefining parts of speech
General idea
◮ PoS can be inferred from word embeddings;
17
Case 1: Redefining parts of speech
General idea
◮ PoS can be inferred from word embeddings; ◮ Classification problem: word vector as input, PoS as output;
17
Case 1: Redefining parts of speech
General idea
◮ PoS can be inferred from word embeddings; ◮ Classification problem: word vector as input, PoS as output; ◮ Words with incorrect predictions are ‘outliers’: their distributional
patterns differ from other words in the same class.
17
Case 1: Redefining parts of speech
General idea
◮ PoS can be inferred from word embeddings; ◮ Classification problem: word vector as input, PoS as output; ◮ Words with incorrect predictions are ‘outliers’: their distributional
patterns differ from other words in the same class. Data British National Corpus (BNC):
◮ About 89M words; ◮ We replaced words with their lemmas and Universal PoS tags:
◮ ‘love_VERB’ ◮ ‘love_NOUN’
Tags used: ADJ, ADP , ADV, AUX, CONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, SCONJ, SYM, VERB, X
17
Workflow
◮ Continuous Skipgram model [Mikolov et al., 2013] to learn word
embeddings from the BNC;
18
Workflow
◮ Continuous Skipgram model [Mikolov et al., 2013] to learn word
embeddings from the BNC;
◮ Logistic regression multinomial classifier trained to predict PoS
based on embeddings:
18
Workflow
◮ Continuous Skipgram model [Mikolov et al., 2013] to learn word
embeddings from the BNC;
◮ Logistic regression multinomial classifier trained to predict PoS
based on embeddings:
◮ The training set: the BNC 10 000 most frequent words; 18
Workflow
◮ Continuous Skipgram model [Mikolov et al., 2013] to learn word
embeddings from the BNC;
◮ Logistic regression multinomial classifier trained to predict PoS
based on embeddings:
◮ The training set: the BNC 10 000 most frequent words; ◮ The test set: the next 17 000 top frequent words; 18
Workflow
◮ Continuous Skipgram model [Mikolov et al., 2013] to learn word
embeddings from the BNC;
◮ Logistic regression multinomial classifier trained to predict PoS
based on embeddings:
◮ The training set: the BNC 10 000 most frequent words; ◮ The test set: the next 17 000 top frequent words; ◮ The 2nd test set: tokens from Universal Dependencies Treebank 18
Classification results
19
Classification results
Classifier F-score (training set) Training set Baseline 1-feature classifier 0.22 Logistic regression 0.98 Test set Logistic regression 0.91 UD Treebank test set Logistic regression (OOV words omitted) 0.99
19
Not from this crowd: analyzing outliers
Interesting errors with high frequency
◮ VERB → ADJ set of verbs dominantly used in passive: ‘to
intertwine’, ‘to disillusion’;
20
Not from this crowd: analyzing outliers
Interesting errors with high frequency
◮ VERB → ADJ set of verbs dominantly used in passive: ‘to
intertwine’, ‘to disillusion’;
◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,
‘$1’) tagged as nouns: a controversial decision;
20
Not from this crowd: analyzing outliers
Interesting errors with high frequency
◮ VERB → ADJ set of verbs dominantly used in passive: ‘to
intertwine’, ‘to disillusion’;
◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,
‘$1’) tagged as nouns: a controversial decision;
◮ NUM → NOUN is mostly years and decades (‘the sixties’) tagged in
the BNC as numerals;
20
Not from this crowd: analyzing outliers
Interesting errors with high frequency
◮ VERB → ADJ set of verbs dominantly used in passive: ‘to
intertwine’, ‘to disillusion’;
◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,
‘$1’) tagged as nouns: a controversial decision;
◮ NUM → NOUN is mostly years and decades (‘the sixties’) tagged in
the BNC as numerals;
◮ In the BNC, ‘£50’ is a noun, but ‘1776’ a numeral! 20
Not from this crowd: analyzing outliers
Interesting errors with high frequency
◮ VERB → ADJ set of verbs dominantly used in passive: ‘to
intertwine’, ‘to disillusion’;
◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,
‘$1’) tagged as nouns: a controversial decision;
◮ NUM → NOUN is mostly years and decades (‘the sixties’) tagged in
the BNC as numerals;
◮ In the BNC, ‘£50’ is a noun, but ‘1776’ a numeral! ◮ Possible minor inconsistencies in the annotation strategy; 20
Not from this crowd: analyzing outliers
Interesting errors with high frequency
◮ VERB → ADJ set of verbs dominantly used in passive: ‘to
intertwine’, ‘to disillusion’;
◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,
‘$1’) tagged as nouns: a controversial decision;
◮ NUM → NOUN is mostly years and decades (‘the sixties’) tagged in
the BNC as numerals;
◮ In the BNC, ‘£50’ is a noun, but ‘1776’ a numeral! ◮ Possible minor inconsistencies in the annotation strategy; ◮ similar problem in the Penn Treebank [Manning, 2011]. 20
Not from this crowd: analyzing outliers
Interesting errors with high frequency
◮ VERB → ADJ set of verbs dominantly used in passive: ‘to
intertwine’, ‘to disillusion’;
◮ NOUN → NUM reveals amounts and percentages (‘£70’, ‘33%’,
‘$1’) tagged as nouns: a controversial decision;
◮ NUM → NOUN is mostly years and decades (‘the sixties’) tagged in
the BNC as numerals;
◮ In the BNC, ‘£50’ is a noun, but ‘1776’ a numeral! ◮ Possible minor inconsistencies in the annotation strategy; ◮ similar problem in the Penn Treebank [Manning, 2011].
◮ ADV → ADJ systematic error: adjectives like ‘plain’, ‘clear’ or
‘sharp’ erroneously tagged in the corpus as adverbs.
20
Not from this crowd: analyzing outliers
Interesting errors with high coverage
◮ SCONJ → ADV: ‘seeing’ and ‘immediately’. Clear tagging errors,
mostly in initial positions: ‘Immediately, she lowered the gun’;
21
Not from this crowd: analyzing outliers
Interesting errors with high coverage
◮ SCONJ → ADV: ‘seeing’ and ‘immediately’. Clear tagging errors,
mostly in initial positions: ‘Immediately, she lowered the gun’;
◮ ADP → ADJ separate word group: ‘cross’, ‘pre’ and ‘pro’ (‘Did
anyone encounter any trouble from Hibs fans in Edinburgh pre season?’). Closer to adjectives or adverbs than to prepositions?
21
Not from this crowd: analyzing outliers
Interesting errors with high coverage
◮ SCONJ → ADV: ‘seeing’ and ‘immediately’. Clear tagging errors,
mostly in initial positions: ‘Immediately, she lowered the gun’;
◮ ADP → ADJ separate word group: ‘cross’, ‘pre’ and ‘pro’ (‘Did
anyone encounter any trouble from Hibs fans in Edinburgh pre season?’). Closer to adjectives or adverbs than to prepositions? Intermediate findings
- 1. ‘Boundary cases’ detected by classifying embeddings reveal
sub-classes of words on the verge between different PoS.
21
Not from this crowd: analyzing outliers
Interesting errors with high coverage
◮ SCONJ → ADV: ‘seeing’ and ‘immediately’. Clear tagging errors,
mostly in initial positions: ‘Immediately, she lowered the gun’;
◮ ADP → ADJ separate word group: ‘cross’, ‘pre’ and ‘pro’ (‘Did
anyone encounter any trouble from Hibs fans in Edinburgh pre season?’). Closer to adjectives or adverbs than to prepositions? Intermediate findings
- 1. ‘Boundary cases’ detected by classifying embeddings reveal
sub-classes of words on the verge between different PoS.
- 2. We can quickly discover systematic errors or inconsistencies in PoS
annotations, whether they be automatic or manual.
21
Embeddings as PoS predictors
What if we employ KNN classifier instead of logistic regression?
22
Embeddings as PoS predictors
What if we employ KNN classifier instead of logistic regression?
◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set;
22
Embeddings as PoS predictors
What if we employ KNN classifier instead of logistic regression?
◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,
uses all the dimensions equally;
22
Embeddings as PoS predictors
What if we employ KNN classifier instead of logistic regression?
◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,
uses all the dimensions equally;
◮ logistic regression learns to find the relevant features.
22
Embeddings as PoS predictors
What if we employ KNN classifier instead of logistic regression?
◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,
uses all the dimensions equally;
◮ logistic regression learns to find the relevant features.
How many features are really important for the classifier?
22
Embeddings as PoS predictors
What if we employ KNN classifier instead of logistic regression?
◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,
uses all the dimensions equally;
◮ logistic regression learns to find the relevant features.
How many features are really important for the classifier?
◮ We ranked all embedding components (features, vector dimensions)
by their correlation to PoS class;
22
Embeddings as PoS predictors
What if we employ KNN classifier instead of logistic regression?
◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,
uses all the dimensions equally;
◮ logistic regression learns to find the relevant features.
How many features are really important for the classifier?
◮ We ranked all embedding components (features, vector dimensions)
by their correlation to PoS class;
◮ Then, trained classifiers on more and more of top-ranked features;
22
Embeddings as PoS predictors
What if we employ KNN classifier instead of logistic regression?
◮ Worse: accuracy 0.913 on the training set, 0.81 on the test set; ◮ K-neighbors fails to separate important features from all the others,
uses all the dimensions equally;
◮ logistic regression learns to find the relevant features.
How many features are really important for the classifier?
◮ We ranked all embedding components (features, vector dimensions)
by their correlation to PoS class;
◮ Then, trained classifiers on more and more of top-ranked features; ◮ Measured their accuracy on the training set.
22
Embeddings as PoS predictors
Classifier accuracy depending on the number of used vector components
23
Embeddings as PoS predictors
Classifier accuracy depending on the number of used vector components Part of speech affiliation is distributed among many components of the word embeddings: not concentrated in few features.
23
Case 1: Redefining parts of speech
Summary for the Case 1
◮ Word co-occurrences yield robust data about part of speech word
clusters;
24
Case 1: Redefining parts of speech
Summary for the Case 1
◮ Word co-occurrences yield robust data about part of speech word
clusters;
◮ This is precisely because part of speech boundaries are a
continuum;
◮ PoS is rather a non-categorical linguistic phenomenon;
24
Case 1: Redefining parts of speech
Summary for the Case 1
◮ Word co-occurrences yield robust data about part of speech word
clusters;
◮ This is precisely because part of speech boundaries are a
continuum;
◮ PoS is rather a non-categorical linguistic phenomenon; ◮ This knowledge is distributed among many vector components (at
least 100 in our case of a 300-dimensional model).
24
Contents
1
What is NLP?
2
Case 1: Redefining parts of speech
3
Case 2: Tracing diachronic semantic shifts
24
Armed conflicts in time
‘Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants’
(presented at the EMNLP2017, [Kutuzov et al., 2017])
25
Armed conflicts in time
‘Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants’
(presented at the EMNLP2017, [Kutuzov et al., 2017])
General overview We employed diachronic word embedding models in the task of temporal analogical reasoning for armed conflicts relations, spanning
- ver 16 years (1994–2010).
25
Armed conflicts in time
‘Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants’
(presented at the EMNLP2017, [Kutuzov et al., 2017])
General overview We employed diachronic word embedding models in the task of temporal analogical reasoning for armed conflicts relations, spanning
- ver 16 years (1994–2010).
UCDP data
25
Armed conflicts in time
Gold standard
◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset
(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.
26
Armed conflicts in time
Gold standard
◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset
(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.
◮ A manually annotated geographical and temporal dataset with
information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].
26
Armed conflicts in time
Gold standard
◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset
(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.
◮ A manually annotated geographical and temporal dataset with
information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].
◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:
26
Armed conflicts in time
Gold standard
◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset
(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.
◮ A manually annotated geographical and temporal dataset with
information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].
◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:
◮ starting and ending dates of armed conflicts between years 1994 and
2010;
26
Armed conflicts in time
Gold standard
◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset
(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.
◮ A manually annotated geographical and temporal dataset with
information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].
◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:
◮ starting and ending dates of armed conflicts between years 1994 and
2010;
◮ 2 sides in each: sideA is a government, sideB is an insurgent group. 26
Armed conflicts in time
Gold standard
◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset
(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.
◮ A manually annotated geographical and temporal dataset with
information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].
◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:
◮ starting and ending dates of armed conflicts between years 1994 and
2010;
◮ 2 sides in each: sideA is a government, sideB is an insurgent group.
◮ Resulting test set of 673 conflicts, with 137 unique
Location–Insurgent pairs.
26
Armed conflicts in time
Gold standard
◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset
(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.
◮ A manually annotated geographical and temporal dataset with
information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].
◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:
◮ starting and ending dates of armed conflicts between years 1994 and
2010;
◮ 2 sides in each: sideA is a government, sideB is an insurgent group.
◮ Resulting test set of 673 conflicts, with 137 unique
Location–Insurgent pairs.
◮ We know what armed groups were active and when;
26
Armed conflicts in time
Gold standard
◮ Ground truth: the UCDP/PRIO Armed Conflict Dataset
(❤tt♣✿✴✴✉❝❞♣✳✉✉✳s❡✴) maintained by the Uppsala Conflict Data Program and the Peace Research Institute Oslo.
◮ A manually annotated geographical and temporal dataset with
information on armed conflicts all over the world from 1946 to the present [Gleditsch et al., 2002].
◮ Particularly, the UCDP Conflict Termination dataset [Kreutz, 2010]:
◮ starting and ending dates of armed conflicts between years 1994 and
2010;
◮ 2 sides in each: sideA is a government, sideB is an insurgent group.
◮ Resulting test set of 673 conflicts, with 137 unique
Location–Insurgent pairs.
◮ We know what armed groups were active and when; ◮ ...now we can try to extract the same data directly from texts.
26
Our task: temporal analogical reasoning
Diachronic cultural shifts and one-to-many armed conflict relations between typed named entities.
27
Our task: temporal analogical reasoning
Diachronic cultural shifts and one-to-many armed conflict relations between typed named entities.
◮ Locations:
- 1. India2003
- 2. India2003
- 3. Uganda2003
- 4. Iraq2004
27
Our task: temporal analogical reasoning
Diachronic cultural shifts and one-to-many armed conflict relations between typed named entities.
◮ Locations:
- 1. India2003
- 2. India2003
- 3. Uganda2003
- 4. Iraq2004
27
Our task: temporal analogical reasoning
Diachronic cultural shifts and one-to-many armed conflict relations between typed named entities.
◮ Locations:
- 1. India2003
- 2. India2003
- 3. Uganda2003
- 4. Iraq2004
◮ Armed groups:
- 1. Kashmir Liberation Front2003
- 2. ULFA2003
- 3. Lord’s Resistance Army2003
- 4. ???
27
Our task: temporal analogical reasoning
Diachronic cultural shifts and one-to-many armed conflict relations between typed named entities.
◮ Locations:
- 1. India2003
- 2. India2003
- 3. Uganda2003
- 4. Iraq2004
◮ Armed groups:
- 1. Kashmir Liberation Front2003
- 2. ULFA2003
- 3. Lord’s Resistance Army2003
- 4. ???
(the correct answer(s): Ansar al-Islam, al-Mahdi Army and Islamic State).
27
Case 2: Tracing diachronic semantic shifts
Incremental diachronic word embeddings
1994 CBOW CBOW CBOW 1995 1996 model 1994 model 1995 model 1996 Yearly corpora Yearly corpora
28
Case 2: Tracing diachronic semantic shifts
The essence of the approach
◮ Diachronic CBOW word embeddings models [Mikolov et al., 2013];
29
Case 2: Tracing diachronic semantic shifts
The essence of the approach
◮ Diachronic CBOW word embeddings models [Mikolov et al., 2013]; ◮ trained incrementally on the English Gigaword news corpus
[Parker et al., 2011];
29
Case 2: Tracing diachronic semantic shifts
The essence of the approach
◮ Diachronic CBOW word embeddings models [Mikolov et al., 2013]; ◮ trained incrementally on the English Gigaword news corpus
[Parker et al., 2011];
◮ years 1994–2010 (yearly subcorpora about 250–320M content
words each);
29
Case 2: Tracing diachronic semantic shifts
The essence of the approach
◮ Diachronic CBOW word embeddings models [Mikolov et al., 2013]; ◮ trained incrementally on the English Gigaword news corpus
[Parker et al., 2011];
◮ years 1994–2010 (yearly subcorpora about 250–320M content
words each);
◮ learned linear projections from the embeddings of locations to the
embeddings of armed groups in each year;
29
Case 2: Tracing diachronic semantic shifts
The essence of the approach
◮ Diachronic CBOW word embeddings models [Mikolov et al., 2013]; ◮ trained incrementally on the English Gigaword news corpus
[Parker et al., 2011];
◮ years 1994–2010 (yearly subcorpora about 250–320M content
words each);
◮ learned linear projections from the embeddings of locations to the
embeddings of armed groups in each year;
◮ projections (transformation matrices) are applied to the model from
the next year.
29
Case 2: Tracing diachronic semantic shifts
Location–insurgent relations (‘semantic directions’) in t-SNE Year 2000 model Year 2001 model Year 2002 model
30
Case 2: Tracing diachronic semantic shifts
Linear projections
◮ Linear regression minimizing the error in transforming one set of
vectors into another:
31
Case 2: Tracing diachronic semantic shifts
Linear projections
◮ Linear regression minimizing the error in transforming one set of
vectors into another:
Input vectors Learning trans- formation Target vectors Result Location1994_1 Solving normal equations Armed group1994_1 Linear transformation matrix (projection) Location1994_2 Armed group1994_2 ... ... Location1994_n Armed group1994_n
31
Case 2: Tracing diachronic semantic shifts
Linear projections
◮ Linear regression minimizing the error in transforming one set of
vectors into another:
Input vectors Learning trans- formation Target vectors Result Location1994_1 Solving normal equations Armed group1994_1 Linear transformation matrix (projection) Location1994_2 Armed group1994_2 ... ... Location1994_n Armed group1994_n
◮ The learned transformation matrix from 1994 can predict an armed
group vector from a location vector for 1995, etc;
31
Case 2: Tracing diachronic semantic shifts
Linear projections
◮ Linear regression minimizing the error in transforming one set of
vectors into another:
Input vectors Learning trans- formation Target vectors Result Location1994_1 Solving normal equations Armed group1994_1 Linear transformation matrix (projection) Location1994_2 Armed group1994_2 ... ... Location1994_n Armed group1994_n
◮ The learned transformation matrix from 1994 can predict an armed
group vector from a location vector for 1995, etc;
◮ can be trained either on all the conflicts from the past and present
years (up-to-now)...
31
Case 2: Tracing diachronic semantic shifts
Linear projections
◮ Linear regression minimizing the error in transforming one set of
vectors into another:
Input vectors Learning trans- formation Target vectors Result Location1994_1 Solving normal equations Armed group1994_1 Linear transformation matrix (projection) Location1994_2 Armed group1994_2 ... ... Location1994_n Armed group1994_n
◮ The learned transformation matrix from 1994 can predict an armed
group vector from a location vector for 1995, etc;
◮ can be trained either on all the conflicts from the past and present
years (up-to-now)...
◮ ...or only on the salient conflicts: active in the last year (previous)
31
Case 2: Tracing diachronic semantic shifts
Evaluation scores
Only in-vocabulary pairs All pairs, including OOV up-to-now previous up-to-now previous Training mode @1 @5 @10 @1 @5 @10 @1 @5 @10 @1 @5 @10 Separate 0.0 0.7 2.1 0.5 1.1 2.4 0.0 0.5 1.6 0.4 0.8 1.8 Cumulative 1.7 8.3 13.8 2.9 9.6 15.2 1.5 7.4 12.2 2.5 8.5 13.4 Incremental static 54.9 82.8 90.1 60.4 79.6 84.8 20.8 31.5 34.2 23.0 30.3 32.2 Incremental dynamic 32.5 64.5 72.2 42.6 64.8 71.5 28.1 56.1 62.9 37.3 56.7 62.6
32
Case 2: Tracing diachronic semantic shifts
Evaluation scores
Only in-vocabulary pairs All pairs, including OOV up-to-now previous up-to-now previous Training mode @1 @5 @10 @1 @5 @10 @1 @5 @10 @1 @5 @10 Separate 0.0 0.7 2.1 0.5 1.1 2.4 0.0 0.5 1.6 0.4 0.8 1.8 Cumulative 1.7 8.3 13.8 2.9 9.6 15.2 1.5 7.4 12.2 2.5 8.5 13.4 Incremental static 54.9 82.8 90.1 60.4 79.6 84.8 20.8 31.5 34.2 23.0 30.3 32.2 Incremental dynamic 32.5 64.5 72.2 42.6 64.8 71.5 28.1 56.1 62.9 37.3 56.7 62.6
◮ Average accuracies of predicting next-year armed groups from
locations.
◮ 3 baselines and the proposed incremental dynamic approach.
32
Case 2: Tracing diachronic semantic shifts
Summary for the Case 2
- 1. Word embeddings can be used to trace temporal dynamics of
semantic relations in word pairs.
33
Case 2: Tracing diachronic semantic shifts
Summary for the Case 2
- 1. Word embeddings can be used to trace temporal dynamics of
semantic relations in word pairs.
◮ This can help people in political science and peace studies to
automatize mining their data.
33
Case 2: Tracing diachronic semantic shifts
Summary for the Case 2
- 1. Word embeddings can be used to trace temporal dynamics of
semantic relations in word pairs.
◮ This can help people in political science and peace studies to
automatize mining their data.
- 2. The necessary prerequisites:
◮ incremental updating of the models with new textual data (not training
from scratch);
◮ expanding the models’ vocabulary. 33
Case 2: Tracing diachronic semantic shifts
Summary for the Case 2
- 1. Word embeddings can be used to trace temporal dynamics of
semantic relations in word pairs.
◮ This can help people in political science and peace studies to
automatize mining their data.
- 2. The necessary prerequisites:
◮ incremental updating of the models with new textual data (not training
from scratch);
◮ expanding the models’ vocabulary.
Now you can decide for yourself, whether NLP/CL is a science or an engineering discipline :-)
33
Q&A
Thank you for your attention! Questions are welcome. Computational linguistics and NLP: How far are they from generic linguistics? ❤tt♣✿✴✴✈❡❝t♦rs✳♥❧♣❧✳❡✉✴❡①♣❧♦r❡✴❡♠❜❡❞❞✐♥❣s✴ Andrey Kutuzov (andreku@ifi.uio.no) Language Technology Group University of Oslo
33
References I
Dyer, C. (2017). Should neural network architecture reflect linguistic structure? In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), page 1. Association for Computational Linguistics. Gleditsch, N. P ., Wallensteen, P ., Eriksson, M., Sollenberg, M., and Strand, H. (2002). Armed conflict 1946-2001: A new dataset. Journal of peace research, 39(5):615–637. Houston, A. C. (1985). Continuity and change in English morphology: The variable (ING). PhD thesis, University of Pennsylvania.
34
References II
Kreutz, J. (2010). How and when armed conflicts end: Introducing the UCDP conflict termination dataset. Journal of Peace Research, 47(2):243–250. Kutuzov, A., Velldal, E., and Øvrelid, L. (2016). Redefining part-of-speech classes with distributional semantic models. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pages 115–125. Association for Computational Linguistics.
35
References III
Kutuzov, A., Velldal, E., and Øvrelid, L. (2017). Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1824–1829. Association for Computational Linguistics. Manning, C. D. (2011). Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In Computational Linguistics and Intelligent Text Processing, pages 171–189. Springer.
36
References IV
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26. Parker, R., Graff, D., Kong, J., Chen, K., and Maeda, K. (2011). English gigaword fifth edition ldc2011t07. Technical report, Linguistic Data Consortium, Philadelphia. Zhao, J., Wang, T., Yatskar, M., Ordonez, V., and Chang, K.-W. (2017). Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2979–2989. Association for Computational Linguistics.
37