[PPT] - Presenting TWITTIR-UD An Italian Twitter Treebank in Universal PowerPoint Presentation

SLIDE 1

Presenting TWITTIRÒ-UD

An Italian Twitter Treebank in Universal Dependencies

Alessandra Teresa Cignarellaa,b Cristina Boscob and Paolo Rossoa

a. Universitat Politècnica de València b. Università degli Studi di Torino

SLIDE 2

Motivation

SLIDE 3

Motivation

1. Sentiment Analysis and Opinion Mining

SLIDE 4

Motivation

1. Sentiment Analysis and Opinion Mining

→ irony, sarcasm, stance, hate speech, misogyny...

SLIDE 5

Motivation

1. Sentiment Analysis and Opinion Mining

→ irony, sarcasm, stance, hate speech, misogyny...

2. Dealing with social media texts

SLIDE 6

Motivation

1. Sentiment Analysis and Opinion Mining

→ irony, sarcasm, stance, hate speech, misogyny...

2. Dealing with social media texts

→ hard!!

SLIDE 7

Motivation

1. Sentiment Analysis and Opinion Mining

→ irony, sarcasm, stance, hate speech, misogyny...

2. Dealing with social media texts

→ hard!!

3. Syntax

SLIDE 8

Motivation

1. Sentiment Analysis and Opinion Mining

→ irony, sarcasm, stance, hate speech, misogyny...

2. Dealing with social media texts

→ hard!!

3. Syntax

→ Universal Dependencies are cool!

SLIDE 9

Research Questions

SLIDE 10

Research Questions

1. How can we automatically detect irony ?

SLIDE 11

Research Questions

1. How can we automatically detect irony ?
2. Could syntax information help in the detection of irony?

SLIDE 12

Research Questions

1. How can we automatically detect irony ?
2. Could syntax information help in the detection of irony?

...and maybe help in other detection tasks too?

SLIDE 13

Research Questions

1. How can we automatically detect irony ?
2. Could syntax information help in the detection of irony?

...and maybe help in other detection tasks too?

Our approach:

SLIDE 14

Research Questions

1. How can we automatically detect irony ?
2. Could syntax information help in the detection of irony?

...and maybe help in other detection tasks too?

Our approach:

Let’s build a corpus and find out!

SLIDE 15

What is TWITTIRÒ-UD ?

SLIDE 16

What is TWITTIRÒ-UD ?

Treebank

SLIDE 17

What is TWITTIRÒ-UD ?

Treebank Italian

SLIDE 18

Twitter

What is TWITTIRÒ-UD ?

Treebank Italian

SLIDE 19

Twitter

What is TWITTIRÒ-UD ?

Treebank Italian Universal Dependencies

SLIDE 20

Twitter

What is TWITTIRÒ-UD ?

Treebank Italian Universal Dependencies Irony Sarcasm

SLIDE 21

Related Work

SLIDE 22

Related Work Social media & Twitter:

SLIDE 23

Related Work Social media & Twitter:

Tagging the Twitterverse (Foster et al., 2011)
The French Social Media Bank (Seddah et al., 2012)
TWEEBANK (Kong et al., 2014)
TWEEBANK v2 (Liu et al., 2018)
Arabic (Albogamy and Ramsay, 2017)
African-American English (Blodgett et al., 2018)
Hindi English (Bhat et al., 2018)

SLIDE 24

Related Work

SLIDE 25

Related Work

SLIDE 26

Related Work Two main references for our work:

SLIDE 27

Related Work Two main references for our work:

UD_Italian treebank (Simi et al., 2014)

SLIDE 28

Related Work Two main references for our work:

UD_Italian treebank (Simi et al., 2014)
PoSTWITA-UD (Sanguinetti et al., 2018)

SLIDE 29

Data

SLIDE 30

Data

1,424 tweets from TWITTIRÒ (Cignarella et al., 2018)

SLIDE 31

Data

1,424 tweets from TWITTIRÒ (Cignarella et al., 2018)
fine-grained irony annotation (Karoui et al. 2017)

SLIDE 32

Data

1,424 tweets from TWITTIRÒ (Cignarella et al., 2018)
fine-grained irony annotation (Karoui et al. 2017)

1. EXPLICIT 2. IMPLICIT

SLIDE 33

Data

1,424 tweets from TWITTIRÒ (Cignarella et al., 2018)
fine-grained irony annotation (Karoui et al. 2017)

1. ANALOGY 2. EUPHEMISM 3. RHETORICAL QUESTION 4. OXYMORON or PARADOX 5. FALSE ASSERTION 6. CONTEXT SHIFT 7. HYPERBOLE or EXAGGERATION 8. OTHER 1. EXPLICIT 2. IMPLICIT

SLIDE 34

Data

1,424 tweets from TWITTIRÒ (Cignarella et al., 2018)
fine-grained irony annotation (Karoui et al. 2017)
sarcasm annotation (EVALITA 2018)

1. ANALOGY 2. EUPHEMISM 3. RHETORICAL QUESTION 4. OXYMORON or PARADOX 5. FALSE ASSERTION 6. CONTEXT SHIFT 7. HYPERBOLE or EXAGGERATION 8. OTHER 1. EXPLICIT 2. IMPLICIT

SLIDE 35

Annotation

SLIDE 36

Annotation

# text = Presentato il nuovo iPhone. È già al 36% di batteria.

SLIDE 37

Annotation

# text = Presentato il nuovo iPhone. È già al 36% di batteria. # irony = EXPLICIT OXYMORON/PARADOX

SLIDE 38

Annotation

# text = Presentato il nuovo iPhone. È già al 36% di batteria. # irony = EXPLICIT OXYMORON/PARADOX # sarcasm = 1

SLIDE 39

Annotation

# text = Presentato il nuovo iPhone. È già al 36% di batteria. # irony = EXPLICIT OXYMORON/PARADOX # sarcasm = 1 Translation: The new iPhone has been launched. Battery is already at 36%.

SLIDE 40

Data

SLIDE 41

Data With the tool UDPipe:

SLIDE 42

Data With the tool UDPipe:

tokenization
lemmatization
PoS-tagging
dependency parsing

SLIDE 43

Data With the tool UDPipe:

tokenization
lemmatization
PoS-tagging
dependency parsing }

SLIDE 44

Data With the tool UDPipe:

tokenization
lemmatization
PoS-tagging
dependency parsing }

1,424 tweets!

(17,933 tokens)

SLIDE 45

Data With the tool UDPipe:

tokenization
lemmatization
PoS-tagging
dependency parsing

Full release in the UD repository: November 2019

}

1,424 tweets!

(17,933 tokens)

SLIDE 46

Data

SLIDE 47

Data

SLIDE 48

Data

SLIDE 49

Data

1. Fine-grained annotation for irony

SLIDE 50

Data

1. Fine-grained annotation for irony

SLIDE 51

Data

1. Fine-grained annotation for irony
2. Morpho-syntactic information

SLIDE 52

Issues Encountered and Lessons Learned

SLIDE 53

Issues Encountered and Lessons Learned

Tokenization errors depending on misspelled words

SLIDE 54

Issues Encountered and Lessons Learned

Tokenization errors depending on misspelled words

xkè → perché

SLIDE 55

Issues Encountered and Lessons Learned

Tokenization errors depending on misspelled words
Punctuation irregularly used

xkè → perché

SLIDE 56

Issues Encountered and Lessons Learned

Tokenization errors depending on misspelled words
Punctuation irregularly used
Twitter marks

xkè → perché

SLIDE 57

Issues Encountered and Lessons Learned

Tokenization errors depending on misspelled words
Punctuation irregularly used
Twitter marks

#hashtag

xkè → perché

SLIDE 58

Issues Encountered and Lessons Learned

Tokenization errors depending on misspelled words
Punctuation irregularly used
Twitter marks

#hashtag

@ m e n t i

n

xkè → perché

SLIDE 59

Issues Encountered and Lessons Learned

Tokenization errors depending on misspelled words
Punctuation irregularly used
Twitter marks
No sentence splitting

#hashtag

@ m e n t i

n

xkè → perché

SLIDE 60

Issues Encountered and Lessons Learned

Tokenization errors depending on misspelled words
Punctuation irregularly used
Twitter marks
No sentence splitting
Single-root constraint

#hashtag

@ m e n t i

n

xkè → perché

SLIDE 61

Issues Encountered and Lessons Learned

SLIDE 62

Issues Encountered and Lessons Learned

SLIDE 63

Issues Encountered and Lessons Learned

SLIDE 64

Issues Encountered and Lessons Learned

SLIDE 65

Issues Encountered and Lessons Learned

SLIDE 66

Issues Encountered and Lessons Learned

SLIDE 67

Other Highlights

SLIDE 68

Other Highlights

Punctuation is indeed exploited more extensively in the

two social media datasets rather than in UD_Italian.

SLIDE 69

Other Highlights

Punctuation is indeed exploited more extensively in the

two social media datasets rather than in UD_Italian.

Mentions and hashtags have a similar distribution in

the two social media datasets.

SLIDE 70

Other Highlights

Punctuation is indeed exploited more extensively in the

two social media datasets rather than in UD_Italian.

Mentions and hashtags have a similar distribution in

the two social media datasets.

The use of passive voices (aux:pass) is low in

PoSTWITA-UD and in TWITTIRÒ-UD, indicating a preference for the exploitation of active voices, as it happens in spoken language.

SLIDE 71

A Parsing Experiment

SLIDE 72

A Parsing Experiment We performed an evaluation of UDPipe using the TWITTIRÒ-UD gold corpus as a test set.

SLIDE 73

A Parsing Experiment We performed an evaluation of UDPipe using the TWITTIRÒ-UD gold corpus as a test set. The following settings were exploited:

SLIDE 74

A Parsing Experiment We performed an evaluation of UDPipe using the TWITTIRÒ-UD gold corpus as a test set. The following settings were exploited:

1. training UDPipe using only UD_Italian

SLIDE 75

A Parsing Experiment We performed an evaluation of UDPipe using the TWITTIRÒ-UD gold corpus as a test set. The following settings were exploited:

1. training UDPipe using only UD_Italian
2. training UDPipe using only PoSTWITA-UD

SLIDE 76

A Parsing Experiment We performed an evaluation of UDPipe using the TWITTIRÒ-UD gold corpus as a test set. The following settings were exploited:

1. training UDPipe using only UD_Italian
2. training UDPipe using only PoSTWITA-UD
3. training UDPipe using both resources

SLIDE 77

A Parsing Experiment

SLIDE 78

A Parsing Experiment

SLIDE 79

A Parsing Experiment

SLIDE 80

A Parsing Experiment Results in-line with state of the art

(PoSTWITA-UD, Sanguinetti et al., 2018)

SLIDE 81

Conclusions

SLIDE 82

Conclusions

We discuss the annotation of this resource which

encompasses a fine-grained representation of irony and the UD morpho-syntactic analysis

SLIDE 83

Conclusions

We discuss the annotation of this resource which

encompasses a fine-grained representation of irony and the UD morpho-syntactic analysis

Release of the complete resource (1,424 tweets) to be

accomplished in November 2019

SLIDE 84

Conclusions

We discuss the annotation of this resource which

encompasses a fine-grained representation of irony and the UD morpho-syntactic analysis

Release of the complete resource (1,424 tweets) to be

accomplished in November 2019

It enriches the scenario of available resources for a text

genre which is especially hard to parse (social media texts)

SLIDE 85

Future Work

SLIDE 86

Future Work

Investigation of possible relationships between syntax

and semantics of the uses of figurative language (irony in particular)

SLIDE 87

Future Work

Investigation of possible relationships between syntax

and semantics of the uses of figurative language (irony in particular)

→ ongoing experiments...

SLIDE 88

Future Work

Investigation of possible relationships between syntax

and semantics of the uses of figurative language (irony in particular)

→ ongoing experiments...

A resource whose annotation encompasses both UD

relations and a fine-grained description of irony may indeed pave the way for the investigation of whether syntactic knowledge might help in SA and other related tasks

SLIDE 89

Future Work

Investigation of possible relationships between syntax

and semantics of the uses of figurative language (irony in particular)

→ ongoing experiments...

A resource whose annotation encompasses both UD

relations and a fine-grained description of irony may indeed pave the way for the investigation of whether syntactic knowledge might help in SA and other related tasks

→ new NLP features for Sentiment Analysis?

SLIDE 90

Thank you!

cigna@di.unito.it