Measuring Semantic Coherence of a Conversation Svitlana Vakulenko , - - PowerPoint PPT Presentation

measuring semantic coherence of a conversation
SMART_READER_LITE
LIVE PREVIEW

Measuring Semantic Coherence of a Conversation Svitlana Vakulenko , - - PowerPoint PPT Presentation

Measuring Semantic Coherence of a Conversation Svitlana Vakulenko , Maarten de Rijke, Michael Cochez, Vadim Savenkov, Axel Polleres 6 JUNE 2018 Semantic coherence An essential property of a conversation, continuity of senses


slide-1
SLIDE 1

Measuring Semantic Coherence of a Conversation

Svitlana Vakulenko, Maarten de Rijke, Michael Cochez, Vadim Savenkov, Axel Polleres

6 JUNE 2018

slide-2
SLIDE 2
  • An essential property of a conversation, “continuity of senses”

2

Semantic coherence

https://pixabay.com/en/fishing-net-red-thread-network-node-1526496/

slide-3
SLIDE 3

▪ See if we can detect holes in conversations ▪ Evaluate existing knowledge models ▪ Propose an approach to measure these holes (incoherence) ▪ Why: dialogue system design, knowledge engineering

3

Research goal

slide-4
SLIDE 4

▪ Knowledge Graphs

4

Semantic models

https://commons.wikimedia.org/wiki/File:Wikidata-gun-ontology-2017-05-11.png

slide-5
SLIDE 5

▪ Word embeddings

5

Semantic models

https://commons.wikimedia.org/wiki/File:2016_02_mini_embedding.png

slide-6
SLIDE 6

▪ Knowledge Graphs ▪ Word embeddings

6

Semantic models

https://commons.wikimedia.org/wiki/File:Wikidata-gun-ontology-2017-05-11.png https://commons.wikimedia.org/wiki/File:2016_02_mini_embedding.png

slide-7
SLIDE 7

▪ Knowledge Graphs ▪ Word embeddings ▪ Knowledge Graph embeddings

7

Semantic models

https://commons.wikimedia.org/wiki/File:Wikidata-gun-ontology-2017-05-11.png https://commons.wikimedia.org/wiki/File:2016_02_mini_embedding.png

slide-8
SLIDE 8

▪ Take existing knowledge models ▪ See if we can detect holes in conversations through this models ▪ Propose an approach to measure these holes (incoherence)

8

Linking dialogue

https://pxhere.com/en/photo/1101883

slide-9
SLIDE 9

VAKULENKO ET AL. MEASURING SEMANTIC COHERENCE OF A CONVERSATION. 9

Dialog graph

p1 u1 u3 p2 u2 u4 w1 w2 w4 w5 w3 c1 c* c4 c2

mdg: gksudo gedit /etc/apt/source.list (type from command line) crunchbang666: the text editor has opened the file source.list but there is no content i typed source instead of sources ... ok so i have it open mdg: see the line # deb http://gb.archive.ubuntu all you have to do is delete the ""#"" character crunchbang666: just the deb or the deb-src line too?

dbr:Ubuntu(OS) dbr:Deb(file format) dbr:Text editor dbr:Gedit wikiPageWikiLink wikiPageWikiLink wikiPageWikiLink dbr:GNOME genre

c3

w1 w2 w3 w4 w5 w4

slide-10
SLIDE 10

▪ Ubuntu Dialogue Corpus ▪ DBpedia Spotlight API ▪ Knowledge Graphs: DBpedia+Wikidata HDT ▪ Knowledge Graph embeddings: rdf2vec, KGlove ▪ Word embeddings: word2vec, Glove

10

Experiments

https://github.com/rkadlec/ubuntu-ranking-dataset-creator


https://en.wikipedia.org/wiki/File:DBpediaSpotlight.jpg https://en.wikipedia.org/wiki/Wikidata

slide-11
SLIDE 11

VAKULENKO ET AL. MEASURING SEMANTIC COHERENCE OF A CONVERSATION. 11

Subgraph induction

slide-12
SLIDE 12

PREFIX ppf: <java:at.ac.wu.arqext.path.> PREFIX dbr: <http://dbpedia.org/resource/> SELECT * WHERE { ?X ppf:topk ("--source" dbr:Directory_service dbr:Gnome dbr:GNOME dbr:Desktop_environment "--target" dbr:Desktop_computer "--k" 5 "--maxlength" 9 "--timeout" 2000) }

12

top-k shortest path

http://wikidata.communidata.at

slide-13
SLIDE 13

VAKULENKO ET AL. MEASURING SEMANTIC COHERENCE OF A CONVERSATION. 13

Subgraph statistics

slide-14
SLIDE 14

VAKULENKO ET AL. MEASURING SEMANTIC COHERENCE OF A CONVERSATION. 14

Shortest paths

slide-15
SLIDE 15

▪ random uniform (RUf) ▪ vocabulary distribution (VoD) ▪ sequence disorder (SqD) ▪ horizontal split (HSp) ▪ vertical split (VSp)

15

Negative sampling

slide-16
SLIDE 16

VAKULENKO ET AL. MEASURING SEMANTIC COHERENCE OF A CONVERSATION. 16

Shortest paths

slide-17
SLIDE 17

17

Binary classification

▪ Convolutional Neural Network (CNN) ▪ Input: sequence of words/entities ▪ Output: coherence score [0;1]

Word embeddings Convolutional Max pool 250 filters size 3 step 1 Hidden Output 0.8 ReLU Sigmoid ReLU

slide-18
SLIDE 18

18

Binary classification

▪ Convolutional Neural Network (CNN) ▪ Input: sequence of words/entities ▪ Output: coherence score [0;1]

Knowledge Graph embeddings Convolutional Max pool 250 filters size 3 step 1 Hidden Output 0.8 ReLU Sigmoid ReLU

dbr:ubuntu (OS) dbr:desktop dbr:totem dbr:vlc dbr:fsck dbr:ext2 dbr:partition

slide-19
SLIDE 19

19

Results

slide-20
SLIDE 20

20

Random uniform

slide-21
SLIDE 21

21

Horizontal split

slide-22
SLIDE 22

22

Semantic spaces

slide-23
SLIDE 23

▪ GloVe word embeddings show best performance ▪ integrating heterogenous knowledge sources

23

Conclusions and future work

slide-24
SLIDE 24

▪ NEL is a bottleneck for KG embeddings ▪ End-to-end training (NEL NN-layer)

24

Conclusions and future work

Knowledge Graph embeddings Convolutional Max pool 250 filters size 3 step 1 Hidden Output 0.8 ReLU Sigmoid ReLU

dbr:ubuntu (OS) dbr:desktop dbr:totem dbr:vlc dbr:fsck dbr:ext2 dbr:partition

slide-25
SLIDE 25

▪ Dialog graph embeddings

25

Conclusions and future work

p1 u1 u3 p2 u2 u4 w1 w2 w4 w5 w3 c1 c* c4 c2

mdg: gksudo gedit /etc/apt/source.list (type from command line) crunchbang666: the text editor has opened the file source.list but there is no content i typed source instead of sources ... ok so i have it open mdg: see the line # deb http://gb.archive.ubuntu all you have to do is delete the ""#"" character crunchbang666: just the deb or the deb-src line too?

dbr:Ubuntu(OS) dbr:Deb(file format) dbr:Text editor dbr:Gedit wikiPageWikiLink wikiPageWikiLink wikiPageWikiLink dbr:GNOME genre

c3

w1 w2 w3 w4 w5 w4