Knowledge Extraction in Text Marco Ponza Who am I? Last-year PhD - - PowerPoint PPT Presentation

knowledge extraction in text
SMART_READER_LITE
LIVE PREVIEW

Knowledge Extraction in Text Marco Ponza Who am I? Last-year PhD - - PowerPoint PPT Presentation

Knowledge Extraction in Text Marco Ponza Who am I? Last-year PhD student... Nov 2015 Aug 2017 Oct 2018 Mar 2019 Moved to PhD Thesis PhD Defense Started PhD Germany Submitted Advanced Algorithms & Applications Lab supervisor: Prof.


slide-1
SLIDE 1

Knowledge Extraction in Text

Marco Ponza

slide-2
SLIDE 2

RESEARCH

Data Compression Searching & Mining Web & Social Media Natural Language Understanding

Advanced Algorithms & Applications Lab

supervisor: Prof. Paolo Ferragina

Who am I? Last-year PhD student...

Started PhD Nov 2015 PhD Thesis Submitted Oct 2018

Moved to Germany

Aug 2017 PhD Defense Mar 2019

slide-3
SLIDE 3

Easy for humans Hard for machines But, machines need today to access, read and understand information stored in very large data archives

Natural Language Understanding

...and this will get to be more and more crucial with Conversational AI systems!

slide-4
SLIDE 4

▷ Machines represent texts by their (possibly, ambiguous) words

Leonardo is the scientist who painted Mona Lisa Leonardo scientist paint Mona Lisa Natural Language Understanding

slide-5
SLIDE 5

Leonardo is the scientist who painted Mona Lisa Leonardo scientist paint Mona Lisa Natural Language Understanding

▷ Machines represent texts by their (possibly, ambiguous) words

slide-6
SLIDE 6

Leonardo is the scientist who painted Mona Lisa Leonardo scientist paint Mona Lisa Natural Language Understanding

Leonardo DiCaprio Leonardo (Ninja Turtle) Leonardo da Vinci Leonardo (Town)

▷ Machines represent texts by their (possibly, ambiguous) words

slide-7
SLIDE 7

Knowledge Graph

May 2012

https://www.blog.google/products/search/introducing-knowledge-graph-things-not

slide-8
SLIDE 8
slide-9
SLIDE 9

Louvre

Leonardo is the scientist who painted Mona Lisa

Leonardo da Vinci Mona Lisa (painting)

Science Cartography Art Italy

Understanding the Text by Entities, not Strings

Florence Renaissance

Map ambiguous words into the real-world entities they refer to as well as contextualize them together with related entities

slide-10
SLIDE 10

Understanding the Text by Entities, not Strings

Map ambiguous words into the real-world entities they refer to as well as contextualize them together with related entities Two Editions: 2010 & 2013 Since 2010: Efficient & effective solutions for this problem!

slide-11
SLIDE 11

People

Understanding the Text by Salient Entities

Hilary Clinton Barack Obama George W Bush Hawaii

...and more and more entities!

slide-12
SLIDE 12

Understanding the Text by Salient Entities

Hilary Clinton Barack Obama George W Bush

...and more and more entities!

▷ Entity Salience Problem

Relevant vs Non-Relevant Entities

People Hawaii

slide-13
SLIDE 13

wrt CMU/Google system ○ ○ Published at ▷ Our Solution Improvements of +12%

People

Understanding the Text by Salient Entities

Hilary Clinton Barack Obama George W Bush Hawaii

...and more and more entities!

▷ Entity Salience Problem

Relevant vs Non-Relevant Entities

Research Grant 2017

slide-14
SLIDE 14

▷ How? Applying Graph Theory and Algorithms!

Used to draw new features Classify entities into salient/non-salient via ML

▷ Problem: How can we weight the edges?

Understanding the Text by Salient Entities

Text

slide-15
SLIDE 15

EMNLP 2018

Brussels

Understanding the Text by Extraction of Facts

▷ How to enrich the Knowledge Graph with information that comes from a text? Identify salient entities Extract facts connecting them

Text Knowledge Graph

Leonardo is the scientist who painted Mona Lisa

(“Leonardo”, “is”, “scientist”) (“Leonardo”, “painted”, “Mona Lisa”)

Facts

slide-16
SLIDE 16

Expert Finding & Profiling

▷ ~1.5K Authors ▷ ~65K Documents (papers’ abstracts) ▷ ~35K Research Topics ▷ More than 1K queries and ~2K profiles view in few months ▷ Currently used by UniPi’s Technology Transfer Office

slide-17
SLIDE 17
slide-18
SLIDE 18

2019

slide-19
SLIDE 19

Future Directions

Moving the paradigms: from text to conversations!

slide-20
SLIDE 20

Thanks!

http:/ /swat.d4science.org http:/ /wiser.d4science.org

SYSTEMS

https:/ /sobigdata.d4science.org/web/tagme