ClausIE: Clause-Based Open Information Extraction Luciano Del Corro - - PowerPoint PPT Presentation

clausie clause based open information extraction
SMART_READER_LITE
LIVE PREVIEW

ClausIE: Clause-Based Open Information Extraction Luciano Del Corro - - PowerPoint PPT Presentation

ClausIE: Clause-Based Open Information Extraction Luciano Del Corro Rainer Gemulla Max-Planck-Institut fr Informatik May 2013 Del Corro, Gemulla (MPI) ClausIE May 2013 1 / 18 Open Information Extraction: From sentences to propositions


slide-1
SLIDE 1

ClausIE: Clause-Based Open Information Extraction

Luciano Del Corro Rainer Gemulla

Max-Planck-Institut für Informatik

May 2013

Del Corro, Gemulla (MPI) ClausIE May 2013 1 / 18

slide-2
SLIDE 2

Open Information Extraction: From sentences to propositions GOAL: Extract information from natural text

Del Corro, Gemulla (MPI) ClausIE May 2013 2 / 18

slide-3
SLIDE 3

Open Information Extraction: From sentences to propositions GOAL: Extract information from natural text Sentence Bell, a telecommunication company, which is based in Los Angeles, makes and distributes electronic, computer and building products.

Del Corro, Gemulla (MPI) ClausIE May 2013 2 / 18

slide-4
SLIDE 4

Open Information Extraction: From sentences to propositions GOAL: Extract information from natural text Sentence Bell, a telecommunication company, which is based in Los Angeles, makes and distributes electronic, computer and building products. Extractions/Propositions (Bell, ’is’, a telecommunication company) (Bell, is based in, Los Angeles) (Bell, makes, electronic products) (Bell, distributes, electronic products) . . .

Del Corro, Gemulla (MPI) ClausIE May 2013 2 / 18

slide-5
SLIDE 5

Open Information Extraction: From sentences to propositions GOAL: Extract information from natural text Sentence Bell, a telecommunication company, which is based in Los Angeles, makes and distributes electronic, computer and building products. Extractions/Propositions (Bell, ’is’, a telecommunication company) (Bell, is based in, Los Angeles) (Bell, makes, electronic products) (Bell, distributes, electronic products) . . . Most OIE extractors Propositions expressed as triples (arg1, relation, arg2) Verb based relation Arguments restricted to noun phrases

Del Corro, Gemulla (MPI) ClausIE May 2013 2 / 18

slide-6
SLIDE 6

Open Information Extraction: challenges and applications Challenges/Requirements Domain independent Unbounded set of relations No filtering of information Structured output Scalable

Del Corro, Gemulla (MPI) ClausIE May 2013 3 / 18

slide-7
SLIDE 7

Open Information Extraction: challenges and applications Challenges/Requirements Domain independent Unbounded set of relations No filtering of information Structured output Scalable Applications Structured search Automatic ontology construction Question answering Semantic role labeling, discourse parsing, ... ?

Del Corro, Gemulla (MPI) ClausIE May 2013 3 / 18

slide-8
SLIDE 8

Outline

1

Information and Representation

2

Open Information Extractors and Language Technology

3

ClausIE Clauses in the English Language From clauses to propositions

4

Results

5

Conclusions and Future Directions

Del Corro, Gemulla (MPI) ClausIE May 2013 4 / 18

slide-9
SLIDE 9

Information and Representation

Outline

1

Information and Representation

2

Open Information Extractors and Language Technology

3

ClausIE Clauses in the English Language From clauses to propositions

4

Results

5

Conclusions and Future Directions

Del Corro, Gemulla (MPI) ClausIE May 2013 5 / 18

slide-10
SLIDE 10

Information and Representation

Information and Representation: a two-step approach Information What information is expressed? How much to retain? How to identify it? (e.g. non-verb mediated propositions‘)

⋆ Messi, a golden ball winner, plays in Barcelona

Del Corro, Gemulla (MPI) ClausIE May 2013 5 / 18

slide-11
SLIDE 11

Information and Representation

Information and Representation: a two-step approach Information What information is expressed? How much to retain? How to identify it? (e.g. non-verb mediated propositions‘)

⋆ Messi, a golden ball winner, plays in Barcelona

Representation What is the form of the relation?

⋆ Messi plays in Barcelona → plays or plays in

Triples or n-ary propositions?

⋆ (Messi, plays football in, Barcelona) or (Messi, plays, football, in Barcelona)

What should be the scope of the arguments?

⋆ Gandhi was vegetarian

Del Corro, Gemulla (MPI) ClausIE May 2013 5 / 18

slide-12
SLIDE 12

Information and Representation

Information and Representation: a two-step approach Information What information is expressed? How much to retain? How to identify it? (e.g. non-verb mediated propositions‘)

⋆ Messi, a golden ball winner, plays in Barcelona

Representation What is the form of the relation?

⋆ Messi plays in Barcelona → plays or plays in

Triples or n-ary propositions?

⋆ (Messi, plays football in, Barcelona) or (Messi, plays, football, in Barcelona)

What should be the scope of the arguments?

⋆ Gandhi was vegetarian

Del Corro, Gemulla (MPI) ClausIE May 2013 5 / 18

We aim to separate these two phases

slide-13
SLIDE 13

Open Information Extractors and Language Technology

Outline

1

Information and Representation

2

Open Information Extractors and Language Technology

3

ClausIE Clauses in the English Language From clauses to propositions

4

Results

5

Conclusions and Future Directions

Del Corro, Gemulla (MPI) ClausIE May 2013 6 / 18

slide-14
SLIDE 14

Open Information Extractors and Language Technology

Open Information Extractors and Language Technology

Del Corro, Gemulla (MPI) ClausIE May 2013 6 / 18

Bell , a telecommunication company , which is based in Los Angeles , makes and distributes electronic , computer and building products B-NP B-NP I-NP I-NP , B-NP B-VP I-VP B-PP B-NP I-NP , B-VP I-VP I-VP B-ADJP , B-NP I-NP I-NP I-NP NNP DT JJ NN , WDT VBZ VBN IN NNP NNP , VBZ CC VBZ JJ , NN CC NN NNS

nsubj det nn appos nsubjpass auxpass rcmod nn prep in conj and amod conj and conj and dobj root

DP chunks POS

slide-15
SLIDE 15

Open Information Extractors and Language Technology

Open Information Extractors and Language Technology Chunks/POS TextRunner WOEpos Reverb Dependency Parser Wanderlust WOEparse KrakeN OLLIE

Del Corro, Gemulla (MPI) ClausIE May 2013 6 / 18

Bell , a telecommunication company , which is based in Los Angeles , makes and distributes electronic , computer and building products B-NP B-NP I-NP I-NP , B-NP B-VP I-VP B-PP B-NP I-NP , B-VP I-VP I-VP B-ADJP , B-NP I-NP I-NP I-NP NNP DT JJ NN , WDT VBZ VBN IN NNP NNP , VBZ CC VBZ JJ , NN CC NN NNS

nsubj det nn appos nsubjpass auxpass rcmod nn prep in conj and amod conj and conj and dobj root

DP chunks POS

slide-16
SLIDE 16

ClausIE

Outline

1

Information and Representation

2

Open Information Extractors and Language Technology

3

ClausIE Clauses in the English Language From clauses to propositions

4

Results

5

Conclusions and Future Directions

Del Corro, Gemulla (MPI) ClausIE May 2013 7 / 18

slide-17
SLIDE 17

ClausIE Clauses in the English Language

Clause Essentials A clause is like a simple sentence

⋆ Paul eats a chocolate bar

Del Corro, Gemulla (MPI) ClausIE May 2013 7 / 18

slide-18
SLIDE 18

ClausIE Clauses in the English Language

Clause Essentials A clause is like a simple sentence

⋆ Paul eats a chocolate bar

A sentence can be composed by more than one clause

⋆ Anna drinks coffee and Bob plays football

Del Corro, Gemulla (MPI) ClausIE May 2013 7 / 18

slide-19
SLIDE 19

ClausIE Clauses in the English Language

Clause Essentials A clause is like a simple sentence

⋆ Paul eats a chocolate bar

A sentence can be composed by more than one clause

⋆ Anna drinks coffee and Bob plays football

Each clause encodes one or more propositions

Del Corro, Gemulla (MPI) ClausIE May 2013 7 / 18

slide-20
SLIDE 20

ClausIE Clauses in the English Language

Clause Essentials A clause is like a simple sentence

⋆ Paul eats a chocolate bar

A sentence can be composed by more than one clause

⋆ Anna drinks coffee and Bob plays football

Each clause encodes one or more propositions Clauses can have optional adverbials

⋆ He will take the exam in May

Del Corro, Gemulla (MPI) ClausIE May 2013 7 / 18

slide-21
SLIDE 21

ClausIE Clauses in the English Language

Clause Essentials A clause is like a simple sentence

⋆ Paul eats a chocolate bar

A sentence can be composed by more than one clause

⋆ Anna drinks coffee and Bob plays football

Each clause encodes one or more propositions Clauses can have optional adverbials

⋆ He will take the exam in May

A minimal clause is a clause without its optional adverbials

⋆ He will take the exam

Del Corro, Gemulla (MPI) ClausIE May 2013 7 / 18

slide-22
SLIDE 22

ClausIE Clauses in the English Language

The seven clauses

1

SVi → Albert Einstein died.

S: Subject, V: Verb, A: Adverbial, C: Complement, Oi: Indirect Object, O: Direct Object

Del Corro, Gemulla (MPI) ClausIE May 2013 8 / 18

slide-23
SLIDE 23

ClausIE Clauses in the English Language

The seven clauses

1

SVi → Albert Einstein died.

2

SVe A → Albert Einstein remained in Princeton.

S: Subject, V: Verb, A: Adverbial, C: Complement, Oi: Indirect Object, O: Direct Object

Del Corro, Gemulla (MPI) ClausIE May 2013 8 / 18

slide-24
SLIDE 24

ClausIE Clauses in the English Language

The seven clauses

1

SVi → Albert Einstein died.

2

SVe A → Albert Einstein remained in Princeton.

3

SVc C → Albert Einstein is smart.

S: Subject, V: Verb, A: Adverbial, C: Complement, Oi: Indirect Object, O: Direct Object

Del Corro, Gemulla (MPI) ClausIE May 2013 8 / 18

slide-25
SLIDE 25

ClausIE Clauses in the English Language

The seven clauses

1

SVi → Albert Einstein died.

2

SVe A → Albert Einstein remained in Princeton.

3

SVc C → Albert Einstein is smart.

4

SVmt O → Albert Einstein has won the Nobel Prize.

5

SVdt Oi Od → RSAS gave Albert Einstein the Nobel Prize.

6

SVct O A → The doorman showed Albert Einstein to his office.

7

SVct O C → Albert Einstein declared the meeting open.

S: Subject, V: Verb, A: Adverbial, C: Complement, Oi: Indirect Object, O: Direct Object

Del Corro, Gemulla (MPI) ClausIE May 2013 8 / 18

slide-26
SLIDE 26

ClausIE Clauses in the English Language

The seven clauses

1

SVi → Albert Einstein died.

2

SVe A → Albert Einstein remained in Princeton.

3

SVc C → Albert Einstein is smart.

4

SVmt O → Albert Einstein has won the Nobel Prize.

5

SVdt Oi Od → RSAS gave Albert Einstein the Nobel Prize.

6

SVct O A → The doorman showed Albert Einstein to his office.

7

SVct O C → Albert Einstein declared the meeting open.

S: Subject, V: Verb, A: Adverbial, C: Complement, Oi: Indirect Object, O: Direct Object

Del Corro, Gemulla (MPI) ClausIE May 2013 8 / 18

By identifying each minimal clause in a sentence we can identify the essential information

slide-27
SLIDE 27

ClausIE Clauses in the English Language

The seven clauses: optional adverbials

Pattern Clause Type Example Derived clauses Some extended patterns SViAA SV AE died in Princeton in 1955. (AE, died) (AE, died, in Princeton) (AE, died, in 1955) (AE, died, in Princeton, in 1955)

S: Subject, V: Verb, A: Adverbial, C: Complement, Oi: Indirect Object, O: Direct Object

slide-28
SLIDE 28

ClausIE Clauses in the English Language

The seven clauses: optional adverbials

Pattern Clause Type Example Derived clauses Some extended patterns SViAA SV AE died in Princeton in 1955. (AE, died) (AE, died, in Princeton) (AE, died, in 1955) (AE, died, in Princeton, in 1955) SVeAA SVA AE remained in Princeton until his death. (AE, remained, in Princeton) (AE, remained, in Princeton, until his death)

S: Subject, V: Verb, A: Adverbial, C: Complement, Oi: Indirect Object, O: Direct Object

slide-29
SLIDE 29

ClausIE Clauses in the English Language

The seven clauses: optional adverbials

Pattern Clause Type Example Derived clauses Some extended patterns SViAA SV AE died in Princeton in 1955. (AE, died) (AE, died, in Princeton) (AE, died, in 1955) (AE, died, in Princeton, in 1955) SVeAA SVA AE remained in Princeton until his death. (AE, remained, in Princeton) (AE, remained, in Princeton, until his death) SVcCA SVC AE is a scientist of the 20th century. (AE, is, a scientist) (AE, is, a scientist, of the 20th century) SVmtOA SVO AE has won the Nobel Prize in 1921. (AE, has won, the Nobel Prize) (AE, has won, the Nobel Prize, in 1921) ASVmtO SVO In 1921, AE has won the Nobel Prize. (AE, has won, the Nobel Prize) (AE, has won, the Nobel Prize, in 1921)

S: Subject, V: Verb, A: Adverbial, C: Complement, Oi: Indirect Object, O: Direct Object

Del Corro, Gemulla (MPI) ClausIE May 2013 9 / 18

slide-30
SLIDE 30

ClausIE From clauses to propositions

From clauses to clause types (I)

Del Corro, Gemulla (MPI) ClausIE May 2013 10 / 18

Gandhi was vegetarian. NNP VBD JJ.

nsubj cop root

DP

slide-31
SLIDE 31

ClausIE From clauses to propositions

From clauses to clause types (I)

Del Corro, Gemulla (MPI) ClausIE May 2013 10 / 18

Gandhi was vegetarian. NNP VBD JJ.

nsubj cop root

DP Clause

slide-32
SLIDE 32

ClausIE From clauses to propositions

From clauses to clause types (I)

Del Corro, Gemulla (MPI) ClausIE May 2013 10 / 18

Gandhi was vegetarian. NNP VBD JJ.

nsubj cop root

DP Clause Object? Q1

slide-33
SLIDE 33

ClausIE From clauses to propositions

From clauses to clause types (I)

Del Corro, Gemulla (MPI) ClausIE May 2013 10 / 18

Gandhi was vegetarian. NNP VBD JJ.

nsubj cop root

DP Clause Object? Q1 Complement? Q2 No

slide-34
SLIDE 34

ClausIE From clauses to propositions

From clauses to clause types (I)

Del Corro, Gemulla (MPI) ClausIE May 2013 10 / 18

Gandhi was vegetarian. NNP VBD JJ.

nsubj cop root

DP Clause Object? Q1 Complement? Q2 Copular (SVC) No Yes

slide-35
SLIDE 35

ClausIE From clauses to propositions

From clauses to clause types (I)

Del Corro, Gemulla (MPI) ClausIE May 2013 10 / 18

Gandhi was vegetarian. NNP VBD JJ.

nsubj cop root

DP Clause Object? Q1 Complement? Q2 Copular (SVC) No Yes

( S: Gandhi, V: was, C: vegetarian)

slide-36
SLIDE 36

ClausIE From clauses to propositions

From clauses to clause types (II)

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

Albert Einstein died in Princeton. B-NP I-NP B-VP B-PP B-NP. NNP NNP VBD IN NNP.

nn nsubj prep in root

DP

slide-37
SLIDE 37

ClausIE From clauses to propositions

From clauses to clause types (II)

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

Albert Einstein died in Princeton. B-NP I-NP B-VP B-PP B-NP. NNP NNP VBD IN NNP.

nn nsubj prep in root

DP Clause

slide-38
SLIDE 38

ClausIE From clauses to propositions

From clauses to clause types (II)

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

Albert Einstein died in Princeton. B-NP I-NP B-VP B-PP B-NP. NNP NNP VBD IN NNP.

nn nsubj prep in root

DP Clause Object? Q1

slide-39
SLIDE 39

ClausIE From clauses to propositions

From clauses to clause types (II)

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

Albert Einstein died in Princeton. B-NP I-NP B-VP B-PP B-NP. NNP NNP VBD IN NNP.

nn nsubj prep in root

DP Clause Object? Q1 Complement? Q2 No

slide-40
SLIDE 40

ClausIE From clauses to propositions

From clauses to clause types (II)

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

Albert Einstein died in Princeton. B-NP I-NP B-VP B-PP B-NP. NNP NNP VBD IN NNP.

nn nsubj prep in root

DP Clause Object? Q1 Complement? Candidate adverbial? Q2 No No

slide-41
SLIDE 41

ClausIE From clauses to propositions

From clauses to clause types (II)

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

Albert Einstein died in Princeton. B-NP I-NP B-VP B-PP B-NP. NNP NNP VBD IN NNP.

nn nsubj prep in root

DP Clause Object? Q1 Complement? Candidate adverbial? Known non-

  • ext. copular?

Q2 No No Yes

slide-42
SLIDE 42

ClausIE From clauses to propositions

From clauses to clause types (II)

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

Albert Einstein died in Princeton. B-NP I-NP B-VP B-PP B-NP. NNP NNP VBD IN NNP.

nn nsubj prep in root

DP Clause Object? Q1 Complement? Candidate adverbial? Known non-

  • ext. copular?

Q2 Intransitive (SV) No No Yes Yes

slide-43
SLIDE 43

ClausIE From clauses to propositions

From clauses to clause types (II)

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

Albert Einstein died in Princeton. B-NP I-NP B-VP B-PP B-NP. NNP NNP VBD IN NNP.

nn nsubj prep in root

DP Clause Object? Q1 Complement? Candidate adverbial? Known non-

  • ext. copular?

Q2 Intransitive (SV) No No Yes Yes

( S: AE, V: died,) ( S: AE, V: died, A: in Princeton)

slide-44
SLIDE 44

ClausIE From clauses to propositions

From clauses to clause types (II)

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

DP Clause Object? Q1 Complement? Candidate adverbial? Known non-

  • ext. copular?

Known

  • ext. copular?

Conservative? Q2 Q3 Q4 Q5 Q6 Copular (SVC) Intransitive (SV) Extended copular (SVA) No Yes No Yes No No Yes No yes no yes

  • Dir. and in-

direct object? Complement? Cand.

  • adv. and direct
  • bject?

Potentially compl.-trans.? Conservative? Q7 Q8 Q9 Q10 Q11 Ditransitive (SVOO) Complex tran- sitive (SVOC) Monotransitive (SVO) Complex tran- sitive (SVOA) Yes No Yes No Yes Yes No No Yes No Yes

slide-45
SLIDE 45

ClausIE From clauses to propositions

From clauses to clause types (II)

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

DP Clause Object? Q1 Complement? Candidate adverbial? Known non-

  • ext. copular?

Known

  • ext. copular?

Conservative? Q2 Q3 Q4 Q5 Q6 Copular (SVC) Intransitive (SV) Extended copular (SVA) No Yes No Yes No No Yes No yes no yes

  • Dir. and in-

direct object? Complement? Cand.

  • adv. and direct
  • bject?

Potentially compl.-trans.? Conservative? Q7 Q8 Q9 Q10 Q11 Ditransitive (SVOO) Complex tran- sitive (SVOC) Monotransitive (SVO) Complex tran- sitive (SVOA) Yes No Yes No Yes Yes No No Yes No Yes

We first identify the information and then generate the proposition.

slide-46
SLIDE 46

ClausIE From clauses to propositions

From clauses to clause types (II) ClausIE makes use of dictionaries

Del Corro, Gemulla (MPI) ClausIE May 2013 11 / 18

DP Clause Object? Q1 Complement? Candidate adverbial? Known non-

  • ext. copular?

Known

  • ext. copular?

Conservative? Q2 Q3 Q4 Q5 Q6 Copular (SVC) Intransitive (SV) Extended copular (SVA) No Yes No Yes No No Yes No yes no yes

  • Dir. and in-

direct object? Complement? Cand.

  • adv. and direct
  • bject?

Potentially compl.-trans.? Conservative? Q7 Q8 Q9 Q10 Q11 Ditransitive (SVOO) Complex tran- sitive (SVOC) Monotransitive (SVO) Complex tran- sitive (SVOA) Yes No Yes No Yes Yes No No Yes No Yes

We first identify the information and then generate the proposition.

slide-47
SLIDE 47

ClausIE From clauses to propositions

Example

Del Corro, Gemulla (MPI) ClausIE May 2013 12 / 18

Bell , a telecommunication company , which is based in Los Angeles , makes and distributes electronic , computer and building products B-NP B-NP I-NP I-NP , B-NP B-VP I-VP B-PP B-NP I-NP , B-VP I-VP I-VP B-ADJP , B-NP I-NP I-NP I-NP NNP DT JJ NN , WDT VBZ VBN IN NNP NNP , VBZ CC VBZ JJ , NN CC NN NNS

nsubj det nn appos nsubjpass auxpass rcmod nn prep in conj and amod conj and conj and dobj root

slide-48
SLIDE 48

ClausIE From clauses to propositions

Example

Del Corro, Gemulla (MPI) ClausIE May 2013 12 / 18

Bell , a telecommunication company , which is based in Los Angeles , makes and distributes electronic , computer and building products B-NP B-NP I-NP I-NP , B-NP B-VP I-VP B-PP B-NP I-NP , B-VP I-VP I-VP B-ADJP , B-NP I-NP I-NP I-NP NNP DT JJ NN , WDT VBZ VBN IN NNP NNP , VBZ CC VBZ JJ , NN CC NN NNS

nsubj det nn appos nsubjpass auxpass rcmod nn prep in conj and amod conj and conj and dobj root

Bell, a telecommunication company, which is based in Los Angeles , makes and distributes electronic, computer and building products.

slide-49
SLIDE 49

ClausIE From clauses to propositions

Example

Reverb → (a telecommunication company, is based in, Los Angeles) Ollie → (Bell, distributes, electronic , computer and building products)

Del Corro, Gemulla (MPI) ClausIE May 2013 12 / 18

Bell , a telecommunication company , which is based in Los Angeles , makes and distributes electronic , computer and building products B-NP B-NP I-NP I-NP , B-NP B-VP I-VP B-PP B-NP I-NP , B-VP I-VP I-VP B-ADJP , B-NP I-NP I-NP I-NP NNP DT JJ NN , WDT VBZ VBN IN NNP NNP , VBZ CC VBZ JJ , NN CC NN NNS

nsubj det nn appos nsubjpass auxpass rcmod nn prep in conj and amod conj and conj and dobj root

Bell, a telecommunication company, which is based in Los Angeles , makes and distributes electronic, computer and building products.

slide-50
SLIDE 50

ClausIE From clauses to propositions

Example

Reverb → (a telecommunication company, is based in, Los Angeles) Ollie → (Bell, distributes, electronic , computer and building products) ClausIE → (S: Bell, V: ’is’, C: a telecommunication company) (S: Bell, V: is based, A: in Los Angeles) (S: Bell, V: makes, O: electronic products) (S: Bell, V: makes, O: computer products) (S: Bell, V: makes, O: building products) (S: Bell, V: distributes, O: electronic products) (S: Bell, V: distributes, O: computer products) (S: Bell, V: distributes, O: building products)

Del Corro, Gemulla (MPI) ClausIE May 2013 12 / 18

Bell , a telecommunication company , which is based in Los Angeles , makes and distributes electronic , computer and building products B-NP B-NP I-NP I-NP , B-NP B-VP I-VP B-PP B-NP I-NP , B-VP I-VP I-VP B-ADJP , B-NP I-NP I-NP I-NP NNP DT JJ NN , WDT VBZ VBN IN NNP NNP , VBZ CC VBZ JJ , NN CC NN NNS

nsubj det nn appos nsubjpass auxpass rcmod nn prep in conj and amod conj and conj and dobj root

Bell, a telecommunication company, which is based in Los Angeles , makes and distributes electronic, computer and building products.

slide-51
SLIDE 51

ClausIE From clauses to propositions

Example

Reverb → (a telecommunication company, is based in, Los Angeles) Ollie → (Bell, distributes, electronic , computer and building products) ClausIE → (S: Bell, V: ’is’, C: a telecommunication company) (S: Bell, V: is based, A: in Los Angeles) (S: Bell, V: makes, O: electronic products) (S: Bell, V: makes, O: computer products) (S: Bell, V: makes, O: building products) (S: Bell, V: distributes, O: electronic products) (S: Bell, V: distributes, O: computer products) (S: Bell, V: distributes, O: building products)

Del Corro, Gemulla (MPI) ClausIE May 2013 12 / 18

Bell , a telecommunication company , which is based in Los Angeles , makes and distributes electronic , computer and building products B-NP B-NP I-NP I-NP , B-NP B-VP I-VP B-PP B-NP I-NP , B-VP I-VP I-VP B-ADJP , B-NP I-NP I-NP I-NP NNP DT JJ NN , WDT VBZ VBN IN NNP NNP , VBZ CC VBZ JJ , NN CC NN NNS

nsubj det nn appos nsubjpass auxpass rcmod nn prep in conj and amod conj and conj and dobj root

Bell, a telecommunication company, which is based in Los Angeles , makes and distributes electronic, computer and building products.

slide-52
SLIDE 52

ClausIE From clauses to propositions

Example

Reverb → (a telecommunication company, is based in, Los Angeles) Ollie → (Bell, distributes, electronic , computer and building products) ClausIE → (S: Bell, V: ’is’, C: a telecommunication company) (S: Bell, V: is based, A: in Los Angeles) (S: Bell, V: makes, O: electronic products) (S: Bell, V: makes, O: computer products) (S: Bell, V: makes, O: building products) (S: Bell, V: distributes, O: electronic products) (S: Bell, V: distributes, O: computer products) (S: Bell, V: distributes, O: building products)

Del Corro, Gemulla (MPI) ClausIE May 2013 12 / 18

Bell , a telecommunication company , which is based in Los Angeles , makes and distributes electronic , computer and building products B-NP B-NP I-NP I-NP , B-NP B-VP I-VP B-PP B-NP I-NP , B-VP I-VP I-VP B-ADJP , B-NP I-NP I-NP I-NP NNP DT JJ NN , WDT VBZ VBN IN NNP NNP , VBZ CC VBZ JJ , NN CC NN NNS

nsubj det nn appos nsubjpass auxpass rcmod nn prep in conj and amod conj and conj and dobj root

Bell, a telecommunication company, which is based in Los Angeles , makes and distributes electronic, computer and building products.

slide-53
SLIDE 53

ClausIE From clauses to propositions

Example

Reverb → (a telecommunication company, is based in, Los Angeles) Ollie → (Bell, distributes, electronic , computer and building products) ClausIE → (S: Bell, V: ’is’, C: a telecommunication company) (S: Bell, V: is based, A: in Los Angeles) (S: Bell, V: makes, O: electronic products) (S: Bell, V: makes, O: computer products) (S: Bell, V: makes, O: building products) (S: Bell, V: distributes, O: electronic products) (S: Bell, V: distributes, O: computer products) (S: Bell, V: distributes, O: building products)

Del Corro, Gemulla (MPI) ClausIE May 2013 12 / 18

Bell , a telecommunication company , which is based in Los Angeles , makes and distributes electronic , computer and building products B-NP B-NP I-NP I-NP , B-NP B-VP I-VP B-PP B-NP I-NP , B-VP I-VP I-VP B-ADJP , B-NP I-NP I-NP I-NP NNP DT JJ NN , WDT VBZ VBN IN NNP NNP , VBZ CC VBZ JJ , NN CC NN NNS

nsubj det nn appos nsubjpass auxpass rcmod nn prep in conj and amod conj and conj and dobj root

Bell, a telecommunication company, which is based in Los Angeles , makes and distributes electronic, computer and building products.

slide-54
SLIDE 54

ClausIE From clauses to propositions

Example

Reverb → (a telecommunication company, is based in, Los Angeles) Ollie → (Bell, distributes, electronic , computer and building products) ClausIE → (S: Bell, V: ’is’, C: a telecommunication company) (S: Bell, V: is based, A: in Los Angeles) (S: Bell, V: makes, O: electronic products) (S: Bell, V: makes, O: computer products) (S: Bell, V: makes, O: building products) (S: Bell, V: distributes, O: electronic products) (S: Bell, V: distributes, O: computer products) (S: Bell, V: distributes, O: building products)

Del Corro, Gemulla (MPI) ClausIE May 2013 12 / 18

Bell , a telecommunication company , which is based in Los Angeles , makes and distributes electronic , computer and building products B-NP B-NP I-NP I-NP , B-NP B-VP I-VP B-PP B-NP I-NP , B-VP I-VP I-VP B-ADJP , B-NP I-NP I-NP I-NP NNP DT JJ NN , WDT VBZ VBN IN NNP NNP , VBZ CC VBZ JJ , NN CC NN NNS

nsubj det nn appos nsubjpass auxpass rcmod nn prep in conj and amod conj and conj and dobj root

Bell, a telecommunication company, which is based in Los Angeles , makes and distributes electronic, computer and building products.

slide-55
SLIDE 55

ClausIE From clauses to propositions

Identifying information ClausIE separates the identification of the information from its representation

Del Corro, Gemulla (MPI) ClausIE May 2013 13 / 18

slide-56
SLIDE 56

ClausIE From clauses to propositions

Identifying information ClausIE separates the identification of the information from its representation Identifies essential and optional arguments in a clause

Del Corro, Gemulla (MPI) ClausIE May 2013 13 / 18

slide-57
SLIDE 57

ClausIE From clauses to propositions

Identifying information ClausIE separates the identification of the information from its representation Identifies essential and optional arguments in a clause No training data

Del Corro, Gemulla (MPI) ClausIE May 2013 13 / 18

slide-58
SLIDE 58

ClausIE From clauses to propositions

Identifying information ClausIE separates the identification of the information from its representation Identifies essential and optional arguments in a clause No training data Initial support non-verb mediated relations

Del Corro, Gemulla (MPI) ClausIE May 2013 13 / 18

slide-59
SLIDE 59

ClausIE From clauses to propositions

Identifying information ClausIE separates the identification of the information from its representation Identifies essential and optional arguments in a clause No training data Initial support non-verb mediated relations Processing of conjunctions (in verbs and subject/arguments)

⋆ Messi and Iniesta play in Barcelona → (Messi, plays, in Barcelona), (Iniesta, plays, in Barcelona)

Del Corro, Gemulla (MPI) ClausIE May 2013 13 / 18

slide-60
SLIDE 60

ClausIE From clauses to propositions

Identifying information ClausIE separates the identification of the information from its representation Identifies essential and optional arguments in a clause No training data Initial support non-verb mediated relations Processing of conjunctions (in verbs and subject/arguments)

⋆ Messi and Iniesta play in Barcelona → (Messi, plays, in Barcelona), (Iniesta, plays, in Barcelona)

Resolution of relative clauses

⋆ I saw the man whose house you like → (I, saw, the man), (You, like, the man’s house) ...

Del Corro, Gemulla (MPI) ClausIE May 2013 13 / 18

slide-61
SLIDE 61

ClausIE From clauses to propositions

Proposition Generation: a flexible process Arbitrary form of relations

⋆ (Messi, plays football in, Barcelona) or (Messi, plays, football in Barcelona)

Del Corro, Gemulla (MPI) ClausIE May 2013 14 / 18

slide-62
SLIDE 62

ClausIE From clauses to propositions

Proposition Generation: a flexible process Arbitrary form of relations

⋆ (Messi, plays football in, Barcelona) or (Messi, plays, football in Barcelona)

Propositions can be customized (e.g. triple, n-ary, etc)

⋆ (Messi, plays, football in Barcelona) or (Messi, plays, football, in Barcelona)

Del Corro, Gemulla (MPI) ClausIE May 2013 14 / 18

slide-63
SLIDE 63

ClausIE From clauses to propositions

Proposition Generation: a flexible process Arbitrary form of relations

⋆ (Messi, plays football in, Barcelona) or (Messi, plays, football in Barcelona)

Propositions can be customized (e.g. triple, n-ary, etc)

⋆ (Messi, plays, football in Barcelona) or (Messi, plays, football, in Barcelona)

Arbitrary argument types (e.g. noun phrases, adjectives, etc)

⋆ (Gandhi, was, vegetarian) or (Gandhi, was, a vegetarian) or (Gandhi from Porbandar, was, a vegetarian)

Del Corro, Gemulla (MPI) ClausIE May 2013 14 / 18

slide-64
SLIDE 64

ClausIE From clauses to propositions

Proposition Generation: a flexible process Arbitrary form of relations

⋆ (Messi, plays football in, Barcelona) or (Messi, plays, football in Barcelona)

Propositions can be customized (e.g. triple, n-ary, etc)

⋆ (Messi, plays, football in Barcelona) or (Messi, plays, football, in Barcelona)

Arbitrary argument types (e.g. noun phrases, adjectives, etc)

⋆ (Gandhi, was, vegetarian) or (Gandhi, was, a vegetarian) or (Gandhi from Porbandar, was, a vegetarian)

Optional arguments can be used to generate new propositions

⋆ (Paul, takes, a shower, in the morning) or (Paul, takes, a shower)

Del Corro, Gemulla (MPI) ClausIE May 2013 14 / 18

slide-65
SLIDE 65

Results

Outline

1

Information and Representation

2

Open Information Extractors and Language Technology

3

ClausIE Clauses in the English Language From clauses to propositions

4

Results

5

Conclusions and Future Directions

Del Corro, Gemulla (MPI) ClausIE May 2013 15 / 18

slide-66
SLIDE 66

Results

Evaluation 3 datasets

Reverb: Web, very noisy (500 sentences) New York Times: Complex, written by experts (200 sentences) Wikipedia: Simple, written by non-experts (200 sentences)

2 labelers, pessimistic approach. Agreement 57%-68%. High precision, high recall.

Del Corro, Gemulla (MPI) ClausIE May 2013 15 / 18

slide-67
SLIDE 67

Results

Results I: Reverb Sentences 500 1000 1500 2000 2500 3000 0.0 0.2 0.4 0.6 0.8 1.0 Number of extractions Precision

ClausIE ClausIE (non− red.) ClausIE w/o CCs ClausIE w/o CCs (non− red.) Reverb OLLIE TextRunner TextRunner (Reverb) WOE

Del Corro, Gemulla (MPI) ClausIE May 2013 16 / 18

slide-68
SLIDE 68

Results

Results II: Wikipedia and New York Times

200 400 600 800 1000 0.0 0.2 0.4 0.6 0.8 1.0 Number of extractions Precision

ClausIE ClausIE (non− red.) ClausIE w/o CC ClausIE w/o CC (non− red.) Reverb OLLIE

200 400 600 800 1000 1200 0.0 0.2 0.4 0.6 0.8 1.0 Number of extractions Precision

ClausIE ClausIE (non− red.) ClausIE w/o CC ClausIE w/o CC (non− red.) Reverb OLLIE

Del Corro, Gemulla (MPI) ClausIE May 2013 17 / 18

Wikipedia (200 sentences) New York Times (200 sentences)

slide-69
SLIDE 69

Conclusions and Future Directions

Outline

1

Information and Representation

2

Open Information Extractors and Language Technology

3

ClausIE Clauses in the English Language From clauses to propositions

4

Results

5

Conclusions and Future Directions

Del Corro, Gemulla (MPI) ClausIE May 2013 18 / 18

slide-70
SLIDE 70

Conclusions and Future Directions

Conclusions and Future Directions Conclusions ClausIE is a principled approach for OIE Separates identification and representation No training needed DP based Publicly available http://www.mpi-inf.mpg.de/ departments/d5/software/clausie/

Del Corro, Gemulla (MPI) ClausIE May 2013 18 / 18

slide-71
SLIDE 71

Conclusions and Future Directions

Conclusions and Future Directions Conclusions ClausIE is a principled approach for OIE Separates identification and representation No training needed DP based Publicly available http://www.mpi-inf.mpg.de/ departments/d5/software/clausie/ Future Directions Build dictionaries Incorporate context analysis Post processing of arguments Input to other tasks: discourse processing, SRL, targeted IE,

  • ntology learning, QA, ...

Del Corro, Gemulla (MPI) ClausIE May 2013 18 / 18

slide-72
SLIDE 72

Conclusions and Future Directions

Conclusions and Future Directions Conclusions ClausIE is a principled approach for OIE Separates identification and representation No training needed DP based Publicly available http://www.mpi-inf.mpg.de/ departments/d5/software/clausie/ Future Directions Build dictionaries Incorporate context analysis Post processing of arguments Input to other tasks: discourse processing, SRL, targeted IE,

  • ntology learning, QA, ...

Del Corro, Gemulla (MPI) ClausIE May 2013 18 / 18

Thank You!