Coupled Bayesian Sets Algorithm for Semi-supervised Learning and - - PowerPoint PPT Presentation

coupled bayesian sets algorithm for semi supervised
SMART_READER_LITE
LIVE PREVIEW

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and - - PowerPoint PPT Presentation

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction Saurabh Verma Baranas Hindu University, India Estevam R. Hruschka Jr. F ederal University of So Carlos, Brazil ECML/PKDD2012


slide-1
SLIDE 1

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

Saurabh Verma

Baranas Hindu University, India

Estevam R. Hruschka Jr.

Federal University of São Carlos, Brazil

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-2
SLIDE 2

http://rtw.ml.cmu.edu

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-3
SLIDE 3

NELL: Never-Ending Language Learner

Inputs:

l

initial ontology

l

handful of examples of each predicate in ontology

l

the web

l

  • ccasional interaction with human trainers

The task:

l

run 24x7, forever

  • each day:
  • 1. extract more facts from the web to populate the initial ontology
  • 2. learn to read (perform #1) better than yesterday

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-4
SLIDE 4

NELL: Never-Ending Language Learner

Goal:

  • run 24x7, forever
  • each day:
  • 1. extract more facts from the web to populate given ontology
  • 2. learn to read better than yesterday

Today... Running 24 x 7, since January, 2010

Input:

  • ontology defining ~800 categories and relations
  • 10-20 seed examples of each
  • 1 billion web pages (ClueWeb – Jamie Callan)

Result:

  • continuously growing KB with +1,300,000 extracted beliefs

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-5
SLIDE 5

http://rtw.ml.cmu.edu

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-6
SLIDE 6

Bayesian Sets (BS)

Given and , rank the elements of by how well they would “fit into” a set which includes Define a score for each : From Bayes rule, the score can be re-written as:

} {x = D D Dc ⊂

D

c

D ) ( ) ( ) ( x x x p D p score

c

=

D ∈ x ) ( ) ( ) , ( ) (

c c

D p p D p score x x x =

Ghahramani & Heller; NIPS 2005

slide-7
SLIDE 7

Bayesian Sets (BS)

Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.

Ghahramani & Heller; NIPS 2005

) ( ) ( ) , ( ) (

c c

D p p D p score x x x =

slide-8
SLIDE 8

Bayesian Sets (BS)

Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.

Ghahramani & Heller; NIPS 2005

) ( ) ( ) , ( ) (

c c

D p p D p score x x x =

slide-9
SLIDE 9

Bayesian Sets (BS)

Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.

Ghahramani & Heller; NIPS 2005

) ( ) ( ) , ( ) (

c c

D p p D p score x x x =

slide-10
SLIDE 10

Bayesian Sets (BS)

Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.

Ghahramani & Heller; NIPS 2005

) ( ) ( ) , ( ) (

c c

D p p D p score x x x =

slide-11
SLIDE 11

BS using NELL’s Ontology

Initial ontology:

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company Vegetable Sport

slide-12
SLIDE 12

Initial ontology:

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

BS using NELL’s Ontology

slide-13
SLIDE 13

Given a huge web corpus, run BS once

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

BS using NELL’s Ontology

slide-14
SLIDE 14

Given a huge web corpus, run BS once

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Facebook DELL … Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle Alan Turing Alexander Fleming … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon Baseball Badminton … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane Beijing Cairo …

BS using NELL’s Ontology

slide-15
SLIDE 15

ECML/PKDD2012 Bristol, UK September, 26th, 2012

BS using NELL’s Ontology

slide-16
SLIDE 16

ECML/PKDD2012 Bristol, UK September, 26th, 2012

BS using NELL’s Ontology

slide-17
SLIDE 17

ECML/PKDD2012 Bristol, UK September, 26th, 2012

BS using NELL’s Ontology

slide-18
SLIDE 18

Given a huge web corpus, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Basketball Football Swimming Tennis Golf Soccer Volleyball Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London

Iterative BS using NELL’s Ontology

Zhang & Liu, 2011

slide-19
SLIDE 19

Given a huge web corpus, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane

Iterative BS using NELL’s Ontology

Zhang & Liu, 2011

slide-20
SLIDE 20

Given a huge web corpus, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Facebook DELL … Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle Alan Turing Alexander Fleming … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon Baseball Badminton … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane Beijing Cairo …

Iterative BS using NELL’s Ontology

Zhang & Liu, 2011

slide-21
SLIDE 21

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Iterative BS using NELL’s Ontology

slide-22
SLIDE 22

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Iterative BS using NELL’s Ontology

slide-23
SLIDE 23

NELL: Coupled semi-supervised training of many functions

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-24
SLIDE 24

Coupled Training Type 2:

Structured Outputs, Multitask, Posterior Regularization, Multilabel

Learn functions with the same input, different outputs, where we know some constraint

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-25
SLIDE 25

Coupled Training Type 2:

Structured Outputs, Multitask, Posterior Regularization, Multilabel

Learn functions with the same input, different outputs, where we know some constraint

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-26
SLIDE 26

Coupled Training Type 2:

Structured Outputs, Multitask, Posterior Regularization, Multilabel

Learn functions with the same input, different outputs, where we know some constraint

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-27
SLIDE 27

Coupled Bayesian Sets (CBS)

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-28
SLIDE 28

Coupled Bayesian Sets (CBS)

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-29
SLIDE 29

Coupled Bayesian Sets (CBS)

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-30
SLIDE 30

Coupled Bayesian Sets (CBS)

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-31
SLIDE 31

Coupled Bayesian Sets (CBS)

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-32
SLIDE 32

Coupled Bayesian Sets (CBS)

ECML/PKDD2012 Bristol, UK September, 26th, 2012

slide-33
SLIDE 33

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-34
SLIDE 34

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

slide-35
SLIDE 35

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);

MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

slide-36
SLIDE 36

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);

MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

slide-37
SLIDE 37

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);

MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

slide-38
SLIDE 38

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

slide-39
SLIDE 39

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

slide-40
SLIDE 40

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

slide-41
SLIDE 41

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …

slide-42
SLIDE 42

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-43
SLIDE 43

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-44
SLIDE 44

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama … AT&T Boeing Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-45
SLIDE 45

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama … AT&T Boeing Basketball Football Swimming Tennis Golf … AT&T Boeing Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-46
SLIDE 46

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama … AT&T Boeing Basketball Football Swimming Tennis Golf … AT&T Boeing Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town … AT&T Boeing

CBS using NELL’s Ontology

slide-47
SLIDE 47

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-48
SLIDE 48

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-49
SLIDE 49

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-50
SLIDE 50

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama … Brazil Telecom Texaco Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-51
SLIDE 51

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama … Brazil Telecom Texaco Basketball Football Swimming Tennis Golf … Brazil Telecom Texaco Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-52
SLIDE 52

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama … Brazil Telecom Texaco Basketball Football Swimming Tennis Golf … Brazil Telecom Texaco Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town … Brazil Telecom Texaco

CBS using NELL’s Ontology

slide-53
SLIDE 53

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Basketball Football Swimming Tennis Golf Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town

CBS using NELL’s Ontology

slide-54
SLIDE 54

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-55
SLIDE 55

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-56
SLIDE 56

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-57
SLIDE 57

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-58
SLIDE 58

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-59
SLIDE 59

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-60
SLIDE 60

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-61
SLIDE 61

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-62
SLIDE 62

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-63
SLIDE 63

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-64
SLIDE 64

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-65
SLIDE 65

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-66
SLIDE 66

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

slide-67
SLIDE 67

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco … Great Britain Keyboard Pencil Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane …

slide-68
SLIDE 68

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco … Great Britain Keyboard Pencil Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane … Not Company

slide-69
SLIDE 69

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco … Great Britain Keyboard Pencil Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane … Not Company Great Britain Keyboard Pencil

slide-70
SLIDE 70

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco … Great Britain Keyboard Pencil Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane … Not Company Great Britain Keyboard Pencil

slide-71
SLIDE 71

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco … Great Britain Keyboard Pencil Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane … Not Company Great Britain Keyboard Pencil

MutuallyExclusive(Company,NotCompany);

slide-72
SLIDE 72

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco … Great Britain Keyboard Pencil Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane … Not Company Great Britain Keyboard Pencil

MutuallyExclusive(Company,NotCompany);

slide-73
SLIDE 73

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Everything Person Company City Sport Apple Microsoft Google IBM Yahoo AT&T Boeing Brazil Telecom Texaco … Great Britain Keyboard Pencil Peter Flach Bill Clinton Jeremy Lin Adele Barak Obama Dalai Lama Freud Tom Mitchell Aristotle … Basketball Football Swimming Tennis Golf Soccer Volleyball Jogging Marathon … Bristol Pittsburgh Rio de Janeiro Tokyo Cape Town New York London Sao Paulo Brisbane … Not Company Great Britain Keyboard Pencil

MutuallyExclusive(Company,NotCompany);

slide-74
SLIDE 74

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

slide-75
SLIDE 75

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

slide-76
SLIDE 76

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

slide-77
SLIDE 77

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

slide-78
SLIDE 78

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

slide-79
SLIDE 79

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

slide-80
SLIDE 80

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

slide-81
SLIDE 81

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What about Semantic Relations?

slide-82
SLIDE 82

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What about Semantic Relations?

slide-83
SLIDE 83

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What about Semantic Relations?

slide-84
SLIDE 84

Conclusions

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Bayesian Sets

  • semi-supervised learning approach to extract category

instances (e.g. country(USA), city(New York) from web pages;

  • based on the original Bayesian Sets
  • can outperform algorithms such as the original Bayesian

Set, the Naive Bayes classifier, the Bas-all and the coupled semi-supervised logistic regression algorithm (CPL);

  • can be used to automatically generate new constraints to

the set expansion task even when no mutually exclusiveness relationship is previously defined

slide-85
SLIDE 85

Acknowledgements

Thanks to:

ECML/PKDD2012 audience! J J

Also Thanks to:

  • Department of Science and Technology, Government of India under Indo-

Brazil cooperation programme;

  • Brazilian research agencies CAPES and CNPq;
  • Zoubin Ghahramani and K.A. Heller;
  • All the Read The Web group;

contact: estevam.hruschka@gmail.com http://rtw.ml.cmu.edu

ECML/PKDD2012 Bristol, UK September, 26th, 2012