HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, - - PowerPoint PPT Presentation

howtokb mining howto knowledge from online communities
SMART_READER_LITE
LIVE PREVIEW

HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, - - PowerPoint PPT Presentation

HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, Niket Tandon , Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken task frame : How to paint a wall HowToKB: Mining HowTo Knowledge from Online


slide-1
SLIDE 1

HowToKB: Mining HowTo Knowledge from Online Communities

task frame Cuong Chu,

Niket Tandon, Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken

: How to paint a wall

slide-2
SLIDE 2

HowToKB: Mining HowTo Knowledge from Online Communities

task frame

Attributes Edges

Cuong Chu,

Niket Tandon, Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken

: How to paint a wall

slide-3
SLIDE 3

Related work on HowTo knowledge acquisition

Input Representation

slide-4
SLIDE 4

Related work on HowTo knowledge acquisition

Input Representation

Reduced expressivity

Semantic expressivity

Generic Domain specific

ConceptNet OpenIE PropBank Yang et. al SIGIR’15 Syntactic structures

  • Tasks are not

semantic frames

slide-5
SLIDE 5

Related work on HowTo knowledge acquisition

Input Representation

Reduced expressivity

Semantic expressivity

Generic Domain specific

Knowlywood ConceptNet OpenIE FrameNet PropBank Yang et. al SIGIR’15 Syntactic structures VerbNet

slide-6
SLIDE 6

Related work on HowTo knowledge acquisition

Input Representation

Reduced expressivity

Semantic expressivity

Generic Domain specific

HowToKB Knowlywood ConceptNet OpenIE FrameNet PropBank Yang et. al SIGIR’15 Syntactic structures Schank’75 Fillmore’76 Minsky’74 VerbNet

slide-7
SLIDE 7

Related work on HowTo knowledge acquisition Message: HowToKB’s knowledge representation is different.

Input Representation

Reduced expressivity Semantic expressivity Generic Domain specific

HowToKB Knowlywood ConceptNet OpenIE FrameNet PropBank Yang et. al SIGIR’15 Syntactic structures Schank’75 Fillmore’76 Minsky’74 VerbNet

  • No phrases/tasks
  • manually populated
slide-8
SLIDE 8

Related work on HowTo knowledge acquisition

Task Model

slide-9
SLIDE 9

Related work on HowTo knowledge acquisition

Task Model

Supervised Unsupervised Schema based Schema free extraction

Semantic Role Labeling Syntactic structures OpenIE Semantic Frame parsing

slide-10
SLIDE 10

Related work on HowTo knowledge acquisition Message: our task is different.

Task Model

Supervised Unsupervised Schema based Schema free extraction

HowToKB Knowlywood Semantic Role Labeling Syntactic structures OpenIE Semantic Frame parsing

  • mapped to WordNet,

closed sense repository

slide-11
SLIDE 11

WikiHow: our input dataset

slide-12
SLIDE 12

WikiHow: our input dataset

Task Sub task Sub task … Previous task Next task Participating

  • bjects
slide-13
SLIDE 13

WikiHow: our input dataset Message: WikiHow data is very rich, and can be exploited.

Task Sub task Sub task … Previous task Next task Participating

  • bjects

Images, videos

slide-14
SLIDE 14

System overview

WikiHow Frame construction Frame

  • rganization

HowToKB

slide-15
SLIDE 15

System overview

WikiHow Frame construction Frame

  • rganization

HowToKB

Stage 1a: convert unstructured articles to structured task frame Stage 1b: sequencing task frames Novel knowledge representation

slide-16
SLIDE 16

System overview

WikiHow Frame construction Frame

  • rganization

HowToKB

Stage 2: organize the sequenced task frames. Novel hierarchical

  • rganization with

distributional senses

slide-17
SLIDE 17

KB construction: extraction

§ OpenIE naturally suits task frame construction; easy mapping to task attributes attribute OpenIE mapping location location time time Participating agent subject Participating object subject/object

slide-18
SLIDE 18

KB construction: extraction Message: Type checking helps to postprocess OpenIE results.

§ OpenIE naturally suits task frame construction; easy mapping to task attributes § Attribute type-checking increases precision from 75% to 97% attribute OpenIE mapping type-checking head WordNet location location WN-noun time time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving

slide-19
SLIDE 19

KB construction: extraction

§ OpenIE naturally suits task frame construction; easy mapping to task attributes § Attribute type-checking increases precision from 75% to 97% attribute OpenIE mapping type-checking WN: WordNet location location Noun phrase time time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving

1.2 million task frames Message: 1.2M task frames are isolated from each other.

slide-20
SLIDE 20

Why KB organization? Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling

task paint wall participating

  • bject

brush, paint, .. sub-task clean the surface, dip the roller.. task paint ceiling participating

  • bject

paint, roller, .. sub-task clean the surface, dip the roller..

slide-21
SLIDE 21

Why KB organization?

Task use keyboard Category Iphone, Android Mac, Windows Music listening Music appreciation Visuals

Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling b) disambiguation of tasks: use keyboard– piano? or, computer?

slide-22
SLIDE 22

Approach to KB organization

use keyboard § For the 1.2 million frames, the number of clusters is unknown. § Hierarchical clustering is natural, but expensive press keystrokes

slide-23
SLIDE 23

Approach to KB organization

use keyboard § For the 1.2 million frames, the number of clusters is unknown. § Hierarchical clustering is natural, but expensive press keystrokes use keyboard use keyboard, press keystrokes use keyboard, press keystrokes Expected

  • rganization
slide-24
SLIDE 24

Approach to KB organization

use keyboard § For the 1.2 million frames, the number of clusters is unknown. § Hierarchical clustering is natural, but expensive press keystrokes use keyboard use keyboard, press keystrokes use keyboard, press keystrokes We propose a two stage-clustering, Stage 1: coarse-grained clustering Stage 2: fine-grained clustering

slide-25
SLIDE 25

Preparing for clustering: Multi-dimensional similarity model

Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … … ... Category home & garden house decoration !

"#$ %1$'(), %2$'()

∗ !

"#$(%1./0., %2./0.)

slide-26
SLIDE 26

Preparing for clustering: Multi-dimensional similarity model

Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … … ... Category home & garden house decoration !

"#$ %1$'(), %2$'()

∗ !

"#$(%1./0., %2./0.)

Finally, logistic regression over the attributes, 234 !

5, ! # = 238493: (;< + > ;?! ? @5, @# A ?B5

)

slide-27
SLIDE 27

Preparing for clustering: Multi-dimensional similarity model

Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … … ... Category home & garden house decoration !

"#$ %1$'(), %2$'()

∗ !

"#$(%1./0., %2./0.)

Finally, logistic regression over the attributes, 234 !

5, ! # = 238493: (;< + > ;?! ? @5, @# A ?B5

)

Message: Our task frame pairs are dissimilar with an empirical confidence of 99.9% if a combination of their categorical and lexical similarity is less than a threshold

slide-28
SLIDE 28

Coarse-grained clustering

use keyboard press keystrokes use keyboard Lexical grouping 375K groups press keystrokes 1.2 million task frames use mac keyboard Efficient Hash Based grouping

slide-29
SLIDE 29

Coarse-grained clustering

use keyboard press keystrokes use keyboard Lexical grouping 375K groups use keyboard, press keystrokes Distributional grouping 200K groups press keystrokes

Message: Pruning helps to efficiently reduce the search space.

1.2 million task frames use mac keyboard Efficient Hash Based grouping Fewer pairs, efficient top-k similarity

slide-30
SLIDE 30

Fine-grained clustering

use keyboard press keystrokes … use keyboard Lexical grouping 375K groups use keyboard, press keystrokes Distributional grouping 200K groups press keystrokes 1.2 million task frames use mac keyboard

Final clusters

Allows fast, parallel hierarchical clustering

slide-31
SLIDE 31

Recap of system architecture

WikiHow Frame construction Frame

  • rganization

HowToKB

slide-32
SLIDE 32

Resulting HowToKB

Wilson confidence intervals

§ 0.5 million grouped task frames, § Avg. per frame: 12 attributes values, 2 images § Precision > 85%

slide-33
SLIDE 33

Resulting HowToKB Message: HowToKB maintains high precision at large-scale.

§

As ground truth, turkers fill “very likely” attribute values for 150 frames

§

Example: In some context such as decorate the house, the most likely location when we paint a wall is ____

Wilson confidence intervals

§ 0.5 million grouped task frames, § Avg. per frame: 12 attributes values, 2 images § Precision > 85%

slide-34
SLIDE 34

Usecase: finding YouTube videos for a HowTo task

Task query YouTube video Gourmet Caramel Popcorn “Thanks Monique” make caramel corn

slide-35
SLIDE 35

Usecase: finding YouTube videos for a HowTo task

Task query Expansion using frames (attributes, edges) YouTube video Gourmet Caramel Popcorn “Thanks Monique” brown sugar ... popcorn ... syrup teaspoon.. ... bake soda ... vanilla .. make caramel corn

slide-36
SLIDE 36

Usecase: finding YouTube videos for a HowTo task

§ HowToKB based expansion beats the strong baselines (Word2Vec, WordNet) § 50% of HowToKB’s context is unique, going beyond distributional context. § For hard ambiguous queries, HowToKB has 10% precision; baselines achieve < 1%

Message: HowToKB provides rich context, beyond relatedness.

Task query Expansion using frames (attributes, edges) YouTube video Gourmet Caramel Popcorn “Thanks Monique” brown sugar ... popcorn ... syrup teaspoon.. ... bake soda ... vanilla .. make caramel corn

slide-37
SLIDE 37

Conclusion

§ WikiHow provides a very rich starting point for extraction of task frames § Knowledge organization is performed by our fast, clustering method § Resulting HowToKB is the first KB on HowTo tasks, and is publicly available. http://www.mpi-inf.mpg.de/yago-naga/webchild/HowToKB

Message: HowToKB, with its rich structure, fills an important knowledge gap

Task paint a wall Category painting and other finishes, ceilings, interior walls Participating

  • bject

wall, ceiling, stair, paint, latex paint for base coat, masking tape, cotton rags Parent task touch up paint, paint ceiling Sub task move furniture, choose color, sand the wall, fill any hole