SLIDE 1
HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, - - PowerPoint PPT Presentation
HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, - - PowerPoint PPT Presentation
HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, Niket Tandon , Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken task frame : How to paint a wall HowToKB: Mining HowTo Knowledge from Online
SLIDE 2
SLIDE 3
Related work on HowTo knowledge acquisition
Input Representation
SLIDE 4
Related work on HowTo knowledge acquisition
Input Representation
Reduced expressivity
Semantic expressivity
Generic Domain specific
ConceptNet OpenIE PropBank Yang et. al SIGIR’15 Syntactic structures
- Tasks are not
semantic frames
SLIDE 5
Related work on HowTo knowledge acquisition
Input Representation
Reduced expressivity
Semantic expressivity
Generic Domain specific
Knowlywood ConceptNet OpenIE FrameNet PropBank Yang et. al SIGIR’15 Syntactic structures VerbNet
SLIDE 6
Related work on HowTo knowledge acquisition
Input Representation
Reduced expressivity
Semantic expressivity
Generic Domain specific
HowToKB Knowlywood ConceptNet OpenIE FrameNet PropBank Yang et. al SIGIR’15 Syntactic structures Schank’75 Fillmore’76 Minsky’74 VerbNet
SLIDE 7
Related work on HowTo knowledge acquisition Message: HowToKB’s knowledge representation is different.
Input Representation
Reduced expressivity Semantic expressivity Generic Domain specific
HowToKB Knowlywood ConceptNet OpenIE FrameNet PropBank Yang et. al SIGIR’15 Syntactic structures Schank’75 Fillmore’76 Minsky’74 VerbNet
- No phrases/tasks
- manually populated
SLIDE 8
Related work on HowTo knowledge acquisition
Task Model
SLIDE 9
Related work on HowTo knowledge acquisition
Task Model
Supervised Unsupervised Schema based Schema free extraction
Semantic Role Labeling Syntactic structures OpenIE Semantic Frame parsing
SLIDE 10
Related work on HowTo knowledge acquisition Message: our task is different.
Task Model
Supervised Unsupervised Schema based Schema free extraction
HowToKB Knowlywood Semantic Role Labeling Syntactic structures OpenIE Semantic Frame parsing
- mapped to WordNet,
closed sense repository
SLIDE 11
WikiHow: our input dataset
SLIDE 12
WikiHow: our input dataset
Task Sub task Sub task … Previous task Next task Participating
- bjects
SLIDE 13
WikiHow: our input dataset Message: WikiHow data is very rich, and can be exploited.
Task Sub task Sub task … Previous task Next task Participating
- bjects
Images, videos
SLIDE 14
System overview
WikiHow Frame construction Frame
- rganization
HowToKB
SLIDE 15
System overview
WikiHow Frame construction Frame
- rganization
HowToKB
Stage 1a: convert unstructured articles to structured task frame Stage 1b: sequencing task frames Novel knowledge representation
SLIDE 16
System overview
WikiHow Frame construction Frame
- rganization
HowToKB
Stage 2: organize the sequenced task frames. Novel hierarchical
- rganization with
distributional senses
SLIDE 17
KB construction: extraction
§ OpenIE naturally suits task frame construction; easy mapping to task attributes attribute OpenIE mapping location location time time Participating agent subject Participating object subject/object
SLIDE 18
KB construction: extraction Message: Type checking helps to postprocess OpenIE results.
§ OpenIE naturally suits task frame construction; easy mapping to task attributes § Attribute type-checking increases precision from 75% to 97% attribute OpenIE mapping type-checking head WordNet location location WN-noun time time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving
SLIDE 19
KB construction: extraction
§ OpenIE naturally suits task frame construction; easy mapping to task attributes § Attribute type-checking increases precision from 75% to 97% attribute OpenIE mapping type-checking WN: WordNet location location Noun phrase time time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving
1.2 million task frames Message: 1.2M task frames are isolated from each other.
SLIDE 20
Why KB organization? Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling
task paint wall participating
- bject
brush, paint, .. sub-task clean the surface, dip the roller.. task paint ceiling participating
- bject
paint, roller, .. sub-task clean the surface, dip the roller..
SLIDE 21
Why KB organization?
Task use keyboard Category Iphone, Android Mac, Windows Music listening Music appreciation Visuals
Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling b) disambiguation of tasks: use keyboard– piano? or, computer?
SLIDE 22
Approach to KB organization
use keyboard § For the 1.2 million frames, the number of clusters is unknown. § Hierarchical clustering is natural, but expensive press keystrokes
SLIDE 23
Approach to KB organization
use keyboard § For the 1.2 million frames, the number of clusters is unknown. § Hierarchical clustering is natural, but expensive press keystrokes use keyboard use keyboard, press keystrokes use keyboard, press keystrokes Expected
- rganization
SLIDE 24
Approach to KB organization
use keyboard § For the 1.2 million frames, the number of clusters is unknown. § Hierarchical clustering is natural, but expensive press keystrokes use keyboard use keyboard, press keystrokes use keyboard, press keystrokes We propose a two stage-clustering, Stage 1: coarse-grained clustering Stage 2: fine-grained clustering
SLIDE 25
Preparing for clustering: Multi-dimensional similarity model
Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … … ... Category home & garden house decoration !
"#$ %1$'(), %2$'()
∗ !
"#$(%1./0., %2./0.)
SLIDE 26
Preparing for clustering: Multi-dimensional similarity model
Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … … ... Category home & garden house decoration !
"#$ %1$'(), %2$'()
∗ !
"#$(%1./0., %2./0.)
Finally, logistic regression over the attributes, 234 !
5, ! # = 238493: (;< + > ;?! ? @5, @# A ?B5
)
SLIDE 27
Preparing for clustering: Multi-dimensional similarity model
Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … … ... Category home & garden house decoration !
"#$ %1$'(), %2$'()
∗ !
"#$(%1./0., %2./0.)
Finally, logistic regression over the attributes, 234 !
5, ! # = 238493: (;< + > ;?! ? @5, @# A ?B5
)
Message: Our task frame pairs are dissimilar with an empirical confidence of 99.9% if a combination of their categorical and lexical similarity is less than a threshold
SLIDE 28
Coarse-grained clustering
use keyboard press keystrokes use keyboard Lexical grouping 375K groups press keystrokes 1.2 million task frames use mac keyboard Efficient Hash Based grouping
SLIDE 29
Coarse-grained clustering
use keyboard press keystrokes use keyboard Lexical grouping 375K groups use keyboard, press keystrokes Distributional grouping 200K groups press keystrokes
Message: Pruning helps to efficiently reduce the search space.
1.2 million task frames use mac keyboard Efficient Hash Based grouping Fewer pairs, efficient top-k similarity
SLIDE 30
Fine-grained clustering
use keyboard press keystrokes … use keyboard Lexical grouping 375K groups use keyboard, press keystrokes Distributional grouping 200K groups press keystrokes 1.2 million task frames use mac keyboard
Final clusters
Allows fast, parallel hierarchical clustering
SLIDE 31
Recap of system architecture
WikiHow Frame construction Frame
- rganization
HowToKB
SLIDE 32
Resulting HowToKB
Wilson confidence intervals
§ 0.5 million grouped task frames, § Avg. per frame: 12 attributes values, 2 images § Precision > 85%
SLIDE 33
Resulting HowToKB Message: HowToKB maintains high precision at large-scale.
§
As ground truth, turkers fill “very likely” attribute values for 150 frames
§
Example: In some context such as decorate the house, the most likely location when we paint a wall is ____
Wilson confidence intervals
§ 0.5 million grouped task frames, § Avg. per frame: 12 attributes values, 2 images § Precision > 85%
SLIDE 34
Usecase: finding YouTube videos for a HowTo task
Task query YouTube video Gourmet Caramel Popcorn “Thanks Monique” make caramel corn
SLIDE 35
Usecase: finding YouTube videos for a HowTo task
Task query Expansion using frames (attributes, edges) YouTube video Gourmet Caramel Popcorn “Thanks Monique” brown sugar ... popcorn ... syrup teaspoon.. ... bake soda ... vanilla .. make caramel corn
SLIDE 36
Usecase: finding YouTube videos for a HowTo task
§ HowToKB based expansion beats the strong baselines (Word2Vec, WordNet) § 50% of HowToKB’s context is unique, going beyond distributional context. § For hard ambiguous queries, HowToKB has 10% precision; baselines achieve < 1%
Message: HowToKB provides rich context, beyond relatedness.
Task query Expansion using frames (attributes, edges) YouTube video Gourmet Caramel Popcorn “Thanks Monique” brown sugar ... popcorn ... syrup teaspoon.. ... bake soda ... vanilla .. make caramel corn
SLIDE 37
Conclusion
§ WikiHow provides a very rich starting point for extraction of task frames § Knowledge organization is performed by our fast, clustering method § Resulting HowToKB is the first KB on HowTo tasks, and is publicly available. http://www.mpi-inf.mpg.de/yago-naga/webchild/HowToKB
Message: HowToKB, with its rich structure, fills an important knowledge gap
Task paint a wall Category painting and other finishes, ceilings, interior walls Participating
- bject