Is Fidel Castro Really an American President?
On Set Expansion
Is Fidel Castro Really an American President? On Set Expansion An - - PowerPoint PPT Presentation
Is Fidel Castro Really an American President? On Set Expansion An Example Google Sets Google Sets Google Sets Google Sets Notion Notion Seeds: Barack Obama, Bill Clinton, George Bush Notion Seeds: Barack Obama, Bill Clinton,
On Set Expansion
George Bush
George Bush
George Bush
elements of the target set
George Bush
elements of the target set
George Bush
elements of the target set
Seeds
Seeds Web pages
Fetcher
Seeds Web pages
Fetcher
Extractor
Mentions Wrapper
Seeds Web pages
Fetcher
Extractor
Mentions Wrapper
Ranker
Suggestions Graph
Seeds Web pages
Fetcher
Extractor
Mentions Wrapper
Ranker
Suggestions Graph
Seeds Web pages
Fetcher
Extractor
Mentions Wrapper
Ranker
Suggestions Graph
<ul> <li>Obama</li> <li>Bush</li> <li>Kennedy</li> </ul>
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil Document
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil Document
Obama Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil Document
Obama Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds Right Contexts
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil Document
Obama Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
Right Contexts
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil Document
Obama Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
Right Contexts
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil Document
Obama Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
Right Contexts
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil Document
Obama Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
Right Contexts
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil Document
Obama Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
Right Contexts
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil
Obama Clinton 1 Obama 2 Clinton 3 Clinton 4
Document Seeds Right Contexts
To Retrieve Information Coded In Alphanumeric
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil
Obama 0 Clinton 1 Obama 2 Clinton 3 Clinton 4
“” {0,1,2,3,4} “pfblRo...” {3} “x” {0,1} “gz” {2,4}
“xaEsd...” {0} “khSlp...” {1} “PHMA...” {2} “UmXu...” {4}
Document Seeds Right Contexts Trie Tright
“” {0,1,2,3,4} “trp” {0,1} “Ag” {2,3,4} “ASxy...” {1} “f” {2,4} “Lloc...” {3} “” {0,1,2,3,4} “pfblRo...” {3} “x” {0,1} “gz” {2,4} “xaEsd...” {0} “khSlp...” {1} “PHMA...” {2} “UmXu...” {4}
Tright Tleft
“Upl...” {2} “TiL...” {4}
Obama 0 Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
“” {0,1,2,3,4} “trp” {0,1} “Ag” {2,3,4} “ASxy...” {1} “f” {2,4} “Lloc...” {3} “” {0,1,2,3,4} “pfblRo...” {3} “x” {0,1} “gz” {2,4} “xaEsd...” {0} “khSlp...” {1} “PHMA...” {2} “UmXu...” {4}
Tright Tleft
“Upl...” {2} “TiL...” {4}
Obama 0 Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
Algorithm
“” {0,1,2,3,4} “trp” {0,1} “Ag” {2,3,4} “ASxy...” {1} “f” {2,4} “Lloc...” {3} “” {0,1,2,3,4} “pfblRo...” {3} “x” {0,1} “gz” {2,4} “xaEsd...” {0} “khSlp...” {1} “PHMA...” {2} “UmXu...” {4}
Tright Tleft
“Upl...” {2} “TiL...” {4}
Obama 0 Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
Algorithm
“” {0,1,2,3,4} “trp” {0,1} “Ag” {2,3,4} “ASxy...” {1} “f” {2,4} “Lloc...” {3} “” {0,1,2,3,4} “pfblRo...” {3} “x” {0,1} “gz” {2,4} “xaEsd...” {0} “khSlp...” {1} “PHMA...” {2} “UmXu...” {4}
Tright Tleft
“Upl...” {2} “TiL...” {4}
Obama 0 Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
Algorithm
“” {0,1,2,3,4} “trp” {0,1} “Ag” {2,3,4} “ASxy...” {1} “f” {2,4} “Lloc...” {3} “” {0,1,2,3,4} “pfblRo...” {3} “x” {0,1} “gz” {2,4} “xaEsd...” {0} “khSlp...” {1} “PHMA...” {2} “UmXu...” {4}
Tright Tleft
“Upl...” {2} “TiL...” {4}
Obama 0 Clinton 1 Obama 2 Clinton 3 Clinton 4
Seeds
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil
Wrapper: prt[...]x Content:
Wrapper: fgA[...]gz Content:
Document
prtoBamAxxaEsdSlkprtKenNed yxSAprtCLinTOnxkhSlpUfgAob AMagzPHMAcolLgAcLIntOnpfb lRoiusWgoprtcAstrOxkLiTfgAClI nTongzUmXuSYfgAkEnneDygzil
Wrapper: prt[...]x Content:
Wrapper: fgA[...]gz Content:
Document
<ul> <li>Obama</li> <li>Bush</li> <li>Kennedy</li> </ul>
<ul> <li>Obama</li> <li>Bush</li> <li>Kennedy</li> </ul>
<ul> <li>Obama</li> <li>Bush</li> <li>Kennedy</li> </ul>
<ul> <li>Obama</li> <li>Bush</li> <li>Kennedy</li> </ul>
Seeds Web pages
Fetcher
Extractor
Mentions Wrapper
Ranker
Suggestions Graph
seeds and mentions
seeds and mentions
seeds and mentions
doc seeds
prt..x Castro JFK fgA..gz
find derive derive extract extract extract
doc seeds
prt..x Castro JFK fgA..gz
find derive derive extract extract extract
Graph-walk (Page rank)
doc seeds
prt..x Castro JFK fgA..gz
find derive derive extract extract extract
Graph-walk (Page rank) P(find|doc) = 0.5
doc seeds
prt..x Castro JFK fgA..gz
find derive derive extract extract extract
Graph-walk (Page rank) P(find|doc) = 0.5 P(derive|doc) = 0.5
doc seeds
prt..x Castro JFK fgA..gz
find derive derive extract extract extract
Graph-walk (Page rank) P(find|doc) = 0.5 P(derive|doc) = 0.5 P(seeds|doc,find) = 1
doc seeds
prt..x Castro JFK fgA..gz
find derive derive extract extract extract
Graph-walk (Page rank) P(find|doc) = 0.5 P(derive|doc) = 0.5 P(seeds|doc,find) = 1 P(prt..x|doc,derive) = 0.5
doc seeds
prt..x Castro JFK fgA..gz
find derive derive extract extract extract
Graph-walk (Page rank) P(find|doc) = 0.5 P(derive|doc) = 0.5 P(seeds|doc,find) = 1 P(prt..x|doc,derive) = 0.5 P(fgA..gz|doc,derive) = 0.5
doc seeds
prt..x Castro JFK fgA..gz
1/2 1/4 1/4 extract extract extract
Graph-walk (Page rank) P(find|doc) = 0.5 P(derive|doc) = 0.5 P(seeds|doc,find) = 1 P(prt..x|doc,derive) = 0.5 P(fgA..gz|doc,derive) = 0.5
doc seeds
prt..x Castro JFK fgA..gz
find derive 1/2 extract 1/4 1/4
Graph-walk (Page rank) Transitions in both ways
s d w1 w2 m1 m2 s d w1 w2 m1 m2
Transition Matrix
d s
w2 m2 m1 w1
find derive derive extract extract extract
x y
s d w1 w2 m1 m2 s d w1 w2 m1 m2
Transition Matrix
d s
w2 m2 m1 w1
find derive derive extract extract extract
x y
s d w1 w2 m1 m2 s d w1 w2 m1 m2
Transition Matrix (x,y) = P(x→y)
d s
w2 m2 m1 w1
find derive derive extract extract extract
s d w1 w2 m1 m2 s d w1 w2 m1 m2
Transition Matrix
d s
w2 m2 m1 w1
find derive derive extract extract extract
s d w1 w2 m1 m2 s d 1 w1 w2 m1 m2
Transition Matrix
d s
w2 m2 m1 w1
find derive derive extract extract extract
s d w1 w2 m1 m2 s d 1 w1 w2 m1 m2
Transition Matrix
d s
w2 m2 m1 w1
find derive derive extract extract extract
s d w1 w2 m1 m2 s ½ d 1 w1 ¼ w2 ¼ m1 m2
Transition Matrix
d s
w2 m2 m1 w1
find derive derive extract extract extract
s d w1 w2 m1 m2 s ½ d 1 ½ ½ w1 ¼ ½ 1 w2 ¼ ½ m1 ¼ ½ m2 ¼
Transition Matrix
d s
w2 m2 m1 w1
find derive derive extract extract extract
doc seeds
prt..x Castro JFK fgA..gz
find derive derive extract extract extract
s d w1 w2 m1 m2 s ½ d 1 ½ ½ w1 ¼ ½ 1 w2 ¼ ½ m1 ¼ ½ m2 ¼
Transition Matrix
with lazyness factor λ=0.01
s d w1 w2 m1 m2
State Vector
d s
w2 m2 m1 w1
find derive derive extract extract extract
s 1 d w1 w2 m1 m2
State Vector
d s
w2 m2 m1 w1
find derive derive extract extract extract
Transition Matrix and State Vector
Transition Matrix and State Vector · =
Transition Matrix and State Vector · =
Iterated Multiplication · =
Iterated Multiplication · =
1000x
Iterated Multiplication · =
1000x
Iterated Multiplication · =
d s
w2 m2 m1 w1
find derive derive extract extract extract
1000x
presidents)
presidents)
Chinese dynasties)
presidents)
Chinese dynasties)
stats = {}, used = input, rslt = {} for i = 1 to M do m = min(3,|used|) seeds = selectm(used) ∪ top(rslt) stats = expand(seeds) rslt = rank(stats) used = used ∪ seeds rof
sadhSAbcGsadutsobAMawrKu SAjkLsdFautsmeRKelwrKgErmA nyjkLsdfkuxeSAmcvDkBSs
sadhSAbcGsadutsobAMawrKu SAjkLsdFautsmeRKelwrKgErmA nyjkLsdfkuxeSAmcvDkBSs
Not sure if that fits...
Seeds Web pages
Fetcher
Extractor
Mentions
Wrapper
Ranker
Suggestions
Graph