DETERMINANTAL POINT PROCESSES
FOR
NATURAL LANGUAGE PROCESSING
Jennifer Gillenwater Joint work with Alex Kulesza and Ben Taskar
DETERMINANTAL POINT PROCESSES FOR NATURAL LANGUAGE PROCESSING - - PowerPoint PPT Presentation
DETERMINANTAL POINT PROCESSES FOR NATURAL LANGUAGE PROCESSING Jennifer Gillenwater Joint work with Alex Kulesza and Ben Taskar OUTLINE OUTLINE Motivation & background on DPPs OUTLINE Motivation & background on DPPs Large-scale
DETERMINANTAL POINT PROCESSES
FOR
NATURAL LANGUAGE PROCESSING
Jennifer Gillenwater Joint work with Alex Kulesza and Ben Taskar
Quality: relevance to the topic
Quality: relevance to the topic Diversity: coverage of core ideas
feature space
feature space
Bi Bj
feature space
quality = p B>
i Bi
similarity = B>
i Bj
Bi Bj
feature space
quality = p B>
i Bi
similarity = B>
i Bj
Bi + Bj Bi Bj
a r e a = q k Bi k2
2kBjk2 2 (B> i Bj)2
feature space
quality = p B>
i Bi
similarity = B>
i Bj
Bi + Bj Bi Bj
a r e a = q k Bi k2
2kBjk2 2 (B> i Bj)2
feature space
quality = p B>
i Bi
similarity = B>
i Bj
Bi + Bj Bi Bj
a r e a = q k Bi k2
2kBjk2 2 (B> i Bj)2
area = q kBik2
2kBjk2 2 (B> i Bj)2
area = q kBik2
2kBjk2 2 (B> i Bj)2
area = q kBik2
2kBjk2 2 (B> i Bj)2
length = kBik2
area = q kBik2
2kBjk2 2 (B> i Bj)2
length = kBik2 volume = base × height
area = q kBik2
2kBjk2 2 (B> i Bj)2
= ||B1||2vol(proj⊥B1(B2:N)) vol(B) = height × base
length = kBik2 volume = base × height
area = q kBik2
2kBjk2 2 (B> i Bj)2
area = q kBik2
2kBjk2 2 (B> i Bj)2
||Bi||2
2
B>
i Bj
B>
i Bj
= det
||Bj||2
2
1 2
area = q kBik2
2kBjk2 2 (B> i Bj)2
||Bi||2
2
B>
i Bj
B>
i Bj
= det
||Bj||2
2
1 2
= det(
Bi Bj Bi Bj
1 2
= det(
Bi Bj Bi Bj
1 2
vol(B{i,j})
= det(
Bi Bj Bi Bj
1 2
vol(B{i,j}) vol(B) = det
1 2
B1 BN . . . B1 BN . . .
vol(B)2 = det(B>B) = det(L)
1 2
Y = {1, . . . , N}
Y = {1, . . . , N}
Y = {1, . . . , N}
Y = {1, . . . , N}
P({2, 3, 5}) ∝
L11 L12 L13 L14 L15 L21 L22 L23 L24 L25 L35 L34 L33 L32 L31 L41 L42 L43 L44 L45 L55 L54 L53 L52 L51 P({2, 3, 5}) ∝
L11 L12 L13 L14 L15 L21 L22 L23 L24 L25 L35 L34 L33 L32 L31 L41 L42 L43 L44 L45 L55 L54 L53 L52 L51 P({2, 3, 5}) ∝
L22 L23 L25 L35 L33 L32 L55 L53 L52 P({2, 3, 5}) ∝
L22 L23 L25 L35 L33 L32 L55 L53 L52 det(
P({2, 3, 5}) ∝
L22 L23 L25 L35 L33 L32 L55 L53 L52 det(
P({2, 3, 5}) =
L22 L23 L25 L35 L33 L32 L55 L53 L52 det(
P({2, 3, 5}) = det(L + I)
PL(Y = Y )
Normalizing:
PL(Y = Y ) P(Y ⊆ Y)
Normalizing: Marginalizing:
PL(Y = Y ) PL(Y = B | A ⊆ Y) PL(Y = B | A ∩ Y = ∅) P(Y ⊆ Y)
Normalizing: Marginalizing: Conditioning:
PL(Y = Y ) PL(Y = B | A ⊆ Y) PL(Y = B | A ∩ Y = ∅) Y ∼ PL P(Y ⊆ Y)
Normalizing: Marginalizing: Conditioning: Sampling:
PL(Y = Y ) PL(Y = B | A ⊆ Y) PL(Y = B | A ∩ Y = ∅) Y ∼ PL P(Y ⊆ Y) O(N 3)
Normalizing: Marginalizing: Conditioning: Sampling:
KULESZA AND TASKAR (NIPS 2010)
B1 BN
. . .
B2 B3 B1 BN
. . .
B2 B3
KULESZA AND TASKAR (NIPS 2010)
B1 BN
. . .
B2 B3 B1 BN
. . .
B2 B3
N × N =
KULESZA AND TASKAR (NIPS 2010)
B1 BN
. . .
B2 B3 B1 BN
. . .
B2 B3
N × N =
KULESZA AND TASKAR (NIPS 2010)
B1 BN
. . .
B2 B3 B1 BN
. . .
B2 B3
=
KULESZA AND TASKAR (NIPS 2010)
B1 BN
. . .
B2 B3 B1 BN
. . .
B2 B3
= D × D
KULESZA AND TASKAR (NIPS 2010)
L = V ΛV > C = ˆ V Λ ˆ V >
L = V ΛV > C = ˆ V Λ ˆ V > V = B> ˆ V Λ 1
2
L = V ΛV > C = ˆ V Λ ˆ V > V = B> ˆ V Λ 1
2
O(D3) P
Y det(LY )
Normalizing
L = V ΛV > C = ˆ V Λ ˆ V > V = B> ˆ V Λ 1
2
O(D3) P
Y det(LY )
Normalizing
O(D3 + D2k2)
Marginalizing & Conditioning
L = V ΛV > C = ˆ V Λ ˆ V > V = B> ˆ V Λ 1
2
O(ND2k) Y ∼ PL
Sampling
O(D3) P
Y det(LY )
Normalizing
O(D3 + D2k2)
Marginalizing & Conditioning
N = O({sentence length}{sentence length})
We want to select a diverse set of parses.
N = O({sentence length}{sentence length})
We want to select a diverse set of parses.
N = O({node degree}{path length})
i =
KULESZA AND TASKAR (NIPS 2010)
i =
Bi = q(i)φ(i)
KULESZA AND TASKAR (NIPS 2010)
quality similarity
i =
Bi = q(i)φ(i) i = {iα}α∈F α c = 1
KULESZA AND TASKAR (NIPS 2010)
quality similarity
i =
Bi = q(i)φ(i) i = {iα}α∈F α c = 2
KULESZA AND TASKAR (NIPS 2010)
quality similarity
i =
i = {iα}α∈F Bi = Q
α∈F
q(iα)
α c = 2
KULESZA AND TASKAR (NIPS 2010)
i =
i = {iα}α∈F Bi = Q
α∈F
q(iα) P
α∈F
φ(iα)
c = 2
KULESZA AND TASKAR (NIPS 2010)
i =
i = {iα}α∈F Bi = Q
α∈F
q(iα) P
α∈F
φ(iα)
c = 2 O(ND2k) Y ∼ PL
KULESZA AND TASKAR (NIPS 2010)
Bi = Q
α∈F
q(iα) P
α∈F
φ(iα)
R = α c = 2 Y ∼ PL O(D2k3 + Dk2M cR)
KULESZA AND TASKAR (NIPS 2010)
Bi = Q
α∈F
q(iα) P
α∈F
φ(iα)
R = α c = 2 Y ∼ PL O(D2k3 + Dk2M cR)
KULESZA AND TASKAR (NIPS 2010)
M cR = 42 ⇤ 12 = 192 ⌧ N = 412 = 16,777,216
Large Exponential Small dual dual + structure
N = # of items D = # of features
Large Exponential Small dual dual + structure Large
N = # of items D = # of features
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
N D Φ
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
D Φ
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
M cR
D d D × Φ
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
M cR
D d D × d = Φ
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
M cR M cR
d D
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
d D
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
JOHNSON AND LINDENSTRAUSS (1984)
JOHNSON AND LINDENSTRAUSS (1984)
log N
JOHNSON AND LINDENSTRAUSS (1984)
log N
MAGEN AND ZOUZIAS (2008)
log N
MAGEN AND ZOUZIAS (2008)
log N
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
vol2 = det
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
k = 1
vol2 = det
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
k = 1
k = 2
vol2 = det
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
k = 1
k = 2
k = 3
vol2 = det
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
d = O ⇣ max n
k ✏ , log(1/)+log(N) ✏2
+ k
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
d = O ⇣ max n
k ✏ , log(1/)+log(N) ✏2
+ k
total # of items subset size
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
d = O ⇣ max n
k ✏ , log(1/)+log(N) ✏2
+ k
w.p. 1 δ : kPk ˜ Pkk1 e6k✏ 1
total # of items subset size
GILLENWATER, KULESZA, AND TASKAR (EMNLP 2012)
d = O ⇣ max n
k ✏ , log(1/)+log(N) ✏2
+ k
50 100 150 0.2 0.4 0.6 0.8 1 1.2 L1 variational distance Projection dimension 1 2 3 4 x 10
8Memory use (bytes)
w.p. 1 δ : kPk ˜ Pkk1 e6k✏ 1
total # of items subset size
March 28: Health officials confirm Ebola outbreak in Guinea’s capital
March 28: Health officials confirm Ebola outbreak in Guinea’s capital August 8: World Health Organization declares Ebola epidemic an international health emergency
March 28: Health officials confirm Ebola outbreak in Guinea’s capital August 8: World Health Organization declares Ebola epidemic an international health emergency September 2: GlaxoSmithKlein begins Ebola vaccine drug trial
10360
March 28: Health officials confirm Ebola outbreak in Guinea’s capital August 8: World Health Organization declares Ebola epidemic an international health emergency September 2: GlaxoSmithKlein begins Ebola vaccine drug trial
M ≈ 35,000
φ(i) D = 36,356
G
Gφ(i) φ(i) D = 36,356 d = 50
30
31
Jan 08 Jan 28 Feb 17 Mar 09 Mar 29 Apr 18 May 08 May 28 Jun 17 pope vatican church parkinson israel palestinian iraqi israeli gaza abbas baghdad
social tax security democrats rove accounts iraq iraqi killed baghdad arab marines deaths forces
Jan 08 Jan 28 Feb 17 Mar 09 Mar 29 Apr 18 May 08 May 28 Jun 17 pope vatican church parkinson israel palestinian iraqi israeli gaza abbas baghdad
social tax security democrats rove accounts iraq iraqi killed baghdad arab marines deaths forces
Feb 24: Parkinson's Disease Increases Risks to Pope Feb 26: Pope's Health Raises Questions About His Ability to Lead Mar 13: Pope Returns Home After 18 Days at Hospital Apr 01: Pope's Condition Worsens as World Prepares for End of Papacy Apr 02: Pope, Though Gravely Ill, Utters Thanks for Prayers Apr 18: Europeans Fast Falling Away from Church Apr 20: In Developing World, Choice [of Pope] Met with Skepticism May 18: Pope Sends Message with Choice of Name
System ROUGE-1F R-SU4F Coherence
System k-means ROUGE-1F 16.5 R-SU4F 3.76 Coherence 2.73
System k-means DTM ROUGE-1F 16.5 14.7 R-SU4F 3.76 3.44 Coherence 2.73 3.2
System k-means DTM DPP ROUGE-1F 16.5 14.7 17.2 R-SU4F 3.76 3.44 3.98 Coherence 2.73 3.2 3.3
System k-means DTM DPP ROUGE-1F 16.5 14.7 17.2 R-SU4F 3.76 3.44 3.98 Coherence 2.73 3.2 3.3 Runtime (s) 626 19,434 252
basic scores for all possible parse trees
basic scores for all possible parse trees
features provides more refined scores
basic scores for all possible parse trees
features provides more refined scores
under the simple model, then score these k with the more complex model and output the best
basic scores for all possible parse trees
features provides more refined scores
under the simple model, then score these k with the more complex model and output the best
ranker does not get to consider significantly different parses
N = O({sentence length}{sentence length})
We want to select a diverse set of parses.
N = O({sentence length}{sentence length})
We want to select a diverse set of parses.
Quality: standard parser scores
N = O({sentence length}{sentence length})
We want to select a diverse set of parses.
Quality: standard parser scores Diversity: edge lengths, POS pairs, etc.
words (e.g. river bank vs bank deposit)
words (e.g. river bank vs bank deposit)
cluster centers represent word senses
words (e.g. river bank vs bank deposit)
cluster centers represent word senses
centers as the problem of finding a high-quality, diverse set
words (e.g. river bank vs bank deposit)
cluster centers represent word senses
centers as the problem of finding a high-quality, diverse set Quality: centrality (density
words (e.g. river bank vs bank deposit)
cluster centers represent word senses
centers as the problem of finding a high-quality, diverse set Quality: centrality (density
Diversity: same as standard WSI features