A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization Massih Amini and Nicolas Usunier April the 26th 2007
Université Pierre et Marie Curie (Paris 6)
Laboratoire d’Informatique de Paris 6
A Contextual Query Expansion Approach by Term Clustering for Robust - - PowerPoint PPT Presentation
A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization Massih Amini and Nicolas Usunier April the 26 th 2007 Universit Pierre et Marie Curie (Paris 6) Laboratoire dInformatique de Paris 6 LIP6 summarizer
Laboratoire d’Informatique de Paris 6
Laboratoire d'Informatique de Paris 6 2
Documents
Preprocessings
v
a b u l a r y
G1 Gi Gn Gk Gl
Dθ
Combination Postprocessing
Qθ
Question Tθ Title
θ
T
Qθ
θ
Q
Alignement Sentence features Term clustering
Laboratoire d'Informatique de Paris 6 3
Term clustering Preprocessings
Documents
θ
T
v
a b u l a r y
G1 Gi Gn Gk Gl
Qθ
Dθ
Combination Postprocessing
Qθ
Question Tθ Title
θ
Q
Alignement Sentence features
Laboratoire d'Informatique de Paris 6 4
Words occurring in the same context with the same
Each term is generated by a mixture density, Each term of the vocabulary V belongs to one and only one
=
K k k k
1
hi ki i i
Laboratoire d'Informatique de Paris 6 5
Laboratoire d'Informatique de Paris 6 6
E-step: Estimate the posterior class probability that each term wj
C-step: Assign each term probability with maximal posterior probability
M-step: Estimate the new mixture parameters which maximize the
E-step: Estimate the posterior class probability that each term wj
C-step: Assign each term probability with maximal posterior probability
Laboratoire d'Informatique de Paris 6 7
D0714: Term cluster containing Napster digital trade act format drives allowed illegally net napster search stored alleged released musical electronic internet signed intended idea billions distribution exchange mp3 music songs tool D0728: Term cluster containing Interferon depression interferon antiviral protein drug ribavirin combination people hepatitis liver disease treatment called doctors cancer epidemic flu fever schering plough corp D0705: Term cluster containing basque and separatism basque people separatist armed region spain separatism eta independence police france batasuna nationalists herri bilbao killed
Laboratoire d'Informatique de Paris 6 8
Alignement
θ
T
θ
Q
Combination Postprocessing
v
a b u l a r y
G1 Gi Gn Gk Gl
Qθ
Question Tθ Title
Documents
Preprocessings Term clustering
Dθ
Qθ
Sentence features
Laboratoire d'Informatique de Paris 6 9
Laboratoire d'Informatique de Paris 6 10
w df log Z , w tf Z , w c Q , w c S , w c Q , w c S , w c Q , S Sim
Q w S w Q S w
× = =
∈ ∈ ∩ ∈ 2 2
Laboratoire d'Informatique de Paris 6 11
Laboratoire d'Informatique de Paris 6 12
Laboratoire d'Informatique de Paris 6 13
QuestionD0708: What countries are having chronic potable water shortages and why?
Before
Document: XIE19970212.0042
Tadesse said 18 water supply projects are underway at various stages, adding that one of such projects involved the sinking of 25 wells at Akaki, about 20 kilometers from Addis Ababa, which will supply 75,000 cubic meters of water daily to the capital city. Currently, the authority supplies only 60 percent of the city's potable water demand. According to a report here today, the announcement was made by Tadesse Kebede, general manager of the authority. The Addis Ababa Regional Water and Sewerage Authority announced that the shortage of potable water in the capital city of Ethiopia will be solved in the last quarter of this year. After The Addis Ababa Regional Water and Sewerage Authority announced that the shortage of potable water in the capital city of Ethiopia will be solved in the last quarter of this year. Tadesse said 18 water supply projects are underway at various stages, adding that one of such projects involved the sinking of 25 wells at Akaki, about 20 kilometers from Addis Ababa, which will supply 75,000 cubic meters of water daily to the capital city.
Laboratoire d'Informatique de Paris 6 14
Combination Postprocessing
Documents
Preprocessings Term clustering
v
a b u l a r y
G1 Gi Gn Gk Gl
Dθ
Qθ
Question Tθ Title
θ
T
Qθ
θ
Q
Alignement Sentence features
Laboratoire d'Informatique de Paris 6 15
q1 = question keywords, q2 = question keywords expanded with their word clusters, q3 = title keywords expanded with their word clusters,
Laboratoire d'Informatique de Paris 6 16
Object
1 2 . . . n
Rank Sys1
r1 r2 . . . rn
Rank Sys2
s1 s2 . . . sn
) 1 ( 6 1
2 1 2 2 1
− − − = =
=
n n s r s , r Cov Sys , Sys an CorrSpearm
n i i i s rσ
σ
Laboratoire d'Informatique de Paris 6 17
Laboratoire d'Informatique de Paris 6 18
Laboratoire d'Informatique de Paris 6 19
Laboratoire d'Informatique de Paris 6 20
Laboratoire d'Informatique de Paris 6 21
Laboratoire d'Informatique de Paris 6 22