NTCIR-4
POSTECH at NTCIR-4: CJKE Monolingual and Korean-related Cross-Language Retrieval Experiments
- Jun. 2, 2004
In-Su Kang*, Seung-Hoon Na, Jong-Hyeok Lee Knowledge and Language Engineering Laboratory
- Dept. of Computer Science & Engineering
POSTECH at NTCIR-4: CJKE Monolingual and Korean-related - - PowerPoint PPT Presentation
POSTECH at NTCIR-4: CJKE Monolingual and Korean-related Cross-Language Retrieval Experiments Jun. 2, 2004 In-Su Kang*, Seung-Hoon Na, Jong-Hyeok Lee Knowledge and Language Engineering Laboratory Dept. of Computer Science & Engineering
NTCIR-4
NTCIR-4
NTCIR-4
NTCIR-4
Sum TF Sum Interpolation Sum, or Union DF Two Document score Ranked list Two Term weight Term weighting One Index term Index creation # of Indexes Coupling Unit Coupling Stage
NTCIR-4
Word Indexes Ngram Indexes Query 1st Retrieval Probabilistic Model Language Model Word Indexes Ngram Indexes Expansion Term Selection 2nd Retrieval Probabilistic Model Language Model Fusion Probabilistic Model Language Model
16 ranked lists
NTCIR-4
NTCIR-4
NTCIR-4
0.3799 0.2584 (-4.3%) 0.1853 0.2699* 0.2532 0.1603 0.2050 0.2297 T 0.3103 0.3880 NTCIR-4 MAX 0.2535 (-5.6%) 0.2016 0.2686* 0.2398 0.1533 0.1823 0.2069 D wPLP+nPPP+nLLL wPLP nLLL nPPP wP-- nL-- nP-- 0.3103* (+1.4%) 0.2968 (-1.7%) 0.2703 (-5.4%) Fusion 0.2693 0.2503 0.2049 0.3046 0.3019* 0.2856* 0.3060 0.2983 0.2681 2nd Retrieval TDNC 0.2281 0.2708 0.2855 DN 0.2358 0.1789 0.2809 0.2365 0.2911 0.2562 1st Retrieval C
NTCIR-4
0.4864 0.4211 (-0.4%) 0.4226* 0.4056 0.3844 0.3647 0.3260 0.3650 T 0.4963 0.4838 NTCIR-4 MAX 0.4119 (-3.8%) 0.4103 0.4282* 0.3842 0.3715 0.3101 0.3424 D wPLP+nPPP+nLLL wPLP nLLL nPPP wP-- nL-- nP-- 0.4963 (-1.2%) 0.4741 (-3.7%) 0.4105 (-2.4%) Fusion 0.4875 0.4715 0.3806 0.5024* 0.4924* 0.4207* 0.4856 0.4539 0.3926 2nd Retrieval TDNC 0.4439 0.4274 0.4346 DN 0.4561 0.3426 0.4435 0.3141 0.4570 0.3496 1st Retrieval C
NTCIR-4
0.5361 0.5226* (+5.2%) 0.4900 0.4967 0.4660 0.4285 0.4091 0.4515 T 0.6212 0.5097 NTCIR-4 MAX 0.4885* (+2.4%) 0.4771 0.4623 0.4347 0.4184 0.3674 0.4198 D wPLP+nPPP+nLLL wPLP nLLL nPPP wP-- nL-- nP-- 0.6212* (+2.8%) 0.5932* (+2.2%) 0.4846* (+5.1%) Fusion 0.5859 0.5806 0.4611 0.5873 0.5592 0.4496 0.6040 0.5610 0.4499 2nd Retrieval TDNC 0.5111 0.4896 0.5249 DN 0.5383 0.4370 0.5318 0.4081 0.5598 0.4450 1st Retrieval C
NTCIR-4
NTCIR-4
NTCIR-4
NTCIR-4
NTCIR-4
NTCIR-4
Source Language Query Source-Target Bilingual Dic. Pseudo Document Translation Source Language
(Word & N-gram) Target Language
(Word & N-gram) Target Language Query Source-Target Bilingual Dic. Query Translation (Statistical WSD) Document Lists Document Lists Fusion
NTCIR-4
NTCIR-4
NTCIR-4
NTCIR-4
0.4229 (4.7%) 0.4098 (4.8%) 0.3241 (3.2%) 0.3362 (4.8%) 0.3234 (2.2%) QT(wP–)+DT(nP–) 0.2089 (1.6%) 0.1992 (2.8%) 0.1763 (11.4%) 0.1731 (18.9%) 0.1687 (8.8%) QT(wP–)+DT(nP–) 0.3602 (11.4%) 0.3165 (10.6%) 0.2861 0.1892 (12.2%) 0.1551 (8.0%) 0.1436 T 0.3601 (7.1%) 0.3207 (5.5%) 0.3039 0.1869 (7.9%) 0.1448 (-0.5%) 0.1456 D 0.3713 (14.6%) 0.3140 (4.7%) 0.3000 0.2028 (15.0%) 0.1567 (-1.1%) 0.1584 C QT(wPLP) + DT(nPLP) DT(nP–) QT(wP–) QT(wPLP) + DT(nPLP) DT(nP–) QT(wP–) 0.4473 (5.8%) 0.4471 (9.1%) 0.4039 (3.4%) 0.3909 (3.9%) 0.3905 0.3763 K J 0.2469 (18.2%) 0.2057 (15.7%) 0.1778 TDNC 0.2378 (19.4%) 0.1937 (16.3%) 0.1665 K C DN
NTCIR-4
9.34% 4.03% 5.38% 19.87% 9.63% 5.38% 34.31% 16.96% 8.09% 14.83% 8.20% 8.09% 0.3314 0.1584 QT 0.3972 0.2127 QT + DT (feedback) 0.3633 0.1852 QT + DT (no feedback) 0.3492 0.1712 DT KJ KC
NTCIR-4
0.4773 (3.6%) 0.4632 (2.1%) 0.3833 (6.9%) 0.3666 (4.7%) 0.3634 (2.1%) QT(wP–)+QT(nP–) 0.4538 (4.2%) 0.4259 (3.9%) 0.3557 (2.6%) 0.3463 (3.6%) 0.3663 (2.5%) QT(wP–)+QT(nP–) 0.4559 (25.5%) 0.3490 (-1.9%) 0.3559 0.4343 (18.6%) 0.3572 (3.1%) 0.3466 T 0.4306 (17.5%) 0.3501 (2.0%) 0.3431 0.4314 (24.6%) 0.3342 (4.7%) 0.3193 D 0.4593 (19.8%) 0.3587 (3.9%) 0.3451 0.4083 (14.8%) 0.3466 (3.0%) 0.3364 C QT(wPLP) + QT(nPLP) QT(nP–) QT(wP–) QT(wPLP) + QT(nPLP) QT(nP–) QT(wP–) 0.5446 (14.1%) 0.5383 (16.2%) 0.4607 (3.5%) 0.4536 (6.9%) 0.4450 0.4243 J K 0.5138 (13.2%) 0.4355 (1.3%) 0.4299 TDNC 0.5060 (18.8%) 0.4099 (2.4%) 0.4004 C K DN
NTCIR-4
18.25% 4.14% 3.07% 26.93% 7.34% 3.07% 25.17% 6.30% 2.77% 17.75% 3.43% 2.77% 0.3827 0.3665 QT(wP–) 0.4857 0.4588 QT(wPLP) + QT(nPLP) 0.4108 0.3896 QT(wP–)+QT(nP–) 0.3944 0.3767 QT(nP–) JK CK
NTCIR-4
NTCIR-4