Evaluation
Experimental protocols, datasets, metrics
Web Search
1
Evaluation Experimental protocols, datasets, metrics Web Search 1 - - PowerPoint PPT Presentation
Evaluation Experimental protocols, datasets, metrics Web Search 1 What makes a good search engine? Efficiency : It replies to user queries without noticeable delays. 1 sec is the limit for users feeling that they are freely
1
2
3
Metric name Description Elapsed indexing time Measures the amount of time necessary to build a document index on a particular system. Indexing processor time Measures the CPU seconds used in building a document
time waiting for I/O or speed gains from parallelism. Query throughput Number of queries processed per second. Query latency The amount of time a user must wait after issuing a query before receiving a response, measured in milliseconds. This can be measured using the mean, but is often more instructive when used with the median or a percentile bound. Indexing temporary space Amount of temporary disk space used while creating an index. Index size Amount of storage necessary to store the index files.
conversational transactions. Proc. AFIPS Fall Joint Computer Conference Vol. 33, 267-277.
4
5
6
7
8
9
Data System Ranked results Queries
Groundtruth
Evaluation metrics
10
11
12
Ground-truth True False Method True True positive False positive False False negative True negative Type I error Type II error
13
14
People Nepal Mother Baby Colorful dress Fence Sunset Horizon Coulds Orange Desert Flowers Yellow Nature Beach Sea Palm tree White-sand Clear sky
15
𝑙𝑏𝑞𝑞𝑏 = 𝑞 𝐵 − 𝑞 𝐹 1 − 𝑞 𝐹
𝑞 𝐹 -> probability of agreeing by chance 𝑞 𝐵 -> proportion of times humans agreed
16
17
18
1 4 2 3 1 2 3 4
𝑠 = 1 − 6 σ 𝑒𝑗
2
𝑜 𝑜2 − 1 𝑠 = 1 − 6 1 − 1 2 + 2 − 3 2 + 3 − 4 2 + 4 − 2 2 4 42 − 1
19
Em PT: exatidão, precisão e abragência. Ground-truth True False Method True True positive False positive False False negative True negative 𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧 = 𝑢𝑠𝑣𝑓𝑄𝑝𝑡 + 𝑢𝑠𝑣𝑓𝑂𝑓 𝑢𝑠𝑣𝑓𝑄𝑝𝑡 + 𝑔𝑏𝑚𝑡𝑓𝑄𝑝𝑡 + 𝑢𝑠𝑣𝑓𝑂𝑓 + 𝑔𝑏𝑚𝑡𝑓𝑂𝑓 𝑄𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 = 𝑢𝑠𝑣𝑓𝑄𝑝𝑡 𝑢𝑠𝑣𝑓𝑄𝑝𝑡 + 𝑔𝑏𝑚𝑡𝑓𝑄𝑝𝑡 𝑆𝑓𝑑𝑏𝑚𝑚 = 𝑢𝑠𝑣𝑓𝑄𝑝𝑡 𝑢𝑠𝑣𝑓𝑄𝑝𝑡 + 𝑔𝑏𝑚𝑡𝑓𝑂𝑓 𝐺
1 =
2 1 𝑄 + 1 𝑆
20
Improved recall Improved precision Improved F-measure
Recall Precision
System A System B System C
A B ... ... ... ... ... ... ... A ... B ... C ... ... ... ... ... A B C D ... S1 S2 S3
21
A B ... ... ... ... ... ... ... A ... B ... C ... ... ... ... ... A B C D ... S1 S2 S3
22
1 2 3 4 5 6 7 8
𝐵𝑄 = 1 #𝑠𝑓𝑚𝑓𝑤𝑏𝑜𝑢 ∙
𝑙∈ 𝑡𝑓𝑢 𝑝𝑔 𝑞𝑝𝑡𝑗𝑢𝑗𝑝𝑜𝑡 𝑝𝑔 𝑢ℎ𝑓 𝑠𝑓𝑚𝑓𝑤𝑏𝑜𝑢 𝑒𝑝𝑑𝑡
𝑞@𝑙 𝐵𝑄 = 1
4 ∙ 1 2 + 2 4 + 3 6 =0.375
23
𝐵𝑄 = 1 #𝑠𝑓𝑚𝑓𝑤𝑏𝑜𝑢 ∙
𝑙∈ 𝑡𝑓𝑢 𝑝𝑔 𝑞𝑝𝑡𝑗𝑢𝑗𝑝𝑜𝑡 𝑝𝑔 𝑢ℎ𝑓 𝑠𝑓𝑚𝑓𝑤𝑏𝑜𝑢 𝑒𝑝𝑑𝑡
𝑞@𝑙
24
A B ... ... ... ... ... ... ... A ... B ... C ... ... ... ... ... A B C D ...
AP(q1) AP(q2) AP(q3)
𝑁𝐵𝑄 = 𝐵𝑄 𝑟1 + 𝐵𝑄 𝑟2 +𝐵𝑄 𝑟3 +…+𝐵𝑄 𝑟𝑜 𝑜
25
26
27
... A ... B ... C ... ...
𝐸𝐷𝐻𝑛 =
𝑗=1 𝑛
2𝑠𝑓𝑚𝑗 − 1 log2 1 + 𝑗 𝑠𝑓𝑚𝑗 = 0,1,2,3, … 𝑜𝐸𝐷𝐻𝑛 = 𝐸𝐷𝐻𝑛 𝑐𝑓𝑡𝑢𝐸𝐷𝐻𝑛
28
𝐶𝑄𝑆𝐹𝐺 = 1 𝑆
𝑒𝑠
1 − 𝑂𝑒𝑠 𝑆
29
30
31
𝛽 = 0.5
32
33
34
35
Chapter 8 Chapter 8