Improving the Accuracy of System Performance Estimation by Using - PowerPoint PPT Presentation

Aug 16, 2022 •286 likes •562 views

Improving the Accuracy of System Performance Estimation by Using Shards Nicola Ferro & Mark Sanderson 1 IR evaluation is noisy 1.00 0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00 2 ANOVA Data = Model + Error

— Improving the Accuracy of System Performance Estimation by Using Shards Nicola Ferro & Mark Sanderson 1
IR evaluation is noisy 1.00 0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00 2
ANOVA Data = Model + Error Model: Linear mixture of factors 3
First go Tague-Sutcliffe and Blustein, 1995 Factors Systems Topics 4
Question Can we do better? Add a Topic*System factor? 5
New system 1.00 0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00 6
Partition collections Shards 7
Replicates 1.00 1.00 0.80 0.80 0.60 0.60 0.40 0.40 0.20 0.20 0.00 0.00 8
Replicates 1.00 1.00 0.80 0.80 0.60 0.60 0.40 0.40 0.20 0.20 0.00 0.00 9
Replicates 1.00 1.00 0.80 0.80 0.60 0.60 0.40 0.40 0.20 0.20 0.00 0.00 E. M. Voorhees, D. Samarov, and I. Soboroff. Using Replicates in Information Retrieval Evaluation. ACM Transactions on Information Systems (TOIS), 36(2): 12:1–12:21, September 2017 10
Past ANOVA Factors Topics Systems Topics*System interactions 11
Our ANOVA Factors Topics Systems Shards Topics*System System*Shard Topic*Shard 12
Models 13
IR evaluation is noisy 1.00 0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00 14
Hard vs Easy Topics? 1.00 1.00 0.90 0.90 0.80 0.80 0.70 0.70 0.60 0.60 ok8alx 0.50 0.50 INQ604 0.40 0.40 0.30 0.30 0.20 0.20 0.10 0.10 0.00 0.00 15
Few vs Many QRELs 16
This paper x 17
Proof in the paper Include topic*shard factor? Value of x is not important, we choose x=0 18
Few vs Many QRELs? Should topics be ‘treated’ equally? G. V. Cormack and T. R. Lynam. Statistical Precision of Information Retrieval Evaluation. In E. N. Efthimiadis, S. Dumais, D. Hawking, and K. Järvelin, editors, Proc. 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), pages 533–540. ACM Press, New York, USA, 2006 S. Robertson. On document populations and measures of IR effectiveness. In Proceedings of the 1st International Conference on the Theory of Information Retrieval (ICTIR’07), Foundation for Information Society, pages 9–22, 2007. 19
Compare MD6 factors 1.00 1.00 0.80 0.80 0.60 0.60 0.40 0.40 0.20 0.20 0.00 0.00 20
Experiments TREC-8, Adhoc, 129 runs TREC-9, Web, 104 runs TREC-27, Common core, 72 runs Original runs rankings ( τ ) 21
22
MD6 23
Other parts of the paper Confidence intervals calculated with Tukey HSD Details of the proof on zero value for shards & MD6 Code: https://bitbucket.org/frrncl/sigir2019-fs-code/ 24
Conclusions Can we do better than past ANOVA? Yes, MD6 Topic*shard interaction is strong. Its impact has not been observed when measuring performance Test collections are expensive to build, we can get substantially more signal out of three collections 25
Future work UQV100: query test collection Compare to Voorhees, Samarov, Soboroff, 2017. Metric, not significant differences but predictive power Create new collections with fewer judgments/topics 26

Recommend

Indoor Accuracy Test Bed Framework Indoor Accuracy Test Bed Framework Working Group #3 E911

Indoor Accuracy Test Bed Framework Indoor Accuracy Test Bed Framework Working Group #3 E911 Location Accuracy E911 Location Accuracy September 12, 2012 Stephen J. Wisely and Richard Craig, Co Chairs Test Bed High Level Overview F ll

215 views • 7 slides

the myth of accuracy Damian Harty, Lucid Motors the myth of accuracy Its easy to believe

the myth of accuracy Damian Harty, Lucid Motors the myth of accuracy Its easy to believe that the usefulness (utility?) of any modelling process is directly related to its accuracy Use Accuracy This simple misconception

387 views • 10 slides

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Based on Based on Codirectionality of Movements Codirectionality of Movements

233 views • 22 slides

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi,

Introduction Methods for Accuracy Estimation Methodology Results and Discussion Summary A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Panagiotis Adamopoulos Department of Information,

424 views • 16 slides

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test L. Garcia Barrado 1 E. Coart 2 T.

520 views • 29 slides

HYBRID PARAMETRIC ESTIMATION FOR GREATER ACCURACY www.level4ventures.com 1 Agenda

William Roetzheim HYBRID PARAMETRIC ESTIMATION FOR GREATER ACCURACY www.level4ventures.com 1 Agenda Abstract Bio Estimation Approaches Estimation Process Estimating Lifecycle Core Estimating Concept

306 views • 18 slides

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com About 100 Years Ago www.jeanchatzky.com How Is Your A Hot Relationship With Mess! Money? www.jeanchatzky.com How Would You Describe

427 views • 40 slides

ANALYTICAL N-BPM METHOD ANALYTICAL N-BPM METHOD IMPROVING ACCURACY AND ROBUSTNESS OF LINEAR

ANALYTICAL N-BPM METHOD ANALYTICAL N-BPM METHOD IMPROVING ACCURACY AND ROBUSTNESS OF LINEAR OPTICS IMPROVING ACCURACY AND ROBUSTNESS OF LINEAR OPTICS MEASUREMENTS MEASUREMENTS PhD student at CERN and Andreas Wegscheider Hamburg University

359 views • 33 slides

Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman

Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman Mohammad Ghafari Oscar Nierstrasz 1 Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman Mohammad

350 views • 19 slides

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non- -Parametric? Parametric? Parametric or Non Parametric or Non-Parametric? Optical Fiber Communication Conference 2008 Parametric versus Non-Parametric

475 views • 12 slides

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

M-estimation Our Paper Isometry Between (M)-estimation & Lasso M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics M-estimation

776 views • 54 slides

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

ENEE630 Part ENEE630 Part- -3 3 Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral Estimation 3.2 Parametric Methods for Spectral Estimation Electrical & Computer Engineering Electrical &

820 views • 45 slides

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

5. Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation 6.1. The parameter as a random variable The parameter as a random variable So far we have seen the frequentist approach to statistical inference

687 views • 14 slides

1 Miniature Mode Estimation Example ACS Subsystem GHe Pressure Transducer (S) Model

Outline Motivation Previous Mode Estimation Approaches Example ACS System Improving Model-based Mode Estimation Miniature Mode Estimation System Through Offline Compilation Rule-based system comparison Conclusions

498 views • 4 slides

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving Improving Improving Improving Improving Improving Fragile Quality Risk and Operations & Workforce Leadership & Services Governance

419 views • 29 slides

for innovation improving for innovation improving Design Thinking for innovation improving New

for innovation improving for innovation improving Design Thinking for innovation improving New project of NArFU and the University of Troms (Norway): Improving the innovative potential of UiT and NArFU with the Design Thinking methodology

212 views • 7 slides

Overview of State Space Models Standard State Space Model Standard state space model x n +1 =

Overview of State Space Models Standard State Space Model Standard state space model x n +1 = F n x n + G n u n n 0 Straight-forward extensions y n = H n x n + v n Properties where Solving the normal equations F n C

293 views • 16 slides

Generalized Cross Entropy Loss for Noisy Labels Zhilu Zhang and Mert R. Sabuncu Cornell

Generalized Cross Entropy Loss for Noisy Labels Zhilu Zhang and Mert R. Sabuncu Cornell University Generalized Cross Entropy Loss for Noisy Labels Poster # 101

262 views • 11 slides

Example: Grid World CS 188: Artificial Intelligence Markov Decision Processes II A

Example: Grid World CS 188: Artificial Intelligence Markov Decision Processes II A maze-like problem The agent lives in a grid Walls block the agents path Noisy movement: actions do not always go as planned 80% of

258 views • 12 slides

Neural Networks for Machine Learning Lecture 9a Overview of ways to improve generalization

Neural Networks for Machine Learning Lecture 9a Overview of ways to improve generalization Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Reminder: Overfitting The training data contains information

1.18k views • 39 slides

2018 Annual Noise & Operations Report Santa Monica Airport Commission April 22, 2019 Areas

ITEM 4(a) 2018 Annual Noise & Operations Report Santa Monica Airport Commission April 22, 2019 Areas of Discussion Aircraft Operations Data Voluntary Night Arrival Curfew Curfew Operations & Violations Noise Violations

196 views • 8 slides

C OMPUTATIONAL A SPECTS OF C OMPUTATIONAL D IGITAL P HOTOGRAPHY P HOTOGRAPHY Noise & Denoising

CS 89.15/189.5, Fall 2015 C OMPUTATIONAL A SPECTS OF C OMPUTATIONAL D IGITAL P HOTOGRAPHY P HOTOGRAPHY Noise & Denoising Wojciech Jarosz wojciech.k.jarosz@dartmouth.edu Todays agenda Info on paper discussions Noise Denoising with

1.2k views • 94 slides

Privet! 2004 - 2010: Performance Engineer, Software Engineer @ MySQL AB / Sun Microsystems / Oracle

Benchmark Noise Reduction: How to Configure Your Machines for Stable Results Santa Clara, California | April 23th 25th, 2018 Privet! 2004 - 2010: Performance Engineer, Software Engineer @ MySQL AB / Sun Microsystems / Oracle 2010 - 2015:

765 views • 60 slides

Five Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text Marlies van

Five Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text Marlies van der Wees, Arianna Bisazza, Christof Monz Statistical Machine Translation News sentence: (mumbai,

448 views • 12 slides

Improving the Accuracy of System Performance Estimation by Using - PowerPoint PPT Presentation

Improving the Accuracy of System Performance Estimation by Using Shards Nicola Ferro & Mark Sanderson 1 IR evaluation is noisy 1.00 0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00 2 ANOVA Data = Model + Error

Indoor Accuracy Test Bed Framework Indoor Accuracy Test Bed Framework Working Group #3 E911

the myth of accuracy Damian Harty, Lucid Motors the myth of accuracy Its easy to believe

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi,

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference

HYBRID PARAMETRIC ESTIMATION FOR GREATER ACCURACY www.level4ventures.com 1 Agenda

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

ANALYTICAL N-BPM METHOD ANALYTICAL N-BPM METHOD IMPROVING ACCURACY AND ROBUSTNESS OF LINEAR

Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

1 Miniature Mode Estimation Example ACS Subsystem GHe Pressure Transducer (S) Model

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

for innovation improving for innovation improving Design Thinking for innovation improving New

Overview of State Space Models Standard State Space Model Standard state space model x n +1 =

Generalized Cross Entropy Loss for Noisy Labels Zhilu Zhang and Mert R. Sabuncu Cornell

Example: Grid World CS 188: Artificial Intelligence Markov Decision Processes II A

Neural Networks for Machine Learning Lecture 9a Overview of ways to improve generalization

2018 Annual Noise & Operations Report Santa Monica Airport Commission April 22, 2019 Areas

C OMPUTATIONAL A SPECTS OF C OMPUTATIONAL D IGITAL P HOTOGRAPHY P HOTOGRAPHY Noise & Denoising

Privet! 2004 - 2010: Performance Engineer, Software Engineer @ MySQL AB / Sun Microsystems / Oracle

Five Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text Marlies van

Sambuz

Useful Links

Newsletter

Mail Us

Improving the Accuracy of System Performance Estimation by Using - PowerPoint PPT Presentation

Improving the Accuracy of System Performance Estimation by Using Shards Nicola Ferro & Mark Sanderson 1 IR evaluation is noisy 1.00 0.80 0.60 0.40 0.20 0.00 1.00 0.80 0.60 0.40 0.20 0.00 2 ANOVA Data = Model + Error

Indoor Accuracy Test Bed Framework Indoor Accuracy Test Bed Framework Working Group #3 E911

the myth of accuracy Damian Harty, Lucid Motors the myth of accuracy Its easy to believe

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi,

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference

HYBRID PARAMETRIC ESTIMATION FOR GREATER ACCURACY www.level4ventures.com 1 Agenda

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

ANALYTICAL N-BPM METHOD ANALYTICAL N-BPM METHOD IMPROVING ACCURACY AND ROBUSTNESS OF LINEAR

Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

1 Miniature Mode Estimation Example ACS Subsystem GHe Pressure Transducer (S) Model

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

for innovation improving for innovation improving Design Thinking for innovation improving New

Overview of State Space Models Standard State Space Model Standard state space model x n +1 =

Generalized Cross Entropy Loss for Noisy Labels Zhilu Zhang and Mert R. Sabuncu Cornell

Example: Grid World CS 188: Artificial Intelligence Markov Decision Processes II A

Neural Networks for Machine Learning Lecture 9a Overview of ways to improve generalization

2018 Annual Noise &amp; Operations Report Santa Monica Airport Commission April 22, 2019 Areas

C OMPUTATIONAL A SPECTS OF C OMPUTATIONAL D IGITAL P HOTOGRAPHY P HOTOGRAPHY Noise &amp; Denoising

Privet! 2004 - 2010: Performance Engineer, Software Engineer @ MySQL AB / Sun Microsystems / Oracle

Five Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text Marlies van

Sambuz

Useful Links

Newsletter

Mail Us

2018 Annual Noise & Operations Report Santa Monica Airport Commission April 22, 2019 Areas

C OMPUTATIONAL A SPECTS OF C OMPUTATIONAL D IGITAL P HOTOGRAPHY P HOTOGRAPHY Noise & Denoising