EVALUATING R E C O M M E N D E R SYSTEMS A C C U R A C Y A N D B - PowerPoint PPT Presentation

EVALUATING R E C O M M E N D E R SYSTEMS A C C U R A C Y A N D B E Y O N D GITHUB.COM/HCORONA/AICS-2016 H U M B E R TO C O R O N A @ TO TO PA M P I N 2 4 -1 0 - 2 0 1 6

A B O U T M E 2

R E F E R E N C E S [1] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative Performance of Neighbourhood-Based Recommender Systems." Spanish Conference of Information Retrieval, 2014 [2] Humberto Jesús Corona Pampín, Houssem Jerbi, and Michael P. O’Mahony. "Evaluating the Relative Performance of Collaborative Filtering Recommender Systems." Journal of Universal Computer Science 21.13 (2015): 1849-1868. 3

ZALANDO https://www.zalando.co.uk/women-street-style/ https://www.zalando.co.uk/men-street-style/ 4

R E C O M M E N D E R S Y S T E M S Enable content discovery by learning the user preferences and exploiting the wisdom of the crowd. 5

E VA L U AT I O N 6

E VA L U AT I O N M E T R I C S R M S E P RE CI S I O N R E C A L L F-1 PER USER C ATA L O G D I V E R S I T Y P O P ULARI TY ITEM U N I Q U E N E S S C O V E R A G E C O V E R A G E 7

E VA L U AT I O N M E T R I C S , A C C U R A C Y R M S E P RE CI S I O N R E C A L L F-1 8

E VA L U AT I O N M E T R I C S , B E Y O N D A C C U R A C Y PER USER C ATA L O G D I V E R S I T Y P O P ULARI TY ITEM U N I Q U E N E S S C O V E R A G E C O V E R A G E 9

E VA L U AT I O N M E T R I C S D I V E R S I T Y 1 0

E VA L U AT I O N M E T R I C S P O P U L A R ITY 11

E VA L U AT I O N M E T R I C S The proportion of items, across the catalog, which are P E R U SER ITEM candidates for recommendations. C O V E R A G E Proportion of items which ever get recommended. C ATA L O G C O V E R A G E 1 2

E VA L U AT I O N M E T R I C S U N I Q U E N E S S 1 3

E VA L U AT I O N M E T R I C S R M S E P RE CI S I O N R E C A L L F-1 PER USER C ATA L O G D I V E R S I T Y P O P ULARI TY ITEM U N I Q U E N E S S C O V E R A G E C O V E R A G E 1 4

E VA L U AT I O N M E T R I C S R M S E P RE CI S I O N R E C A L L F-1 PER USER C ATA L O G D I V E R S I T Y P O P ULARI TY ITEM U N I Q U E N E S S C O V E R A G E C O V E R A G E 1 5

A R E U K N N A N D I K N N R E A L LY T H AT D I F F E R E N T ? A C O M PA R AT I V E A N A LY S I S 1 6

E X P E R I M E N T D E S I G N THE MODELS THE DATA E VA L U AT I O N U K N N A C C U R A C Y M O V I E L E N S - 1 0 0 K IKNN B E Y O N D A C C U R A C Y M O V I E L E N S - 1 M U K N N [ 2 0 , 2 0 0 ] IKNN FIXED TRAINING DATA TESTING DATA 1 0 I T E M S T E S T S E T 1 7

THE ALGORITHMS U S E R B A S E D ITEM-BASED C O L L A B O R AT I V E C O L L A B O R AT I V E FILTERING FILTERING ( U K N N ) ( I K N N ) • Find similar users • Find similar items • word of mouth • Scalable • The neighbours paradigm • Widely used • Scales with number of users 1 8

R E S U LT S 1 9 Insert footnote

S U M M A RY 2 2

LESSONS LEARNED • One size fits all is not true, never, ever! • Use many metrics, even if you don’t optimise for them • They help understanding what is the model doing • Use various datasets (if you want to publish a paper) - Do results generalise? • Understand what is the best proxy or dataset for your evaluation goal. 2 3

C O N C L U S I O N S • User-based (UKNN) and item-based (UKNN) collaborative filtering algorithms have a high inverse correlation between popularity and diversity. • Smaller neighbourhood sizes (for UKNN) lead to more unique, less popular, and more diverse recommendations. • Recommend a common set of items at large neighbourhood sizes. • Matrix factorisation approach (WMF) leads to more accurate and diverse recommendations, while being less biased toward popularity. • item-based collaborative filtering (IKNN) has significantly better catalog coverage. 2 4

EVALUATING R E C O M M E N D E R SYSTEMS A C C U R A C Y A N D B E Y O N D GITHUB.COM/HCORONA/AICS-2016 H U M B E R TO C O R O N A @ TO TO PA M P I N 2 4 -1 0 - 2 0 1 6

E X P E R I M E N T I I 2 6

A B I A S A N A LY S I S 2 7

E X P E R I M E N T D E S I G N THE MODELS THE DATA E VA L U AT I O N U K N N A C C U R A C Y FACEBOOK D ATA S E T IKNN B E Y O N D A C C U R A C Y M O V I E L E N S - WMF H E T R E C S I G N I F I C A N C E LASTFM - HETREC A C C U R A C Y OPTIMISATION TRAINING DATA TESTING DATA 1 0 F O L D C R O S S VA L I D AT I O N 2 8

THE DATASETS FACEBOOK M U S I C / B A N D S D ATA S E T LASTFM - HETREC M U S I C / B A N D S M O V I E L E N S - M O V I E S H E T R E C 2 9

THE ALGORITHMS U S E R B A S E D ITEM-BASED M AT R I X C O L L A B O R AT I V E C O L L A B O R AT I V E FACTORISATION FILTERING FILTERING ( W E I G H T E D ) ( U K N N ) ( I K N N ) • Latent Factors • Find similar users • Find similar items • Really good accuracy • word of mouth • Scalable • Scalable • The neighbours paradigm • Widely used • Parallel computing • Scales with number of users • Very accurate 3 0

E VA L U AT I O N M E T R I C S • PRECISION : Out of the items recommended, how many are good recommendations? • RECALL : How many of the items the user likes are being recommended? • F-1 : Mixes the properties of Precision and Recall into a single metric • DIVERSITY : How different are the items in the list of the recommendations? • POPULARITY: How popular are the items recommended • (PER USER) ITEM COVERAGE: Proportion of items that are candidates for recommendations • CATALOG COVERAGE: The proportion of items of the catalog that ever get recommended • UNIQUENESS: How many items in two recommendation lists are different from each other? 3 1

R E S U LT S 3 2

R E S U LT S - P O P U L A R I T Y B I A S 3 3

R E S U LT S - O T H E R P R O P E R T I E S • Accuracy : WMF performs best in terms of F-1 for the Facebook and MovieLens datasets, while the accuracy of the UKNN and IKNN algorithms are similar. • Per-user item coverage •WMF algorithm considers almost every item as a candidate (UICov > 98%). •The UKNN algorithm (by definition) only items which are in the user’s neighbourhood can be considered as recommendation candidates. IKNN was seen to outperform UKNN in all datasets in terms of • Coverage : the IKNN algorithm, performs significantly better than the other algorithms, covering up to 30% of the item catalog - Up to 6 times more items than the UKNN and WMF algorithms. • Diversity: the WMF algorithm performs better, with a performance around 9% higher on average than the best neighbourhood-based approach 3 4

R E S U LT S - C O N S I S T E N C Y •Important to evaluate in different datasets. • MovieLens dataset, (3 times more dense than the Facebook and LastFM datasets), the catalog coverage of the IKNN algorithm is ∼ 10 times smaller than for the LastFM and Facebook datasets. 3 5

EVALUATING R E C O M M E N D E R SYSTEMS A C C U R A C Y A N D B - PowerPoint PPT Presentation

EVALUATING R E C O M M E N D E R SYSTEMS A C C U R A C Y A N D B E Y O N D GITHUB.COM/HCORONA/AICS-2016 H U M B E R TO C O R O N A @ TO TO PA M P I N 2 4 -1 0 - 2 0 1 6 A B O U T M E 2 R E F E R E N C E S [1] Humberto Jess Corona

Pairw ise Variability Index: Variability Index: Pairw ise Evaluating the Cognitive Evaluating

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

Evaluating Existing Systems Standards approaches arent always suitable Not helpful for

Disclosures Disclosures No personal conflicts of interest. Pain Swelling Research

Evaluating the Expansion of Oregons Indoor Clean Air Act Shaun Parkman Outline 1. Define the

Evaluating the Productivity of a Evaluating the Productivity of a Multicore Architecture

Evaluating Heat Treatment Evaluating Heat Treatment Effectiveness Effectiveness Bh. .

Evaluating Temperature Data Evaluating Temperature Data Bh. . Subramanyam Subramanyam ( (Subi

Evaluating learners intercultural experiences intercultural experiences Evaluating

Quantum Algorithms for Quantum Algorithms for Evaluating M IN Evaluating M IN -M -M AX AX Trees

Developing and Evaluating School Principals Selecting, Developing, Supporting, and Evaluating

Lender Liability: Evaluating, Minimizing Lender Liability: Evaluating, Minimizing and Defending

MIS Project MIS Project Hala Salah Salah Hala Hany El- -Sawah Sawah Hany El Hany El Hany

C he c k Y o ur P DF : The Importance of Evaluating Your Documents Check Your PDF: The

Evaluating Workplace Evaluating Workplace Education for New Hires Education for New Hires Robert L.

Evaluating Soul City Evaluating Soul City A multi level A multi level Communication for Social

Apologies to Neil Gaiman and Terry Pratchett Good Omens 1 Matt Graeber has a good talk if you

CIM Common Information Model CIM Common Information Model Web-based Enterprise Management

Overview of the Application pp Hosting Environment Stefan Zasada University College London y

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic,

STAR: croSs-plaTform App Recommendation Da Cao, Xiangnan He, Liqiang Nie, Xiaochi Wei, Xia Hu,

File-Format Interoperability or the legacy document conversion powerhouse - Thorsten Behrens -

Reducing search space for trace equivalence checking FOSAD 2013 Lucca Hirschi LSV, ENS Cachan

Dependability in the real world Dependability in the real world p Dependability needs arise from

Sambuz

Useful Links

Newsletter

Mail Us