Estimating Peer Similarity using Distance of Shared Files Distance - PowerPoint PPT Presentation

Estimating Peer Similarity using Distance of Shared Files Distance of Shared Files Yuval Shavitt, Ela Weinsberg , Udi Weinsberg Tel-Aviv University

Problem Setting � Peer-to-Peer (p2p) networks are used by millions for sharing content � Increasingly difficult to find useful content o Noise in user generated content (meta-data) Noise in user generated content (meta-data) o Extreme dimensions o Sparseness Udi Weinsberg, IPTPS, April 2010 2

Work Goal � Suggest a new metric for peer similarity o Overcome the sparseness problem � Improve ability to find content o Search algorithms Search algorithms • Similar peers are likely to hold relevant content o Collaborative filtering • Find “like-minded” peers Udi Weinsberg, IPTPS, April 2010 3

Key Concept � Build a file similarity graph o Use data about all shared files o Weights of edges = distance between files � Peer similarity is calculated using the distance � Peer similarity is calculated using the distance between their shared files o No need for overlapping content between peers Udi Weinsberg, IPTPS, April 2010 4

Dataset � Active crawl of Gnutella in 2007 � Crawled 1.2 million peers � Only 35% of songs contain meta-data � 530k distinct songs � 530k distinct songs o Identified using “title|artist” o Accounting for spelling mistakes with edit distance Udi Weinsberg, IPTPS, April 2010 5

Dataset Statistics � Using a sample of 100k peers (<10%) � Over 511k songs remain (96%) Power-law Power-law Popularity Popularity 98% of the peers 98% of the peers distribution share less than 50 songs Udi Weinsberg, IPTPS, April 2010 6

Sparseness Problem Peers with very Peers with very Median maximal Median maximal Median maximal few popular few popular overlap is 20% songs Udi Weinsberg, IPTPS, April 2010 7

File Similarity Graph � Files are vertices � Link weight is the number of peers sharing both � Normalize similarity with popularity: Power-law Power-law distribution, filter distribution, filter � Filter causes distortion o Keep only top 40% o And no less than 10 Udi Weinsberg, IPTPS, April 2010 8

Peer Similarity Estimation (1) � Create a bi-partite graph connecting the files of every two peers � Connect files in the two sides with links: o If exact same file – weight is 1 If exact same file – weight is 1 o Otherwise – use normalized similarity along the shortest path between the files Udi Weinsberg, IPTPS, April 2010 9

Distance Estimation …. 0.2 0.5 0.8 0.9 1 Udi Weinsberg, IPTPS, April 2010 10

Peer Similarity Estimation (2) � Run maximal weighted matching on the bi- partite o Find the “best” matching links between files o The matching M is the sum of links weight o The matching M is the sum of links weight � Peer similarity Udi Weinsberg, IPTPS, April 2010 11

Maximal Weighted Matching …. 0.2 0.5 Udi Weinsberg, IPTPS, April 2010 12

Distance Estimation Issues � File similarity graph can have connected components o Some distances are infinite � All pairs shortest paths can be costly � All pairs shortest paths can be costly o Reduce the size of the similarity graph o Limit the search depth Udi Weinsberg, IPTPS, April 2010 13

Reducing Similarity Graph Size � For each file, take only the top N nearest neighboring files � Distribution almost overlap for N≥10 Udi Weinsberg, IPTPS, April 2010 14

Limit Search Depth � Stop searching files once reached K times the distance of the first finding o Distance between files become asymmetric o Depends on the peer we start from o Depends on the peer we start from � For K ≥1.5 links removed are unlikely to be selected in the maximum matching o Asymmetric links are mostly low-similarity links o Hence will not be selected in the matching Udi Weinsberg, IPTPS, April 2010 15

Meta-data and Similarity � Similarity between peers i and j using artists � Normalized similarity matches meta-data Udi Weinsberg, IPTPS, April 2010 16

Geography and Similarity � Comparing the distance with similarity � No direct correlation! Udi Weinsberg, IPTPS, April 2010 17

Conclusions � A metric for similarity between peers � Evaluation using song files shared in Gnutella o Metric reflects the similarity of peer preferences in music in music � Geography is not necessarily a good indication for peer similarity! Udi Weinsberg, IPTPS, April 2010 18

Thank You! Thank You! Udi Weinsberg udiw@eng.tau.ac.il

Estimating Peer Similarity using Distance of Shared Files Distance - PowerPoint PPT Presentation

Estimating Peer Similarity using Distance of Shared Files Distance of Shared Files Yuval Shavitt, Ela Weinsberg , Udi Weinsberg Tel-Aviv University Problem Setting Peer-to-Peer (p2p) networks are used by millions for sharing content

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE Thanks to: Tan,

DATA MINING LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE

THE PEER-TO-PEER NETWORK JOHN NEWBERY @jfnewbery github.com/jnewbery THE PEER-TO-PEER NETWORK

Serverless networking (peer-to-peer computing) Peer-to-peer models Client-server computing

Peer-to-Peer Networks 09 Random Graphs for Peer-to-Peer-Networks Christian Ortolf Technical

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY

Peer to Peer Learning & Support Aims and Objectives of this Workshop Workshop 3: Peer to

Peer-to-Peer Networking and Discovery Technologies Week 6 Whats Peer-to-Peer? A different

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Accessing Files in Python Learning Objectives Concepts about files in Python How to open

P2P: Distributed Hash Tables Chord + Routing Geometries Nirvan Tyagi CS 6410 Fall16

A Probabilistic Model for Measuring Grammaticality and Similarity of Automatically Generated

Problem: Finding Duplicate Elements Given a set of objects

ex 1. compare and test : conditions Aside: save conditions b,a computes a - b , sets flags,

Events in Magnetized GAr Tom Junk DUNE ND Meeting January 11, 2018 The question: What fraction

using the Similarity Matrix Zafar Rafii & Bryan Pardo Introduction Musical pieces are

Graphs and their representations After this lesson, you should be able to explain what

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Matrices in the Theory of Signed Simple Graphs Thomas Zaslavsky Binghamton University (State

Sambuz

Useful Links

Newsletter

Mail Us

Estimating Peer Similarity using Distance of Shared Files Distance - PowerPoint PPT Presentation

Estimating Peer Similarity using Distance of Shared Files Distance of Shared Files Yuval Shavitt, Ela Weinsberg , Udi Weinsberg Tel-Aviv University Problem Setting Peer-to-Peer (p2p) networks are used by millions for sharing content

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE Thanks to: Tan,

DATA MINING LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE

THE PEER-TO-PEER NETWORK JOHN NEWBERY @jfnewbery github.com/jnewbery THE PEER-TO-PEER NETWORK

Serverless networking (peer-to-peer computing) Peer-to-peer models Client-server computing

Peer-to-Peer Networks 09 Random Graphs for Peer-to-Peer-Networks Christian Ortolf Technical

Comparing Hybrid Peer-to-Peer Hybrid peer-to-peer systems Systems Beverly Yang and Hector

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY

Peer to Peer Learning &amp; Support Aims and Objectives of this Workshop Workshop 3: Peer to

Peer-to-Peer Networking and Discovery Technologies Week 6 Whats Peer-to-Peer? A different

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Accessing Files in Python Learning Objectives Concepts about files in Python How to open

P2P: Distributed Hash Tables Chord + Routing Geometries Nirvan Tyagi CS 6410 Fall16

A Probabilistic Model for Measuring Grammaticality and Similarity of Automatically Generated

Problem: Finding Duplicate Elements Given a set of objects

ex 1. compare and test : conditions Aside: save conditions b,a computes a - b , sets flags,

Events in Magnetized GAr Tom Junk DUNE ND Meeting January 11, 2018 The question: What fraction

using the Similarity Matrix Zafar Rafii &amp; Bryan Pardo Introduction Musical pieces are

Graphs and their representations After this lesson, you should be able to explain what

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Matrices in the Theory of Signed Simple Graphs Thomas Zaslavsky Binghamton University (State

Sambuz

Useful Links

Newsletter

Mail Us

Peer to Peer Learning & Support Aims and Objectives of this Workshop Workshop 3: Peer to

using the Similarity Matrix Zafar Rafii & Bryan Pardo Introduction Musical pieces are