clustering by contrast
play

Clustering by contrast Cyril CHHUN Tlcom Paris Advisor: Jean-Louis - PowerPoint PPT Presentation

Clustering by contrast Cyril CHHUN Tlcom Paris Advisor: Jean-Louis DESSALLES June 20, 2019 Outline 1 Introduction 2 The algorithm 3 Test results 4 Conclusion Introduction views on Youtube Clustering by contrast Cyril CHHUN


  1. Clustering by contrast Cyril CHHUN Télécom Paris Advisor: Jean-Louis DESSALLES June 20, 2019

  2. Outline 1 Introduction 2 The algorithm 3 Test results 4 Conclusion

  3. Introduction views on Youtube” Clustering by contrast Cyril CHHUN • learn from a single example: a “Siamese cat” • detect anomalies: a talking cat • produce negations and explanations: “she is not a writer” without going through the set of “small” objects The algorithm Design a clustering algorithm able to: What are the end goals of contrast learning? Introduction Conclusion Test results 3 / 17 • understand the meaning of “small bacteria” and “small galaxy” • produce relevant descriptions: “it’s a singer who has ten million

  4. Introduction properties. Clustering by contrast Cyril CHHUN Information Processing Systems 15 , 2002. 1 Jon Kleinberg. An impossibility theorem for clustering. Advances in Neural functions • Solution: forsake one of those properties or use non-metric function-based clustering algorithm which verifjes those three The algorithm • Scale-invariance, richness, consistency Which properties would we expect of a clustering algorithm? Impossibility theorem Conclusion Test results 4 / 17 • Kleinberg (2002) 1 : it is impossible to design a distance

  5. Introduction The algorithm Test results Conclusion Vocabulary • Object: observed instance • Prototype: mental representation of a group as a basic object • Contrast: “difgerence” between two objects • Weight: number of times a prototype has been recalled to its prototype • Order: real-life observations are fjrst-order objects, contrasts are second-order objects, etc. Cyril CHHUN Clustering by contrast 5 / 17 • Deviation: acceptable range of an object’s properties compared

  6. Introduction . Clustering by contrast Cyril CHHUN w . . . The algorithm . . 6 / 17 Mean Test results Weight Conclusion Deviation Design Prototypes How to represent prototypes?     µ 1 σ 1         µ m σ m

  7. Introduction m Clustering by contrast Cyril CHHUN • Problem: many prototypes can verify the smallest distance. verifjes scale-invariance along any axis . j The algorithm 7 / 17 Finding the clusters Test results • Dimension-agnostic, scale-invariant, not density-based. Conclusion • The prametric function Design Given object b , how to fjnd the best prototype a of deviation a ′ ? � � | a j − b j | � d ( a , b ) �→ ✶ > θ j a ′ j =1 • It is not a distance, as none of the three properties are verifjed!

  8. Introduction The algorithm Clustering by contrast Cyril CHHUN • Deviations are not used in this step so as to avoid hubs. • Using this rule, we make a tournament and pick the winner. dimensions. The other cluster is eliminated. one whose mean is closer to the object along the most avoid the hub? reasonable: how to cluster seems more Figure: The smaller Comparing the clusters Design Conclusion Test results 8 / 17 • We simply take the best prototypes two by two and choose the

  9. Introduction • The winning cluster is updated as follows: Clustering by contrast Cyril CHHUN improve effjciency. Unused prototypes are forgotten fjrst. • We enforce a limited memory to cope with initial errors and The algorithm 9 / 17 • The object is added as a prototype no matter what, with a How to stock the new information in the memory? Updating the memory Design Conclusion Test results deviation equal to ε times itself and a weight of 1 mean = weight ∗ prototype + object weight + 1 deviation = weight ∗ deviation + | prototype − object | weight + 1 weight = weight + 1

  10. Introduction The algorithm Test results Conclusion Design Skeleton def feed_data_online(data): for obj in data: closest_clusters = find_closest_clusters(obj) winner = cluster_battles(obj, closest_clusters) update_memory(obj, winner) Cyril CHHUN Clustering by contrast 10 / 17 • Clustering: simple loop with complexity O ( mem _ size × n ) → Online learning

  11. Introduction The algorithm Test results Conclusion Understanding results • Softer clustering than k-means; difgerent ways to classify when seeing a new object: – Find the closest prototype to the object (by tournament for example) Cyril CHHUN Clustering by contrast 11 / 17 – Assign object b to prototype a if d ( a , b ) = 0

  12. Introduction The algorithm Test results Conclusion Live demonstration Cyril CHHUN Clustering by contrast 12 / 17

  13. Introduction contrast c such that Clustering by contrast Cyril CHHUN contrast. • Example: seeing a black tomato would give a “red-to-black” j The algorithm 13 / 17 • Given an object b and its closest prototype a , we extract the low-dimensional and applicable between similar objects. • The contrast features should be meaningful, i.e. How to extract relevant contrasts? What about contrasts? Conclusion Test results � � | a j − b j | c j = ( a j − b j ) · ✶ > θ j a ′

  14. Introduction The algorithm Test results Conclusion What about contrasts? How to stock the contrasts in memory? deviation and weight. Then, how to refjne the contrasts? • We can use the same procedure ! Cyril CHHUN Clustering by contrast 14 / 17 • We can use the same principle! Contrast-prototypes with mean,

  15. Introduction The algorithm Test results Conclusion Second demonstration Cyril CHHUN Clustering by contrast 15 / 17

  16. Introduction The algorithm Test results Conclusion Feedback on the checklist without going through the set of “small” objects views on Youtube” Cyril CHHUN Clustering by contrast 16 / 17 ✓ understand the meaning of “small bacteria” and “small galaxy” ✗ produce relevant descriptions: “it’s a singer who has ten million ✗ produce negations and explanations: “she is not a writer” ✓ detect anomalies: a talking cat ✓ learn from a single example: a “Siamese cat”

  17. Introduction The algorithm Test results Conclusion Conclusion • The algorithm is dimension-agnostic and verifjes scale-invariance • It learns on-the-fmy and has a reasonable complexity (linear on average) • Designed to be used on relatively high-level datasets • Contrasts still need testing: some inconsistent results can appear Cyril CHHUN Clustering by contrast 17 / 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend