announcements thank you for your course feedback watch
play

Announcements: - Thank you for your course feedback! - Watch out - PowerPoint PPT Presentation

Announcements: - Thank you for your course feedback! - Watch out for homework 2 feedback poll - Course project TAs will reach out with feedback - Regrade requests for HW1 Deadline Thu next week at 23:59pm - Today: HW2 due / HW3


  1. Announcements: - Thank you for your course feedback! - Watch out for homework 2 feedback poll - Course project –TAs will reach out with feedback - Regrade requests for HW1 – Deadline Thu next week at 23:59pm - Today: HW2 due / HW3 release 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 1

  2. A B C 3.3 38.4 34.3 D E F 3.9 8.1 3.9 1.6 1.6 1.6 1.6 1.6 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 2

  3. [1/N] NxN M 0.8·½+0.2·⅓ y 1/2 1/2 0 1/3 1/3 1/3 + 0.2 1/2 0 0 1/3 1/3 1/3 0.8 0.8·½+0.2·⅓ 0 1/2 1 1/3 1/3 1/3 0.8·½+0.2·⅓ 0.2·⅓ 0.2· ⅓ y 7/15 7/15 1/15 0.8+0.2·⅓ a 7/15 1/15 1/15 0.8·½+0.2·⅓ a m 1/15 7/15 13/15 m 0 . 2 · ⅓ 0 . 2 A · ⅓ y 1/3 0.33 0.24 0.26 7/33 a = . . . 1/3 0.20 0.20 0.18 5/33 m 1/3 0.46 0.52 0.56 21/33 r = A r 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 3

  4. ¡ Input: Graph 𝑯 and parameter 𝜸 § Directed graph 𝑯 (can have spider traps and dead ends ) § Parameter 𝜸 ¡ Output: PageRank vector 𝒔 (#) = % § Set: 𝑠 & , 𝑢 = 1 ! (𝒖$𝟐) 𝒔 𝒋 § Do: ∀𝑘: 𝒔′ 𝒌 = ∑ 𝒋→𝒌 𝜸 𝒆 𝒋 If the graph has no dead- 𝒔′ 𝒌 = 𝟏 if in-degree of 𝒌 is 0 ends then the amount of leaked PageRank is 1-β . But § Now re-insert the leaked PageRank: since we have dead-ends the (𝒖) = 𝒔 (𝒌 + 𝟐)𝑻 amount of leaked PageRank ∀𝒌: 𝒔 𝒌 where: 𝑇 = ∑ ! 𝑠′ ! may be larger. We have to 𝑶 explicitly account for it by § 𝒖 = 𝒖 + 𝟐 computing S . (,) − 𝑠 (,-%) < 𝜁 § while ∑ ! 𝑠 ! ! 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 4

  5. ¡ Measures generic popularity of a page § Will ignore/miss topic-specific authorities § Solution: Topic-Specific PageRank ( next ) ¡ Uses a single measure of importance § Other models of importance § Solution: Hubs-and-Authorities ¡ Susceptible to Link spam § Artificial link topographies created in order to boost page rank § Solution: TrustRank 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 5

  6. 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 6

  7. ¡ Instead of generic popularity, can we measure popularity within a topic? ¡ Goal: Evaluate Web pages not just according to their popularity, but also by how close they are to a particular topic, e.g. “sports” or “history” ¡ Allows search queries to be answered based on interests of the user § Example: Query “Trojan” wants different pages depending on whether you are interested in sports, history, or computer security 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 7

  8. ¡ Random walker has a small probability of teleporting at any step ¡ Teleport can go to: § Standard PageRank: Any page with equal probability § To avoid dead-end and spider-trap problems § Topic Specific PageRank: A topic-specific set of “relevant” pages (teleport set) ¡ Idea: Bias the random walk § When the walker teleports, she picks a page from a set S § S contains only pages that are relevant to the topic § E.g., Open Directory (DMOZ) pages for a given topic/query § For each teleport set S , we get a different vector r S 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 8

  9. ¡ To make this work all we need is to update the teleportation part of the PageRank formulation: 𝑩 𝒋𝒌 = 𝜸 𝑵 𝒋𝒌 + (𝟐 − 𝜸)/|𝑻| if 𝒋 ∈ 𝑻 𝜸 𝑵 𝒋𝒌 + 𝟏 otherwise § A is a stochastic matrix! ¡ We weighted all pages in the teleport set S equally § Could also assign different weights to pages! ¡ Compute as for regular PageRank: § Multiply by M , then add a vector § Maintains sparseness 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 9

  10. Suppose S = {1} , b = 0.8 0.2 1 0.5 Node Iteration 0.5 0.4 0 1 2 … stable 0.4 1 0.25 0.4 0.28 0.294 1 2 3 2 0.25 0.1 0.16 0.118 0.8 3 0.25 0.3 0.32 0.327 1 1 4 0.25 0.2 0.24 0.261 0.8 0.8 4 S β r 1 r 2 r 3 r 4 S β r 1 r 2 r 3 r 4 {1,2,3,4} 0.8 0.13 0.10 0.39 0.36 {1} 0.9 0.17 0.07 0.40 0.36 {1,2,3} 0.8 0.17 0.13 0.38 0.30 {1} 0.8 0.29 0.12 0.33 0.26 {1,2} 0.8 0.26 0.20 0.29 0.23 {1} 0.7 0.39 0.14 0.27 0.19 {1} 0.8 0.29 0.12 0.33 0.26 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 10

  11. ¡ Create different PageRanks for different topics § The 16 DMOZ top-level categories: § Arts, Business, Sports,… ¡ Which topic ranking to use? § User can pick from a menu § Classify query into a topic § Can use the context of the query § E.g., query is launched from a web page talking about a known topic § History of queries e.g., “basketball” followed by “Jordan” § User context, e.g., user’s bookmarks, … 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 11

  12. Random Walk with Restarts: set S is a single node 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 12

  13. [Tong-Faloutsos, ‘06] I 1 J 1 1 A 1 H 1 B 1 1 D 1 1 1 E G F a.k.a.: Relevance, Closeness, ‘Similarity’… 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 13

  14. ¡ Shortest path is not good: ¡ No effect of degree-1 nodes (E, F, G)! ¡ Multi-faceted relationships 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 14

  15. ¡ Network flow is not good: ¡ Does not punish long paths 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 15

  16. ¡ Need a method that considers: § Multiple connections § Multiple paths § Direct and indirect connections § Degree of the node 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 16

  17. ¡ SimRank: Random walks from a fixed node on k -partite graphs Conferences Tags Authors ¡ Setting: k -partite graph with k types of nodes § E.g.: Authors, Conferences, Tags ¡ Topic Specific PageRank from node u : teleport set S = { u } ¡ Resulting scores measure similarity/proximity to node u ¡ Problem: § Must be done once for each node u § Only suitable for sub-Web-scale applications 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 17

  18. … … Q: What is the most IJCAI related conference Philip S. Yu KDD to ICDM ? Ning Zhong ICDM A: Topic-Specific R. Ramakrishnan SDM PageRank with M. Jordan AAAI teleport set S={ICDM} … NIPS … Conference Author 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 18

  19. 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 19

  20. Pin Board 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 20

  21. ¡ Pins belong to Boards 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 21

  22. Input: 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 22

  23. Input: Recommendations: 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 23

  24. Input: 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 24

  25. Input: Recommendations: 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 25

  26. 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 26

  27. 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 27

  28. 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 28

  29. ¡ Idea: § Every node has some importance § Importance gets evenly split among all edges and pushed to the neighbors ¡ Given a set of QUERY NODES Q, simulate a random walk: Q 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 29

  30. ¡ Proximity to query node(s) Q : 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 30

  31. ¡ Proximity to query node(s) Q : 5 5 5 5 5 5 14 9 Q 16 7 8 8 8 8 1 1 1 Yummm Strawberries Smoothies Smoothie Madness!•!•!•! 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 31

  32. ¡ Pixie: § Outputs top 1k pins with highest visit count Extensions: ¡ Weighted edges: § The walk prefers to traverse certain edges: § Edges to pins in your local language ¡ Early stopping: § Don’t need to walk a fixed big number of steps § Walk until 1k-th pin has at least 20 visits 4/27/20 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend