Media Link Analysis and Web Search How to Organize the Web First - PowerPoint PPT Presentation

Online Social Networks and Media Link Analysis and Web Search

How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart

How to organize the web • Second try: Web Search – Information Retrieval investigates: • Find relevant docs in a small and trusted set e.g., Newspaper articles, Patents, etc. (“needle -in-a- haystack”) • Limitation of keywords (synonyms, polysemy, etc) But: Web is huge, full of untrusted documents, random things, web spam, etc .  Everyone can create a web page of high production value  Rich diversity of people issuing queries  Dynamic and constantly-changing nature of web content

Size of the Search Index http://www.worldwidewebsize.com/

How to organize the web • Third try (the Google era): using the web graph – Swift from relevance to authoritativeness – It is not only important that a page is relevant, but that it is also important on the web • For example, what kind of results would we like to get for the query “ greek newspapers”?

Link Analysis • Not all web pages are equal on the web • The links act as endorsements: – When page p links to q it endorses the content of the content of q What is the simplest way to measure importance of a page on the web?

Rank by Popularity • Rank pages according to the number of incoming edges (in-degree, degree centrality) 𝑤 2 1. Red Page 𝑤 1 2. Yellow Page 𝑤 3 3. Blue Page 4. Purple Page 5. Green Page 𝑤 5 𝑤 4

Popularity • It is not important only how many link to you, but also how important are the people that link to you. • Good authorities are pointed by good authorities – Recursive definition of importance

THE PAGERANK ALGORITHM

PageRank • Good authorities should be pointed by good authorities – The value of a node is the value of the nodes that point to it. • How do we implement that? – Assume that we have a unit of authority to distribute to all nodes. 1 • Initially each node gets 𝑜 amount of authority – Each node distributes the authority value they have to their neighbors – The authority value of each node is the sum of the authority fractions it collects from its neighbors. 1 𝑥 𝑤 = 𝑒 𝑝𝑣𝑢 𝑣 𝑥 𝑣 𝑥 𝑤 : the PageRank value of node 𝑤 𝑣→𝑤 Recursive definition

A simple example w w + w + w = 1 w = w + w w = ½ w w = ½ w w w • Solving the system of equations we get the authority values for the nodes – w = ½ w = ¼ w = ¼

A more complex example 𝑤 2 𝑤 1 w 1 = 1/3 w 4 + 1/2 w 5 𝑤 3 w 2 = 1/2 w 1 + w 3 + 1/3 w 4 w 3 = 1/2 w 1 + 1/3 w 4 w 4 = 1/2 w 5 w 5 = w 2 𝑤 5 𝑤 4 1 𝑥 𝑤 = 𝑒 𝑝𝑣𝑢 𝑣 𝑥 𝑣 𝑣→𝑤

Computing PageRank weights • A simple way to compute the weights is by iteratively updating the weights • PageRank Algorithm Initialize all PageRank weights to 1 𝑜 Repeat: 1 𝑥 𝑤 = 𝑒 𝑝𝑣𝑢 𝑣 𝑥 𝑣 𝑣→𝑤 Until the weights do not change • This process converges

PageRank Initially, all nodes PageRank 1/8  As a kind of “fluid” that circulates through the network  The total PageRank in the network remains constant (no need to normalize)

PageRank: equilibrium  A simple way to check whether an assignment of numbers forms an equilibrium set of PageRank values: check that they sum to 1, and that when apply the Basic PageRank Update Rule, we get the same values back.  If the network is strongly connected, then there is a unique set of equilibrium values.

Random Walks on Graphs • The algorithm defines a random walk on the graph • Random walk: – Start from a node chosen uniformly at random with 1 probability 𝑜 . – Pick one of the outgoing edges uniformly at random – Move to the destination of the edge – Repeat. • The Random Surfer model – Users wander on the web, following links.

Example • Step 0 𝑤 2 𝑤 1 𝑤 3 𝑤 5 𝑤 4

Example • Step 4… 𝑤 2 𝑤 1 𝑤 3 𝑤 5 𝑤 4

Random walk 𝑢 of being • Question: what is the probability 𝑞 𝑗 at node 𝑗 after 𝑢 steps? 𝑤 2 𝑤 1 𝑤 3 0 = 1 𝑢 = 1 𝑢−1 + 1 𝑢−1 𝑞 1 𝑞 1 3 𝑞 4 2 𝑞 5 5 0 = 1 𝑢 = 1 𝑢−1 + 1 𝑢−1 𝑢−1 𝑞 2 𝑞 2 2 𝑞 1 + 𝑞 3 3 𝑞 4 5 𝑢 = 1 𝑢−1 + 1 0 = 1 𝑢−1 𝑞 3 𝑞 3 2 𝑞 1 3 𝑞 4 5 𝑤 5 𝑤 4 0 = 1 𝑢 = 1 𝑢−1 𝑞 4 𝑞 4 2 𝑞 5 5 0 = 1 𝑢 = 𝑞 2 𝑢−1 𝑞 5 𝑞 5 5

Markov chains • A Markov chain describes a discrete time stochastic process over a set of states 𝑇 = {𝑡 1 , 𝑡 2 , … , 𝑡 𝑜 } according to a transition probability matrix 𝑄 = {𝑄 𝑗𝑘 } – 𝑄 𝑗𝑘 = probability of moving to state 𝑘 when at state 𝑗 Matrix 𝑄 has the property that the entries of all rows sum to 1 • 𝑄 𝑗, 𝑘 = 1 𝑘 A matrix with this property is called stochastic 𝑢 , 𝑞 2 𝑢 , … , 𝑞 𝑜 𝑢 ) that stores State probability distribution: The vector 𝑞 𝑢 = (𝑞 1 • the probability of being at state 𝑡 𝑗 after 𝑢 steps • Memorylessness property: The next state of the chain depends only at the current state and not on the past of the process (first order MC) – Higher order MCs are also possible • Markov Chain Theory: After infinite steps the state probability vector converges to a unique distribution if the chain is irreducible (possible to get from any state to any other state) and aperiodic

Random walks • Random walks on graphs correspond to Markov Chains – The set of states 𝑇 is the set of nodes of the graph 𝐻 – The transition probability matrix is the probability that we follow an edge from one node to another 𝑄 𝑗, 𝑘 = 1/ deg 𝑝𝑣𝑢 (𝑗)

An example 𝑤 2 0 1 1 0 0     𝑤 1 0 0 0 0 1   𝑤 3   A  0 1 0 0 0   1 1 1 0 0     1 0 0 1 0     0 1 2 1 2 0 0   𝑤 5 𝑤 4 0 0 0 0 1      P 0 1 0 0 0   1 3 1 3 1 3 0 0       1 2 0 0 1 2 0

Node Probability vector 𝑢 , 𝑞 2 𝑢 , … , 𝑞 𝑜 𝑢 ) that stores • The vector 𝑞 𝑢 = (𝑞 𝑗 the probability of being at node 𝑤 𝑗 at step 𝑢 0 = the probability of starting from state • 𝑞 𝑗 𝑗 (usually) set to uniform • We can compute the vector 𝑞 𝑢 at step t using a vector-matrix multiplication 𝑞 𝑢 = 𝑞 𝑢−1 𝑄

An example   0 1 2 1 2 0 0   𝑤 2 0 0 0 0 1   𝑤 1    P 0 1 0 0 0 𝑤 3   1 3 1 3 1 3 0 0       1 2 0 0 1 2 0 𝑢 = 1 𝑢−1 + 1 𝑢−1 𝑞 1 3 𝑞 4 2 𝑞 5 𝑢 = 1 𝑢−1 + 1 𝑤 5 𝑤 4 𝑢−1 𝑢−1 𝑞 2 2 𝑞 1 + 𝑞 3 3 𝑞 4 𝑢 = 1 𝑢−1 + 1 𝑢−1 𝑞 3 2 𝑞 1 3 𝑞 4 𝑢 = 1 𝑢−1 𝑞 4 2 𝑞 5 𝑢 = 𝑞 2 𝑢−1 𝑞 5

Stationary distribution • The stationary distribution of a random walk with transition matrix 𝑄 , is a probability distribution 𝜌 , such that 𝜌 = 𝜌𝑄 • The stationary distribution is an eigenvector of matrix 𝑄 – the principal left eigenvector of P – stochastic matrices have maximum eigenvalue 1 • The probability 𝜌 𝑗 is the fraction of times that we visited state 𝑗 as 𝑢 → ∞ • Markov Chain Theory: The random walk converges to a unique stationary distribution independent of the initial vector if the graph is strongly connected, and not bipartite.

Computing the stationary distribution • The Power Method Initialize 𝑟 0 to some distribution Repeat 𝑟 𝑢 = 𝑟 𝑢−1 𝑄 Until convergence • After many iterations q t → 𝜌 regardless of the initial vector 𝑟 0 • Power method because it computes 𝑟 𝑢 = 𝑟 0 𝑄 𝑢 • Rate of convergence |𝜇 2 | – determined by the second eigenvalue |𝜇 1 |

The stationary distribution • What is the meaning of the stationary distribution 𝜌 of a random walk? • 𝜌(𝑗) : the probability of being at node i after very large (infinite) number of steps • 𝜌 = 𝑞 0 𝑄 ∞ , where 𝑄 is the transition matrix, 𝑞 0 the original vector – 𝑄 𝑗, 𝑘 : probability of going from i to j in one step – 𝑄 2 (𝑗, 𝑘) : probability of going from i to j in two steps (probability of all paths of length 2) – 𝑄 ∞ 𝑗, 𝑘 = 𝜌(𝑘) : probability of going from i to j in infinite steps – starting point does not matter.

The PageRank random walk • Vanilla random walk – make the adjacency matrix stochastic and run a random walk 0 1 2 1 2 0 0     0 0 0 0 1     P  0 1 0 0 0   1 3 1 3 1 3 0 0     1 2 0 0 1 2 0  

Media Link Analysis and Web Search How to Organize the Web First - PowerPoint PPT Presentation

Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information Retrieval investigates:

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

New Media Production 2 MUMT 303 Week 1 Sven-Amin Lembke What is new media? What is OLD media?

Media 101 Presented by: Elements of a Media Campaign: Overview Positioning Media strategy

Social Media Legal Issues Brian C. England Deputy City Attorney Garland, Texas March 7, 2018

Social Media for Mason AGENDA What is Social Media Social Media Strategy Content

Law and the Media Media lies, war propaganda and manipulation JRN 6205 Media Ethics and Law By

What is your definition of media literacy? 1. Radical media education 2. Ideology in media 3.

MEDIA TRAINING Media Outreach and Social Media INTRODUCTIONS Media Outreach Best Practices

We (Are Still) the Media Dan Gillmor Arizona State University Media Shift: A Brief History

Presentation 2 Why is there advertising on social media? Get Media Smart social media 2

Chart 1: Children s Media Use s Media Use Chart 1: Children Chart 1: Childrens Media

CRISIS COMMUNICATION The Social Media Impact May 10, 2011 MEDIA AS A FULL SPECTRUM MONITORING

All Media ADS About All Media ADS All Media ADS offers Internet advertising that provides

Social Media Week BEIRUT Social Media versus Traditional Media; The contradictory results of the

Media Fragmentation The Impact, Data and What You Can Do About It Introduction The Impact Media

Social Media donts What is social media Social media is nothing new Just an extension

Outline The big picture 1 Grounds for rebuttal 2 Structuring rebuttal 3 Definitional

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

The Power of an Agile Mindset Linda Rising linda@lindarising.org www.lindarising.org

Text classification I (Nave Bayes) CE-324: Modern Information Retrieval Sharif University of

BEIS Non Domestic Smart Energy Management Innovation Competition Outline the background to the

CS 380: ARTIFICIAL INTELLIGENCE AI FOR GAMES 11/25/2013 Santiago Ontan santi@cs.drexel.edu

MEET-THE-PARENTS SESSION 19 Jan 2018 P4 Level Motto Be Kind! Spread Love! Communication 1st

AUTOMATIC CLASSIFICATION: NAVE BAYES WM&R 2019/20 2 U NITS R. Basili ( many slides

Media Link Analysis and Web Search How to Organize the Web First - PowerPoint PPT Presentation

Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information Retrieval investigates:

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

New Media Production 2 MUMT 303 Week 1 Sven-Amin Lembke What is new media? What is OLD media?

Media 101 Presented by: Elements of a Media Campaign: Overview Positioning Media strategy

Social Media Legal Issues Brian C. England Deputy City Attorney Garland, Texas March 7, 2018

Social Media for Mason AGENDA What is Social Media Social Media Strategy Content

Law and the Media Media lies, war propaganda and manipulation JRN 6205 Media Ethics and Law By

What is your definition of media literacy? 1. Radical media education 2. Ideology in media 3.

MEDIA TRAINING Media Outreach and Social Media INTRODUCTIONS Media Outreach Best Practices

We (Are Still) the Media Dan Gillmor Arizona State University Media Shift: A Brief History

Presentation 2 Why is there advertising on social media? Get Media Smart social media 2

Chart 1: Children s Media Use s Media Use Chart 1: Children Chart 1: Childrens Media

CRISIS COMMUNICATION The Social Media Impact May 10, 2011 MEDIA AS A FULL SPECTRUM MONITORING

All Media ADS About All Media ADS All Media ADS offers Internet advertising that provides

Social Media Week BEIRUT Social Media versus Traditional Media; The contradictory results of the

Media Fragmentation The Impact, Data and What You Can Do About It Introduction The Impact Media

Social Media donts What is social media Social media is nothing new Just an extension

Outline The big picture 1 Grounds for rebuttal 2 Structuring rebuttal 3 Definitional

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

The Power of an Agile Mindset Linda Rising linda@lindarising.org www.lindarising.org

Text classification I (Nave Bayes) CE-324: Modern Information Retrieval Sharif University of

BEIS Non Domestic Smart Energy Management Innovation Competition Outline the background to the

CS 380: ARTIFICIAL INTELLIGENCE AI FOR GAMES 11/25/2013 Santiago Ontan santi@cs.drexel.edu

MEET-THE-PARENTS SESSION 19 Jan 2018 P4 Level Motto Be Kind! Spread Love! Communication 1st

AUTOMATIC CLASSIFICATION: NAVE BAYES WM&amp;R 2019/20 2 U NITS R. Basili ( many slides

AUTOMATIC CLASSIFICATION: NAVE BAYES WM&R 2019/20 2 U NITS R. Basili ( many slides