DCS/CSCI 2350: Social & Economic Networks WWW: Information - - PDF document

dcs csci 2350 social economic networks
SMART_READER_LITE
LIVE PREVIEW

DCS/CSCI 2350: Social & Economic Networks WWW: Information - - PDF document

4/25/18 DCS/CSCI 2350: Social & Economic Networks WWW: Information Networks Chapters 13, 14 Mohammad T . Irfan Questions 1. What does the web look like? [Ch 13] 2. How does Google search it? [Ch 14] 1 4/25/18 Information network u


slide-1
SLIDE 1

4/25/18 1

DCS/CSCI 2350: Social & Economic Networks

WWW: Information Networks Chapters 13, 14

Mohammad T . Irfan

Questions

  • 1. What does the web look like? [Ch 13]
  • 2. How does Google search it? [Ch 14]
slide-2
SLIDE 2

4/25/18 2

Information network

u Common elements of social and economic

network

u Graphs, paths, giant components u Connections to matching markets and auctions

Web

u Application for sharing info over the Internet u Created by Tim Berners-Lee (1989—91) u 2 perspectives

u Web pages: Make documents easily available to

anyone on the Internet

u Browser: Retrieve and display documents

u Web organizes information in a unique

fashion

u Different from library system u Different from folders in a computer u Different from indexing

u Hypertext

slide-3
SLIDE 3

4/25/18 3

Hypertext

u Replaces linear structure of text by pointers u Concept dates back to 1950s

Precursor to hypertext

u Citation network

slide-4
SLIDE 4

4/25/18 4

Precursor to hypertext

u Semantic network

Precursor to hypertext

u Vannevar Bush (1945)

u Associative memory in “Memex” u Cited by Tim Berners-Lee

slide-5
SLIDE 5

4/25/18 5

Evolution of the web

u Navigational functions (1990s)

u Static web pages

u Transactional functions

u Dynamic, real-time operations

u Web 2.0

u New attitude to technology, not new technology 1.

Collective creation and maintenance of shared content (Wikipedia)

2.

Move personal data to corporate servers (Gmail)

3.

Network among individuals, not just web pages (Facebook)

Web as a directed graph

u Nodes: Web pages u Directed edges: Links u bowdoin.edu à Restaurants and Lodgings à

Brunswick Downtown Association à Visit à Things to do à Museum of Art à bowdoin.edu

u A directed cycle

slide-6
SLIDE 6

4/25/18 6

Example

Strongly Connected Component (SCC)

slide-7
SLIDE 7

4/25/18 7

Bow-tie structure of the web

Link analysis and web search

Chapter 14

slide-8
SLIDE 8

4/25/18 8

Web search

u Google “Bowdoin”

u What do you see? u Why is Bowdoin College ranked first? (Why not

James Bowdoin?) u Google’s source of information is the web

itself

u No expert intervention

u There must be enough information intrinsic

to the web!

Information retrieval

u 1960s: Search repositories of newspapers,

patents, etc. by keywords

u Done by specialized people

u Challenges in web search

u Synonymy: scallion vs. onion u Polysemy: jaguar (you mean the animal or the car

  • r the football team?)

u Search results must be dynamic u Abundance of information (opposite of needle-in-

haystack)

slide-9
SLIDE 9

4/25/18 9

Ranking algorithms

u Voting by in-links u Hubs and authorities u PageRank

Voting by in-links

u Highest in-degree node is ranked first, and so

  • n…
slide-10
SLIDE 10

4/25/18 10

Hubs and authorities algorithm (1998)

Image source: http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture4/lecture4.html

Hubs and authorities example

Initially, all hub and auth scores are 1 hub score = 0 auth score auth score = 0 hub score

  • 1. Update auth

score of every node

  • 2. Update hub

score of every node

slide-11
SLIDE 11

4/25/18 11

Hubs and authorities example

  • 3. Update auth

score of every node

Hubs and authorities example

Normalization

slide-12
SLIDE 12

4/25/18 12

PageRank

Modern web search

u Google, Bing, (Yahoo!, Ask) u PageRank is a central ingredient of Google

u There are more ingredients

u In 2004, Google incorporated the “Hilltop”

method (2001); Ask incorporated the hubs and authorities algorithm

u Exact search method: secret!

slide-13
SLIDE 13

4/25/18 13

Modern web search

u Combination of links, text, and clicks

u Anchor text: “I’m a student of Bowdoin College.”

u Moving target

u Google’s changes in algorithm causes millions of

dollars of damage to many companies

u Companies seek help from SEOs to climb up the

ranking

u “white hat” vs. “black hat” optimization (later)

PageRank (PR) (1998)

u Intuition u Update rule u Demo

u NetLogo

slide-14
SLIDE 14

4/25/18 14

JCPenney scandal (2011) How JCPenney did it

u Hired SearchDex u Black hat optimization

Image source: http://blogs.cornell.edu/info2040/2011/11/03/j-c-penney%E2%80%99s-pagerank/

slide-15
SLIDE 15

4/25/18 15

How they got caught

u NY Times + Blue Fountain

Media

u Punishment (Feb 9, 2011)

u 7 pm: J. C. Penney was still the

  • No. 1 result for “Samsonite carry
  • n luggage.”

u 9 pm: It was at No. 71. u Similar with other keywords

u Precedence: BMW in Germany

(2006)

Google’s spam cop Matt Cutts

(Image source: NY Times)

Modern web search

slide-16
SLIDE 16

4/25/18 16

Link analysis beyond web search

u Citation analysis

u Journal’s impact scores

Link analysis of U.S. Supreme Court citations

u Fowler & Jeon’s study (2008) u Hubs and authorities algorithm applied to

data spanning 2 centuries!

u Important precedence has very high

authority score

u Public recognition comes later (authority

score can predict future popularity)

slide-17
SLIDE 17

4/25/18 17

Link analysis of U.S. Supreme Court citations

u Rise and fall of authority scores