4/25/18 1
DCS/CSCI 2350: Social & Economic Networks
WWW: Information Networks Chapters 13, 14
Mohammad T . Irfan
Questions
- 1. What does the web look like? [Ch 13]
- 2. How does Google search it? [Ch 14]
DCS/CSCI 2350: Social & Economic Networks WWW: Information - - PDF document
4/25/18 DCS/CSCI 2350: Social & Economic Networks WWW: Information Networks Chapters 13, 14 Mohammad T . Irfan Questions 1. What does the web look like? [Ch 13] 2. How does Google search it? [Ch 14] 1 4/25/18 Information network u
u Common elements of social and economic
u Graphs, paths, giant components u Connections to matching markets and auctions
u Application for sharing info over the Internet u Created by Tim Berners-Lee (1989—91) u 2 perspectives
u Web pages: Make documents easily available to
anyone on the Internet
u Browser: Retrieve and display documents
u Web organizes information in a unique
u Different from library system u Different from folders in a computer u Different from indexing
u Hypertext
u Replaces linear structure of text by pointers u Concept dates back to 1950s
u Citation network
u Semantic network
u Vannevar Bush (1945)
u Associative memory in “Memex” u Cited by Tim Berners-Lee
u Navigational functions (1990s)
u Static web pages
u Transactional functions
u Dynamic, real-time operations
u Web 2.0
u New attitude to technology, not new technology 1.
Collective creation and maintenance of shared content (Wikipedia)
2.
Move personal data to corporate servers (Gmail)
3.
Network among individuals, not just web pages (Facebook)
u Nodes: Web pages u Directed edges: Links u bowdoin.edu à Restaurants and Lodgings à
u A directed cycle
u Google “Bowdoin”
u What do you see? u Why is Bowdoin College ranked first? (Why not
James Bowdoin?) u Google’s source of information is the web
u No expert intervention
u There must be enough information intrinsic
u 1960s: Search repositories of newspapers,
u Done by specialized people
u Challenges in web search
u Synonymy: scallion vs. onion u Polysemy: jaguar (you mean the animal or the car
u Search results must be dynamic u Abundance of information (opposite of needle-in-
haystack)
u Voting by in-links u Hubs and authorities u PageRank
u Highest in-degree node is ranked first, and so
Image source: http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture4/lecture4.html
Initially, all hub and auth scores are 1 hub score = 0 auth score auth score = 0 hub score
score of every node
score of every node
score of every node
Normalization
u Google, Bing, (Yahoo!, Ask) u PageRank is a central ingredient of Google
u There are more ingredients
u In 2004, Google incorporated the “Hilltop”
u Exact search method: secret!
u Combination of links, text, and clicks
u Anchor text: “I’m a student of Bowdoin College.”
u Moving target
u Google’s changes in algorithm causes millions of
dollars of damage to many companies
u Companies seek help from SEOs to climb up the
ranking
u “white hat” vs. “black hat” optimization (later)
u Intuition u Update rule u Demo
u NetLogo
u Hired SearchDex u Black hat optimization
Image source: http://blogs.cornell.edu/info2040/2011/11/03/j-c-penney%E2%80%99s-pagerank/
u NY Times + Blue Fountain
u Punishment (Feb 9, 2011)
u 7 pm: J. C. Penney was still the
u 9 pm: It was at No. 71. u Similar with other keywords
u Precedence: BMW in Germany
Google’s spam cop Matt Cutts
(Image source: NY Times)
u Citation analysis
u Journal’s impact scores
u Fowler & Jeon’s study (2008) u Hubs and authorities algorithm applied to
u Important precedence has very high
u Public recognition comes later (authority
u Rise and fall of authority scores