About Me About Me The Webs Missing Links: The Webs Missing Links: - - PDF document

about me about me the web s missing links the web s
SMART_READER_LITE
LIVE PREVIEW

About Me About Me The Webs Missing Links: The Webs Missing Links: - - PDF document

12/10/2009 About Me About Me The Webs Missing Links: The Webs Missing Links: Dual training Dual training B.Sc. & M.Eng in Computer Science (Manchester) B.Sc. & M.Eng in Computer Science (Manchester) The Search Engine


slide-1
SLIDE 1

12/10/2009 1

The Web’s Missing Links: The Web’s Missing Links:

The Search Engine & Portal Industry The Search Engine & Portal Industry

Thomas Haigh Thomas Haigh

www.tomandmaria.com/tom www.tomandmaria.com/tom

Thomas Haigh Thomas Haigh The Haigh Group & The Haigh Group & University of Wisconsin, Milwaukee University of Wisconsin, Milwaukee thaigh@computer.org thaigh@computer.org ETH, ETH, Informatik Informatik Lunch, 18 June 2007 Lunch, 18 June 2007

About Me About Me

 Dual training

Dual training

 B.Sc. & M.Eng in Computer Science (Manchester)

B.Sc. & M.Eng in Computer Science (Manchester)

 Ph.D. in History & Sociology of Science (Pennsylvania)

Ph.D. in History & Sociology of Science (Pennsylvania)

 Main interest in history of IT use in US business

Main interest in history of IT use in US business

 Published papers on history of

Published papers on history of

 Management Information Systems concept

Management Information Systems concept

www.tomandmaria.com/tom www.tomandmaria.com/tom

Management Information Systems concept Management Information Systems concept

 Early data processing

Early data processing

 DBMS concept

DBMS concept

 Word Processing

Word Processing

 Packaged software industry

Packaged software industry

 Sources for ACM history

Sources for ACM history

 Chair SHOT SIG on Computers, Information, Society

Chair SHOT SIG on Computers, Information, Society

 Involved with IEEE, SIAM & ACM projects

Involved with IEEE, SIAM & ACM projects

Background of Project Background of Project

 Two chapters in MIT Press edited book, “The

Two chapters in MIT Press edited book, “The Internet & American Business,” Aspray & Ceruzzi Internet & American Business,” Aspray & Ceruzzi

 Software infrastructure chapter

Software infrastructure chapter – – web, email, web, email, protocols protocols

www.tomandmaria.com/tom www.tomandmaria.com/tom  Search and portals (“Web navigation business”)

Search and portals (“Web navigation business”)

 Contemporary history, somewhat journalistic

Contemporary history, somewhat journalistic

 Recounting of basic events from secondary sources

Recounting of basic events from secondary sources

 Focus on interplay between technology and business

Focus on interplay between technology and business models models

Aims Aims

1. 1.

Situate web with respect to other electronic Situate web with respect to other electronic publishing technologies publishing technologies

And earlier Internet story And earlier Internet story

2. 2.

Tie together Tie together

www.tomandmaria.com/tom www.tomandmaria.com/tom

g

Web publishing economics Web publishing economics

Web navigation economics Web navigation economics

Technical choices built into web design Technical choices built into web design

3. 3.

Write analytical history from journalistic Write analytical history from journalistic sources sources

Social Construction of Technology Social Construction of Technology

Two key concepts established since 1980s Two key concepts established since 1980s

 1: Mutual shaping of technologies and

1: Mutual shaping of technologies and society society

 Influence of social factors on technological

Influence of social factors on technological

www.tomandmaria.com/tom www.tomandmaria.com/tom

 Influence of social factors on technological

Influence of social factors on technological design choices design choices

 2: Power of technological SYSTEMS

2: Power of technological SYSTEMS

 Combine users, firms, standards, technologies

Combine users, firms, standards, technologies

 Lock

Lock-

  • in effects of dominant systems as

in effects of dominant systems as “Technological Momentum” “Technological Momentum”

Reconstruction of Technology Reconstruction of Technology

 Commercialization of Internet infrastructure

Commercialization of Internet infrastructure

 What happens when an already “shaped”

What happens when an already “shaped” technology gets technology gets

 New uses

New uses

www.tomandmaria.com/tom www.tomandmaria.com/tom  New uses

New uses

 New “relevant social groups”

New “relevant social groups”

 New cultural meanings

New cultural meanings

 Thoughts at the back of my mind

Thoughts at the back of my mind

 VHS vs Beta, QWERTY vs. Dvorak?

VHS vs Beta, QWERTY vs. Dvorak? – –

 which is the net?

which is the net?

slide-2
SLIDE 2

12/10/2009 2

2: Narrative Overview 2: Narrative Overview

www.tomandmaria.com/tom www.tomandmaria.com/tom

Web Hosts Growth Web Hosts Growth

www.tomandmaria.com/tom www.tomandmaria.com/tom

Timeline of Developments Timeline of Developments

1991: Web introduced at CERN 1991: Web introduced at CERN

1993: Mosaic popularizes the Web 1993: Mosaic popularizes the Web

130 servers to 10,000 in 18 months 130 servers to 10,000 in 18 months

1993: First web crawlers 1993: First web crawlers

1994: Yahoo directory service founded 1994: Yahoo directory service founded

1995: AltaVista, Lycos, Excite, 1995: AltaVista, Lycos, Excite, Infoseek Infoseek & & OpenText OpenText index web index web

www.tomandmaria.com/tom www.tomandmaria.com/tom

, y , , , y , , p p

1995: Netscape IPO 1995: Netscape IPO

1996: Yahoo, Excite, Lycos & 1996: Yahoo, Excite, Lycos & Infoseek Infoseek IPOs IPOs

1998: Google, Inc. founded 1998: Google, Inc. founded

1999: Search firms converge on Portal model 1999: Search firms converge on Portal model

2000: Dot com crash signals end of easy money 2000: Dot com crash signals end of easy money

2000: Google starts selling 2000: Google starts selling AdWords AdWords

2004: Google IPO. 2004: Google IPO.

Today: Google dominates search, Yahoo is primary U.S. Portal Today: Google dominates search, Yahoo is primary U.S. Portal

Web Directories Web Directories

 The Web As Its Own Catalog

The Web As Its Own Catalog

 Link directories are special

Link directories are special-

  • purpose websites

purpose websites

 Yahoo is most successful

Yahoo is most successful

 Humans visit lots of websites

Humans visit lots of websites

www.tomandmaria.com/tom www.tomandmaria.com/tom  Find the best ones on a topic

Find the best ones on a topic

 Add them with topic code to a simple database

Add them with topic code to a simple database

 Directory listings are batch generated

Directory listings are batch generated

 Basically the yellow pages of the Internet

Basically the yellow pages of the Internet

 Businesses pay for prominent position

Businesses pay for prominent position

 Firms advertise to reach searchers

Firms advertise to reach searchers

Yahoo, 1996 Yahoo, 1996

www.tomandmaria.com/tom www.tomandmaria.com/tom

Search Engine Model Search Engine Model

 Crawlers index the web

Crawlers index the web

 Technology already developed for ftp sites, gopher headings

Technology already developed for ftp sites, gopher headings

 Keywords entered by users are looked up in index

Keywords entered by users are looked up in index

 Index & search developed for online services, full text databases

Index & search developed for online services, full text databases like OED like OED

www.tomandmaria.com/tom www.tomandmaria.com/tom  Hard to do well!

Hard to do well!

 How to make money?

How to make money?

 Subscription model fails for Infoseek

Subscription model fails for Infoseek

 Standard for online databases like LEXIS

Standard for online databases like LEXIS

 Advertising supported

Advertising supported

 Popular keywords sold at a premium from 1995

Popular keywords sold at a premium from 1995

 Also sell tech or services to other websites

Also sell tech or services to other websites

slide-3
SLIDE 3

12/10/2009 3

AltaVista 1996 AltaVista 1996

www.tomandmaria.com/tom www.tomandmaria.com/tom

Portals Portals

 Internet navigation firms add content

Internet navigation firms add content

 Both Yahoo (directory)

Both Yahoo (directory)

 And Excite, Lycos & other search firms

And Excite, Lycos & other search firms

 Theory: add “stickiness”

Theory: add “stickiness” – – be more like AOL be more like AOL

 Good search sends users away quickly

Good search sends users away quickly

www.tomandmaria.com/tom www.tomandmaria.com/tom

y q y y q y

 Keep them around instead

Keep them around instead

 News, Weather & Horrorscopes

News, Weather & Horrorscopes

 Free email

Free email

 Shopping “malls”

Shopping “malls”

 They watch more banner advertisements

They watch more banner advertisements

 But unlike AOL aren’t online services

But unlike AOL aren’t online services

AltaVista AltaVista 2000 2000

www.tomandmaria.com/tom www.tomandmaria.com/tom

Yahoo, 2000 Yahoo, 2000

www.tomandmaria.com/tom www.tomandmaria.com/tom

Influence of .com Boom Influence of .com Boom

 Portals copy AOL with “strategic partnerships”

Portals copy AOL with “strategic partnerships” with doomed startups with doomed startups

 E.g. “Exclusive CD retailer on Yahoo”

E.g. “Exclusive CD retailer on Yahoo”

 Excite@home pays $780 million for online greeting

Excite@home pays $780 million for online greeting

www.tomandmaria.com/tom www.tomandmaria.com/tom  Excite@home pays $780 million for online greeting

Excite@home pays $780 million for online greeting card company card company

 Companies valued on number of visitors

Companies valued on number of visitors

 Institutional Ismophism

Institutional Ismophism – – companies copying companies copying each other each other

 Need rising numbers to justify valuation

Need rising numbers to justify valuation

 YHOO stock rises 100 times in 4 years from IPO

YHOO stock rises 100 times in 4 years from IPO

 Lycos (# 3 portal) sold for $12.5 billion in 2000

Lycos (# 3 portal) sold for $12.5 billion in 2000

Portals Largely Wiped Out Portals Largely Wiped Out

  Had deemphasized search

Had deemphasized search

 Full of advertising & paid results

Full of advertising & paid results

 Swamped by search engine spam

Swamped by search engine spam

www.tomandmaria.com/tom www.tomandmaria.com/tom

 Little investment in improvements

Little investment in improvements

  Crippled when easy money dries up in

Crippled when easy money dries up in 2001 2001

  By 2003 Yahoo is only significant non

By 2003 Yahoo is only significant non-

  • ISP

ISP portal portal

 AOL and MSN retain online service portals

AOL and MSN retain online service portals

slide-4
SLIDE 4

12/10/2009 4

3: Special Features of the 3: Special Features of the Web Web

www.tomandmaria.com/tom www.tomandmaria.com/tom

Why Was the Web Special? Why Was the Web Special?

 Web is the first functional

Web is the first functional

Very large scale Very large scale

Highly distributed (no index or catalog) Highly distributed (no index or catalog)

www.tomandmaria.com/tom www.tomandmaria.com/tom 

Hypertext Hypertext

Electronic publishing system Electronic publishing system

 So, how was it different from other electronic

So, how was it different from other electronic publishing systems? publishing systems?

 And how did this influence the web navigation

And how did this influence the web navigation industry? industry?

Web Navigation Business Web Navigation Business

 Unlike earlier electronic publishing, the web has

Unlike earlier electronic publishing, the web has no search or index built in no search or index built in

 Makes publishing very easy, retrieving very hard

Makes publishing very easy, retrieving very hard

 Hypertext seen as alternative to searching and

Hypertext seen as alternative to searching and indexing indexing

www.tomandmaria.com/tom www.tomandmaria.com/tom

 Unlike earlier electronic publishing systems

Unlike earlier electronic publishing systems

 Navigation and indexing content is a separate

Navigation and indexing content is a separate business from publishing content business from publishing content

 Creates huge business opportunity. 2 models

Creates huge business opportunity. 2 models

 Web Directory (Yahoo, Magellan)

Web Directory (Yahoo, Magellan)

 Web Search (Excite, Lycos, AltaVista)

Web Search (Excite, Lycos, AltaVista)

The Early Web The Early Web

 Leverages existing Internet technologies

Leverages existing Internet technologies

 TCP/IP, FTP, news, Gopher, SGML, SMTP etc

TCP/IP, FTP, news, Gopher, SGML, SMTP etc

 New elements: HTML, HTTP, URL

New elements: HTML, HTTP, URL

 Simple design

Simple design

www.tomandmaria.com/tom www.tomandmaria.com/tom  elegantly tackles immediate needs

elegantly tackles immediate needs

 Fundamental problems ignored

Fundamental problems ignored

 Searching

Searching

 Hyperlink issues

Hyperlink issues

 Follows cultural traditions of Internet

Follows cultural traditions of Internet

Layering of Protocols Layering of Protocols

FTP Client FTP Client Mail client Mail client Web Web browser browser Many Many

  • thers….
  • thers….

FTP FTP (File (File transfer) transfer) SMTP SMTP (Mail (Mail transfer) transfer) HTTP HTTP (Web) (Web) Video, chat, Video, chat, news, P2P , news, P2P , instant instant i i

www.tomandmaria.com/tom www.tomandmaria.com/tom

messaging messaging Socket API Socket API TCP/IP TCP/IP (also DNS shared by applications) (also DNS shared by applications) Ethernet Ethernet SLIP/ SLIP/ PPP PPP Satellite Satellite Fiber Optic, Fiber Optic, Etc. Etc.

Construction of Internet Construction of Internet Technologies (1970s Technologies (1970s-

  • 80s)

80s)

 Closed, homogenous, small academic population

Closed, homogenous, small academic population

  • Results: Rely on social mechanisms for security,

Results: Rely on social mechanisms for security, elimination of troublemakers elimination of troublemakers

 Practical working network

Practical working network

www.tomandmaria.com/tom www.tomandmaria.com/tom

 Practical, working network

Practical, working network

  • Rather have it next week than perfect

Rather have it next week than perfect  Non

Non-

  • commercial

commercial

  • No mechanisms to bill for use of resources

No mechanisms to bill for use of resources

 Support for many machine types

Support for many machine types

  • Compatibility through standards, not code

Compatibility through standards, not code

slide-5
SLIDE 5

12/10/2009 5

Construction of Internet Construction of Internet Technologies II Technologies II

 Decentralized and international

Decentralized and international

  • Easy to connect new machines, sub

Easy to connect new machines, sub-

  • domains

domains

 Many different communication mechanisms

Many different communication mechanisms

  • TCP/IP works over many media

TCP/IP works over many media

www.tomandmaria.com/tom www.tomandmaria.com/tom

  • TCP/IP works over many media

TCP/IP works over many media

 Connects computers to each other

Connects computers to each other

  • Peer to Peer

Peer to Peer – – any machine can be client or server any machine can be client or server

 Created for experimentation and research, not

Created for experimentation and research, not

  • ne specific task
  • ne specific task
  • Separation of application protocols from network

Separation of application protocols from network mechanisms mechanisms

Berners Berners-

  • Lee’s Limited Resources

Lee’s Limited Resources

 Computer specialist at CERN

Computer specialist at CERN

 Supporting the real science…

Supporting the real science…

 Web justified as useful tool for CERN

Web justified as useful tool for CERN

 By 1994, CERN gave 20 man years of effort over 5 years

By 1994, CERN gave 20 man years of effort over 5 years

Mostly from interns and post docs Mostly from interns and post docs

www.tomandmaria.com/tom www.tomandmaria.com/tom  Mostly from interns and post docs

Mostly from interns and post docs

 Initial appeal of web as integrator of existing content

Initial appeal of web as integrator of existing content

FTP, news, Gopher, telnet FTP, news, Gopher, telnet  Contrast with major electronic publishing projects

Contrast with major electronic publishing projects – – Xanadu, Time Warner, etc Xanadu, Time Warner, etc

 No hypertext, information retrieval or database specialists

No hypertext, information retrieval or database specialists involved involved

 No grants awarded

No grants awarded

 No top management approval

No top management approval

Difficult Problems Ignored Difficult Problems Ignored

1. 1.

From Hypertext Research From Hypertext Research

Maintaining links in distributed system Maintaining links in distributed system

State of the art: 2 way, versioned, typed links State of the art: 2 way, versioned, typed links

www.tomandmaria.com/tom www.tomandmaria.com/tom

y, , yp y, , yp

2. 2.

From Information Retrieval & Databases From Information Retrieval & Databases

Standards for metadata Standards for metadata

(date, author, keywords) (date, author, keywords)

Searching distributed databases Searching distributed databases

Difficult Problems Ignored Difficult Problems Ignored

3. 3.

From Online Services (& Xanadu) From Online Services (& Xanadu)

Charging for microtransactions Charging for microtransactions

Reimbursing content providers Reimbursing content providers

www.tomandmaria.com/tom www.tomandmaria.com/tom

Reimbursing content providers Reimbursing content providers

As A Result of Problems Ignored As A Result of Problems Ignored

 Web server is very simple

Web server is very simple

 HTTP just delivers requested file

HTTP just delivers requested file

 Web has no catalog (central or federated)

Web has no catalog (central or federated)

www.tomandmaria.com/tom www.tomandmaria.com/tom

 Web has no catalog (central or federated)

Web has no catalog (central or federated)

 Links decay rapidly

Links decay rapidly

 There is no clear way to make money

There is no clear way to make money from web publishing from web publishing

The Need for Web Navigation The Need for Web Navigation

 Web servers very easy to set up, so people do

Web servers very easy to set up, so people do

 No license, fees, or permissions needed

No license, fees, or permissions needed

 No need for specialist cataloging skills

No need for specialist cataloging skills

 Add one small service to an existing computer

Add one small service to an existing computer

www.tomandmaria.com/tom www.tomandmaria.com/tom

g p g p

 Information is very hard to find

Information is very hard to find

 Easier publishing = harder searching

Easier publishing = harder searching

 Search firms need

Search firms need

 Great algorithms

Great algorithms

 Big computers

Big computers

 Ph.D. specialists

Ph.D. specialists

 Venture capital

Venture capital

slide-6
SLIDE 6

12/10/2009 6

4: The Triumph of Google 4: The Triumph of Google

www.tomandmaria.com/tom www.tomandmaria.com/tom

Google Google

 Seizes a neglected search market

Seizes a neglected search market

 Highest quality search results

Highest quality search results

 Lowest profile advertising (from 2000)

Lowest profile advertising (from 2000)

 Simplest user interface

Simplest user interface

www.tomandmaria.com/tom www.tomandmaria.com/tom  Simplest user interface

Simplest user interface

 Two big innovations

Two big innovations

 PageRank algorithm

PageRank algorithm

 priority for pages widely cited by widely cited pages

priority for pages widely cited by widely cited pages

 Pay

Pay-

  • per

per-

  • click advertising with price set by auction

click advertising with price set by auction algorithm on keyword algorithm on keyword

Internet Publishing Models Internet Publishing Models

 No support for payment for content

No support for payment for content

 Micropayment hyped but flops

Micropayment hyped but flops

 Web publishing model shifts fundamentally from AOL

Web publishing model shifts fundamentally from AOL

www.tomandmaria.com/tom www.tomandmaria.com/tom

era era

 Users resist subscription services

Users resist subscription services

 Economic foundation for web publishing comes

Economic foundation for web publishing comes from advertising, not readers from advertising, not readers

 Economies of scale favor big firms

Economies of scale favor big firms

 Key argument for portals

Key argument for portals

Pay Per Click Ad Model Pay Per Click Ad Model

  First used by Overture, Google copies

First used by Overture, Google copies

 Traditional: $X per thousand page views

Traditional: $X per thousand page views

 New: $Y per person who clicks on an ad

New: $Y per person who clicks on an ad

  Easy to add Google ads to a website

Easy to add Google ads to a website

R lit ith b it t R lit ith b it t

www.tomandmaria.com/tom www.tomandmaria.com/tom

 Revenues split with website operator

Revenues split with website operator

 Selection algorithm includes several factors

Selection algorithm includes several factors

 Site content

Site content

 Amount bid & frequency of clicks

Amount bid & frequency of clicks

  Changes economics of web publishing

Changes economics of web publishing

 Smaller sites can cover costs, make money

Smaller sites can cover costs, make money

Current Situation Current Situation

 Google booms

Google booms

 Adds new services

Adds new services

 Keeps things simple

Keeps things simple

 Offers APIs for maps, etc

Offers APIs for maps, etc

www.tomandmaria.com/tom www.tomandmaria.com/tom

p , p ,

 Broadens ad

Broadens ad-

  • syndication business

syndication business

 Yahoo stumbles

Yahoo stumbles

 Realizes importance of search, launches own engine

Realizes importance of search, launches own engine

 So far unable to match Google’s effective ad targeting

So far unable to match Google’s effective ad targeting

 Despite hyped “Panama” project

Despite hyped “Panama” project

Open Questions Open Questions

 How would one ideally tackle the topic?

How would one ideally tackle the topic?

 Is it too soon to write this history?

Is it too soon to write this history?

 Where are the users?

Where are the users?

 Is this a new industry or continuation of yellow pages,

Is this a new industry or continuation of yellow pages, etc. etc.

 What to do with academic side of story?

What to do with academic side of story?

www.tomandmaria.com/tom www.tomandmaria.com/tom

 What to do with academic side of story?

What to do with academic side of story?

 Lycos: CMU

Lycos: CMU

 Yahoo, Google, Excite: Stanford

Yahoo, Google, Excite: Stanford

 Open Text: Waterloo

Open Text: Waterloo

 Relationship of Web search to enterprise

Relationship of Web search to enterprise document management document management

 Similarities, differences?

Similarities, differences?

slide-7
SLIDE 7

12/10/2009 7

Contact Contact

 thaigh@computer.org

thaigh@computer.org

 www.tomandmaria.com/tom

www.tomandmaria.com/tom Copies of my chapters available on Copies of my chapters available on

www.tomandmaria.com/tom www.tomandmaria.com/tom

 Copies of my chapters available on

Copies of my chapters available on request request

 Book appears late 2007/early 2008, MIT Press

Book appears late 2007/early 2008, MIT Press