Introduction to Information Retrieval
Introduction to
Information Retrieval
ΠΛΕ70: Ανάκτηση Πληροφορίας
Διδάσκουσα: Ευαγγελία Πιτουρά
Διάλεξη 10: Βασικά Θέματα Αναζήτησης στον Παγκόσμιο Ιστό.
1
Information Retrieval 70: : - - PowerPoint PPT Presentation
Introduction to Information Retrieval Introduction to Information Retrieval 70: : 10:
Introduction to Information Retrieval
Διδάσκουσα: Ευαγγελία Πιτουρά
Διάλεξη 10: Βασικά Θέματα Αναζήτησης στον Παγκόσμιο Ιστό.
1
Introduction to Information Retrieval
Κεφ. 19
2
Introduction to Information Retrieval
3
Κεφ. 19.4
Introduction to Information Retrieval
Κεφ. 19.4.1
4
Introduction to Information Retrieval
Need [Brod02, RL04]
(learn) για κάτι (~40% / 65%)
πληροφορίας από πολλές ιστοσελίδες
συγκεκριμένη ιστοσελίδα (~25% / 15%)
(δεν ενδιαφέρονται γενικά για ιστοσελίδες που περιέχουν τους όρους United Airlines)
Low hemoglobin United Airlines
Κεφ. 19.4.1
5
Introduction to Information Retrieval
Transactional (ερωτήματα συναλλαγής) – θέλουν να κάνουν (do) κάτι (σχετιζόμενο με το web) (~35% / 20%)
Seattle weather Mars surface images Canon S410 Car rental Brasil
Κεφ. 19.4.1
6
Introduction to Information Retrieval
Κεφ. 19.4.1
7
Και ανά χώρα
Τα ερωτήματα ακολουθούν επίσης power law κατανομή
Introduction to Information Retrieval
Κεφ. 19.4.1
8
Επηρεάζει (ανάμεσα σε άλλα)
πλοήγησης ένα αποτέλεσμα ίσως αρκεί, για τα άλλα (και κυρίως πληροφοριακά) ενδιαφερόμαστε για την περιεκτικότητα/ανάκληση
Introduction to Information Retrieval
(Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf)
9
Introduction to Information Retrieval
10
do this, Google claims it doesn’t)
Introduction to Information Retrieval
11
Introduction to Information Retrieval
12
geographic entities
Example: East Palo Alto CA → Latitude: 37.47 N, Longitude: 122.14 W
Important NLP problem
Introduction to Information Retrieval
13
based on personal context Contextualization / personalization is an area of search with a lot
Introduction to Information Retrieval
queries
small
14
Introduction to Information Retrieval
15
Introduction to Information Retrieval
16
Κεφ. 19.3
Introduction to Information Retrieval
17
Introduction to Information Retrieval
18
Introduction to Information Retrieval
19
Introduction to Information Retrieval
20
Introduction to Information Retrieval
21
Introduction to Information Retrieval
Algorithmic results. Paid Search Ads
22
Introduction to Information Retrieval
23
Introduction to Information Retrieval
24
Introduction to Information Retrieval
billions of additional revenue for the search engine.
25
Introduction to Information Retrieval
26
Introduction to Information Retrieval
27
time do users click on it? CTR is a measure of relevance.
willing to pay against (ii) how relevant the ad is
Introduction to Information Retrieval
28
Second price auction: The advertiser pays the minimum amount necessary to maintain their position in the auction (plus 1 cent). price1 × CTR1 = bid2 × CTR2 (this will result in rank1=rank2) price1 = bid2 × CTR2 / CTR1 p1 = bid2 × CTR2/CTR1 = 3.00 × 0.03/0.06 = 1.50 p2 = bid3 × CTR3/CTR2 = 1.00 × 0.08/0.03 = 2.67 p3 = bid4 × CTR4/CTR3 = 4.00 × 0.01/0.08 = 0.50
Introduction to Information Retrieval
29
According to http://www.cwire.org/highest-paying-search-terms/
$69.1 mesothelioma treatment options $65.9 personal injury lawyer michigan $62.6 student loans consolidation $61.4 car accident attorney los angeles $59.4 online car insurance quotes $59.4 arizona dui lawyer $46.4 asbestos cancer $40.1 home equity line of credit $39.8 life insurance quotes $39.2 refinancing $38.7 equity line of credit $38.0 lasik eye surgery new york city $37.0 2nd mortgage $35.9 free car insurance quote
Introduction to Information Retrieval
30
Introduction to Information Retrieval
31
somebody clicks on an ad.
clicking on an ad.
Introduction to Information Retrieval
32
more than you are paying Google.
32
Introduction to Information Retrieval
33
bought by competitors.
trademark if the user can’t buy the product on the site.
Introduction to Information Retrieval
(SEARCH ENGINE OPTIMIZATION)
34
Introduction to Information Retrieval
Κεφ. 19.2.2
35
Introduction to Information Retrieval
Κεφ. 19.2.2
36
resort ήταν αυτές που περιείχαν τα περισσότερα maui και resort
ιστοσελίδα
Απλή πυκνότητα όρων δεν είναι αξιόπιστο ΑΠ σήμα
Introduction to Information Retrieval
style sheet tricks, etc.
Meta-Tags = “… London hotels, hotel, holiday inn, hilton, discount, booking, reservation, sex, mp3, britney spears, viagra, …”
Κεφ. 19.2.2
37
Introduction to Information Retrieval
Is this a Search Engine spider? N Y SPAM Real Doc
Cloaking
Κεφ. 19.2.2
38
Introduction to Information Retrieval
results page, redirected with a fast Meta refresh command to another page.
name, designed to attract surfers who will then click on ads
Κεφ. 19.2.2
39
Introduction to Information Retrieval
page
Κεφ. 19.2.2
40
Introduction to Information Retrieval
signals)
linkage (or text)
spammers (guilt by association)
spam
classification techniques, etc.
detectors, source text analysis, etc.
41
Introduction to Information Retrieval
42
Check out: Webmaster Tools (Google)
Introduction to Information Retrieval
43
Introduction to Information Retrieval
mirroring (~30%)
Κεφ. 19.5
44
Introduction to Information Retrieval
defined.
indexing anchortext.
words, only relevant words, etc.)
Κεφ. 19.5
45
Introduction to Information Retrieval
...
Κεφ. 19.5
46
Introduction to Information Retrieval
A B = (1/2) * Size A A B = (1/6) * Size B (1/2)*Size A = (1/6)*Size B \ Size A / Size B = (1/6)/(1/2) = 1/3
1. Sample URLs randomly from A 2. Check if contained in B and vice versa
A B
Κεφ. 19.5
47
Introduction to Information Retrieval
web (as opposed to just relative sizes of indexes)
Κεφ. 19.5
48
Introduction to Information Retrieval
Κεφ. 19.5
49
Introduction to Information Retrieval
Not an English dictionary
Κεφ. 19.5
50
Introduction to Information Retrieval
Κεφ. 19.5
51
Introduction to Information Retrieval
top 100)
with 8 words conjunctive query
index modification.
Κεφ. 19.5
52
Introduction to Information Retrieval
Κεφ. 19.5
53
Introduction to Information Retrieval
advantage for originating search engine)
results, ratio average not statistically sound)
Κεφ. 19.5
54
Introduction to Information Retrieval
match
Κεφ. 19.5
55
Introduction to Information Retrieval
topographic
importing
theory
Κεφ. 19.5
56
Introduction to Information Retrieval
Κεφ. 19.5
57
Introduction to Information Retrieval
crawlable web servers (16 million total) from observing 2500 servers.
core metadata in 0.3%
Κεφ. 19.5
58
Introduction to Information Retrieval
requests
few pages. (under-sampling)
understood
server to avoid IP block)
Κεφ. 19.5
59
Introduction to Information Retrieval
distribution)
Κεφ. 19.5
60
Introduction to Information Retrieval
under certain metrics.
Κεφ. 19.5
61
Introduction to Information Retrieval
Κεφ. 19.5
62
Check out http://www.worldwidewebsize.com/ The Indexed Web contains at least 3.57 billion pages (Tuesday, 20 May, 2014).
Introduction to Information Retrieval
Κεφ. 19.5
63
Introduction to Information Retrieval
Χρησιμοποιήθηκε κάποιο υλικό από: Pandu Nayak and Prabhakar Raghavan, CS276:Information Retrieval and Web Search (Stanford) Hinrich Schütze and Christina Lioma, Stuttgart IIR class 64