Information Storage and Processing for Web Search Nicola Tonellotto - PowerPoint PPT Presentation

Query Processing Breakdown � Pre-process the query (e.g., tokenisation, stemming) � Lookup the statistics for each term in the lexicon � Process the postings for each query term, computing scores for documents to identify the final retrieved set � Output the retrieved set with metadata (e.g., URLs) Università La Sapienza – 18 October 2016

Document-at-a-Time (DAAT) Università La Sapienza – 18 October 2016

Dynamic Pruning •What takes time? - Number of query terms ‣ Longer queries have more terms with posting lists to process - Length of posting lists ‣ More postings takes longer times •Aim: avoid (unnecessary) scoring of posting 26 Università La Sapienza – 18 October 2016

Safeness • Safe pruning : the output ordering of the strategy is identical to the output ordering of the full processing • Safe up to rank K : the first K documents are identical to the first K documents of the full processing • Approximate : no guarantees on final ordering of document w.r.t. full processing 27 Università La Sapienza – 18 October 2016

DAAT Pruning • MaxScore (Turtle & Flood, IPM 31(6), 1995) - Early termination : does not compute scores for documents that won’t be retrieved - By comparing upper bounds with threshold - Suitable for TAAT as well • WAND (Broder et al., CIKM 2003) - Approximate evaluation : does not consider documents with approximate scores (sum of upper bounds) lower than threshold - Exploit skipping • BlockMaxWAND (Ding & Suel, SIGIR 2011) - Two levels: initially on blocks (128 postings), then on postings - Approximate evaluation : does not consider documents with approximate scores (sum of upper bounds) lower than threshold - Exploit skipping • All three use docids sorted posting lists 28 Università La Sapienza – 18 October 2016

Some (Unpublished) Results (50 M docs) Num Terms Ranking Algorithm 1 2 3 4 5 6 7 43.59 38.08 32.68 25.23 29.26 17.57 15.78 Ranked And 43.05 261.59 536.06 807.05 1,107.93 1,402.26 1,756.52 Ranked Or MaxScore 45.01 48.21 51.06 57.28 75.66 95.06 117.61 Wand 62.62 44.98 48.24 55.27 69.39 98.47 120.70 BlockMaxWand 0.71 13.11 40.69 64.62 99.95 149.94 192.65 Average Response Times (msec) Num Terms Ranking Algorithm 1 2 3 4 5 6 7 Ranked And 265.25 182.84 136.33 101.75 94.99 66.70 61.26 Ranked Or 260.74 838.04 1,296.98 1,759.49 2,209.58 2,663.02 3,130.37 245.03 189.39 174.92 175.44 215.57 253.29 313.25 MaxScore Wand 387.55 210.37 184.10 182.05 210.05 289.27 337.07 1.64 43.05 140.12 201.39 280.05 394.48 500.16 BlockMaxWand 95% Response Time (msec) 29 Università La Sapienza – 18 October 2016

Web Search Engine Learned Feature Lookup Query Processing Ranking and Calculation Function Vertical Vertical Core Document Learning to Rank Index Vertical Index Inverted Vertical Feature Technique Index Vertical Index Index Repository Index Training Metadata Feature Indexer Data Processor Processor Text Processor Document Document Collection Collection Crawling 30 Università La Sapienza – 18 October 2016

Learning to Rank is not classification relevant query, Black Box or   document not relevant

Learning to Rank is: query,   ranked list   Black Box d 1 , d 2 , d 3 , … d 17 , d 13 , d 666 , … The goal is to learn the ranking, not the label !

Learning to Rank is: Machine Learning (a.k.a. Black Magic) query,   ranked list   Black Box d 1 , d 2 , d 3 , … d 17 , d 13 , d 666 , … The goal is to learn the ranking, not the label !

Learning to Rank is: large training set of queries and ideal document ranking   q a , d 1 , d 2 , d 3 , d 5 , d 8 , d 13 , d 21 , …   q b , d 99 , d 98 , d 97 , d 96 , d 95 , d 94 , … Machine Learning (a.k.a. Black Magic) query,   ranked list   Black Box d 1 , d 2 , d 3 , … d 17 , d 13 , d 666 , … The goal is to learn the ranking, not the label !

Normalized Discounted Cumulative Gain   NDCG@K • Consider only the top-K ranked documents,   and sum up (cumulate) their contribution • The contribution (gain) of a result depends on its relevance label • Contribution is diminished (discounted) if the result is in the “bottom” positions • Normalize between 0 and 1 • rel i is the relevance label of the i-th result (e.g., 1..5) • IDCG@k is the score of the ideal ranking

Learning to Rank Approaches • Pointwise • Each query-document pair is associated with a score • The objective is to predict such score • Can be considered a regression problem • Does not consider the position of a document into the result list • Pairwise • We are given pairwise preferences, d1 is better than d2 for query q • The objective is to predict a score that preserves such preferences • Can be considered a classification problem • It partially considers the position of a document into the result list • Listwise • We are given the ideal ranking of results for each query • NB: It might not be trivial to produce such training set • Objective maximize the quality of the resulting ranked list • We need some improved approach…

Decision Tree • Tree-like structure similar to a flow chart. • Every internal node denotes a test over an attribute/feature • Outgoing edges correspond to the test possible outcomes • Every leaf node is associated to a class label (if it is a classification task) or class distribution or predicted value (if it is a regression task) • It is used to label a new data instance on the basis of its attributes • Runs tests on the data instance attributes and traverses the tree according to the tests results • Starting form the root, the data instances follows a path to a leaf • The label associated to the leaf is the prediction of the decision tree.

(Basic) Boosted Decision Trees

(Basic) Boosted Decision Trees • We want to learn a predictor incrementally: • Input: a learning sample { (x i ,y i ): i = 1, …, N }

(Basic) Boosted Decision Trees • We want to learn a predictor incrementally: • Input: a learning sample { (x i ,y i ): i = 1, …, N } • Initialize • Baseline tree predicts the average label value ŷ 0 (x) = 1/N ∑ i y i r i = y i , i = 1, …, N

(Basic) Boosted Decision Trees • We want to learn a predictor incrementally: • Input: a learning sample { (x i ,y i ): i = 1, …, N } • Initialize • Baseline tree predicts the average label value ŷ 0 (x) = 1/N ∑ i y i r i = y i , i = 1, …, N • For t = 1 to M: • Regression tree predicts the residual error • For i = 1 to N, compute the residuals r i ← r i-1 - ŷ t-1 (x i ) • Build a regression tree from the learning sample { (x i ,y i ): i = 1, …, N } • The prediction of the new regression tree is denoted with ŷ t

(Basic) Boosted Decision Trees • We want to learn a predictor incrementally: • Input: a learning sample { (x i ,y i ): i = 1, …, N } • Initialize • Baseline tree predicts the average label value ŷ 0 (x) = 1/N ∑ i y i r i = y i , i = 1, …, N • For t = 1 to M: • Regression tree predicts the residual error • For i = 1 to N, compute the residuals r i ← r i-1 - ŷ t-1 (x i ) • Build a regression tree from the learning sample { (x i ,y i ): i = 1, …, N } • The prediction of the new regression tree is denoted with ŷ t • Return the model ŷ (x)= ŷ 0 (x) + ŷ 1 (x) + … + ŷ M (x)

(Basic) Boosted Decision Trees • We want to learn a predictor incrementally: • Input: a learning sample { (x i ,y i ): i = 1, …, N } • Initialize • Baseline tree predicts the average label value ŷ 0 (x) = 1/N ∑ i y i r i = y i , i = 1, …, N • For t = 1 to M: • Regression tree predicts the residual error • For i = 1 to N, compute the residuals r i ← r i-1 - ŷ t-1 (x i ) • Build a regression tree from the learning sample { (x i ,y i ): i = 1, …, N } • The prediction of the new regression tree is denoted with ŷ t • Return the model ŷ (x)= ŷ 0 (x) + ŷ 1 (x) + … + ŷ M (x) • Function f m should be easy to be learnt: • Decision stump: trees with one node and two leaves

Documents F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Trees 0.4 -1.4 7.1 2.0 0.5 -3.1 1.5 3.2 Documents F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Trees 50.1:F 4 0.4 -1.4 7.1 2.0 0.5 -3.1 1.5 3.2 Documents F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Struct+ Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Struct+ Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Struct+ Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 2.0 0.2:F 2 3.0:F 8 Exit leaf 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Struct+ Trees 50.1:F 4 -3.0:F 3 Need to store the structure 10.1:F 1 of the tree 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 2.0 0.2:F 2 3.0:F 8 Exit leaf 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Struct+ Trees 50.1:F 4 -3.0:F 3 Need to store the structure 10.1:F 1 of the tree 0.4 -1.4 -1.0:F 3 0.1:F 6 High branch misprediction rate 7.1 2.0 2.0 0.2:F 2 3.0:F 8 Exit leaf 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Struct+ Trees 50.1:F 4 -3.0:F 3 Need to store the structure 10.1:F 1 of the tree 0.4 -1.4 -1.0:F 3 0.1:F 6 High branch misprediction rate 7.1 2.0 2.0 0.2:F 2 3.0:F 8 Exit leaf Low cache hit ratio 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

If-then-else Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

If-then-else Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 if (x[4] <= 50.1) { // recurses on the left subtree Documents # docs = >100K … F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

If-then-else Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 if (x[4] <= 50.1) { // recurses on the left subtree Documents # docs = >100K … F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 } else { # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 // recurses on the right subtree # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

If-then-else Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 if (x[4] <= 50.1) { // recurses on the left subtree Documents # docs = >100K … F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 } else { # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 // recurses on the right subtree # features = 100–1000 if(x[3] <= -3.0) 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 result = 0.4; # leaves = 4–64 else 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 result = -1.4; … }

If-then-else Trees 50.1:F 4 -3.0:F 3 Need to store the structure 10.1:F 1 of the tree 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 if (x[4] <= 50.1) { // recurses on the left subtree Documents # docs = >100K … F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 } else { # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 // recurses on the right subtree # features = 100–1000 if(x[3] <= -3.0) 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 result = 0.4; # leaves = 4–64 else 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 result = -1.4; … }

If-then-else Trees 50.1:F 4 -3.0:F 3 Need to store the structure 10.1:F 1 of the tree 0.4 -1.4 -1.0:F 3 0.1:F 6 High branch misprediction rate 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 if (x[4] <= 50.1) { // recurses on the left subtree Documents # docs = >100K … F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 } else { # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 // recurses on the right subtree # features = 100–1000 if(x[3] <= -3.0) 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 result = 0.4; # leaves = 4–64 else 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 result = -1.4; … }

If-then-else Trees 50.1:F 4 -3.0:F 3 Need to store the structure 10.1:F 1 of the tree 0.4 -1.4 -1.0:F 3 0.1:F 6 High branch misprediction rate 7.1 2.0 0.2:F 2 3.0:F 8 Low cache hit ratio 0.5 -3.1 1.5 3.2 if (x[4] <= 50.1) { // recurses on the left subtree Documents # docs = >100K … F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 } else { # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 // recurses on the right subtree # features = 100–1000 if(x[3] <= -3.0) 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 result = 0.4; # leaves = 4–64 else 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 result = -1.4; … }

Vpred Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Vpred Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents double depth4(float* x, Node* nodes) { # docs = >100K int nodeId = 0; F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 nodeId = nodes->children[x[nodes[nodeId].fid] > nodes[nodeId].theta]; # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 nodeId = nodes->children[x[nodes[nodeId].fid] > nodes[nodeId].theta]; # features = 100–1000 nodeId = nodes->children[x[nodes[nodeId].fid] > nodes[nodeId].theta]; 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 nodeId = nodes->children[x[nodes[nodeId].fid] > nodes[nodeId].theta]; # leaves = 4–64 return scores[nodeId]; 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 } …

Vpred Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents double depth4(float* x, Node* nodes) { # docs = >100K int nodeId = 0; F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 nodeId = nodes->children[x[nodes[nodeId].fid] > nodes[nodeId].theta]; # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 nodeId = nodes->children[x[nodes[nodeId].fid] > nodes[nodeId].theta]; 16 docs # features = 100–1000 nodeId = nodes->children[x[nodes[nodeId].fid] > nodes[nodeId].theta]; 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 nodeId = nodes->children[x[nodes[nodeId].fid] > nodes[nodeId].theta]; # leaves = 4–64 return scores[nodeId]; 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 } …

An alternative traversing algorithm Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 7.1 2.0 0.2:F 2 3.0:F 8 0.5 -3.1 1.5 3.2 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

An alternative traversing algorithm Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 6 7 7.1 2.0 0.2:F 2 3.0:F 8 0 3 0.5 -3.1 1.5 3.2 1 2 4 5 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

An alternative traversing algorithm ? Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 6 7 7.1 2.0 0.2:F 2 3.0:F 8 0 3 0.5 -3.1 1.5 3.2 1 2 4 5 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

An alternative traversing algorithm Trees 50.1:F 4 ? -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 6 7 7.1 2.0 0.2:F 2 3.0:F 8 0 3 0.5 -3.1 1.5 3.2 1 2 4 5 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

An alternative traversing algorithm Trees 50.1:F 4 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 6 7 7.1 2.0 0.2:F 2 3.0:F 8 0 3 0.5 -3.1 1.5 3.2 1 2 4 5 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

An alternative traversing algorithm Trees 50.1:F 4 Result 1 1 1 1 1 1 1 1 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 6 7 7.1 2.0 0.2:F 2 3.0:F 8 0 3 0.5 -3.1 1.5 3.2 1 2 4 5 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

An alternative traversing algorithm Trees 50.1:F 4 Result 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 6 7 7.1 2.0 0.2:F 2 3.0:F 8 0 3 0.5 -3.1 1.5 3.2 1 2 4 5 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

An alternative traversing algorithm Trees 50.1:F 4 Result 0 0 0 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 6 7 7.1 2.0 0.2:F 2 3.0:F 8 0 3 0.5 -3.1 1.5 3.2 1 2 4 5 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

An alternative traversing algorithm Trees 50.1:F 4 Result 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 6 7 7.1 2.0 0.2:F 2 3.0:F 8 0 3 0.5 -3.1 1.5 3.2 1 2 4 5 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

An alternative traversing algorithm Trees 50.1:F 4 Result 0 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 -3.0:F 3 10.1:F 1 0.4 -1.4 -1.0:F 3 0.1:F 6 6 7 7.1 2.0 0.2:F 2 3.0:F 8 0 3 0.5 -3.1 1.5 3.2 1 2 4 5 Documents # docs = >100K F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 # trees = 1K–20K 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 # features = 100–1000 10.9 0.08 -1.1 42.9 15 -0.3 6.74 1.65 # leaves = 4–64 11.2 0.6 -0.2 54.1 13 -0.5 7.97 3 …

Information Storage and Processing for Web Search Nicola Tonellotto - PowerPoint PPT Presentation

Information Storage and Processing for Web Search Nicola Tonellotto ISTI-CNR nicola.tonellotto@isti.cnr.it Universit La Sapienza 18 October 2016 1 Outline Some info about ISTI-CNR Introduction to Information Retrieval and Web

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Web CS490W: Web I nformation Search & Management Web opened the door for many important

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining to automatically discover and extract information from Web

Information Retrieval Data Processing and Storage Ilya Markov i.markov@uva.nl University of

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Plane Sweep Algorithms and Segment Intersection Carola Wenk 9/1/10 CS 6463: AT Computational

STRIPS Planning Set of operators , where each operator has Set of parameters Set

This webinar is brought to you by Community Law School (Sarnia Lambton) Inc. , a nonprofit,

Welcome Back! Professor Shahrokh Valaee Associate Professor Associate Chair, Undergraduate Studies

BEPS Action 4 : Interest Deductions and Financial Payments Jacques Malherbe Prof. Em. of the

2008 2008 2008 2008 Investor Community Conference Call Risk Review Tom Flynn Executive

Subcontracting Readings: OOSCS2 Chapters 14 16 EECS3311 A & E: Software Design Fall 2020

Opinion Mining Opinion Mining Feiyu Xu DFKI, LT-Lab Xu, LT1, 2013 Outline Outline

Information Storage and Processing for Web Search Nicola Tonellotto - PowerPoint PPT Presentation

Information Storage and Processing for Web Search Nicola Tonellotto ISTI-CNR nicola.tonellotto@isti.cnr.it Universit La Sapienza 18 October 2016 1 Outline Some info about ISTI-CNR Introduction to Information Retrieval and Web

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Web CS490W: Web I nformation Search &amp; Management Web opened the door for many important

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining to automatically discover and extract information from Web

Information Retrieval Data Processing and Storage Ilya Markov i.markov@uva.nl University of

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Plane Sweep Algorithms and Segment Intersection Carola Wenk 9/1/10 CS 6463: AT Computational

STRIPS Planning Set of operators , where each operator has Set of parameters Set

This webinar is brought to you by Community Law School (Sarnia Lambton) Inc. , a nonprofit,

Welcome Back! Professor Shahrokh Valaee Associate Professor Associate Chair, Undergraduate Studies

BEPS Action 4 : Interest Deductions and Financial Payments Jacques Malherbe Prof. Em. of the

2008 2008 2008 2008 Investor Community Conference Call Risk Review Tom Flynn Executive

Subcontracting Readings: OOSCS2 Chapters 14 16 EECS3311 A &amp; E: Software Design Fall 2020

Opinion Mining Opinion Mining Feiyu Xu DFKI, LT-Lab Xu, LT1, 2013 Outline Outline

Web CS490W: Web I nformation Search & Management Web opened the door for many important

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Subcontracting Readings: OOSCS2 Chapters 14 16 EECS3311 A & E: Software Design Fall 2020