Trees Indexes (Part II) Lecture # 08 Database Systems Andy Pavlo - - PowerPoint PPT Presentation

trees indexes
SMART_READER_LITE
LIVE PREVIEW

Trees Indexes (Part II) Lecture # 08 Database Systems Andy Pavlo - - PowerPoint PPT Presentation

Trees Indexes (Part II) Lecture # 08 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 ADM IN ISTRIVIA Project #1 is due Wednesday Sept 26 th @ 11:59pm Homework #2 is due Friday Sept 28 th


slide-1
SLIDE 1

Database Systems 15-445/15-645 Fall 2018 Andy Pavlo Computer Science Carnegie Mellon Univ.

AP AP

Lecture # 08

Trees Indexes

(Part II)

slide-2
SLIDE 2

CMU 15-445/645 (Fall 2018)

ADM IN ISTRIVIA

Project #1 is due Wednesday Sept 26th @ 11:59pm Homework #2 is due Friday Sept 28th @ 11:59pm Project #2 will be released on Wednesday Sept

  • 26th. First checkpoint is due Monday Oct 8th.

2

slide-3
SLIDE 3

CMU 15-445/645 (Fall 2018)

TO DAY'S AGEN DA

Additional Index Usage Skip Lists Radix Trees Inverted Indexes

3

slide-4
SLIDE 4

CMU 15-445/645 (Fall 2018)

IM PLICIT IN DEXES

Most DBMSs automatically create an index to enforce integrity constraints.

→ Primary Keys → Unique Constraints → Foreign Keys (?)

4

CREATE TABLE foo ( id SERIAL PRIMARY KEY, val1 INT NOT NULL, val2 VARCHAR(32) UNIQUE );

CREATE UNIQUE INDEX foo_pkey ON foo (id); CREATE UNIQUE INDEX foo_val2_key ON foo (val2);

slide-5
SLIDE 5

CMU 15-445/645 (Fall 2018)

IM PLICIT IN DEXES

Most DBMSs automatically create an index to enforce integrity constraints.

→ Primary Keys → Unique Constraints → Foreign Keys (?)

4

CREATE TABLE foo ( id SERIAL PRIMARY KEY, val1 INT NOT NULL, val2 VARCHAR(32) UNIQUE ); CREATE TABLE bar ( id INT REFERENCES foo (val1), val VARCHAR(32) );

CREATE INDEX foo_val1_key ON foo (val1);

slide-6
SLIDE 6

CMU 15-445/645 (Fall 2018)

IM PLICIT IN DEXES

Most DBMSs automatically create an index to enforce integrity constraints.

→ Primary Keys → Unique Constraints → Foreign Keys (?)

4

CREATE TABLE foo ( id SERIAL PRIMARY KEY, val1 INT NOT NULL, val2 VARCHAR(32) UNIQUE ); CREATE TABLE bar ( id INT REFERENCES foo (val1), val VARCHAR(32) );

CREATE INDEX foo_val1_key ON foo (val1);

X

slide-7
SLIDE 7

CMU 15-445/645 (Fall 2018)

IM PLICIT IN DEXES

Most DBMSs automatically create an index to enforce integrity constraints.

→ Primary Keys → Unique Constraints → Foreign Keys (?)

4

CREATE TABLE foo ( id SERIAL PRIMARY KEY, val1 INT NOT NULL UNIQUE, val2 VARCHAR(32) UNIQUE ); CREATE TABLE bar ( id INT REFERENCES foo (val1), val VARCHAR(32) );

slide-8
SLIDE 8

CMU 15-445/645 (Fall 2018)

PARTIAL IN DEXES

Create an index on a subset of the entire table. This potentially reduces its size and the amount of overhead to maintain it. One common use case is to partition indexes by date ranges.

→ Create a separate index per month, year.

5

CREATE INDEX idx_foo ON foo (a, b) WHERE c = 'WuTang';

slide-9
SLIDE 9

CMU 15-445/645 (Fall 2018)

PARTIAL IN DEXES

Create an index on a subset of the entire table. This potentially reduces its size and the amount of overhead to maintain it. One common use case is to partition indexes by date ranges.

→ Create a separate index per month, year.

5

CREATE INDEX idx_foo ON foo (a, b) WHERE c = 'WuTang'; SELECT b FROM foo WHERE a = 123 AND c = 'WuTang';

slide-10
SLIDE 10

CMU 15-445/645 (Fall 2018)

COVERIN G IN DEXES

If all of the fields needed to process the query are available in an index, then the DBMS does not need to retrieve the tuple. This reduces contention on the DBMS's buffer pool resources.

6

SELECT b FROM foo WHERE a = 123; CREATE INDEX idx_foo ON foo (a, b);

slide-11
SLIDE 11

CMU 15-445/645 (Fall 2018)

IN DEX IN CLUDE CO LUM N S

Embed additional columns in indexes to support index-only queries. Not part of the search key.

7

CREATE INDEX idx_foo ON foo (a, b) INCLUDE (c);

slide-12
SLIDE 12

CMU 15-445/645 (Fall 2018)

IN DEX IN CLUDE CO LUM N S

Embed additional columns in indexes to support index-only queries. Not part of the search key.

7

SELECT b FROM foo WHERE a = 123 AND c = 'WuTang'; CREATE INDEX idx_foo ON foo (a, b) INCLUDE (c);

slide-13
SLIDE 13

CMU 15-445/645 (Fall 2018)

IN DEX IN CLUDE CO LUM N S

Embed additional columns in indexes to support index-only queries. Not part of the search key.

7

SELECT b FROM foo WHERE a = 123 AND c = 'WuTang'; CREATE INDEX idx_foo ON foo (a, b) INCLUDE (c);

slide-14
SLIDE 14

CMU 15-445/645 (Fall 2018)

FUN CTIO N AL/ EXPRESSIO N IN DEXES

The index does not need to store keys in the same way that they appear in their base table.

8

SELECT * FROM users WHERE EXTRACT(dow ⮱FROM login) = 2;

slide-15
SLIDE 15

CMU 15-445/645 (Fall 2018)

FUN CTIO N AL/ EXPRESSIO N IN DEXES

The index does not need to store keys in the same way that they appear in their base table.

8

SELECT * FROM users WHERE EXTRACT(dow ⮱FROM login) = 2; CREATE INDEX idx_user_login ON users (login);

slide-16
SLIDE 16

CMU 15-445/645 (Fall 2018)

FUN CTIO N AL/ EXPRESSIO N IN DEXES

The index does not need to store keys in the same way that they appear in their base table. You can use expressions when declaring an index.

8

SELECT * FROM users WHERE EXTRACT(dow ⮱FROM login) = 2; CREATE INDEX idx_user_login ON users (login);

X

slide-17
SLIDE 17

CMU 15-445/645 (Fall 2018)

FUN CTIO N AL/ EXPRESSIO N IN DEXES

The index does not need to store keys in the same way that they appear in their base table. You can use expressions when declaring an index.

8

SELECT * FROM users WHERE EXTRACT(dow ⮱FROM login) = 2; CREATE INDEX idx_user_login ON users (login); CREATE INDEX idx_user_login ON users (EXTRACT(dow FROM login));

X

slide-18
SLIDE 18

CMU 15-445/645 (Fall 2018)

FUN CTIO N AL/ EXPRESSIO N IN DEXES

The index does not need to store keys in the same way that they appear in their base table. You can use expressions when declaring an index.

8

SELECT * FROM users WHERE EXTRACT(dow ⮱FROM login) = 2; CREATE INDEX idx_user_login ON users (login); CREATE INDEX idx_user_login ON users (EXTRACT(dow FROM login));

X

slide-19
SLIDE 19

CMU 15-445/645 (Fall 2018)

FUN CTIO N AL/ EXPRESSIO N IN DEXES

The index does not need to store keys in the same way that they appear in their base table. You can use expressions when declaring an index.

8

SELECT * FROM users WHERE EXTRACT(dow ⮱FROM login) = 2; CREATE INDEX idx_user_login ON users (login); CREATE INDEX idx_user_login ON users (EXTRACT(dow FROM login));

X

CREATE INDEX idx_user_login ON foo (login) WHERE EXTRACT(dow FROM login) = 2;

slide-20
SLIDE 20

CMU 15-445/645 (Fall 2018)

O BSERVATIO N

The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.

→ Average Cost: O(N)

9

slide-21
SLIDE 21

CMU 15-445/645 (Fall 2018)

K1 K2 K3 K4 K6 K5 K7

O BSERVATIO N

The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.

→ Average Cost: O(N)

9

slide-22
SLIDE 22

CMU 15-445/645 (Fall 2018)

K1 K2 K3 K4 K6 K5 K7

O BSERVATIO N

The easiest way to implement a dynamic order- preserving index is to use a sorted linked list. All operations have to linear search.

→ Average Cost: O(N)

9

slide-23
SLIDE 23

CMU 15-445/645 (Fall 2018)

SKIP LISTS

Multiple levels of linked lists with extra pointers that skip over intermediate nodes. Maintains keys in sorted order without requiring global rebalancing.

10

slide-24
SLIDE 24

CMU 15-445/645 (Fall 2018)

SKIP LISTS

A collection of lists at different levels

→ Lowest level is a sorted, singly linked list of all keys → 2nd level links every other key → 3rd level links every fourth key → In general, a level has half the keys of one below it

To insert a new key, flip a coin to decide how many levels to add the new key into. Provides approximate O(log n) search times.

11

slide-25
SLIDE 25

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAM PLE

12

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-26
SLIDE 26

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAM PLE

12

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-27
SLIDE 27

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAM PLE

12

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-28
SLIDE 28

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAM PLE

12

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-29
SLIDE 29

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAM PLE

12

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-30
SLIDE 30

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: EXAM PLE

12

∞ ∞ ∞

P=N P=N/2 P=N/4

slide-31
SLIDE 31

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: IN SERT

13

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

slide-32
SLIDE 32

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: IN SERT

13

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-33
SLIDE 33

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: IN SERT

13

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-34
SLIDE 34

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: IN SERT

13

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-35
SLIDE 35

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: IN SERT

13

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-36
SLIDE 36

CMU 15-445/645 (Fall 2018)

End Levels

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

SKIP LISTS: IN SERT

13

∞ ∞ ∞

P=N P=N/2 P=N/4

Insert K5

K5 K5 K5 V5

slide-37
SLIDE 37

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

14

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

slide-38
SLIDE 38

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

14

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5

slide-39
SLIDE 39

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

14

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5 K3>K2

slide-40
SLIDE 40

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

14

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5 K3>K2 K3<K4

slide-41
SLIDE 41

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: SEARCH

14

∞ ∞ ∞

P=N P=N/2 P=N/4

Find K3

K3<K5 K3>K2 K3<K4

slide-42
SLIDE 42

CMU 15-445/645 (Fall 2018)

SKIP LISTS: DELETE

First logically remove a key from the index by setting a flag to tell threads to ignore. Then physically remove the key once we know that no other thread is holding the reference.

15

slide-43
SLIDE 43

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

16

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

slide-44
SLIDE 44

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

16

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-45
SLIDE 45

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

16

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-46
SLIDE 46

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

16

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-47
SLIDE 47

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K5 V5 K5 K5 K6 V6

Levels

SKIP LISTS: DELETE

16

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

Del

false

Del

true

slide-48
SLIDE 48

CMU 15-445/645 (Fall 2018)

End

K1 V1 K2 K2 V2 K3 V3 K4 V4 K4 K6 V6

Levels

SKIP LISTS: DELETE

16

∞ ∞ ∞

P=N P=N/2 P=N/4

Delete K5

Del

false

Del

false

Del

false

Del

false

Del

false

slide-49
SLIDE 49

CMU 15-445/645 (Fall 2018)

SKIP LISTS

Advantages:

→ Uses less memory than a typical B+Tree if you don’t include reverse pointers. → Insertions and deletions do not require rebalancing.

Disadvantages:

→ Not disk/cache friendly because they do not optimize locality of references. → Reverse search is non-trivial.

17

slide-50
SLIDE 50

CMU 15-445/645 (Fall 2018)

RADIX TREE

Represent keys as individual digits. This allows threads to examine prefixes one-by-one instead of comparing entire key.

→ The height of the tree depends on the length of keys. → Does not require rebalancing → The path to a leaf node represents the key of the leaf → Keys are stored implicitly and can be reconstructed from paths.

18

slide-51
SLIDE 51

CMU 15-445/645 (Fall 2018)

TRIE VS. RADIX TREE

19

Keys: HELLO, HAT, HAVE

Trie

E H L

¤

L O A

¤ T ¤

V E

slide-52
SLIDE 52

CMU 15-445/645 (Fall 2018)

TRIE VS. RADIX TREE

19

Keys: HELLO, HAT, HAVE

Trie

E H L

¤

L O A

¤ T ¤

V E

slide-53
SLIDE 53

CMU 15-445/645 (Fall 2018)

TRIE VS. RADIX TREE

19

Keys: HELLO, HAT, HAVE

Trie

E H L

¤

L O A

¤ T ¤

V E

slide-54
SLIDE 54

CMU 15-445/645 (Fall 2018)

TRIE VS. RADIX TREE

19

Keys: HELLO, HAT, HAVE

Trie

E H L

¤

L O A

¤ T ¤

V E

Radix Tree

ELLO H

¤

A

¤ T ¤

VE

slide-55
SLIDE 55

CMU 15-445/645 (Fall 2018)

TRIE VS. RADIX TREE

19

Keys: HELLO, HAT, HAVE

Trie

E H L

¤

L O A

¤ T ¤

V E

Radix Tree

ELLO H

¤

A

¤ T ¤

VE

slide-56
SLIDE 56

CMU 15-445/645 (Fall 2018)

RADIX TREE: M O DIFICATIO N S

20

¤

ELLO

¤ ¤

T VE H A

slide-57
SLIDE 57

CMU 15-445/645 (Fall 2018)

RADIX TREE: M O DIFICATIO N S

20

¤

ELLO

¤ ¤

T VE H A

Insert HAIR

slide-58
SLIDE 58

CMU 15-445/645 (Fall 2018)

RADIX TREE: M O DIFICATIO N S

20

¤

ELLO

¤ ¤

T VE H A

¤

IR

Insert HAIR

slide-59
SLIDE 59

CMU 15-445/645 (Fall 2018)

RADIX TREE: M O DIFICATIO N S

20

¤

ELLO

¤ ¤

T VE H A

¤

IR

Insert HAIR Delete HAT, HAVE

slide-60
SLIDE 60

CMU 15-445/645 (Fall 2018)

RADIX TREE: M O DIFICATIO N S

20

¤

ELLO H A

¤

IR

Insert HAIR Delete HAT, HAVE

slide-61
SLIDE 61

CMU 15-445/645 (Fall 2018)

RADIX TREE: M O DIFICATIO N S

20

¤

ELLO H A

¤

IR

Insert HAIR Delete HAT, HAVE

slide-62
SLIDE 62

CMU 15-445/645 (Fall 2018)

RADIX TREE: M O DIFICATIO N S

20

¤

ELLO H A

Insert HAIR Delete HAT, HAVE

AIR

¤

slide-63
SLIDE 63

CMU 15-445/645 (Fall 2018)

RADIX TREE: BIN ARY CO M PARABLE KEYS

Not all attribute types can be decomposed into binary comparable digits for a radix tree.

→ Unsigned Integers: Byte order must be flipped for little endian machines. → Signed Integers: Flip two’s-complement so that negative numbers are smaller than positive. → Floats: Classify into group (neg vs. pos, normalized vs. denormalized), then store as unsigned integer. → Compound: Transform each attribute separately.

21

slide-64
SLIDE 64

CMU 15-445/645 (Fall 2018)

RADIX TREE: BIN ARY CO M PARABLE KEYS

22

Hex Key: 0A 0B 0C 0D Int Key: 168496141

0A 0B 0C 0D

Big Endian

0D 0C 0B 0A

Little Endian

slide-65
SLIDE 65

CMU 15-445/645 (Fall 2018)

RADIX TREE: BIN ARY CO M PARABLE KEYS

22

Hex Key: 0A 0B 0C 0D Int Key: 168496141

0A 0B 0C 0D

Big Endian

0D 0C 0B 0A

Little Endian

0F0F0F 0A

¤

0B

¤

0B0F

¤

OF0F

¤ ¤ ¤

0C 0F 0D

slide-66
SLIDE 66

CMU 15-445/645 (Fall 2018)

RADIX TREE: BIN ARY CO M PARABLE KEYS

22

Hex Key: 0A 0B 0C 0D Int Key: 168496141

0A 0B 0C 0D

Big Endian

0D 0C 0B 0A

Little Endian

0F0F0F 0A

¤

0B

¤

0B0F

¤

OF0F

¤ ¤ ¤

0C 0F 0D

slide-67
SLIDE 67

CMU 15-445/645 (Fall 2018)

IN- M EM O RY TABLE IN DEXES

23

9.94 1 5.5 1 3.3 8.09 29 25.1 2.51 2.78 1 .51 44.9 51 .5 42.9 20 40 60 Insert-Only Read-Only Read/Update Operations/sec (M)

Open Bw-Tree B+Tree Skip List Radix

Processor: 1 socket, 1 0 cores w/ 2×HT Workload: 50m Random Integer Keys (64-bit)

Source: Ziqi Wang

slide-68
SLIDE 68

CMU 15-445/645 (Fall 2018)

O BSERVATIO N

The tree indexes that we've discussed so far are useful for "point" and "range" queries:

→ Find all customers in the 15217 zip code. → Find all orders between June 2018 and September 2018.

They are not good at keyword searches:

→ Find all Wikipedia articles that contain the word "Pavlo"

24

slide-69
SLIDE 69

CMU 15-445/645 (Fall 2018)

WIKIPEDIA EXAM PLE

25

CREATE TABLE revisions ( revID INT PRIMARY KEY, userID INT REFERENCES useracct (userID), pageID INT REFERENCES pages (pageID), content TEXT, updated DATETIME ); CREATE TABLE pages ( pageID INT PRIMARY KEY, title VARCHAR UNIQUE, latest INT ⮱REFERENCES revisions (revID), ); CREATE TABLE useracct ( userID INT PRIMARY KEY, userName VARCHAR UNIQUE, ⋮ );

slide-70
SLIDE 70

CMU 15-445/645 (Fall 2018)

WIKIPEDIA EXAM PLE

If we create an index on the content attribute, what does that actually do? This doesn't help our query. Our SQL is also not correct...

26

CREATE INDEX idx_rev_cntnt ON revisions (content); SELECT pageID FROM revisions WHERE content LIKE '%Pavlo%';

slide-71
SLIDE 71

CMU 15-445/645 (Fall 2018)

IN VERTED IN DEX

An inverted index stores a mapping of words to records that contain those words in the target attribute.

→ Sometimes called a full-text search index. → Also called a concordance in old (like really old) times.

The major DBMSs support these natively. There are also specialized DBMSs.

27

slide-72
SLIDE 72

CMU 15-445/645 (Fall 2018)

Q UERY TYPES

Phrase Searches

→ Find records that contain a list of words in the given

  • rder.

Proximity Searches

→ Find records where two words occur within n words of each other.

Wildcard Searches

→ Find records that contain words that match some pattern (e.g., regular expression).

28

slide-73
SLIDE 73

CMU 15-445/645 (Fall 2018)

DESIGN DECISIO N S

Decision #1: What To Store

→ The index needs to store at least the words contained in each record (separated by punctuation characters). → Can also store frequency, position, and other meta-data.

Decision #2: When To Update

→ Maintain auxiliary data structures to "stage" updates and then update the index in batches.

29

slide-74
SLIDE 74

CMU 15-445/645 (Fall 2018)

CO N CLUSIO N

B+Trees are still the way to go for tree indexes. Inverted indexes are covered in CMU 11-442. We did not discuss geo-spatial tree indexes:

→ Examples: R-Tree, Quad-Tree, KD-Tree → This is covered in CMU 15-826.

31

slide-75
SLIDE 75

CMU 15-445/645 (Fall 2018)

N EXT CLASS

How to make indexes thread-safe!

32