Median Finding 1. Testing iroot 2. Analyze backboneSimilar 3. - - PowerPoint PPT Presentation

median finding
SMART_READER_LITE
LIVE PREVIEW

Median Finding 1. Testing iroot 2. Analyze backboneSimilar 3. - - PowerPoint PPT Presentation

Median Finding 1. Testing iroot 2. Analyze backboneSimilar 3. Median finding Testing iroot on interval 1, 2, 3, 4, 5 suppose function values for some procedure f are 7, -2, -8, 5, -3 checkExpect(iroot(1, 4, f), ???) ? let k =


slide-1
SLIDE 1

Median Finding

  • 1. Testing iroot
  • 2. Analyze backboneSimilar
  • 3. Median finding
slide-2
SLIDE 2

Testing iroot

  • on interval 1, 2, 3, 4, 5 suppose function values for some procedure f

are

  • 7, -2, -8, 5, -3
  • checkExpect(iroot(1, 4, f), ???) ?

let k = iroot(1, 4, f); checkExpect(f(k)*f(k+1) <= 0, true);

slide-3
SLIDE 3

Analyze backboneSimilar

  • Let B(n) be the max number of operations involved in evaluating

backboneSimilar(t1, t2), where n is the number of nodes/leaves in the larger of t1 and t2.

slide-4
SLIDE 4

Analyze backboneSimilar

  • We're going to apply backboneSimilar to the left and right subtrees. If

the left subtree has k items, the right has n-k-1 (the -1 for the item at the current node!)

  • But we don't know what k is
  • Could be any number from 1 to n-1
slide-5
SLIDE 5

Analyze backboneSimilar

How much work are we really doing? How often do we "visit" each node of t1? At most once, right? And all we do is test whether it has children or not! Seems like total work at least as long as if not, we can increase c to make it at least as big as a.

slide-6
SLIDE 6

Usual well-ordering proof

  • Suppose that

and . Then I claim that for Let be the set of all natural numbers for which (*) is false. Observe that 1 is not in S. Suppose S nonempty, and we'll arrive at a contraduction. Let be the least element of S (well-ordering). Then (*) holds for n = 1…h-1

slide-7
SLIDE 7

Usual well-ordering proof

  • Suppose that

𝐶 1 = 𝑏 𝐶 𝑜 ≤ 𝑑 + max

…( 𝐶 𝑙 + 𝐶(𝑜 − 𝑙 − 1))

and 𝑏 ≤ 𝑑. Claim ∗ 𝐶(𝑜) ≤ 𝑑𝑜 for 𝑜 = 1, 2, … Let 𝑇 be the set of all natural numbers for which (*) is false. Let ℎ be the least element of S (well-

  • rdering). Then (*) holds for n = 1…h-1. What's 𝐶(ℎ)? Well,

𝐶 ℎ ≤ 𝑑 + max

…( 𝐶 𝑙 + 𝐶(ℎ − 𝑙 − 1))

= 𝑑 + max

…( 𝑑𝑙 + 𝑑(ℎ − 𝑙 − 1))

= 𝑑 + max

…( 𝑑𝑙 + 𝑑(ℎ − 1) − 𝑑𝑙)

= 𝑑 + max

…( 𝑑(ℎ − 1)) = 𝑑 + 𝑑(ℎ − 1) = 𝑑ℎ.

Contradiction!

slide-8
SLIDE 8

Median-finding

slide-9
SLIDE 9

Warmup: ceilings

For , we have , hence for . For , we have

  • and
  • Reason: apply previous result to
  • and
  • .

For , we have

  • Reason: apply previous result to
  • to get

=

  • .

Then apply part 2 to to get

  • .
slide-10
SLIDE 10
  • For

, we have

  • For

, we have

  • For

, we have

  • For

, we have

slide-11
SLIDE 11

Last facts about ceilings

  • For

, we have In particular

  • For

, we have

slide-12
SLIDE 12

A problem

  • Find the (upper) median of a list of n items.
  • Upper median means "if the list has an even number
  • f items, pick the one

that's from the bottom, rather than s from the bottom"

  • Obvious solution: sort, then pick the middle item.
  • Seems like more work than is needed.
  • Generalize ('strengthen the recursion'): SELECT( , S): find the th

smallest in a set of items.

  • Illustrate with sets of numbers, ordered smallest to largest
  • MEDIAN( ) is now just SELECT(
  • , ).
slide-13
SLIDE 13

A SELECT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1973)

  • Input: a nonempty set of

numbers, and an index ,

  • Output: The th smallest of the numbers.

1. If (one item set), return that item. 2. Divide input into

  • groups of five, and at most one group of

remaining items. 3. Find the (upper) medians of each of these

  • groups.

4. Find the median of these

  • medians (recursively)

5. Partition the input around this median. Let be the number of elements on the low side.

  • Low side: all items less than or equal to 𝑦. High side: items greater than 𝑦.

6. If , find the th smallest item on the low side; otherwise find the th smallest item in the high side (recursively)

slide-14
SLIDE 14

Group into 5s; median of medians; partition; recur on appropriate piece

Input: 1 5 2 9 8 3 7 4 11 22 27 14 6 21 31 13 12; find 14th-smallest item.

1 3 27 13 5 7 14 12 2 4 6 9 11 21 8 22 31 1 3 27 13 5 7 14 12 2 4 6 9 11 21 8 22 31 5 7 13 21

12 items less than or equal to median of medians; want 14th item. So SELECT(2, upper group), recursively.

1 5 2 9 8 3 7 4 11 6 12 13 22 27 14 21 31

slide-15
SLIDE 15

A SELECT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan)

  • Input: a set of

numbers, and an index ,

  • Output: The th smallest of the numbers.

1. Group into 5s; find medians of each (at a cost of for each); find median of medians 2. Partition around median of medians. Recur.

  • Fictitious, experimental analysis.

Suppose that each "part" was no larger than ¾ of input. Then we'd have Replace

  • with similar a
  • and combine with previous term (

):

slide-16
SLIDE 16
  • "

; I then claim this looks consistent with for all For ignoring the "ceilings" for a moment, we'd then get

slide-17
SLIDE 17

In practice, it's a little messier than this.

  • Warning: Some of the following steps look like magic.
  • Carefully crafted to make the algebra as simple as possible.
  • Recall from warmup: For

, we have

  • Critical step: show that in recursive call, the partition piece we recur
  • n is not too big.
  • was almost too large
  • We'll show it's more like 70%, but with a slight adjustment.
slide-18
SLIDE 18

Claim: after partitioning, each "pile" has at least numbers in it (almost)

MEDIANS

slide-19
SLIDE 19

Claim: after partitioning, each "pile" has at least numbers in it (almost)

MEDIANS, SORTED Median of medians

slide-20
SLIDE 20

Claim: after partitioning, each "pile" has at least numbers in it (almost)

MEDIANS, SORTED Median of medians, 𝑦 Values less than 𝑦 Values greater than 𝑦

  • "Greater than" pile is no larger than the other
  • Contains at least half of the
  • medians
  • All but two columns (first and last) have 3 elts greater than > 𝑦
  • At least 3
  • − 2 ≥
  • − 6 elts in "greater than pile"
  • At most
  • + 6 elts in "≤" pile (or greater-than pile)
slide-21
SLIDE 21

Recurrence

  • Let

be the max number of operations involved in "Select" on any input of size .

  • Group into fives:
  • 𝑑𝑜
  • Find medians of each group:
  • 𝑏⌈𝑜/5⌉
  • median of medians:
  • 𝑈⌈𝑜/5⌉
  • partition around median element:
  • 𝑐𝑜 (combine: 𝑑 = 𝑑 + 𝑐)
  • recur on appropriate piece:
  • At most
  • + 6 elts in pile
  • Operation count: ≤ 𝑈⌈
  • + 6 ⌉
  • Total:
  • +

+

slide-22
SLIDE 22

Algebra

  • +

+

  • +

+

  • +

+

  • +

+

  • (Note:
  • )
  • Since n is at least 1, we can write
  • +

+

  • +

+

  • (Note:
  • )
slide-23
SLIDE 23

Summary

(replacing with )

  • For

+ +

slide-24
SLIDE 24

Algebraic Cleverness

For + +

  • 1. For

, compute explicitly, and pick a number with for in this range.

  • 1. Let

,…, , for instance!

  • 2. Pick

. (!)

  • 1. Because

we have

  • . Also:
  • 3. I claim that for all ,

.

  • 4. For

we have . (Item 2: )

  • 5. Still need to handle the case
  • 6. Why 160? Because it's large enough to make the argument work!
slide-25
SLIDE 25

Claim: for all

  • Suppose it's false for some minimum value 𝑙, but true for all smaller 𝑜.
  • Then 𝑙 > 160 (because we already showed it true for 160 and less). Hence
  • > 8 (used later).
  • 𝑈 𝑙 ≤ 𝑑𝑙 + 𝑈
  • + 𝑈
  • + 6
  • ≤ 𝑑𝑙 + 𝑡
  • + 𝑡
  • + 6
  • ≤ 𝑑𝑙 + 𝑡(
  • + 1) + 𝑡(
  • + 6 + 1)
  • = 𝑑𝑙 + 𝑡
  • + 𝑡 + 𝑡
  • + 7𝑡
  • = 𝑑𝑙 + 𝑡
  • + 8𝑡
  • 𝑙 + 𝑡
  • + 8𝑡
  • = 𝑡
  • + 8𝑡
  • = 𝑡𝑙 + 8𝑡 −
  • 𝑡𝑙
  • = 𝑡𝑙 + 𝑡 8 −
  • [By note above,
  • > 8, so 0 > 8 −
  • ]
  • ≤ 𝑡𝑙

Contradiction! Hence claim is true for all 𝑜.

slide-26
SLIDE 26

Why piles of five?

  • If you try more or fewer, the sum of
  • and
  • ends up changing to

something…a bit larger than 1 instead of a bit less than 1

  • So 5 is a "sweet spot" for this algorithm!
slide-27
SLIDE 27

Surprising simpler algorithm

  • RandSelect(k, S)
  • Pick a random item in your set, S
  • Partition into set of numbers less than , and set
  • f those greater than
  • If has at least items: RandSelect(

)

  • If has k-1 items: return
  • Otherwise, RandSelect(

, )

  • Works in "expected linear time" because on average, the size of the

larger partition is ¾ size of the set. Work is (roughly)

  • .
slide-28
SLIDE 28

Big idea! (More in CS18)

  • Randomized algorithms are often simpler than deterministic ones
  • Deep philosophical question: why does adding a stream of

randomness make tasks easier?