Median Finding 1. Testing iroot 2. Analyze backboneSimilar 3. - PowerPoint PPT Presentation

Median Finding 1. Testing iroot 2. Analyze backboneSimilar 3. Median finding

Testing iroot • on interval 1, 2, 3, 4, 5 suppose function values for some procedure f are • 7, -2, -8, 5, -3 • checkExpect(iroot(1, 4, f), ???) ? let k = iroot(1, 4, f); checkExpect(f(k)*f(k+1) <= 0, true);

Analyze backboneSimilar • Let B(n) be the max number of operations involved in evaluating backboneSimilar(t1, t2), where n is the number of nodes/leaves in the larger of t1 and t2.

Analyze backboneSimilar • We're going to apply backboneSimilar to the left and right subtrees. If the left subtree has k items, the right has n-k-1 (the -1 for the item at the current node!) • But we don't know what k is • Could be any number from 1 to n-1 • ��…��

Analyze backboneSimilar ��…�� How much work are we really doing? How often do we "visit" each node of t1? At most once, right? And all we do is test whether it has children or not! Seems like total work at least as long as if not, we can increase c to make it at least as big as a.

Usual well-ordering proof • Suppose that ��…�� and . Then I claim that for Let be the set of all natural numbers for which (*) is false. Observe that 1 is not in S. Suppose S nonempty, and we'll arrive at a contraduction. Let be the least element of S (well-ordering). Then (*) holds for n = 1…h-1

Usual well-ordering proof • Suppose that 𝐶 1 = 𝑏 𝐶 𝑜 ≤ 𝑑 + ��…�� ( 𝐶 𝑙 + 𝐶(𝑜 − 𝑙 − 1)) max and 𝑏 ≤ 𝑑 . Claim ∗ 𝐶(𝑜) ≤ 𝑑𝑜 for 𝑜 = 1, 2, … Let 𝑇 be the set of all natural numbers for which (*) is false. Let ℎ be the least element of S (well- ordering). Then (*) holds for n = 1…h-1. What's 𝐶(ℎ) ? Well, 𝐶 ℎ ≤ 𝑑 + ��…�� ( 𝐶 𝑙 + 𝐶(ℎ − 𝑙 − 1)) max = 𝑑 + ��…�� ( 𝑑𝑙 + 𝑑(ℎ − 𝑙 − 1)) max = 𝑑 + ��…�� ( 𝑑𝑙 + 𝑑(ℎ − 1) − 𝑑𝑙) max = 𝑑 + ��…�� ( 𝑑(ℎ − 1)) = 𝑑 + 𝑑(ℎ − 1) = 𝑑ℎ. max Contradiction!

Median-finding

Warmup: ceilings For , we have , hence for . � � � � For , we have � and � � � � � Reason: apply previous result to � and � . � � � For , we have � � �� ⌈ � � ⌉ � � � � � Reason: apply previous result to � to get � = � . � � � � � � Then apply part 2 to to get �� . � �

� � � • For , we have � � �� • For , we have � � �� • For , we have � � �� • For , we have � � ��

Last facts about ceilings • For , we have In particular � � • For , we have � �

A problem • Find the (upper) median of a list of n items. • Upper median means "if the list has an even number of items, pick the one that's from the bottom, rather than s from the bottom" • Obvious solution: sort, then pick the middle item. • Seems like more work than is needed. • Generalize ('strengthen the recursion'): SELECT( , S): find the th smallest in a set of items. • Illustrate with sets of numbers, ordered smallest to largest �� • MEDIAN( ) is now just SELECT( , ). �

A SELECT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1973) • Input: a nonempty set of numbers, and an index , • Output: The th smallest of the numbers. 1. If (one item set), return that item. � 2. Divide input into � groups of five, and at most one group of remaining items. � 3. Find the (upper) medians of each of these groups. � � 4. Find the median of these medians (recursively) � 5. Partition the input around this median. Let be the number of elements on the low side. • Low side : all items less than or equal to 𝑦 . High side : items greater than 𝑦 . 6. If , find the th smallest item on the low side; otherwise find the th smallest item in the high side (recursively)

Group into 5s; median of medians; partition; recur on appropriate piece Input: 1 5 2 9 8 3 7 4 11 22 27 14 6 21 31 13 12; find 14 th -smallest item. 1 3 27 13 1 3 27 13 5 7 14 12 5 7 14 12 2 4 6 1 5 2 9 8 3 7 4 11 6 12 13 5 7 13 21 2 4 6 9 11 21 9 11 21 8 22 31 22 27 14 21 31 8 22 31 12 items less than or equal to median of medians; want 14 th item. So SELECT(2, upper group), recursively.

A SELECT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan) • Input: a set of numbers, and an index , • Output: The th smallest of the numbers. 1. Group into 5s; find medians of each (at a cost of for each); find median of medians 2. Partition around median of medians. Recur. • Fictitious, experimental analysis. Suppose that each "part" was no larger than ¾ of input. Then we'd have � � � and combine with previous term ( � Replace � with similar a ): � � �

� " ; I then claim this looks consistent with for all For ignoring the "ceilings" for a moment, we'd then get � � � � �

In practice, it's a little messier than this. • Warning: Some of the following steps look like magic. • Carefully crafted to make the algebra as simple as possible. � � �� • Recall from warmup: For , we have � � �� • Critical step: show that in recursive call, the partition piece we recur on is not too big. � • was almost too large � • We'll show it's more like 70%, but with a slight adjustment.

Claim: after partitioning, each "pile" has at least numbers in it (almost) MEDIANS

Claim: after partitioning, each "pile" has at least numbers in it (almost) MEDIANS, SORTED Median of medians

Claim: after partitioning, each "pile" has at least numbers in it (almost) MEDIANS, SORTED Median of medians, 𝑦 Values less than 𝑦 • Values greater than 𝑦 All but two columns (first and last) have 3 elts greater than > 𝑦 � � �� • • At least 3 � − 2� ≥ �� − 6 elts in "greater than pile" "Greater than" pile is no larger than the other � � � • �� Contains at least half of the � medians • At most �� + 6 elts in " ≤ " pile (or greater-than pile)

Recurrence • Let be the max number of operations involved in "Select" on any input of size . • Group into fives: • 𝑑𝑜 • Find medians of each group: • 𝑏⌈𝑜/5⌉ • median of medians: • 𝑈⌈𝑜/5⌉ • partition around median element: • 𝑐𝑜 (combine: 𝑑 � = 𝑑 + 𝑐 ) • recur on appropriate piece: �� • At most �� + 6 elts in pile �� • Operation count: ≤ 𝑈⌈ �� + 6 ⌉ �� • Total: + + ��

Algebra �� + + �� + + � �� + + � �� (Note: �� + + � ) �� • Since n is at least 1, we can write �� + + �� (Note: �� + + ) ��

Summary (replacing with ) • For �� + + ��

Algebraic Cleverness �� For + + �� 1. For , compute explicitly, and pick a number with for in this range. � � 1. Let � , for instance! ��,…,�� 2. Pick . (!) � 1. Because we have �� . Also: 3. I claim that for all , . 4. For we have . (Item 2: ) 5. Still need to handle the case 6. Why 160? Because it's large enough to make the argument work!

Claim: for all • Suppose it's false for some minimum value 𝑙 , but true for all smaller 𝑜 . � • Then 𝑙 > 160 (because we already showed it true for 160 and less). Hence �� > 8 (used later). � �� • 𝑈 𝑙 ≤ 𝑑𝑙 + 𝑈 � + 𝑈 �� + 6 � �� ≤ 𝑑𝑙 + 𝑡 � + 𝑡 �� + 6 • � �� ≤ 𝑑𝑙 + 𝑡( � + 1) + 𝑡( �� + 6 + 1 ) • � �� = 𝑑𝑙 + 𝑡 � + 𝑡 + 𝑡 �� + 7𝑡 • �� = 𝑑𝑙 + 𝑡 �� + 8𝑡 • � �� ≤ �� 𝑙 + 𝑡 �� + 8𝑡 • �� = 𝑡 �� + 8𝑡 • � = 𝑡𝑙 + 8𝑡 − �� 𝑡𝑙 • � � � = 𝑡𝑙 + 𝑡 8 − [By note above, �� > 8 , so 0 > 8 − �� ] • �� ≤ 𝑡𝑙 • Contradiction! Hence claim is true for all 𝑜 .

Why piles of five ? � �� • If you try more or fewer, the sum of � and �� ends up changing to something…a bit larger than 1 instead of a bit less than 1 • So 5 is a "sweet spot" for this algorithm!

Surprising simpler algorithm • RandSelect(k, S) • Pick a random item in your set, S • Partition into set of numbers less than , and set of those greater than • If has at least items: RandSelect( ) • If has k-1 items: return • Otherwise, RandSelect( , ) • Works in "expected linear time" because on average, the size of the larger partition is ¾ size of the set. Work is (roughly) � � � � � • . ��

Big idea! (More in CS18) • Randomized algorithms are often simpler than deterministic ones • Deep philosophical question: why does adding a stream of randomness make tasks easier?

Median Finding 1. Testing iroot 2. Analyze backboneSimilar 3. - PowerPoint PPT Presentation

Median Finding 1. Testing iroot 2. Analyze backboneSimilar 3. Median finding Testing iroot on interval 1, 2, 3, 4, 5 suppose function values for some procedure f are 7, -2, -8, 5, -3 checkExpect(iroot(1, 4, f), ???) ? let k =

the nerves sensory radial median ulnar median median sensory median median ulnar radial

Median Finding Test Cases What's Next 1. Median finding, part 2 2. Why we write test cases 3.

I - -75 Median Cable Barrier 75 Median Cable Barrier 75 Median Cable Barrier I 75 Median Cable

Linear-time Median Def: Median of elements A=a 1 , a 2 , , a n is the (n/2)-th smallest element

Spartanburg Nation Median Value of a $115,900 $184,700 Home Median Gross Rent $705 $950

African American Strategy Equitable Access to Homeownership Presentation April 16, 2018

Business Statistics CONTENTS Hypotheses on the median The sign test The Wilcoxon signed ranks

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Finding Graph Matchings in Data Streams Andrew McGregor, UPenn The Streaming Model The

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Median Drains Hydraulic Policy Bill P Schmidt, PE INDOT Hydraulics Team Leader

Events MedIAN Jobs Contact The Network Background About MedIAN UKs national Medical

Harbor Bay Median Sheet Mulch project HARBOR BAY MEDIAN Located in the City of Alameda 2

City of Atascadero Size: 26 sq miles Incorporated: 1979 Population: 30,305 o Median

On the Least Median Square On the Least Median Square Problem Problem Jeff Erickson University

W4231: Analysis of Algorithms Definition of median 9/14/1999 Let A = a 1 a n be a

Medial Representations Mathematics, Algorithms and Applications Kaleem Siddiqi School of

Medial left-node raising in Japanese Shichi Yatabe University of Tokyo Right-node raising

A [ ]triking change in Manchester English UKLVC12 4 September 2019 George Bailey Stephen

Chapter 9: Medians and Order Statistics The selection problem is the problem of computing, given a

Chapter 3 : Central Tendency O Overview i Definition: Central tendency is a statistical

Smooth Sensitivity and Sampling CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 7 :

Lecture 9/Chapter 7 Summarizing and Displaying Measurement (Quantitative) Data Five Number

Statistical Data Analysis DS GA 1002 Statistical and Mathematical Models