Median Finding Test Cases What's Next 1. Median finding, part 2 - - PowerPoint PPT Presentation

median finding test cases what s next
SMART_READER_LITE
LIVE PREVIEW

Median Finding Test Cases What's Next 1. Median finding, part 2 - - PowerPoint PPT Presentation

Median Finding Test Cases What's Next 1. Median finding, part 2 2. Why we write test cases 3. What's next? A problem Find the (upper) median of a list of n items. Upper median means "if the list has an even number of items, pick


slide-1
SLIDE 1

Median Finding Test Cases What's Next

  • 1. Median finding, part 2
  • 2. Why we write test cases
  • 3. What's next?
slide-2
SLIDE 2

A problem

  • Find the (upper) median of a list of n items.
  • Upper median means "if the list has an even number
  • f items, pick the one

that's from the bottom, rather than s from the bottom"

  • Obvious solution: sort, then pick the middle item.
  • Seems like more work than is needed.
  • Generalize ('strengthen the recursion'): SELECT( , S): find the th

smallest in a set of items.

  • Illustrate with sets of numbers, ordered smallest to largest
slide-3
SLIDE 3

A SELECT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1973)

  • median-of-medians, analysis from hell
  • complicated, hard to believe it's worth implementing
slide-4
SLIDE 4

Surprising simpler algorithm

  • RandSelect(k, S)
  • Pick a random item in your set, S
  • Partition into set of numbers less than , and set
  • f those greater than
  • If has at least items: RandSelect(

)

  • If has k-1 items: return
  • Otherwise, RandSelect(

, )

  • Works in "expected linear time" because on average, the size of the

larger partition is ¾ size of the set. Work is (roughly)

  • .
slide-5
SLIDE 5

Challenges

  • Where do you get a random number in a functional programming

language?

  • Once you have it, how do you test a procedure that depends on

randomness?

slide-6
SLIDE 6

Big idea! (More in CS18)

  • Randomized algorithms are often simpler than deterministic ones
  • Deep philosophical question: why does adding a stream of

randomness make tasks easier?

slide-7
SLIDE 7

http://www.eatingwell.com/recipe/267339/citrus-sherbet/

slide-8
SLIDE 8

Why test-cases?

  • Jack Wrenn, PhD thesis proposal
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
  • If a student's misconception is consistently reflected both in their

example and in their implementation…

  • …then it will not be detected by adapting those examples as test

cases.

  • That point motivates Jack's dissertation work (Executable examples),

but…

slide-25
SLIDE 25
  • If a student's misconception is reflected in their implementation…
  • …and the examples are created post-implementation, then
  • …the examples are certain to enforce the misconception.
slide-26
SLIDE 26

What's next?

slide-27
SLIDE 27

Where we are now

slide-28
SLIDE 28

Programming

  • Basic constructs like "procedure"
  • if-then-else and cond
  • let-expressions ("local values")
  • Recursion in multiple forms
  • Functions as first-class entities (lambda)
  • Higher-order procedures
  • Modules as a way to gather types/data/procedures together
slide-29
SLIDE 29

Data structures

  • Lists
  • Recursive definition leads to recursive code structure
  • Recursive definition leads to recurrence relations in analysis
  • Tuples
  • Trees
  • Recursive definition leads to recurrence relations in analysis, often with a

factor of 2

  • Balanced or ordered trees can help speed things up
slide-30
SLIDE 30

Analysis

  • Go from code to recurrence
  • "Solve" a few classes of recurrence relations
  • Use plug-n-chug to guess solution
  • Use big-O to represent 'fairly equivalent' program performance
slide-31
SLIDE 31

Algorithms

  • Fast-reverse
  • Insertion, Selection, and Merge Sort
  • Exhaustion (subsets!)
  • Minimax
  • Tree-search
  • General trees
  • BSTs
  • Tree traversal
slide-32
SLIDE 32

Problem approaches

  • Design Recipe
  • Recursion
  • Recursion Diagrams
  • Recursion on any kind of structured data
  • Natural numbers
  • Lists
  • Trees
  • Divide-and-Conquer
  • Recursion is a special case
  • Data-hiding (via modules)
  • Decomposition
  • Using helper procedures to achieve a larger goal
slide-33
SLIDE 33

What's next?

slide-34
SLIDE 34

What's next? Programming

  • Problems get bigger
  • Hundreds or thousands of lines of code in a program
  • Programs themselves become complex objects worthy of study!
  • Software engineering
  • Programming techniques that support large and complex software
  • Object-oriented programming (CS18)
  • Event-driven programming (most web stuff)
slide-35
SLIDE 35

What's next? Data structures

  • Lists, trees are very simple
  • Amenable to recursion approaches
  • Build on these: heaps, priority queues, …
  • Generalize:
  • Directed acyclic graphs
  • Prerequisite structure in course requirements make a good example
  • Directed graphs
  • Streets in a city (some of them one-way) for example
  • Edges often "labelled" with data like "how long to traverse this one block stretch?"
  • Problems like "find shortest path" (i.e., quickest route from here to there)
  • … [CS1570]
slide-36
SLIDE 36

What's next? Analysis

  • Analysis of probabilistic programs like RandSelect
  • Analysis of performance of more complicated data structures
  • Analysis of algorithms like shortest-path
  • Study of "effective" solutions to (some instances of) provably hard

problems

slide-37
SLIDE 37

What's next? Algorithms

  • How does Google work?
  • How does Facebook choose which ads to show you?
  • How do we recognize unusual behaviors?
  • Securities fraud
  • Crime
  • How do you make a drone deliver a package?
  • How does Disney/Pixar make Frozen II?
slide-38
SLIDE 38

A shift in style

  • In CS17, we've been very concrete: let's sort this list of numbers, let's

find an integer in a tree with int-values at nodes, etc.

  • ADTs moved away from this a little: we have a Dictionary, but we

don't know the details, only the runtime-performance

  • In general CS work, the gap between the real world and the code is

much greater

slide-39
SLIDE 39

A conceptual gap

  • The internet consists of a bunch of computers tied together by network

connections from computers to routers (specialized computers that can pass data from one machine to another)

  • The routers are interconnected as well
  • The connections come and go; some are permanent, some are very

temporary

  • How do we get data from my computer to yours?
  • We'll work out an algorithm in which we somehow represent what a router

is or can do, but in discussing the algorithm, we'll just draw pictures, etc.

  • Leave implementation for later
slide-40
SLIDE 40

Example problem and algorithm

  • We have a bunch of data:
  • We'd like to "classify" it into

clusters (red dots could be cluster centers)

slide-41
SLIDE 41

Idea

  • First, decide how many clusters (by hand?)
  • really annoying assumption, relieved by fancier methods
  • For our example, pick k = 2.
  • grab ANY TWO points in the dataset as "centers"
slide-42
SLIDE 42
slide-43
SLIDE 43

Divide the data into those closer to each point

slide-44
SLIDE 44
slide-45
SLIDE 45

For each group, find the "mean"

slide-46
SLIDE 46
slide-47
SLIDE 47

Using these new means, reclassify!

slide-48
SLIDE 48
slide-49
SLIDE 49

Repeat until stabilized

slide-50
SLIDE 50
slide-51
SLIDE 51

What didn't I mention?

  • How to find distances
  • Are data points stored in a list? An array? A tree?
  • What are the piles we created?
  • Are data points lists of ints? of floats? Are they tuples?
  • Are they all 2-dimensional? Could this work in 3D? in 10D?
slide-52
SLIDE 52

Skills

  • Whatever math is needed
  • Whatever else is needed
  • For graphics: physics,
  • An ability to guess some representation of the problem that might

work

  • The ability to translate a pictorial record of a discussion into an actual

algorithm ("pseudocode") and then a real program ("code")

  • Analysis (during and after the fact)