CSE326:DataStructures Lecture#16 SortingThingsOut Bart Niswonger - - PDF document

cse 326 data structures lecture 16 sorting things out
SMART_READER_LITE
LIVE PREVIEW

CSE326:DataStructures Lecture#16 SortingThingsOut Bart Niswonger - - PDF document

CSE326:DataStructures Lecture#16 SortingThingsOut Bart Niswonger SummerQuarter2001 UnixTutorial!! Usefultools Tuesday,July31 st grep, egrep/grep -e sort 10:50am,Sieg322 cut file tr


slide-1
SLIDE 1

1

CSE326:DataStructures Lecture#16 SortingThingsOut

Bart Niswonger SummerQuarter2001

UnixTutorial!!

  • Tuesday,July31st

– 10:50am,Sieg322

Printingworksheet Shell

differentshellquotes:'` scripting,#! alias variables/environment redirection,piping

Usefultools

grep, egrep/grep -e sort cut file tr find, xargs diff,patch which,locate, whereis

Findinginfo

Techniques Resources(ACM webpage,web, internaldocs)

Processmanagement Filemanagement/permissions Filesystem layout

slide-2
SLIDE 2

2

Today’sOutline

  • Project

– Rulesofcompetition

  • Sortingbycomparison

– Simple:

  • SelectionSort;BubbleSort; InsertionSort

– Quick:

  • QuickSort

– GoodWorstCase:

  • MergeSort; HeapSort

Sorting:TheProblemSpace

Generalproblem GivenasetofNorderable items,putthemin

  • rder

Without(significant)lossofgenerality,assume:

– Itemsareintegers – Orderingis Mostsortingproblemsmaptotheaboveinlineartime.

slide-3
SLIDE 3

3

SelectionSort

  • 1. Findthesmallestelement,putitfirst
  • 2. Findthenextsmallestelement,putit

second

  • 3. Findthenextsmallest,putitnext

… etc.

SelectionSort

procedure SelectionSort(Array[1..N] For i=1to N-1 FindthesmallestentryinArray[i..N] Letj betheindexofthatentry Swap(Array[i],Array[j]) EndFor While otherpeoplearecodingQuickSort/MergeSort Twiddlethumbs End While

slide-4
SLIDE 4

4

HeapSort

  • UseaPriorityQueue(Heap)

756 27 18 801 35 13 23 44 87 8 13 18 23 27

Shoveeverythingintoaqueue,takethemout smallesttolargest.

QuickSort

28 15 47

  • 1. Basicidea:Pickapivot.
  • 2. Partition intoless-than&greater-thanpivot.
  • 3. Sorteachsiderecursively.
slide-5
SLIDE 5

5 2goesto less-than

QuickSortPartition

6 9 5 3 8 2 7

Pickpivot Partitionwith cursors

6 9 5 3 8 2 7

< >

6 9 5 3 8 2 7

< >

8 9 5 3 6 2 7

< > 6,8swap less/greater-than

8 9 5 3 6 2 7

3,5less-than 9greater-than

8 9 5 3 6 2 7

Partitiondone. Recursively sorteachside.

AnalyzingQuickSort

  • Pickingpivot:constanttime
  • Partitioning:lineartime
  • Recursion:timeforsortingleftpartition

(sayofsizei)+timeforright(sizeN-i-1)

T(1)=b T(N)=T(i)+T(N-i-1)+cN

wherei isthenumberofelementssmaller thanthepivot

slide-6
SLIDE 6

6

QuickSort :WorstCase

  • Whatistheworstcase?

OptimizingQuickSort

  • ChoosingthePivot

– Randomlychoosepivot

  • Goodtheoreticallyandpractically,butcalltorandomnumber

generatorcanbeexpensive

– Pickpivotcleverly

  • “Median-of-3”ruletakeselementatMedian(firstvalue,last

value).Workswellinpractice.

  • Cutoff

– Usesimplersortingtechniquebelowacertainproblem size

  • Weiss suggestsusinginsertionsort,withacutofflimitof5-20
slide-7
SLIDE 7

7

QuickSort:BestCase

T(N)=T(i)+T(N-i-1)+cN T(N) =2T(N/2- 1)+cN <2T(N/2)+cN <4T(N/4)+c(2(N/2)+N) <8T(N/8)+cN(1+1+1) <kT(N/k)+cNlogk=O(NlogN)

QuickSort:AverageCase

  • Assumeallsizepartitionsequallylikely,

withprobability1/N

  • 1

1

averagevalueofT(i)orT(N-i-1) ( ) is(1/ ) ( log ) ( ) ( 1) ( ) (2 ) ( ) ( / )

N j N j

T N T i T N i cN T N N T j N j N N O cN T

  • details:Weisspg278-279
slide-8
SLIDE 8

8

MergingCarsbykey [Aggressivenessofdriver]. Mostaggressivegoesfirst. MergeSort (Collection[1..n]) 1. SplitCollectioninhalf 2. Recursivelysorteachhalf 3. merge twosorted halvestogether merge (C1[1..n],C2[1..n])

i1=1,i2=1 while i1<n andi2<n if C1[i1]<C2[i2] NextisC1[i1] i1++ else NextisC2[i2] i2++ endIf endwhile

MergeSort MergeSort Analysis

  • RunningTime

– Worstcase? – Bestcase? – Averagecase?

  • Otherconsiderationsbesidesrunning

time?

slide-9
SLIDE 9

9

IsThisTheBestWeCanDo?

  • SortingbyComparison

– Onlyinformationavailabletousistheset

  • fNitems tobesorted

– Onlyoperationavailabletousispairwise comparisonbetween2items

Whatisthebestrunningtimewecanpossibly achieve?

DecisionTreeAnalysis

✂✁☎✄ ✆✞✝✠✟✠✡☞☛

Internalnode,with factsknownsofar Leafnode,with

  • rderingofA,B,C
✌✎✍✑✏

Edge,withresult

  • fonecomparison
✄✒✁✓ ✔✁☎✄ ✕ ✁☎✄ ✄✒✁✓ ✕ ✁✓ ✄✖ ✕ ✗✄ ✕
✄ ✕ ✘✄ ✄ ✕
✄✖ ✗✄ ✕
slide-10
SLIDE 10

10

HowdeepisDecisionTree?

  • HowmanypermutationsarethereofN

numbers?

  • Howmanyleavesdoesthetreehave?
  • What’stheshallowesttreewithagiven

numberofleaves?

  • Whatisthereforetheworstrunningtime

(numberofcomparisons)bythebestpossible sortingalgorithm?

LowerBoundforlog(n!)

n

e n n n

  • 2

!

log( !) log 2 log( 2 ) lo ( log ) g

n n

n n n e n n n n e

  • Stirling’s approximation:
slide-11
SLIDE 11

11

IsThisTheBestWeCanDo?

  • SortingbyComparison

– Onlyinformationavailabletousistheset

  • fNitems tobesorted

– Onlyoperationavailabletousispairwise comparisonbetween2items

Whathappensifwerelaxtheseconstraints?

BinSort (a.k.a.BucketSort)

Requires:

– Knowingthekeystobein{1,…,K} – HavinganarrayofsizeK

Worksby:

Puttingitemsintocorrectbin(cell)ofarray, basedonkey

slide-12
SLIDE 12

12

BinSortexample

K=5list=(5,1,3,4,3,2,1,1,5,4,5)

5,5,5 key=5 4,4 key=4 3,3 key=3 2 key=2 1,1,1 key=1 Binsinarray Sortedlist: 1,1,1,2,3,3,4,4,5,5,5

BinSortPseudocode

procedure BinSort (ListL,K) LinkedList bins[1..K] //Eachelementofarraybins islinkedlist. //CouldalsoBinSort witharrayofarrays. ForEach numberx inL bins[x].Append(x) EndFor For i =1..K ForEach numberx inbins[i] Printx EndFor EndFor

slide-13
SLIDE 13

13

BinSort RunningTime

  • Kisaconstant

– BinSort islineartime

  • Kisvariable

– Notsimplylineartime

  • Kislarge(e.g.232)

– Impractical

BinSortis“stable”

Definition: StableSortingAlgorithm

Itemsininputwiththesamekeyendupin thesameorderaswhentheybegan.

  • BinSortisstable

– Importantifkeyshaveassociatedvalues – CriticalforRadixSort

slide-14
SLIDE 14

14

Mr.Radix

Herman Hollerith inventedanddevelopedapunch-cardtabulationmachinesystemthatrevolutionizedstatistical computation. BorninBuffalo,NewYork,thesonofGermanimmigrants, Hollerith enrolledintheCityCollegeofNewYorkatage 15andgraduatedfromtheColumbiaSchoolofMineswithdistinctionattheageof19. HisfirstjobwaswiththeU.S.Censuseffortof1880. Hollerith successivelytaughtmechanicalengineeringatthe MassachusettsInstituteofTechnologyandworkedfortheU.S.PatentOffice.Hollerith beganworkingonthe tabulatingsystemduringhisdaysatMIT,filingforthefirstpatentin1884.Hedevelopedahand-fed'press'that sensedtheholesinpunchedcards;awirewouldpassthroughthe holesintoacupofmercurybeneaththecard closingtheelectricalcircuit.Thisprocesstriggeredmechanicalcountersandsorterbinsandtabulatedthe appropriatedata. Hollerith's system-includingpunch,tabulator,andsorter-allowedtheofficial1890populationcounttobetalliedinsix months,andinanothertwoyearsallthecensusdatawascompletedanddefined;thecostwas$5millionbelowthe forecastsandsavedmorethantwoyears'time.Hislatermachinesmechanizedthecard-feedingprocess,added numbers,andsortedcards,inadditiontomerelycountingdata. In1896 Hollerith foundedtheTabulatingMachineCompany,forerunnerofComputerTabulatingRecording Company(CTR).HeservedasaconsultingengineerwithCTRuntil retiringin1921. In1924CTRchangeditsnametoIBM- theInternationalBusinessMachinesCorporation. Herman Hollerith

BornFebruary29,1860- DiedNovember17,1929

ArtofCompilingStatistics;ApparatusforCompilingStatistics Source:NationalInstituteofStandardsandTechnology(NIST)VirtualMuseum- http://museum.nist.gov/panels/conveyor/hollerithbio.htm

RadixSort

  • Radix=“Thebaseofanumbersystem”

(Webster’sdictionary)

– alternateterminology:radixisnumberofbitsneeded torepresent0tobase-1;cansay“base8”or“radix3”

  • Idea:BinSort oneachdigit,bottomup.
slide-15
SLIDE 15

15

RadixSort– magic!Itworks.

  • Inputlist:

126,328,636,341,416,131,328

  • BinSortonlowerdigit:

341,131,126,636,416,328,328

  • BinSortresultonnext-higherdigit:

416,126,328,328,131,636,341

  • BinSortthatresultonhighestdigit:

126,131,328,328,341,416,636

Notmagic.Itprovablyworks.

  • Keys

– K-digitnumbers – baseB

  • Claim:afterith BinSort,leastsignificanti

digitsaresorted.

– e.g.B=10,i=3,keysare1776and8234. 8234 comesbefore1776 forlast3digits.

slide-16
SLIDE 16

16

RadixSort

ProofbyInduction

  • Basecase:

– i=0.0digitsaresorted(thatwasn’thard!)

  • Inductionstep

– assumefori,provefori+1. – considertwonumbers:X,Y.SayXi isith digitofX (fromtheright)

  • Xi+1 <Yi+1 theni+1th BinSortwillputtheminorder
  • Xi+1 >Yi+1 ,samething
  • Xi+1 =Yi+1 ,orderdependsonlasti digits.Induction

hypothesissaysalreadysortedforthesedigits.(Careful aboutensuringthatyourBinSort preservesorderaka “stable”…)

WhattypescanyouRadixSort?

  • AnytypeT thatcanbeBinSorted
  • AnytypeT thatcanbebrokenintoparts

A andB,suchthat:

– YoucanreconstructT fromA andB – A canbeRadixSorted – B canbeRadixSorted – A isalwaysmoresignificantthanB,in

  • rdering
slide-17
SLIDE 17

17

Example:

  • 1-digitnumberscanbeBinSorted
  • 2to5-digitnumberscanbeBinSorted

withoutusingtoomuchmemory

  • 6-digitnumbers,brokenupintoA=first3

digits,B=last3digits.

– AandBcanreconstructoriginal6-digits – AandBeachRadixSortable asabove – AmoresignificantthanB

RadixSortingStrings

  • 1CharactercanbeBinSorted
  • Breakstringsintocharacters
  • Needtoknowlengthofbiggeststring(or

calculatethisonthefly).

  • Null-padshorterstrings
  • Runningtime:

– N isnumberofstrings – L islengthoflongeststring – RadixSort takesO(N*L)

slide-18
SLIDE 18

18

ToDo

  • FinishProjectIII(dueWednesday!)
  • Finishreadingchapter7

ComingUp

  • MoreAlgorithms!
  • Sorting
  • ProjectIIIdue(Wednesday)
  • UnixTutorial (Tuesday,tomorrow!)