Chapter 10 Sorting and Searching Some concepts Sorting is one of - - PowerPoint PPT Presentation

chapter 10 sorting and searching some concepts sorting is
SMART_READER_LITE
LIVE PREVIEW

Chapter 10 Sorting and Searching Some concepts Sorting is one of - - PowerPoint PPT Presentation

CS 2412 Data Structures Chapter 10 Sorting and Searching Some concepts Sorting is one of the most common data-processing applications. Sorting algorithms are classed as either internal or external. Sorting order can be either


slide-1
SLIDE 1

CS 2412 Data Structures

Chapter 10 Sorting and Searching

slide-2
SLIDE 2

Some concepts

  • Sorting is one of the most common data-processing

applications.

  • Sorting algorithms are classed as either internal or external.
  • Sorting order can be either ascending sequence or descending

sequence.

  • Sort stability is an attribute of a sort, indicating that data with

equal keys maintain their relative input order in the output.

  • Sort efficiency usually is based on the comparisons and moves

required for the sorting. The best possible sorting algorithms are O(n log n).

  • During the sorting process, each traversal of the data is

referred to as a sort pass.

Data Structure 2016

  • R. Wei

2

slide-3
SLIDE 3

Selection sorts

  • Heap sort: we have already discussed. First build a heap. Then

remove the root of the heap and put the last element to the root and reheap down.

  • Straight selection sort: In each pass of the selection sort, the

smallest element is selected from the unsorted sublist and exchange with the element at the beginning of the unsorted list.

Data Structure 2016

  • R. Wei

3

slide-4
SLIDE 4

Data Structure 2016

  • R. Wei

4

slide-5
SLIDE 5

Algorithm selectionSort (list, last) set current to 0 loop (until last element sorted) set smallest to current set walker to current +1 loop (walker key < smallest key) set smallest to walker increment walker end loop exchange (current, smallest) increment current end loop

Data Structure 2016

  • R. Wei

5

slide-6
SLIDE 6

The efficiency of selection sort

  • Straight select sort: O(n2). The algorithm has two level of

loops, each of the loop executes about n times.

  • Heap sort: O(n log n). To build a heap, about n log n loops are
  • needed. To sort from the heap needs another n log n loops. In

big-O notation, the complexity is O(n log n).

Data Structure 2016

  • R. Wei

6

slide-7
SLIDE 7

Insertion sorts

  • Straight insertion sort: the list is divided into sorted and

unsorted sublists. In each pass the first element of the unsorted sublist is inserted into the sorted sublist at correct position.

  • Shell sort: the list is divided into K segments and each

segment is sorting (the segments are dispersed through the list). After each passing, the number of segments is reduced according to a increment. When the number of segments is reduced to 1, the list is sorted.

Data Structure 2016

  • R. Wei

7

slide-8
SLIDE 8

Data Structure 2016

  • R. Wei

8

slide-9
SLIDE 9

Algorithm insertionSort(list, last) set current to 1 loop (until last element sorted) move current element to hold set walker to current - 1 loop (walker >= 0 AND hold key < walker key) move walker element right one element decrement walker end loop move hold to walker + 1 element increment current end loop

Data Structure 2016

  • R. Wei

9

slide-10
SLIDE 10

The main idea for the Shell sort is divide the list into segments and use insertion sort to sort each segment. The positions of the elements of a segment are at a distance of

  • increment. In the following example, the list is of size 10. The 5

segments for increment K = 5 are as follows: Segment 1. A[0], A[5] Segment 2. A[1], A[6] Segment 3. A[2], A[7] Segment 4. A[3], A[8] Segment 5. A[4], A[9] Then for increment K = 2 Segment 1. A[0], A[2], A[4], A[6], A[8] Segment 2. A[1], A[3], A[5], A[7], A[9]

Data Structure 2016

  • R. Wei

10

slide-11
SLIDE 11

Data Structure 2016

  • R. Wei

11

slide-12
SLIDE 12

Data Structure 2016

  • R. Wei

12

slide-13
SLIDE 13

Algorithm shellSort (list, last) set incre to last / 2 loop (incre not 0) set current to incre loop(until last element sorted) move current element to hold set walker to current - incre loop (walker>=0 AND hold key < walker key) move walker element one increment right set walker to walker - incre end loop move hold to walker + incre element increment current end loop set incre to incre / 2 end loop

Data Structure 2016

  • R. Wei

13

slide-14
SLIDE 14

void shellSort (int list [], int last) { int hold; int incre; int walker; incre = last / 2; while (incre != 0) { for (int curr = incre; curr <= last; curr++) { hold = list [curr]; walker = curr - incre; while (walker >= 0 && hold < list [walker]) { list [walker + incre] = list [walker]; walker = ( walker - incre );

Data Structure 2016

  • R. Wei

14

slide-15
SLIDE 15

} // while list [walker + incre] = hold; } // for walk incre = incre / 2; } // while return; } // shellSort Note In the above algorithm, the increment start from n/2, then each pass reduce half of the size. This is not the most efficient way, but

  • simple. The ideal increments should be set so that no two elements

will appear at same segment more than once. But this is not easy in general.

Data Structure 2016

  • R. Wei

15

slide-16
SLIDE 16

Insertion sort efficiency:

  • Straight insertion sort: O(n2). The algorithm has two

embedded loops. The execute times is about n(n + 1)/2.

  • Shell sort: the complexity is difficult to analysis. Using

empirical studies show that the average sort complexity is O(n1.25)

Data Structure 2016

  • R. Wei

16

slide-17
SLIDE 17

Exchange sorts

  • Bubble sort: the list in divided into two sublists: sorted and
  • unsorted. The smallest element is bubbled from the unsorted

sublist to the sorted sublist each time.

  • Quick sort: each time a pivot is selected. Then the elements

less than pivot and the elements greater or equal to pivot are separated into two sublist. The pivot is put at its ultimately correct location in the list.

Data Structure 2016

  • R. Wei

17

slide-18
SLIDE 18

Example: 23 78 45 8 56 32 8 ∥23 78 45 32 56 8 23 ∥32 78 45 56 8 23 32 ∥45 78 56 8 23 32 45 ∥56 78

Data Structure 2016

  • R. Wei

18

slide-19
SLIDE 19

Algorithm bubbleSort(list, last) set current to 0 set sorted to false loop (current <= last AND sorted false) set walker to last set sorted to true loop (walker > current) if (walker dta < walker -1 data) set sorted to false exchange (list, walker, walker -1) end if decrement walker end loop increment current end loop

Data Structure 2016

  • R. Wei

19

slide-20
SLIDE 20

Data Structure 2016

  • R. Wei

20

slide-21
SLIDE 21

Note for quick sort

  • There are different methods for selecting the pivot.

– Select the first element. – Select the middle element. – Select the median value of three elements: left, right and the element in the middle of the list. This text uses this method.

  • When the partition becomes small, a straight insertion sort can

be used, which may be more efficient.

Data Structure 2016

  • R. Wei

21

slide-22
SLIDE 22

Example for one pass of a quick sort:

Data Structure 2016

  • R. Wei

22

slide-23
SLIDE 23

Algorithm medianLeft(sortData, left, right) set mid to (left + right ) /2 if (left key > mid key) exchange (sortData, left, mid) end if if (left key > right key) exchange ( sortData, left, right) end if if(mid key > right key) exchange (sortData, mid, right) end if exchange (sortData, left, mid) //put pivot in left.

Data Structure 2016

  • R. Wei

23

slide-24
SLIDE 24

Data Structure 2016

  • R. Wei

24

slide-25
SLIDE 25

Data Structure 2016

  • R. Wei

25

slide-26
SLIDE 26

The list in Figure 12-15 is sorted as follows:

Data Structure 2016

  • R. Wei

26

slide-27
SLIDE 27

The exchange sort efficiency:

  • Bubble sort: O(n2). There are two loops in the algorithm. The

comparison is about n(n + 1)/2.

  • Quick sort: O(n log n). The algorithm has 5 loops. However,

for each pass, the partition is general half size as previous pass. Roughly say, there are total log2 n passes.

Data Structure 2016

  • R. Wei

27

slide-28
SLIDE 28

void bubbleSort (int list [], int last) { int temp; for (int current = 0, sorted = 0; current <= last && !sorted; current++) for (int walker = last, sorted = 1; walker > current; walker--) if (list[ walker ] < list[ walker - 1 ]) { sorted = 0; temp = list[walker]; list[walker] = list[walker - 1]; list[walker - 1] = temp; } // if return; } // bubbleSort

Data Structure 2016

  • R. Wei

28

slide-29
SLIDE 29

External sorts In external sorting, portions of the data may be stored in secondary memory during the sorting process. One important method for the external sort is merge the (sorted) files in to one sorted file.

Data Structure 2016

  • R. Wei

29

slide-30
SLIDE 30

Merge sorts A simple merge is merge two sorted files into one file. For example, we have two sorted lists:

  • 1, 3, 5
  • 2, 4, 6, 8, 10

After we merged these two list, we should obtain the following list: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Data Structure 2016

  • R. Wei

30

slide-31
SLIDE 31

The following algorithm merges two sorted files file1, file2. The combined data are written into file3 Algorithm mergeFiles

  • pen files

read (file1 into record1) read (file2 into record2) loop (not end file1 or not end file2) if (record1.key <= record2.key) write (record1 to file3) read (file1 into record1) if (end of file1) set record1.key to infinity end if else write (record2 to file3)

Data Structure 2016

  • R. Wei

31

slide-32
SLIDE 32

read (file2 into record2) if (end of file2) set record2 key to infinity end if end if end loop close files end mergeFiles

Data Structure 2016

  • R. Wei

32

slide-33
SLIDE 33

Merge unsorted files:

  • Form merge runs for the files. Each run is ordered.
  • The end of each run is identified by a stepdown.
  • Merge each run of the two files.
  • When one run is stepdown, the another run is rollout (copied

to the merged file).

Data Structure 2016

  • R. Wei

33

slide-34
SLIDE 34

Data Structure 2016

  • R. Wei

34

slide-35
SLIDE 35

The sorting process:

  • Sort phase: Divide the file into merge files according to the size
  • f memory. Foe example, if we have 2300 records, but the

memory only can handle 500 records. We first read in 500 records and sort it as the first merge run. Then read and sort 501-1000 records as first run of the merge 2, etc.

  • Merge phase: merge the sorted runs.

Data Structure 2016

  • R. Wei

35

slide-36
SLIDE 36

Data Structure 2016

  • R. Wei

36

slide-37
SLIDE 37

There are different merge concepts. We discuss 3 of them as examples

  • Natural merge: after merge, all data are written in one file and

need a distribute phase to redistribute the data to two files.

  • Balance merge: use a constant number of input merge files and

the same number of output merger files.

  • Ployphase merge: A constant number of input merge files are

merged to one output merge file, the input merge files are immediately reused when their input has been completely merged.

Data Structure 2016

  • R. Wei

37

slide-38
SLIDE 38

Data Structure 2016

  • R. Wei

38

slide-39
SLIDE 39

Data Structure 2016

  • R. Wei

39

slide-40
SLIDE 40

Searching

  • Binary search: for sorted list.
  • Sequential search:

– Straight sequential search: each time check if the key equals to the target AND if it is the last key. – Sentinel sequential search: add the target at the end of the list so that each time just check if key equals to the target. – Probability search: when a target is found, move the element containing target up one location. In this way, most frequent targets are easier to found.

Data Structure 2016

  • R. Wei

40

slide-41
SLIDE 41

Hashed list searches

  • Hashing is a method using key-to-address mapping to find the

data quickly.

  • The basic idea is using a hash function to map a key (which is

at a large range) to a index (which is at a small range) of data.

  • Some keys may be mapped to a same index (synonyms). Then

we need some method to solve the collision.

  • The main part of hashing is to find good hashing methods.

Data Structure 2016

  • R. Wei

41

slide-42
SLIDE 42

Data Structure 2016

  • R. Wei

42

slide-43
SLIDE 43

Hashing methods:

  • Direct method: the range of keys and the range of index are

the same.

  • Subtraction method: subtract a fixed number from the key.

Also require both ranges are the same.

  • Modulo-division method: index= key MODULO listSize
  • Digit-extraction method: select digits at certain positions as

the index.

  • Midsquare method: key is squared and the middle digits are

used as index.

Data Structure 2016

  • R. Wei

43

slide-44
SLIDE 44
  • Folding method: fold shift (key is divided into parts whose size

matches the size of the index. Then the left and right parts are shifted and added with the middle part); fold boundary (the left and right numbers are folded on a fixed boundary between them and the center number. The two outside values are reversed).

Data Structure 2016

  • R. Wei

44

slide-45
SLIDE 45
  • Rotation method: rotating the last character to the front of the
  • key. Usually used by incorporating with other methods.
  • Pseudorandom method: the key is used as the seed in a

pseudorandom number generator, the resulting random number is then scaled into the possible index range.

Data Structure 2016

  • R. Wei

45

slide-46
SLIDE 46

Some concepts used in collision resolution method:

  • Load factor: the number of elements in the list divided by the

number of physical allocated for the list, expressed as percentage (better less than 75). α = k n × 100.

  • Clustering: as data are added to a list and collisions are

resolved, some hashing algorithms tend to cause data to group within the list.

Data Structure 2016

  • R. Wei

46

slide-47
SLIDE 47

Data Structure 2016

  • R. Wei

47

slide-48
SLIDE 48

Open addressing to resolve collisions (disadvantage: each collision resolution increases the probability of future collisions).

  • Linear probe: when data cannot be stored in the home address,

we resolve the collision by adding 1 to the current address.

Data Structure 2016

  • R. Wei

48

slide-49
SLIDE 49
  • Quadratic probe: the increment is the collision probe number

squared.

Data Structure 2016

  • R. Wei

49

slide-50
SLIDE 50
  • Pseudorandom collision resolution (double hashing): use a

pseudorandom number to resolve the collision. Use the collision address as the key of the the pseudorandom generator.

Data Structure 2016

  • R. Wei

50

slide-51
SLIDE 51
  • Key offset (double hashing): calculate the new address as a

function of the old address and the key. For example:

  • ffSet =

key / listSize address = (offSet + old address) modulo listSize

Data Structure 2016

  • R. Wei

51

slide-52
SLIDE 52

Linked list collision resolution: use a separate area to store collisions and chains all synonyms together in a linked list (usually use LIFO sequence). Two storage areas are used: prime area and the overflow area.

Data Structure 2016

  • R. Wei

52

slide-53
SLIDE 53

Bucket hashing: keys are hashed to buckets, nodes that accommodate multiple data occurrences. (disadvantage: use more empty space, when the bucket is full, collision occurs)

Data Structure 2016

  • R. Wei

53

slide-54
SLIDE 54

Combination approaches may used: bucket hashing first, then a linear probe is used if bucket is full.

Data Structure 2016

  • R. Wei

54