Sorting

2022-11-18 约 2654 字预计阅读 13 分钟

Data Structure

Sorting

QuickSort

Idea

$Process\ after\ dividing \rightarrow$(like MergeSort in thought)

Divide the sequence into two subsequences $$ S=S_L+S_R $$ Downsizing $$ max(|S_L|,|S_R|) < n $$ Be independent of each other $$ max(S_L) \leq min(S_R) $$ After the subsequences are sorted recursively, the original sequence is ordered naturally $$ Sorted(S)=Sorted(S_L)+Sorted(S_R) $$ A trivial solution: With a single element, is itself a solution$\rightarrow Recursive\ base\newline$

Problem

How to delimit subsequence?

Pivot

None of the left/right elements are bigger/smaller than it

$\rightarrow$ With the pivot as the boundary, the partition of the sequence will be realized naturally $$ [l0,hi)=[lo,mi)+[mi]+(mi,hi) $$ Key point: Construct Pivot!

1
2
3
4
5
6
7


template <typename T> 
void Vector<T>::quickSort ( Rank lo, Rank hi ) { //0 <= lo < hi <= size
   if ( hi - lo < 2 ) return; //Recursive base
   Rank mi = partition ( lo, hi ); //Key point
   quickSort ( lo, mi ); 
   quickSort ( mi + 1, hi ); 
}

Partition

Difficulty: pivot may not exist in the original sequence

Good news: The pivot must be in place

In an ordered sequence, all elements are pivots and vice versa

Construct Pivot

Analysis

Unstable $$ \begin{align} &Example:\newline &Before:[…5_a…5_b…]\newline &After:[…5_b…5_a…]\newline \end{align} $$

In-place

$O(1)$ nice!

In-time

Best case

The pivot is always (near) in the center

$$ T(n)=2T(\frac{n-1}{2})+O(n)\newline T(n)=O(n\log n)\newline $$

Worst case

Each division is extremely uneven $$ T(n)=T(n-1)+T(0)+O(n)\newline T(n)=O(n^2)\newline $$

Reduce the worst case probability

Random selection

Middle of three(Randomly select three elements in the sequence and take the middle value)

But it can only reduce the probability, not eliminate

Average

$O(n\log n)\newline$

Take a uniformly independent distributed sequence as an example $$ \begin{align} &order\ the\ rank\ of\ the\ pivot\ is\ k\newline &\rightarrow probability:\frac{1}{n}\newline\newline &then\ T(n)=(n+1)+(\frac{1}{n}) * \sum_{k=0}^{n-1}[T(k)+T(n-k-1)]\newline &It’s\ easy\ to\ see\ that\rightarrow T(k)=T(n-k-1)\newline &T(n)=(n+1)+(\frac{2}{n}) * \sum_{k=0}^{n-1}T(k)\newline &n*T(n)=n * (n+1)+2 * \sum_{k=0}^{n-1}T(k)\ (1)\newline\newline &make\ n = n - 1\newline &(n-1) * T(n-1)=(n-1) * n+2 * \sum_{k=0}^{n-2}T(k)\ (2)\newline\newline &Then\ (1)-(2)\newline &n * T(n) - (n - 1) * T(n - 1) = 2 * n + 2 * T(n-1)\newline &n * T(n) - (n + 1) * T(n - 1)=2 * n\newline &\frac {T(n)}{n + 1} = \frac {T(n - 1)}{n}+\frac {2}{n+1}\newline &=\frac {2}{n+1} + \frac {2}{n} + \frac {2}{n-1} + … + \frac {2}{2} + \frac{T(0)}{1}\newline &=(2\ln2) * \log n \approx 1.39 * \log n\newline \end{align} $$

Algorithm of optimization(LGU)

$$ S=[l0]+L(lo,mi]+G(mi,k)+U[l,hi]\newline L < pivot\leq G\newline $$

code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


//pivot construction algorithm;
//by adjusting the element position to construct the pivot of the interval [lo, hi] and return its rank
template <typename T>
Rank Vector<T>::partition ( Rank lo, Rank hi ) { //LGU
   swap ( _elem[lo], _elem[ lo + rand() % ( hi - lo ) ] ); //Swap any element with the first element
   T pivot = _elem[lo]; //Take the first element as the candidate pivot 
    					//-- after the above exchange, equivalent to random selection
   int mi = lo;
   //++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
   //   ---- L < [lo] ----- ] ----- [lo] <= G --- ] [ ----- Unknown -------
   // X x . . . . . . . . . x . . . . . . . . . . . x . . . . . . . . . . x X
   // |                     |                       |                       |
   // lo (pivot candidate)  mi                      k                       hi
   //++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    
   for ( Rank k = lo + 1; k < hi; k++ ) //Scan from left to right
      if ( _elem[k] < pivot ) //If the current element _elem[k] is less than pivot
         swap ( _elem[++mi], _elem[k] ); //After swapping _elem[k] to the original mi, 
    									 //the L subsequence expands to the right
    
   //++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
   //   --------- L < [lo] ---------- ] ------------- [lo] <= G ----------]
   // X x . . . . . . . . . . . . . . x . . . . . . . . . . . . . . . . . x X
   // |                               |                                     |
   // lo (pivot candidate)            mi                                    hi
   //++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    
   swap ( _elem[lo], _elem[mi] ); //Candidate pivot homing
   return mi; //return its rank
}

In-place

$O(1)\newline$

In-time

$O(n)\newline$

Selection

The time cost of sorting is too high and you need to find other methods

K-selection

In any set of comparably sized elements, how do you go from small to large to find the elements of order k?

Example

Excel: large(range, rank)

Median

The length of the sorted sequence S is n$(range\ from\ 0\ to\ n - 1)$, the element whose order is $\lfloor \frac {n}{2} \rfloor$ is called median

Example

Excel: median(range)

Mode

In an unordered sequence, if more than half of the elements are m, m is called the mode

Example $$ \begin{align} &[3,5,2,3,3]\rightarrow mode\ is\ 3\newline &but\ in\ [3,5,2,3,3,0]\rightarrow no\ mode\newline \end{align} $$

mode algorithm

Trivial algorithm: sort + traversal

but if we want that $\rightarrow T(n) \leq O(n)\ and\ S(n)\leq O(1)?\newline$

Necessary condition: If mode exists, then the mode must also be the median

The mode, if it exists, must be frequent

$Process\ after\ reducing\newline$

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


template <typename T> 
T majEleCandidate ( Vector<T> A ) { 
   T maj;
   for ( int c = 0, i = 0; i < A.size(); i++ )
      if ( 0 == c ) { 
         maj = A[i]; 
          c = 1; 
      } else 
         maj == A[i] ? c++ : c--; 
   return maj;
}

General purpose algorithm

quickSelect

$Process\ after\ reducing\newline$

Partition

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


template <typename T> 
void quickSelect ( Vector<T> & A, Rank k ) {
   for ( Rank lo = 0, hi = A.size() - 1; lo < hi; ) {
      Rank i = lo, j = hi; 
      T pivot = A[lo];
      while ( i < j ) { //O(hi - lo + 1) = O(n)
         while ( ( i < j ) && ( pivot <= A[j] ) ) j--; A[i] = A[j];
         while ( ( i < j ) && ( A[i] <= pivot ) ) i++; A[j] = A[i];
      } //assert: quit with i == j
      A[i] = pivot; // A[0,i) <= A[i] <= A(i, n)
      if ( k <= i ) hi = i - 1;
      if ( i <= k ) lo = i + 1;
   } //A[k] is now a pivot
}

Worst case

$T(n)=O(n)+T(\frac {n}{Q})+T(\frac{3n}{4})\newline$

$if\ we\ want\ that\ T(n)=O(n)\newline$

$we\ should\ make\ \frac{n}{Q}+\frac{3n}{4}<n \newline$

$so\ \frac{1}{Q}+\frac{3}{4}<1\newline$

$make\ Q=5\newline$

$T(n)=cn+T(\frac{n}{5})+T(\frac{3n}{4})\newline$

$then\ T(n)=O(20cn)=O(n)\newline$

linearSelect

Let Q be a small constant

if ( n = |A| < Q ) return trivialSelect( A, k )
else divide A evenly into n/Q subsequences (each of size Q)
Sort each subsequence and determine n/Q medians //e.g. by insertionsort
Call linearSelect() to find M, median of the medians //by recursion
Let L/E/G = { x </=/> M | x $\in$ A }

1
2
3


	if ( k <= |L| ) return linearSelect( L, k ) 
  	if ( k <= |L| + |E| ) return M 
  	return linearSelect( G, k - |L| - |E| )

In-time

$T(n)=O(n)\newline$

ShellSort

Donald L.Shell,1959

Idea

Think of the whole sequence as a matrix, with each column sorted individually$\rightarrow(w-sorting)\newline$

step sequence

An inverted sequence of the width of each matrix

$set(step)={w_1=1,w_2,w_3,…w_k,…}\newline$

diminishing increment

Algorithm

$Call-by-rank\newline$

$The\ width\ of\ the\ matrix\ is\ w,then\ the\ elements\ in\ the\ i^{th}\ column\ is\ a[i+kw],0\leq k <\frac {n}{w}\newline$

Internal sort $\rightarrow$Input sensitive algorithm

$\rightarrow insertionsort(The\ running\ time\ depends\ on\ the\ total\ number\ of\ inversions)\newline$

The biggest factor

$\rightarrow set(step)={w_1=1,w_2,w_3,…w_k,…}\newline$

Example

$set(step)_{shell}={1,2,4,8,…,2^k,…}\newline$

worst-case: $O(n\log n)\newline$

best-case: $O(n)\newline$

code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


template <typename T>
void Vector<T>::shellSort ( Rank lo, Rank hi ) { //0 <= lo < hi <= size <= 2^30
   /*DSA*/ printf ( "\tSHELLsort [%3d, %3d)\n", lo, hi );
   for ( Rank d = 0x3FFFFFFF; 0 < d; d >>= 1 ) //PS Sequence: { 1, 3, 7, 15, ..., 1073741823 }
      for ( Rank j = lo + d; j < hi; j++ ) { //for each j in [lo+d, hi)
         T x = _elem[j]; Rank i = j - d;
         while ( lo <= i && _elem[i] > x )
            { _elem[i + d] = _elem[i]; i -= d; }
         _elem[i + d] = x; //insert [j] into its subsequence
      }
}

Code by Python(use numpy)

1

import numpy as np

SelectionSort

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


def SelectionSort(x: np.ndarray):
    length = x.size
    CountOfCompare, CountOfMove = 0, 0

    for i in range(length):
        minPosition = i

        for j in range(i + 1, length):
            minPosition = j if x[j] < x[minPosition] else minPosition
            CountOfCompare = CountOfCompare + 1

        if minPosition != i:
            CountOfMove = CountOfMove + 1
            temp = x[i]
            x[i] = x[minPosition]
            x[minPosition] = temp

    return CountOfCompare, CountOfMove

BubbleSort

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


def BubbleSort(x: np.ndarray):
    length = x.size
    CountOfCompare, CountOfMove = 0, 0

    for i in range(length - 1, 0, -1):
        flag = True

        for j in range(i):
            CountOfCompare = CountOfCompare + 1
            if x[j] > x[j + 1]:
                temp = x[j]
                x[j] = x[j + 1]
                x[j + 1] = temp
                CountOfMove = CountOfMove + 1
                flag = False

        if flag:
            break

    return CountOfCompare, CountOfMove

MergeSort

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35


def MergeSort(x: np.ndarray):
    CountOfCompare, CountOfMove = 0, 0

    def mergeSort(arr: np.ndarray, lo: int, hi: int):
        if hi - lo < 2: return
        mi = (lo + hi) >> 1
        mergeSort(arr, lo, mi)
        mergeSort(arr, mi, hi)
        merge(arr, lo, mi, hi)

    def merge(Arr: np.ndarray, low: int, middle: int, high: int):
        nonlocal CountOfCompare, CountOfMove
        i, j, k = 0, 0, 0
        B = Arr[low:middle].copy()
        lb, lc = middle - low, high - middle

        while j < lb and k < lc:
            CountOfCompare = CountOfCompare + 1
            CountOfMove = CountOfMove + 1
            if B[j] <= Arr[middle + k]:
                Arr[low + i] = B[j]
                j = j + 1
            else:
                Arr[low + i] = Arr[middle + k]
                k = k + 1
            i = i + 1

        while j < lb:
            CountOfMove = CountOfMove + 1
            Arr[low + i] = B[j]
            i = i + 1
            j = j + 1

    mergeSort(x, 0, x.size)
    return CountOfCompare, CountOfMove

BinaryInsertSort

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


def BinaryInsertSort(x: np.ndarray):
    CountOfCompare, CountOfMove = 0, 0

    def Search(arr: np.ndarray, Val, lo: int, hi: int):
        nonlocal CountOfCompare

        while lo < hi:
            mi = (lo + hi) >> 1
            CountOfCompare = CountOfCompare + 1
            if Val < arr[mi]:
                hi = mi
            else:
                lo = mi + 1

        return lo - 1

    def Insert(Arr: np.ndarray, val, Rank: int, Length: int):
        nonlocal CountOfMove
        for _ in range(Length, Rank, -1):
            CountOfMove = CountOfMove + 1
            Arr[_] = Arr[_ - 1]
        Arr[Rank + 1] = val
        CountOfMove = CountOfMove + 1

    for i in range(x.size):
        Local = Search(x, x[i], 0, i)
        Insert(x, x[i], Local, i)

    return CountOfCompare, CountOfMove

InsertSort

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


def InsertSort(x: np.ndarray):
    CountOfCompare, CountOfMove = 0, 0

    for i in range(x.size):
        key = x[i]

        j = i - 1
        while j >= 0:
            CountOfCompare = CountOfCompare + 1
            if x[j] > key:
                j = j - 1
            else:
                break
        for _ in range(i, j, -1):
            CountOfMove = CountOfMove + 1
            x[_] = x[_ - 1]

        x[j + 1] = key
        CountOfMove = CountOfMove + 1

    return CountOfCompare, CountOfMove

HeapSort

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42


def HeapSort(x: np.ndarray):
    CountOfCompare, CountOfMove = 0, 0
    length = x.size

    def BuildHeap(Arr: np.ndarray):
        nonlocal length

        for i in range(length // 2 - 1, -1, -1):
            heapify(Arr, i, length)

    def heapify(arr: np.ndarray, i: int, n: int):
        nonlocal CountOfCompare, CountOfMove

        largest = i
        leftChild = i * 2 + 1
        rightChild = i * 2 + 2

        CountOfCompare = CountOfCompare + 2
        largest = leftChild if leftChild < n and arr[largest] < arr[leftChild] else largest
        largest = rightChild if rightChild < n and arr[largest] < arr[rightChild] else largest

        CountOfCompare = CountOfCompare + 1
        if largest != i:
            CountOfMove = CountOfMove + 1
            temp = arr[i]
            arr[i] = arr[largest]
            arr[largest] = temp
            heapify(arr, largest, n)

    def heap_sort(Array: np.ndarray):
        nonlocal length, CountOfMove
        BuildHeap(Array)

        for i in range(length - 1, 0, -1):
            CountOfMove = CountOfMove + 1
            temp = Array[0]
            Array[0] = Array[i]
            Array[i] = temp
            heapify(Array, 0, i)

    heap_sort(x)
    return CountOfCompare, CountOfMove

ShellSort

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


def ShellSort(x: np.ndarray):
    CountOfCompare, CountOfMove = 0, 0

    n = x.size
    inc = n // 2

    while inc > 0:
        for i in range(inc, n):
            key = x[i]
            j = i
            while j >= inc and key < x[j - inc]:
                CountOfCompare = CountOfCompare + 1
                CountOfMove = CountOfMove + 1
                x[j] = x[j - inc]
                j = j - inc
            CountOfMove = CountOfMove + 1
            x[j] = key
        inc = inc // 2
    return CountOfCompare, CountOfMove

QuickSort

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


def QuickSort(x: np.ndarray):
    CountOfCompare, CountOfMove = 0, 0

    def qsort(arr: np.ndarray, lo: int, hi: int):
        if lo < hi:
            pi = partition(arr, lo, hi)

            qsort(arr, lo, pi - 1)
            qsort(arr, pi + 1, hi)

    def partition(Arr: np.ndarray, low: int, high: int):
        nonlocal CountOfCompare, CountOfMove
        i = low - 1
        pivot = Arr[high]

        for j in range(low, high):
            CountOfCompare = CountOfCompare + 1
            if Arr[j] <= pivot:
                i = i + 1
                temp = Arr[i]
                Arr[i] = Arr[j]
                Arr[j] = temp
                CountOfMove = CountOfMove + 1

        CountOfMove = CountOfMove + 1
        temp = Arr[i + 1]
        Arr[i + 1] = Arr[high]
        Arr[high] = temp

        return i + 1

    qsort(x, 0, x.size - 1)
    return CountOfCompare, CountOfMove

TwoWay_InsertionSort

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39


def TwoWay_InsertionSort(x: np.ndarray):
    CountOfCompare, CountOfMove = 0, 0
    n = x.size
    if n == 0:
        return CountOfCompare, CountOfMove
    first, final = 0, 0
    temp = np.array(np.zeros(n))
    temp[0] = x[0]

    for i in range(1, n):
        if x[i] < temp[first]:
            CountOfCompare = CountOfCompare + 1
            CountOfMove = CountOfMove + 1
            first = (first - 1 + n) % n
            temp[first] = x[i]
        elif x[i] > temp[final]:
            CountOfCompare = CountOfCompare + 2
            CountOfMove = CountOfMove + 1
            final = (final + 1 + n) % n
            temp[final] = x[i]
        else:
            CountOfCompare = CountOfCompare + 3
            k = (final + 1 + n) % n

            while temp[(k - 1 + n) % n] > x[i]:
                CountOfCompare = CountOfCompare + 1
                CountOfMove = CountOfMove + 1
                temp[(k + n) % n] = temp[(k - 1 + n) % n]
                k = (k - 1 + n) % n

            CountOfMove = CountOfMove + 1
            temp[(k + n) % n] = x[i]
            final = (final + 1 + n) % n

    for k in range(n):
        CountOfMove = CountOfMove + 1
        x[k] = temp[(first + k) % n]

    return CountOfCompare, CountOfMove

目录

Sorting

Data Structure

Sorting

QuickSort

Idea

Problem

Pivot

Partition

Construct Pivot

Analysis

In-place

In-time

Best case

Worst case

Average

Algorithm of optimization(LGU)

code

In-place

In-time

Selection

Median

Mode

mode algorithm

General purpose algorithm

quickSelect

Worst case

linearSelect

In-time

ShellSort

Idea

step sequence

Algorithm

Internal sort $\rightarrow$Input sensitive algorithm

The biggest factor

Example

code

Code by Python(use numpy)

SelectionSort

BubbleSort

MergeSort

BinaryInsertSort

InsertSort

HeapSort

ShellSort

QuickSort

TwoWay_InsertionSort