Data Structure
Sorting
QuickSort
Idea
$Process\ after\ dividing \rightarrow$(like MergeSort in thought)
Divide the sequence into two subsequences
$$
S=S_L+S_R
$$
Downsizing
$$
max(|S_L|,|S_R|) < n
$$
Be independent of each other
$$
max(S_L) \leq min(S_R)
$$
After the subsequences are sorted recursively, the original sequence is ordered naturally
$$
Sorted(S)=Sorted(S_L)+Sorted(S_R)
$$
A trivial solution: With a single element, is itself a solution$\rightarrow Recursive\ base\newline$
Problem
How to delimit subsequence?
Pivot
None of the left/right elements are bigger/smaller than it
$\rightarrow$ With the pivot as the boundary, the partition of the sequence will be realized naturally
$$
[l0,hi)=[lo,mi)+[mi]+(mi,hi)
$$
Key point: Construct Pivot!
1
2
3
4
5
6
7
|
template <typename T>
void Vector<T>::quickSort ( Rank lo, Rank hi ) { //0 <= lo < hi <= size
if ( hi - lo < 2 ) return; //Recursive base
Rank mi = partition ( lo, hi ); //Key point
quickSort ( lo, mi );
quickSort ( mi + 1, hi );
}
|
Partition
Difficulty: pivot may not exist in the original sequence
Good news: The pivot must be in place
In an ordered sequence, all elements are pivots and vice versa
Construct Pivot
Analysis
Unstable
$$
\begin{align}
&Example:\newline
&Before:[…5_a…5_b…]\newline
&After:[…5_b…5_a…]\newline
\end{align}
$$
In-place
$O(1)$ nice!
In-time
Best case
The pivot is always (near) in the center
$$
T(n)=2T(\frac{n-1}{2})+O(n)\newline
T(n)=O(n\log n)\newline
$$
Worst case
Each division is extremely uneven
$$
T(n)=T(n-1)+T(0)+O(n)\newline
T(n)=O(n^2)\newline
$$
Reduce the worst case probability
Random selection
Middle of three(Randomly select three elements in the sequence and take the middle value)
But it can only reduce the probability, not eliminate
Average
$O(n\log n)\newline$
Take a uniformly independent distributed sequence as an example
$$
\begin{align}
&order\ the\ rank\ of\ the\ pivot\ is\ k\newline
&\rightarrow probability:\frac{1}{n}\newline\newline
&then\ T(n)=(n+1)+(\frac{1}{n}) * \sum_{k=0}^{n-1}[T(k)+T(n-k-1)]\newline
&It’s\ easy\ to\ see\ that\rightarrow T(k)=T(n-k-1)\newline
&T(n)=(n+1)+(\frac{2}{n}) * \sum_{k=0}^{n-1}T(k)\newline
&n*T(n)=n * (n+1)+2 * \sum_{k=0}^{n-1}T(k)\ (1)\newline\newline
&make\ n = n - 1\newline
&(n-1) * T(n-1)=(n-1) * n+2 * \sum_{k=0}^{n-2}T(k)\ (2)\newline\newline
&Then\ (1)-(2)\newline
&n * T(n) - (n - 1) * T(n - 1) = 2 * n + 2 * T(n-1)\newline
&n * T(n) - (n + 1) * T(n - 1)=2 * n\newline
&\frac {T(n)}{n + 1} = \frac {T(n - 1)}{n}+\frac {2}{n+1}\newline
&=\frac {2}{n+1} + \frac {2}{n} + \frac {2}{n-1} + … + \frac {2}{2} + \frac{T(0)}{1}\newline
&=(2\ln2) * \log n \approx 1.39 * \log n\newline
\end{align}
$$
Algorithm of optimization(LGU)
$$
S=[l0]+L(lo,mi]+G(mi,k)+U[l,hi]\newline
L < pivot\leq G\newline
$$
code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
//pivot construction algorithm;
//by adjusting the element position to construct the pivot of the interval [lo, hi] and return its rank
template <typename T>
Rank Vector<T>::partition ( Rank lo, Rank hi ) { //LGU
swap ( _elem[lo], _elem[ lo + rand() % ( hi - lo ) ] ); //Swap any element with the first element
T pivot = _elem[lo]; //Take the first element as the candidate pivot
//-- after the above exchange, equivalent to random selection
int mi = lo;
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
// ---- L < [lo] ----- ] ----- [lo] <= G --- ] [ ----- Unknown -------
// X x . . . . . . . . . x . . . . . . . . . . . x . . . . . . . . . . x X
// | | | |
// lo (pivot candidate) mi k hi
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
for ( Rank k = lo + 1; k < hi; k++ ) //Scan from left to right
if ( _elem[k] < pivot ) //If the current element _elem[k] is less than pivot
swap ( _elem[++mi], _elem[k] ); //After swapping _elem[k] to the original mi,
//the L subsequence expands to the right
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
// --------- L < [lo] ---------- ] ------------- [lo] <= G ----------]
// X x . . . . . . . . . . . . . . x . . . . . . . . . . . . . . . . . x X
// | | |
// lo (pivot candidate) mi hi
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
swap ( _elem[lo], _elem[mi] ); //Candidate pivot homing
return mi; //return its rank
}
|
In-place
$O(1)\newline$
In-time
$O(n)\newline$
Selection
The time cost of sorting is too high and you need to find other methods
K-selection
In any set of comparably sized elements, how do you go from small to large to find the elements of order k?
Example
Excel: large(range, rank)
The length of the sorted sequence S is n$(range\ from\ 0\ to\ n - 1)$, the element whose order is $\lfloor \frac {n}{2} \rfloor$ is called median
Example
Excel: median(range)
Mode
In an unordered sequence, if more than half of the elements are m, m is called the mode
Example
$$
\begin{align}
&[3,5,2,3,3]\rightarrow mode\ is\ 3\newline
&but\ in\ [3,5,2,3,3,0]\rightarrow no\ mode\newline
\end{align}
$$
mode algorithm
Trivial algorithm: sort + traversal
but if we want that $\rightarrow T(n) \leq O(n)\ and\ S(n)\leq O(1)?\newline$
Necessary condition: If mode exists, then the mode must also be the median
The mode, if it exists, must be frequent
$Process\ after\ reducing\newline$
1
2
3
4
5
6
7
8
9
10
11
|
template <typename T>
T majEleCandidate ( Vector<T> A ) {
T maj;
for ( int c = 0, i = 0; i < A.size(); i++ )
if ( 0 == c ) {
maj = A[i];
c = 1;
} else
maj == A[i] ? c++ : c--;
return maj;
}
|
General purpose algorithm
quickSelect
$Process\ after\ reducing\newline$
Partition
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
template <typename T>
void quickSelect ( Vector<T> & A, Rank k ) {
for ( Rank lo = 0, hi = A.size() - 1; lo < hi; ) {
Rank i = lo, j = hi;
T pivot = A[lo];
while ( i < j ) { //O(hi - lo + 1) = O(n)
while ( ( i < j ) && ( pivot <= A[j] ) ) j--; A[i] = A[j];
while ( ( i < j ) && ( A[i] <= pivot ) ) i++; A[j] = A[i];
} //assert: quit with i == j
A[i] = pivot; // A[0,i) <= A[i] <= A(i, n)
if ( k <= i ) hi = i - 1;
if ( i <= k ) lo = i + 1;
} //A[k] is now a pivot
}
|
Worst case
$T(n)=O(n)+T(\frac {n}{Q})+T(\frac{3n}{4})\newline$
$if\ we\ want\ that\ T(n)=O(n)\newline$
$we\ should\ make\ \frac{n}{Q}+\frac{3n}{4}<n \newline$
$so\ \frac{1}{Q}+\frac{3}{4}<1\newline$
$make\ Q=5\newline$
$T(n)=cn+T(\frac{n}{5})+T(\frac{3n}{4})\newline$
$then\ T(n)=O(20cn)=O(n)\newline$
linearSelect
Let Q be a small constant
-
if ( n = |A| < Q ) return trivialSelect( A, k )
-
else divide A evenly into n/Q subsequences (each of size Q)
-
Sort each subsequence and determine n/Q medians //e.g. by insertionsort
-
Call linearSelect() to find M, median of the medians //by recursion
-
Let L/E/G = { x </=/> M | x $\in$ A }
1
2
3
|
if ( k <= |L| ) return linearSelect( L, k )
if ( k <= |L| + |E| ) return M
return linearSelect( G, k - |L| - |E| )
|
In-time
$T(n)=O(n)\newline$
ShellSort
Donald L.Shell,1959
Idea
Think of the whole sequence as a matrix, with each column sorted individually$\rightarrow(w-sorting)\newline$
step sequence
An inverted sequence of the width of each matrix
$set(step)={w_1=1,w_2,w_3,…w_k,…}\newline$
diminishing increment
Algorithm
$Call-by-rank\newline$
$The\ width\ of\ the\ matrix\ is\ w,then\ the\ elements\ in\ the\ i^{th}\ column\ is\ a[i+kw],0\leq k <\frac {n}{w}\newline$
$\rightarrow insertionsort(The\ running\ time\ depends\ on\ the\ total\ number\ of\ inversions)\newline$
The biggest factor
$\rightarrow set(step)={w_1=1,w_2,w_3,…w_k,…}\newline$
Example
$set(step)_{shell}={1,2,4,8,…,2^k,…}\newline$
worst-case: $O(n\log n)\newline$
best-case: $O(n)\newline$
code
1
2
3
4
5
6
7
8
9
10
11
|
template <typename T>
void Vector<T>::shellSort ( Rank lo, Rank hi ) { //0 <= lo < hi <= size <= 2^30
/*DSA*/ printf ( "\tSHELLsort [%3d, %3d)\n", lo, hi );
for ( Rank d = 0x3FFFFFFF; 0 < d; d >>= 1 ) //PS Sequence: { 1, 3, 7, 15, ..., 1073741823 }
for ( Rank j = lo + d; j < hi; j++ ) { //for each j in [lo+d, hi)
T x = _elem[j]; Rank i = j - d;
while ( lo <= i && _elem[i] > x )
{ _elem[i + d] = _elem[i]; i -= d; }
_elem[i + d] = x; //insert [j] into its subsequence
}
}
|
Code by Python(use numpy)
SelectionSort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
def SelectionSort(x: np.ndarray):
length = x.size
CountOfCompare, CountOfMove = 0, 0
for i in range(length):
minPosition = i
for j in range(i + 1, length):
minPosition = j if x[j] < x[minPosition] else minPosition
CountOfCompare = CountOfCompare + 1
if minPosition != i:
CountOfMove = CountOfMove + 1
temp = x[i]
x[i] = x[minPosition]
x[minPosition] = temp
return CountOfCompare, CountOfMove
|
BubbleSort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
def BubbleSort(x: np.ndarray):
length = x.size
CountOfCompare, CountOfMove = 0, 0
for i in range(length - 1, 0, -1):
flag = True
for j in range(i):
CountOfCompare = CountOfCompare + 1
if x[j] > x[j + 1]:
temp = x[j]
x[j] = x[j + 1]
x[j + 1] = temp
CountOfMove = CountOfMove + 1
flag = False
if flag:
break
return CountOfCompare, CountOfMove
|
MergeSort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
def MergeSort(x: np.ndarray):
CountOfCompare, CountOfMove = 0, 0
def mergeSort(arr: np.ndarray, lo: int, hi: int):
if hi - lo < 2: return
mi = (lo + hi) >> 1
mergeSort(arr, lo, mi)
mergeSort(arr, mi, hi)
merge(arr, lo, mi, hi)
def merge(Arr: np.ndarray, low: int, middle: int, high: int):
nonlocal CountOfCompare, CountOfMove
i, j, k = 0, 0, 0
B = Arr[low:middle].copy()
lb, lc = middle - low, high - middle
while j < lb and k < lc:
CountOfCompare = CountOfCompare + 1
CountOfMove = CountOfMove + 1
if B[j] <= Arr[middle + k]:
Arr[low + i] = B[j]
j = j + 1
else:
Arr[low + i] = Arr[middle + k]
k = k + 1
i = i + 1
while j < lb:
CountOfMove = CountOfMove + 1
Arr[low + i] = B[j]
i = i + 1
j = j + 1
mergeSort(x, 0, x.size)
return CountOfCompare, CountOfMove
|
BinaryInsertSort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
def BinaryInsertSort(x: np.ndarray):
CountOfCompare, CountOfMove = 0, 0
def Search(arr: np.ndarray, Val, lo: int, hi: int):
nonlocal CountOfCompare
while lo < hi:
mi = (lo + hi) >> 1
CountOfCompare = CountOfCompare + 1
if Val < arr[mi]:
hi = mi
else:
lo = mi + 1
return lo - 1
def Insert(Arr: np.ndarray, val, Rank: int, Length: int):
nonlocal CountOfMove
for _ in range(Length, Rank, -1):
CountOfMove = CountOfMove + 1
Arr[_] = Arr[_ - 1]
Arr[Rank + 1] = val
CountOfMove = CountOfMove + 1
for i in range(x.size):
Local = Search(x, x[i], 0, i)
Insert(x, x[i], Local, i)
return CountOfCompare, CountOfMove
|
InsertSort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
def InsertSort(x: np.ndarray):
CountOfCompare, CountOfMove = 0, 0
for i in range(x.size):
key = x[i]
j = i - 1
while j >= 0:
CountOfCompare = CountOfCompare + 1
if x[j] > key:
j = j - 1
else:
break
for _ in range(i, j, -1):
CountOfMove = CountOfMove + 1
x[_] = x[_ - 1]
x[j + 1] = key
CountOfMove = CountOfMove + 1
return CountOfCompare, CountOfMove
|
HeapSort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
def HeapSort(x: np.ndarray):
CountOfCompare, CountOfMove = 0, 0
length = x.size
def BuildHeap(Arr: np.ndarray):
nonlocal length
for i in range(length // 2 - 1, -1, -1):
heapify(Arr, i, length)
def heapify(arr: np.ndarray, i: int, n: int):
nonlocal CountOfCompare, CountOfMove
largest = i
leftChild = i * 2 + 1
rightChild = i * 2 + 2
CountOfCompare = CountOfCompare + 2
largest = leftChild if leftChild < n and arr[largest] < arr[leftChild] else largest
largest = rightChild if rightChild < n and arr[largest] < arr[rightChild] else largest
CountOfCompare = CountOfCompare + 1
if largest != i:
CountOfMove = CountOfMove + 1
temp = arr[i]
arr[i] = arr[largest]
arr[largest] = temp
heapify(arr, largest, n)
def heap_sort(Array: np.ndarray):
nonlocal length, CountOfMove
BuildHeap(Array)
for i in range(length - 1, 0, -1):
CountOfMove = CountOfMove + 1
temp = Array[0]
Array[0] = Array[i]
Array[i] = temp
heapify(Array, 0, i)
heap_sort(x)
return CountOfCompare, CountOfMove
|
ShellSort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
def ShellSort(x: np.ndarray):
CountOfCompare, CountOfMove = 0, 0
n = x.size
inc = n // 2
while inc > 0:
for i in range(inc, n):
key = x[i]
j = i
while j >= inc and key < x[j - inc]:
CountOfCompare = CountOfCompare + 1
CountOfMove = CountOfMove + 1
x[j] = x[j - inc]
j = j - inc
CountOfMove = CountOfMove + 1
x[j] = key
inc = inc // 2
return CountOfCompare, CountOfMove
|
QuickSort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
def QuickSort(x: np.ndarray):
CountOfCompare, CountOfMove = 0, 0
def qsort(arr: np.ndarray, lo: int, hi: int):
if lo < hi:
pi = partition(arr, lo, hi)
qsort(arr, lo, pi - 1)
qsort(arr, pi + 1, hi)
def partition(Arr: np.ndarray, low: int, high: int):
nonlocal CountOfCompare, CountOfMove
i = low - 1
pivot = Arr[high]
for j in range(low, high):
CountOfCompare = CountOfCompare + 1
if Arr[j] <= pivot:
i = i + 1
temp = Arr[i]
Arr[i] = Arr[j]
Arr[j] = temp
CountOfMove = CountOfMove + 1
CountOfMove = CountOfMove + 1
temp = Arr[i + 1]
Arr[i + 1] = Arr[high]
Arr[high] = temp
return i + 1
qsort(x, 0, x.size - 1)
return CountOfCompare, CountOfMove
|
TwoWay_InsertionSort
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
def TwoWay_InsertionSort(x: np.ndarray):
CountOfCompare, CountOfMove = 0, 0
n = x.size
if n == 0:
return CountOfCompare, CountOfMove
first, final = 0, 0
temp = np.array(np.zeros(n))
temp[0] = x[0]
for i in range(1, n):
if x[i] < temp[first]:
CountOfCompare = CountOfCompare + 1
CountOfMove = CountOfMove + 1
first = (first - 1 + n) % n
temp[first] = x[i]
elif x[i] > temp[final]:
CountOfCompare = CountOfCompare + 2
CountOfMove = CountOfMove + 1
final = (final + 1 + n) % n
temp[final] = x[i]
else:
CountOfCompare = CountOfCompare + 3
k = (final + 1 + n) % n
while temp[(k - 1 + n) % n] > x[i]:
CountOfCompare = CountOfCompare + 1
CountOfMove = CountOfMove + 1
temp[(k + n) % n] = temp[(k - 1 + n) % n]
k = (k - 1 + n) % n
CountOfMove = CountOfMove + 1
temp[(k + n) % n] = x[i]
final = (final + 1 + n) % n
for k in range(n):
CountOfMove = CountOfMove + 1
x[k] = temp[(first + k) % n]
return CountOfCompare, CountOfMove
|