The kth quantiles of an n-element set are the k-1 order statistics that divide the sorted set into k equal-sized sets (to within 1). Give an O(nlgk)-time algorithm to list the kth quantiles of a set.
I did a lot of jobs to figure out what the hell the question is actually talking about. In fact, there would have been k groups fragments in an n-size set/list which are divided by k-1 order statistics named quantiles after evaluating one's algorithm. At first, the set/list is unsorted and it can be hardly unsorted after execution.
So hereafter I make up 2 functions - k_quantiles_sorted_n
(which costs Θ(n)) & k_quantiles_sorted_lgk
(which takes Θ(lgk)) - to solve the already-sorted-array problems. In these cases, I just work out an array of indices which are on behave of the positions of dividers.
In the end, I take out the original problem with function k_quantiles_unsorted_nlgk
and make it by recursion which consumes Θ(nlgk). As you'll see below, the gross problem is separated into 2 subroutines, each of n(1-1/k)/2 to n/2 scale.
This works homogeneous with the same size of group when the number of elements (aka n) is ak+k?1 for a positive integer a. Once it were a different number, the interval among groups would be heterogeneous. some additive care with rounding needs to be taken in order to avoid creating two segments that differ by more than 1.
The inherent consumption takes place in the select
function which takes up to Θ(n) time. The base case is that the subproblem is divided into a scale of T(n/k) with a item of Θ(1) so that you could immediately return. The recursion here is
T(n) <= 2T(n/2) + Θ(n) <= 4T(n/4) + 2Θ(n) <= 8T(n/8) + 3Θ(n) ...
<= kT(n/k) + lgkΘ(n) = Θ(nlgk)
It would be nice to illustrate by a binary recursion tree with a depth of lgk and leaves of T(n/k). As we could see, the last subproblem scale determines the log part in final Θ(nlgk), which reflected by the depth of the recursion tree. Furthermore, we could find that if the last problem scale is Θ(1), that is n/n, then the consumption goes up to Θ(nlgn), which means the groups count will be the last divisor. The less groups we divide at last, the less log part we spend.
i.e. Note that the slice of array in python is returning a new array(aka slice for copy) which doesn't correspond to Swift. I've spent a lot of time getting rid of the startIndex & endIndex parameters using slice of array. But it all went in vain as a result since we cannot regard the slice as a reference of original array partially. Probably it'd take effect with memoryview
or something which is quite annoying.
As we used to say, unit tests run first.
Test Driven Code:
import unittest
class TestCase(unittest.TestCase):
def setUp(self):
self.maxDiff = None
def test_k_quantiles_sorted(self):
for j in range(0, 100):
n = randint(10, 20)
k = randint(0, n+1)
# l = k_quantiles_sorted_n(n, k, 0)
l = k_quantiles_sorted_lgk(n, k)
count = len(l)
self.assertGreaterEqual(count, 0)
if count == 0:
continue
sentinel_1 = l[0]
sentinel_2 = -1
self.assertLess(sentinel_1, n)
for i in range(1, count+1):
scope = 0
if i == count:
scope = n - l[i-1] - 1
else:
self.assertLess(l[i], n)
scope = l[i] - l[i-1] - 1
difference = abs(sentinel_1 - scope)
self.assertLessEqual(difference, 1)
if difference == 0:
continue
else:
if sentinel_2 < 0:
sentinel_2 = difference
else:
self.assertEqual(sentinel_2, difference)
def test_k_quantiles_unsorted_nlgk(self):
for j in range(0, 100):
n = randint(50, 100)
k = randint(0, n+1)
l = list(range(0, n))
quantiles = k_quantiles_unsorted_nlgk(l, 0, len(l)-1, k)
count = len(quantiles)
self.assertGreaterEqual(count, 0)
if count == 0:
continue
dividor = k - 1
remain = n - dividor
base = remain // k
additive = base + 1
additiveGroup = remain % k
baseGroup = k - additiveGroup
current = -1
quantiles.append(n)
for i in range(0, len(quantiles)):
scope = quantiles[i]-current-1
current = quantiles[i]
self.assertTrue(scope==base or scope==additive)
if scope == base:
baseGroup -= 1
elif scope == additive:
additiveGroup -= 1
self.assertTrue(baseGroup==0 and additiveGroup==0)
if __name__ == '__main__':
unittest.main()
Then it goes with source code:
def k_quantiles_sorted_n(n:int, k:int, base:int) -> []:
if k <= 1 or k > n+1:
return[]
remainder = k % 2
if remainder:
midst = n // k
remains = n-midst-2
n_left = remains // 2
n_right = remains - n_left
k_child = (k-1) // 2
left = base + n_left
right = base + n - n_right - 1
return k_quantiles_sorted_n(n_left, k_child, base) + \
[left, right] + k_quantiles_sorted_n(n_right, k_child, right+1)
else:
k_child = k // 2
n_left = n // 2
n_right = n - n_left - 1
halver = base + n_left
return k_quantiles_sorted_n(n_left, k_child, base) + \
[halver] + k_quantiles_sorted_n(n_right, k_child, halver+1)
def k_quantiles_sorted_lgk(n:int, k:int) -> []:
if k < 2 or k > n+1:
return[]
dividor = k - 1
remain = n - dividor
base = remain // k
additiveGroup = remain % k
additiveDividor = additiveGroup - 1
baseGroup = k - additiveGroup
baseDividor = baseGroup - 1
if additiveGroup > 0:
baseDividor += 1
quantiles = []
current = -1
for i in range(0, baseDividor):
current += base+1
quantiles.append(current)
for j in range(0, additiveDividor):
current += base+2
quantiles.append(current)
return quantiles
def k_quantiles_unsorted_nlgk(array:[], startIndex:int, endIndex:int, k:int) -> []:
n = endIndex - startIndex + 1
if k < 2 or k > n+1:
return[]
k_child = k // 2
remainder = k % 2
if remainder:
midstCount = n // k
remainCount = n - midstCount - 2
leftCount = remainCount // 2
rightCount = remainCount - leftCount
leftIndex = startIndex + leftCount
rightIndex = leftIndex + midstCount + 1
localLeftIndex = leftCount
localRightIndex = midstCount
left = select(array, startIndex, endIndex, localLeftIndex)
right = select(array, leftIndex+1, endIndex, localRightIndex)
return k_quantiles_unsorted_nlgk(array, startIndex, leftIndex-1, k_child) + \
[left, right] + k_quantiles_unsorted_nlgk(array, rightIndex+1, endIndex, k_child)
else:
leftCount = n // 2
halverIndex = startIndex + leftCount
localHalverIndex = leftCount
halver = select(array, startIndex, endIndex, localHalverIndex)
return k_quantiles_unsorted_nlgk(array, startIndex, halverIndex-1, k_child) + \
[halver] + k_quantiles_unsorted_nlgk(array, halverIndex+1, endIndex, k_child)