Majority Element

Problem

Given an array of size n, find the majority element. The majority element is the element that appears more than ? n/2 ? times.

You may assume that the array is non-empty and the majority element always exist in the array.

Approach #1 Brute Force

Intuition
We can exhaust the search space in quadratic time by checking whether each element is the majority element.

Algorithm
The brute force algorithm iterates over the array, and then iterates again for each number to count its occurrences. As soon as a number is found to have appeared more than any other can possibly have appeared, return it.

#include <iostream>
#include <vector>

int majorityElement(std::vector<int>& nums)
{
    int size = (int)nums.size();
    int halfCount = size / 2;
    
    for (auto num : nums)
    {
        int count = 0;
        
        for (auto elem : nums)
        {
            if (elem == num)
            {
                ++count;
            }
        }
        
        if (count > halfCount)
        {
            return num;
        }
    }
    
    return -1;
}

int main()
{
    int arr[] = { 1, 2, 3, 2, 4, 2, 2, 2, 2, 5, 7};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    int result = majorityElement(nums);
    
    std::cout << result << std::endl;
    
    return 0;
}

Complexity Analysis

Time complexity : O(n^2)
The brute force algorithm contains two nested for loops that each run for n iterations, adding up to quadratic time complexity.
Space complexity : O(1)
The brute force solution does not allocate additional space proportional to the input size.

Approach #2 HashMap

Intuition
We know that the majority element occurs more than [n/2] times, and a HashMap allows us to count element occurrences efficiently.
Algorithm
We can use a HashMap that maps elements to counts in order to count occurrences in linear time by looping over nums. Then, we simply return the key with maximum value.

#include <iostream>
#include <vector>
#include <unordered_map>

int majorityElement(std::vector<int>& nums)
{
    // hash
    std::unordered_map<int, int> counts;
    for (auto num : nums)
    {
        if (counts.count(num))
        {
            ++counts[num];
        }
        else
        {
            counts[num] = 1;
        }
    }
    
    // iteration
    int size = (int)nums.size();
    int halfCount = size / 2;
    
    for (auto elem : nums)
    {
        if (counts[elem] > halfCount)
        {
            return elem;
        }
    }
    
    return -1;
}

int main()
{
    int arr[] = { 1, 2, 3, 2, 4, 2, 2, 2, 2, 5, 7};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    int result = majorityElement(nums);
    
    std::cout << result << std::endl;
    
    return 0;
}

Complexity Analysis

Time complexity : O(n)
We iterate over nums once and make a constant time HashMap insertion on each iteration. Therefore, the algorithm runs inO(n) time.
Space complexity : O(n)
At most, the HashMap can contain n – [n/2] associations, so it occupies O(n) space. This is because an arbitrary array of length n can contain n distinct values, but nums is guaranteed to contain a majority element, which will occupy (at minimum) [n/2] +1 array indices. Therefore, n – ([n/2] +1) indices can be occupied by distinct, non-majority elements (plus 1 for the majority element itself), leaving us with (at most) n - [n/2] distinct elements.

Approach #3 Sorting

Intuition
If the elements are sorted in monotonically increasing (or decreasing) order, the majority element can be found at index ??n/2???? (and ??n/2???? +1, incidentally, if n is even).

Algorithm
For this algorithm, we simply do exactly what is described: sort nums, and return the element in question. To see why this will always return the majority element (given that the array has one), consider the figure below (the top example is for an odd-length array and the bottom is for an even-length array):

For each example, the line below the array denotes the range of indices that are covered by a majority element that happens to be the array minimum. As you might expect, the line above the array is similar, but for the case where the majority element is also the array maximum. In all other cases, this line will lie somewhere between these two, but notice that even in these two most extreme cases, they overlap at index ??n/2????for both even- and odd-length arrays. Therefore, no matter what value the majority element has in relation to the rest of the array, returning the value at ??n/2???? will never be wrong.

#include <iostream>
#include <vector>
#include <algorithm>

int majorityElement(std::vector<int>& nums)
{
    std::sort(nums.begin(), nums.end());
    return nums[nums.size() / 2];
}

int main()
{
    int arr[] = { 1, 2, 3, 2, 4, 2, 2, 2, 2, 5, 7};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    int result = majorityElement(nums);
    
    std::cout << result << std::endl;
    
    return 0;
}

Complexity Analysis

Time complexity : O(nlgn)
Sorting the array costs O(nlgn) time in Python and Java, so it dominates the overall runtime.
Space complexity : O(1) or O(n)
We sorted nums in place here - if that is not allowed, then we must spend linear additional space on a copy of nums and sort the copy instead.

Approach #4 Randomization

Intuition
Because more than ??n/2? array indices are occupied by the majority element, a random array index is likely to contain the majority element.

Algorithm
Because a given index is likely to have the majority element, we can just select a random index, check whether its value is the majority element, return if it is, and repeat if it is not. The algorithm is verifiably correct because we ensure that the randomly chosen value is the majority element before ever returning.

Complexity Analysis

Time complexity : O(∞)
It is technically possible for this algorithm to run indefinitely (if we never manage to randomly select the majority element), so the worst possible runtime is unbounded. However, the expected runtime is far better - linear, in fact. For ease of analysis, convince yourself that because the majority element is guaranteed to occupy more than half of the array, the expected number of iterations will be less than it would be if the element we sought occupied exactly half of the array. Therefore, we can calculate the expected number of iterations for this modified version of the problem and assert that our version is easier.

Because the series converges, the expected number of iterations for the modified problem is constant. Based on an expected-constant number of iterations in which we perform linear work, the expected runtime is linear for the modifed problem. Therefore, the expected runtime for our problem is also linear, as the runtime of the modifed problem serves as an upper bound for it.

Space complexity : O(1)
Much like the brute force solution, the randomized approach runs with constant additional space.

Approach #5 Divide and Conquer

Intuition
If we know the majority element in the left and right halves of an array, we can determine which is the global majority element in linear time.

Algorithm
Here, we apply a classical divide & conquer approach that recurses on the left and right halves of an array until an answer can be trivially achieved for a length-1 array. Note that because actually passing copies of subarrays costs time and space, we instead pass lo and hi indices that describe the relevant slice of the overall array. In this case, the majority element for a length-1 slice is trivially its only element, so the recursion stops there. If the current slice is longer than length-1, we must combine the answers for the slice's left and right halves. If they agree on the majority element, then the majority element for the overall slice is obviously the same1. If they disagree, only one of them can be "right", so we need to count the occurrences of the left and right majority elements to determine which subslice's answer is globally correct. The overall answer for the array is thus the majority element between indices 0 and n.

#include <iostream>
#include <vector>
#include <algorithm>

int countInRange(std::vector<int>& nums, int num, int lo, int hi)
{
    int count = 0;
    for (int i = lo; i < hi; ++i)
    {
        if (nums[i] == num)
        {
            ++count;
        }
    }
    
    return count;
}

int majorityElementRec(std::vector<int>& nums, int lo, int hi)
{
    if (lo == hi - 1)
    {
        return nums[lo];
    }
    
    int mid = lo + (hi - lo) / 2;
    int left = majorityElementRec(nums, lo, mid);
    int right = majorityElementRec(nums, mid, hi);
    
    if (left == right)
    {
        return left;
    }
    
    int leftCount = countInRange(nums, left, lo, hi);
    int rightCount = countInRange(nums, right, lo, hi);
    
    return leftCount > rightCount ? left : right;
}

int majorityElement(std::vector<int>& nums)
{
    return majorityElementRec(nums, 0, (int)nums.size());
}

int main()
{
    int arr[] = { 1, 2, 3, 2, 4, 2, 2, 2, 2, 5, 7};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    int result = majorityElement(nums);
    
    std::cout << result << std::endl;
    
    return 0;
}

Complexity Analysis

Time complexity :O(nlgn)
Each recursive call to majority_element_rec performs two recursive calls on subslices of size n/2 and two linear scans of length nn. Therefore, the time complexity of the divide & conquer approach can be represented by the following recurrence relation:
T(n) = 2T(n/2) + 2n

By the master theorem, the recurrence satisfies case 2, so the complexity can be analyzed as such:

Space complexity : O(lgn)
Although the divide & conquer does not explicitly allocate any additional memory, it uses a non-constant amount of additional memory in stack frames due to recursion. Because the algorithm "cuts" the array in half at each level of recursion, it follows that there can only be O(lgn) "cuts" before the base case of 1 is reached. It follows from this fact that the resulting recursion tree is balanced, and therefore all paths from the root to a leaf are of length O(lgn).

Because the recursion tree is traversed in a depth-first manner, the space complexity is therefore equivalent to the length of the longest path, which is, of course, O(lgn).

Approach #6 Boyer-Moore Voting Algorithm

Intuition
If we had some way of counting instances of the majority element as +1 and instances of any other element as -1, summing them would make it obvious that the majority element is indeed the majority element.

Algorithm
Essentially, what Boyer-Moore does is look for a suffix suf of nums where suf[0] is the majority element in that suffix. To do this, we maintain a count, which is incremented whenever we see an instance of our current candidate for majority element and decremented whenever we see anything else.

Whenever count equals 0, we effectively forget about everything in nums up to the current index and consider the current number as the candidate for majority element. It is not immediately obvious why we can get away with forgetting prefixes of nums - consider the following examples (pipes are inserted to separate runs of nonzero count).
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 7, 7, 7, 7]

Here, the 7 at index 0 is selected to be the first candidate for majority element. count will eventually reach 0 after index 5 is processed, so the 5 at index 6 will be the next candidate. In this case, 7 is the true majority element, so by disregarding this prefix, we are ignoring an equal number of majority and minority elements - therefore, 7 will still be the majority element in the suffix formed by throwing away the first prefix.
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 5, 5, 5, 5]

Now, the majority element is 5 (we changed the last run of the array from 7s to 5s), but our first candidate is still 7. In this case, our candidate is not the true majority element, but we still cannot discard more majority elements than minority elements (this would imply that count could reach -1 before we reassign candidate, which is obviously false).

Therefore, given that it is impossible (in both cases) to discard more majority elements than minority elements, we are safe in discarding the prefix and attempting to recursively solve the majority element problem for the suffix. Eventually, a suffix will be found for which count does not hit 0, and the majority element of that suffix will necessarily be the same as the majority element of the overall array.

#include <iostream>
#include <vector>
#include <algorithm>

int majorityElement(std::vector<int>& nums)
{
    int count = 0;
    int candidate = 0;
    
    for (auto num : nums)
    {
        if (0 == count)
        {
            candidate = num;
        }
        
        count += (candidate == num) ? 1 : -1;
    }
    
    return candidate;
}

int main()
{
    int arr[] = { 1, 2, 3, 2, 4, 2, 2, 2, 2, 5, 7};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    int result = majorityElement(nums);
    
    std::cout << result << std::endl;
    
    return 0;
}

Complexity Analysis

Time complexity : O(n)
Boyer-Moore performs constant work exactly nn times, so the algorithm runs in linear time.
Space complexity : O(1)
Boyer-Moore allocates only constant additional memory.