Chapter 4: Divide and Conquer
Activity 8: Vaccinations
In this activity, we will store and lookup the vaccination status of students to determine if they need to be vaccinated. These steps should help you complete the activity:
- Begin by including the following headers:
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <algorithm>
#include <numeric>
- Define the Student class as follows:
class Student
{
private:
    std::pair<int, int> name;
    bool vaccinated;
public:
    // Constructor
    Student(std::pair<int, int> n, bool v) :
        name(n), vaccinated(v)
    {}
    // Getters
    auto get_name() { return name; }
    auto is_vaccinated() { return vaccinated; }
    // Two people are same if they have the same name
    bool operator ==(const Student& p) const
    {
        return this->name == p.name;
    }
    // The ordering of a set of people is defined by their name
    bool operator< (const Student& p) const
    {
        return this->name < p.name;
    }
    bool operator> (const Student& p) const
    {
        return this->name > p.name;
    }
};
- The following function lets us generate a student from random data:
auto generate_random_Student(int max)
{
    std::random_device rd;
    std::mt19937 rand(rd());
    // the IDs of Student should be in range [1, max]
    std::uniform_int_distribution<std::mt19937::result_type> uniform_dist(1, max);
    // Generate random credentials
    auto random_name = std::make_pair(uniform_dist(rand), uniform_dist(rand));
    bool is_vaccinated = uniform_dist(rand) % 2 ? true : false;
    return Student(random_name, is_vaccinated);
}
- The following code is used to run and test the output of our implementation:
void search_test(int size, Student p)
{
    std::vector<Student> people;
    // Create a list of random people
    for (auto i = 0; i < size; i++)
        people.push_back(generate_random_Student(size));
    std::sort(people.begin(), people.end());
    // To measure the time taken, start the clock
    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
    bool search_result = needs_vaccination(p, people);
    // Stop the clock
    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
    std::cout << "Time taken to search = " <<
        std::chrono::duration_cast<std::chrono::microseconds>
        (end - begin).count() << " microseconds" << std::endl;
    if (search_result)
        std::cout << "Student (" << p.get_name().first
<< " " << p.get_name().second << ") "
            << "needs vaccination." << std::endl;
    else
        std::cout << "Student (" << p.get_name().first
<< " " << p.get_name().second << ") "
            << "does not need vaccination." << std::endl;
}
- The following function implements our logic for whether a vaccination is needed:
bool needs_vaccination(Student P, std::vector<Student>& people)
{
    auto first = people.begin();
    auto last = people.end();
    while (true)
    {
        auto range_length = std::distance(first, last);
        auto mid_element_index = std::floor(range_length / 2);
        auto mid_element = *(first + mid_element_index);
        // Return true if the Student is found in the sequence and
// he/she's not vaccinated
        if (mid_element == P && mid_element.is_vaccinated() == false)
            return true;
        else if (mid_element == P && mid_element.is_vaccinated() == true)
            return false;
        else if (mid_element > P)
            std::advance(last, -mid_element_index);
        if (mid_element < P)
            std::advance(first, mid_element_index);
        // Student not found in the sequence and therefore should be vaccinated
        if (range_length == 1)
            return true;
    }
}
- Finally, the driver code is implemented as follows:
int main()
{
    // Generate a Student to search
    auto p = generate_random_Student(1000);
    search_test(1000, p);
    search_test(10000, p);
    search_test(100000, p);
    return 0;
}
Note
Since we are randomizing values in step 3, your output may vary from the expected output shown for this activity.
Activity 9: Partial Sorting
The partial quicksort is only a slight modification of the original quicksort algorithm that was demonstrated in Exercise 20, Quicksort. Compared to that exercise, only step 4 is different. The following is a reference implementation:
- Add the following header files:
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <algorithm>
- Next, we shall implement the partition operation, as follows:
template <typename T>
auto partition(typename std::vector<T>::iterator begin,
    typename std::vector<T>::iterator end)
{
    auto pivot_val = *begin;
    auto left_iter = begin + 1;
    auto right_iter = end;
    while (true)
    {
        // Starting from the first element of vector,
        // find an element that is greater than pivot.
        while (*left_iter <= pivot_val && std::distance(left_iter, right_iter) > 0)
            left_iter++;
        // Starting from the end of vector moving to the beginning,
        // find an element that is lesser than the pivot.
        while (*right_iter > pivot_val && std::distance(left_iter, right_iter) > 0)
            right_iter--;
        // If left and right iterators meet, there are no elements left to swap.
        // Else, swap the elements pointed to by the left and right iterators
        if (left_iter == right_iter)
            break;
        else
            std::iter_swap(left_iter, right_iter);
    }
    if (pivot_val > *right_iter)
        std::iter_swap(begin, right_iter);
    return right_iter;
}
- Since the desired output also needs an implementation of the quicksort algorithm, we'll implement one as follows:
template <typename T>
void quick_sort(typename std::vector<T>::iterator begin,
    typename std::vector<T>::iterator last)
{
    // If there are more than 1 elements in the vector
    if (std::distance(begin, last) >= 1)
    {
        // Apply the partition operation
        auto partition_iter = partition<T>(begin, last);
        // Recursively sort the vectors created by the partition operation
        quick_sort<T>(begin, partition_iter-1);
        quick_sort<T>(partition_iter, last);
    }
}
- Implement the partial quicksort function as follows:
template <typename T>
void partial_quick_sort(typename std::vector<T>::iterator begin,
    typename std::vector<T>::iterator last,
    size_t k)
{
    // If there are more than 1 elements in the vector
    if (std::distance(begin, last) >= 1)
    {
        // Apply the partition operation
        auto partition_iter = partition<T>(begin, last);
        // Recursively sort the vectors created by the partition operation
        partial_quick_sort<T>(begin, partition_iter-1, k);
       Â
        // Sort the right subvector only if the final position of pivot < k
        if(std::distance(begin, partition_iter) < k)
            partial_quick_sort<T>(partition_iter, last, k);
    }
}
- The following helper functions can be then used to print the contents of a vector and to generate a random vector:
template <typename T>
void print_vector(std::vector<T> arr)
{
    for (auto i : arr)
        std::cout << i << " ";
    std::cout << std::endl;
}
// Generates random vector of a given size with integers [1, size]
template <typename T>
auto generate_random_vector(T size)
{
    std::vector<T> V;
    V.reserve(size);
    std::random_device rd;
    std::mt19937 rand(rd());
    // the IDs of Student should be in range [1, max]
    std::uniform_int_distribution<std::mt19937::result_type> uniform_dist(1, size);
    for (T i = 0; i < size; i++)
        V.push_back(uniform_dist(rand));
    return std::move(V);
}
- The following function implements the testing logic for our sorting functions:
// Sort the first K elements of a random vector of a given 'size'
template <typename T>
void test_partial_quicksort(size_t size, size_t k)
{
        // Create two copies of a random vector to use for the two algorithms
        auto random_vec = generate_random_vector<T>(size);
        auto random_vec_copy(random_vec);
        std::cout << "Original vector: "<<std::endl;
        print_vector<T>(random_vec);
           Â
        // Measure the time taken by partial quick sort
        std::chrono::steady_clock::time_point
begin_qsort = std::chrono::steady_clock::now();
        partial_quick_sort<T>(random_vec.begin(), random_vec.end()-1, k);
        std::chrono::steady_clock::time_point
end_qsort = std::chrono::steady_clock::now();
   Â
        std::cout << std::endl << "Time taken by partial quick sort = "
            << 'std::chrono::duration_cast<std::chrono::microseconds>
            (end_qsort - begin_qsort).count()
            << " microseconds" << std::endl;
   Â
        std::cout << "Partially sorted vector (only first "<< k <<" elements):";
        print_vector<T>(random_vec);
        // Measure the time taken by partial quick sort
        begin_qsort = std::chrono::steady_clock::now();
        quick_sort<T>(random_vec_copy.begin(), random_vec_copy.end()-1);
        end_qsort = std::chrono::steady_clock::now();
        std::cout << std::endl <<"Time taken by full quick sort = "
            << std::chrono::duration_cast<std::chrono::microseconds>
            (end_qsort - begin_qsort).count()
            << " microseconds" << std::endl;
   Â
        std::cout << "Fully sorted vector: ";
        print_vector<T>(random_vec_copy);
}
- Finally, add the driver code, as follows:
int main()
{
    test_partial_quicksort<unsigned>(100, 10);
    return 0;
}
Activity 10: Implementing WordCount in MapReduce
In this activity, we will implement the MapReduce model to solve the WordCount problem. The following is the solution to this activity:
- Implement the map task as follows:
struct map_task : public mapreduce::map_task<
    std::string,                             // MapKey (filename)
    std::pair<char const*, std::uintmax_t>>  // MapValue (memory mapped file contents)
{
    template<typename Runtime>
    void operator()(Runtime& runtime, key_type const& key, value_type& value) const
    {
        bool in_word = false;
        char const* ptr = value.first;
        char const* end = ptr + value.second;
        char const* word = ptr;
        // Iterate over the contents of the file, extract words and emit a <word,1> pair.
        for (; ptr != end; ++ptr)
        {
            // Convert the character to upper case.
            char const ch = std::toupper(*ptr, std::locale::classic());
            if (in_word)
            {
                if ((ch < 'A' || ch > 'Z') && ch != '\'')
                {
runtime.emit_intermediate(std::pair<char const*,
              std::uintmax_t> (word, ptr - word), 1);
                    in_word = false;
                }
            }
            else if (ch >= 'A' && ch <= 'Z')
            {
                word = ptr;
                in_word = true;
            }
        }
        // Handle the last word.
        if (in_word)
        {
            assert(ptr > word);
            runtime.emit_intermediate(std::pair<char const*,
                          std::uintmax_t>(word, ptr - word), 1);
        }
    }
};
The preceding map function is applied separately to each file in the input directory. The contents of the input file are accepted as the * character in value. The inner loop then iterates over the contents of the file, extracting different words and emitting < key, value > pairs, where key is a word and value is set to 1.
- Implement the reduce task as follows:
template<typename KeyType>
struct reduce_task : public mapreduce::reduce_task<KeyType, unsigned>
{
    using typename mapreduce::reduce_task<KeyType, unsigned>::key_type;
    template<typename Runtime, typename It>
    void operator()(Runtime& runtime, key_type const& key, It it, It const ite) const
    {
        runtime.emit(key, std::accumulate(it, ite, 0));   Â
}
};
The reduce operation can then be applied to all < key, value > pairs that are emitted by the map function. Since the value was set to 1 in the previous step, we can now use std::accumulate() to get the total number of times a key appears among the input pairs of the reduce operation.