Chapter 4: Divide and Conquer
Activity 8: Vaccinations
In this activity, we will store and lookup the vaccination status of students to determine if they need to be vaccinated. These steps should help you complete the activity:
- Begin by including the following headers:
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <algorithm>
#include <numeric>
- Define the Student class as follows:
class Student
{
private:
std::pair<int, int> name;
bool vaccinated;
public:
// Constructor
Student(std::pair<int, int> n, bool v) :
name(n), vaccinated(v)
{}
// Getters
auto get_name() { return name; }
auto is_vaccinated() { return vaccinated; }
// Two people are same if they have the same name
bool operator ==(const Student& p) const
{
return this->name == p.name;
}
// The ordering of a set of people is defined by their name
bool operator< (const Student& p) const
{
return this->name < p.name;
}
bool operator> (const Student& p) const
{
return this->name > p.name;
}
};
- The following function lets us generate a student from random data:
auto generate_random_Student(int max)
{
std::random_device rd;
std::mt19937 rand(rd());
// the IDs of Student should be in range [1, max]
std::uniform_int_distribution<std::mt19937::result_type> uniform_dist(1, max);
// Generate random credentials
auto random_name = std::make_pair(uniform_dist(rand), uniform_dist(rand));
bool is_vaccinated = uniform_dist(rand) % 2 ? true : false;
return Student(random_name, is_vaccinated);
}
- The following code is used to run and test the output of our implementation:
void search_test(int size, Student p)
{
std::vector<Student> people;
// Create a list of random people
for (auto i = 0; i < size; i++)
people.push_back(generate_random_Student(size));
std::sort(people.begin(), people.end());
// To measure the time taken, start the clock
std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
bool search_result = needs_vaccination(p, people);
// Stop the clock
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
std::cout << "Time taken to search = " <<
std::chrono::duration_cast<std::chrono::microseconds>
(end - begin).count() << " microseconds" << std::endl;
if (search_result)
std::cout << "Student (" << p.get_name().first
<< " " << p.get_name().second << ") "
<< "needs vaccination." << std::endl;
else
std::cout << "Student (" << p.get_name().first
<< " " << p.get_name().second << ") "
<< "does not need vaccination." << std::endl;
}
- The following function implements our logic for whether a vaccination is needed:
bool needs_vaccination(Student P, std::vector<Student>& people)
{
auto first = people.begin();
auto last = people.end();
while (true)
{
auto range_length = std::distance(first, last);
auto mid_element_index = std::floor(range_length / 2);
auto mid_element = *(first + mid_element_index);
// Return true if the Student is found in the sequence and
// he/she's not vaccinated
if (mid_element == P && mid_element.is_vaccinated() == false)
return true;
else if (mid_element == P && mid_element.is_vaccinated() == true)
return false;
else if (mid_element > P)
std::advance(last, -mid_element_index);
if (mid_element < P)
std::advance(first, mid_element_index);
// Student not found in the sequence and therefore should be vaccinated
if (range_length == 1)
return true;
}
}
- Finally, the driver code is implemented as follows:
int main()
{
// Generate a Student to search
auto p = generate_random_Student(1000);
search_test(1000, p);
search_test(10000, p);
search_test(100000, p);
return 0;
}
Note
Since we are randomizing values in step 3, your output may vary from the expected output shown for this activity.
Activity 9: Partial Sorting
The partial quicksort is only a slight modification of the original quicksort algorithm that was demonstrated in Exercise 20, Quicksort. Compared to that exercise, only step 4 is different. The following is a reference implementation:
- Add the following header files:
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <algorithm>
- Next, we shall implement the partition operation, as follows:
template <typename T>
auto partition(typename std::vector<T>::iterator begin,
typename std::vector<T>::iterator end)
{
auto pivot_val = *begin;
auto left_iter = begin + 1;
auto right_iter = end;
while (true)
{
// Starting from the first element of vector,
// find an element that is greater than pivot.
while (*left_iter <= pivot_val && std::distance(left_iter, right_iter) > 0)
left_iter++;
// Starting from the end of vector moving to the beginning,
// find an element that is lesser than the pivot.
while (*right_iter > pivot_val && std::distance(left_iter, right_iter) > 0)
right_iter--;
// If left and right iterators meet, there are no elements left to swap.
// Else, swap the elements pointed to by the left and right iterators
if (left_iter == right_iter)
break;
else
std::iter_swap(left_iter, right_iter);
}
if (pivot_val > *right_iter)
std::iter_swap(begin, right_iter);
return right_iter;
}
- Since the desired output also needs an implementation of the quicksort algorithm, we'll implement one as follows:
template <typename T>
void quick_sort(typename std::vector<T>::iterator begin,
typename std::vector<T>::iterator last)
{
// If there are more than 1 elements in the vector
if (std::distance(begin, last) >= 1)
{
// Apply the partition operation
auto partition_iter = partition<T>(begin, last);
// Recursively sort the vectors created by the partition operation
quick_sort<T>(begin, partition_iter-1);
quick_sort<T>(partition_iter, last);
}
}
- Implement the partial quicksort function as follows:
template <typename T>
void partial_quick_sort(typename std::vector<T>::iterator begin,
typename std::vector<T>::iterator last,
size_t k)
{
// If there are more than 1 elements in the vector
if (std::distance(begin, last) >= 1)
{
// Apply the partition operation
auto partition_iter = partition<T>(begin, last);
// Recursively sort the vectors created by the partition operation
partial_quick_sort<T>(begin, partition_iter-1, k);
// Sort the right subvector only if the final position of pivot < k
if(std::distance(begin, partition_iter) < k)
partial_quick_sort<T>(partition_iter, last, k);
}
}
- The following helper functions can be then used to print the contents of a vector and to generate a random vector:
template <typename T>
void print_vector(std::vector<T> arr)
{
for (auto i : arr)
std::cout << i << " ";
std::cout << std::endl;
}
// Generates random vector of a given size with integers [1, size]
template <typename T>
auto generate_random_vector(T size)
{
std::vector<T> V;
V.reserve(size);
std::random_device rd;
std::mt19937 rand(rd());
// the IDs of Student should be in range [1, max]
std::uniform_int_distribution<std::mt19937::result_type> uniform_dist(1, size);
for (T i = 0; i < size; i++)
V.push_back(uniform_dist(rand));
return std::move(V);
}
- The following function implements the testing logic for our sorting functions:
// Sort the first K elements of a random vector of a given 'size'
template <typename T>
void test_partial_quicksort(size_t size, size_t k)
{
// Create two copies of a random vector to use for the two algorithms
auto random_vec = generate_random_vector<T>(size);
auto random_vec_copy(random_vec);
std::cout << "Original vector: "<<std::endl;
print_vector<T>(random_vec);
// Measure the time taken by partial quick sort
std::chrono::steady_clock::time_point
begin_qsort = std::chrono::steady_clock::now();
partial_quick_sort<T>(random_vec.begin(), random_vec.end()-1, k);
std::chrono::steady_clock::time_point
end_qsort = std::chrono::steady_clock::now();
std::cout << std::endl << "Time taken by partial quick sort = "
<< 'std::chrono::duration_cast<std::chrono::microseconds>
(end_qsort - begin_qsort).count()
<< " microseconds" << std::endl;
std::cout << "Partially sorted vector (only first "<< k <<" elements):";
print_vector<T>(random_vec);
// Measure the time taken by partial quick sort
begin_qsort = std::chrono::steady_clock::now();
quick_sort<T>(random_vec_copy.begin(), random_vec_copy.end()-1);
end_qsort = std::chrono::steady_clock::now();
std::cout << std::endl <<"Time taken by full quick sort = "
<< std::chrono::duration_cast<std::chrono::microseconds>
(end_qsort - begin_qsort).count()
<< " microseconds" << std::endl;
std::cout << "Fully sorted vector: ";
print_vector<T>(random_vec_copy);
}
- Finally, add the driver code, as follows:
int main()
{
test_partial_quicksort<unsigned>(100, 10);
return 0;
}
Activity 10: Implementing WordCount in MapReduce
In this activity, we will implement the MapReduce model to solve the WordCount problem. The following is the solution to this activity:
- Implement the map task as follows:
struct map_task : public mapreduce::map_task<
std::string, // MapKey (filename)
std::pair<char const*, std::uintmax_t>> // MapValue (memory mapped file contents)
{
template<typename Runtime>
void operator()(Runtime& runtime, key_type const& key, value_type& value) const
{
bool in_word = false;
char const* ptr = value.first;
char const* end = ptr + value.second;
char const* word = ptr;
// Iterate over the contents of the file, extract words and emit a <word,1> pair.
for (; ptr != end; ++ptr)
{
// Convert the character to upper case.
char const ch = std::toupper(*ptr, std::locale::classic());
if (in_word)
{
if ((ch < 'A' || ch > 'Z') && ch != '\'')
{
runtime.emit_intermediate(std::pair<char const*,
std::uintmax_t> (word, ptr - word), 1);
in_word = false;
}
}
else if (ch >= 'A' && ch <= 'Z')
{
word = ptr;
in_word = true;
}
}
// Handle the last word.
if (in_word)
{
assert(ptr > word);
runtime.emit_intermediate(std::pair<char const*,
std::uintmax_t>(word, ptr - word), 1);
}
}
};
The preceding map function is applied separately to each file in the input directory. The contents of the input file are accepted as the * character in value. The inner loop then iterates over the contents of the file, extracting different words and emitting < key, value > pairs, where key is a word and value is set to 1.
- Implement the reduce task as follows:
template<typename KeyType>
struct reduce_task : public mapreduce::reduce_task<KeyType, unsigned>
{
using typename mapreduce::reduce_task<KeyType, unsigned>::key_type;
template<typename Runtime, typename It>
void operator()(Runtime& runtime, key_type const& key, It it, It const ite) const
{
runtime.emit(key, std::accumulate(it, ite, 0));
}
};
The reduce operation can then be applied to all < key, value > pairs that are emitted by the map function. Since the value was set to 1 in the previous step, we can now use std::accumulate() to get the total number of times a key appears among the input pairs of the reduce operation.