"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"
- Brian W. Kernighan
In an ideal world, we write perfect Spark codes and everything runs perfectly all the time, right? Just kidding; in practice, we know that working with large-scale datasets is hardly ever that easy, and there are inevitably some data points that will expose any corner cases with your code.
Considering the aforementioned challenges, therefore, in this chapter, we will see how difficult it can be to test an application if it is distributed; then, we will see some ways to tackle this. In a nutshell, the following topics will be cover throughout this chapter:
- Testing in a distributed environment
- Testing Spark application
- Debugging Spark application...