The skew problem
Distributed systems just like teams of people working on an activity perform at the most optimum level when the work is evenly distributed among all the members of the team or the cluster. Both suffer, if the work is unevenly distributed and the system performs only as fast as the slowest component.
In the case of Spark, data is distributed across the cluster. You might have come across cases where a map job runs fairly quickly by your joins or shuffles take an excessive time. In most real life cases you would have popular keys or null values in your data, which would result in some tasks getting more work than others, thus resulting in a system skew. In the database world, original keys would actually be used to create new keys with random values such that the resultant keys would be fairly unique and thus allow the system to distribute the data more evenly across the system. Of course, you would need to do a multiple stage aggregation, but this would in most cases be faster...