Another way to get the best of neural networks is by using ensemble models. The idea is quite simple: why use one network when you can use many? In other words, why not design different neural networks, each sensitive to specific representations in the input data? Then, we can average out their predictions, getting a more generalizable and parsimonious prediction than using just one network.
We can even attribute weights to each network, by pegging each network's prediction to the test accuracy it achieves on the task. Then, we can take a weighted average of the predictions (weighted with their relative accuracies) from each network to get to a more comprehensive prediction altogether.
Intuitively, we just look at the data with different eyes; each network, by virtue of its design, may pay attention to different factors of variance...