It is easy for a model to learn simple features with a smaller number of hidden layers. However, as the features get complex or non-linearity increases, it requires more and more layers and units.
Having a small network for a complex task would result in a model that performs poorly as it wouldn't have the required learning capacity. Having a slightly larger number of units than the optimal number is not a problem; however, a much larger number will lead to the model overfitting. This means that the model will try to memorize the dataset and perform well on the training dataset, but will fail to perform well on the test data. So, we can play with the number of hidden layers and validate the accuracy of the network.