The preceding parameter norm penalties work by penalizing the model parameters when they deviate from 0 (a fixed value). But sometimes, we may want to express prior knowledge about which parameters would be better suited to the model. Although we may not know what those parameters are, thanks to domain knowledge and the architecture of the model, we know that there are likely to be some dependencies between the parameters of the model.
These dependencies could be some specific parameters that are closer to some than to others. Let's suppose we have two different models for a classification task and detect the same number of classes. Their input distributions, however, are not the same. Let's name the first model A with θ(A) parameters and the second model B with θ(B) parameters. Both of these models map their respective inputs...