In a loosely worded manner, machine learning is about mapping inputs (such as images, or movie reviews) to targets (such as the label cat or positive). The model does this by looking at (or training from) several pairs of input and targets.
Deep neural networks do this input-to-target mapping using a long sequence of simple data transformations (layers). This sequence length is referred to as the depth of the network. The entire sequence from input-to-target is referred to as a model that learns about the data. These data transformations are learned by repeated observation of examples. Let's look at how this learning happens.