An artificial neural network is loosely inspired by the way the human brain functions. Technically, it is an improvement over linear and logistic regression as neural networks introduce multiple non-linear measures in estimating the output. Additionally, neural networks provide a great flexibility in modifying the network architecture to solve the problems across multiple domains leveraging structured and unstructured data.
The more complex the function, the greater the chance that the network has to tune to the data that is given as input, hence the better the accuracy of the predictions.
The typical structure of a feed-forward neural network is as follows:
A layer is a collection of one or more nodes (computation units), where each node in a layer is connected to every other node in the next immediate layer. The input level/layer is constituted of the input variables that are required to predict the output values.
The number of nodes in the output layer depends on whether we are trying to predict a continuous variable or a categorical variable. If the output is a continuous variable, the output has one unit.
If the output is categorical with n possible classes, there will be n nodes in the output layer. The hidden level/layer is used to transform the input layer values into values in a higher-dimensional space, so that we can learn more features from the input. The hidden layer transforms the output as follows:
In the preceding diagram, x1,x2, ..., xn are the independent variables, and x0 is the bias term (similar to the way we have bias in linear/logistic regression).
Note that w1,w2, ..., wn are the weights given to each of the input variables. If a is one of the units in the hidden layer, it will be equal to the following:
The f function is the activation function that is used to apply non-linearity on top of the sum-product of the input and their corresponding weight values. Additionally, higher non-linearity can be achieved by having more than one hidden layer.
In sum, a neural network is a collection of weights assigned to nodes with layers connecting them. The collection is organized into three main parts: the input layer, the hidden layer, and the output layer. Note that you can have n hidden layers, with the term deep learning implying multiple hidden layers. Hidden layers are necessary when the neural network has to make sense of something really complicated, contextual, or not obvious, such as image recognition. The intermediate layers (layers that are not input or output) are known as hidden, since they are practically not visible (there's more on how to visualize the intermediate layers in Chapter 4, Building a Deep Convolutional Neural Network).