Deep learning provides a good opportunity to leverage large amounts of data, to extract the best possible features for NER. In general, the deep learning approaches of NER use the recurrent neural network (RNN), as the problem is posed as a sequence labeling task. RNNs do not only have the capability to process variable length inputs; variants of such neural networks, called Long Short-Term Memory (LSTM), possess long-term memory, which is useful for understanding non-trivial dependencies in the words of a given sentence. Variations of LSTM, called bidirectional LSTM, have the ability to understand not only long-term dependencies, but also the relationships of words in a sentence, from both sides of a sentence.
In this chapter, we will build an NER system using deep learning with LSTM. However, before we try to understand how to build such a system, we will...