Named Entity Recognition with RNNs
Now let’s look at our first task: using an RNN to identify named entities in a text corpus. This task is known as Named Entity Recognition (NER). We will be using a modified version of the well-known CoNLL 2003 (which stands for Conference on Computational Natural Language Learning - 2003) dataset for NER.
CoNLL 2003 is available for multiple languages, and the English data was generated from a Reuters Corpus that contains news stories published between August 1996 and August 1997. The database we’ll be using is found at https://github.com/ZihanWangKi/CrossWeigh and is called CoNLLPP. It is a more closely curated version than the original CoNLL, which contains errors in the dataset induced by incorrectly understanding the context of a word. For example, in the phrase “Chicago won …” Chicago was identified as a location, whereas it is in fact an organization. This exercise is available in ch06_rnns_for_named_entity_recognition...