Loading the data
Data can be downloaded from the University of Groningen website as follows:
# alternate: download the file from the browser and put # in the same directory as this notebook
!wget https://gmb.let.rug.nl/releases/gmb-2.2.0.zip
!unzip gmb-2.2.0.zip
Please note that the data is quite large – over 800MB. If wget
is not available on your system, you may use any other tool such as, curl
or a browser to download the data set. This step may take some time to complete. If you have a challenge accessing the data set from the University server, you may download a copy from Kaggle: https://www.kaggle.com/bradbolliger/gmb-v220. Also note that since we are going to be working on large data sets, some of the following steps may take some time to execute. In the world of Natural Language Processing (NLP), more training data and training time is key to great results.
All the code for this example can be found in the NER with BiLSTM and CRF.ipynb
notebook...