As mentioned previously, the dataset used in this project is from a popular Packt book that goes by the name of Mastering PostgreSQL 10, and was written by Hans-Jürgen Schönig (https://www.cybertec-postgresql.com). We considered text from the first 100 pages of the book, excluding any figures, tables, and SQL code. The cleaned dataset is stored, alongside the code, in a text file. The dataset contains almost 44,000 words, which is just enough to train the model. The following are a few lines from the script:
"PostgreSQL Overview
PostgreSQL is one of the world's most advanced open source database systems, and it has many features that are widely used by developers and system administrators alike. Starting with PostgreSQL 10, many new features have been added to PostgreSQL, which contribute greatly to the success...