In the information retrieval world, an inverted index is a common data structure used to speed up the searches of text in a collection of documents. It stores all the words of the document collection and a list of the documents that contain that word.
To construct the index, we have to parse all the documents of the collection and construct the index in an incremental way. For every document, we extract the significant words of that document (deleting the most common words, also called stop words, and maybe applying a stemming algorithm) and then add those words to the index. If a word exists in the index, we add the document to the list of documents associated with that word. If a word doesn't exist, add the word to the list of words of the index and associate the document to that word. You can...