Most of the time, we don't want to have every individual word fragment that we have ever encountered in our vocabulary. We could want this for several reasons, one being the need to correctly distinguish (for example) the phrase U.N. (with characters separated by a period) from UN (without any periods). We can also bring words to their root form in the dictionary. For instance, am, are, and is can be identified by their root form, be. On another front, we can remove inflections from words to bring them down to the same form. Words car, cars, and car's can all be identified as car.
Also, common words that occur very frequently and do not convey much meaning, such as the articles a, an, and the, can be removed. However, all these highly depend on the use cases. Wh- words, such as when, why, where, and who, do not carry much information in most contexts and are removed as part of a technique called stopword removal, which we will see a little later...