Structuring the lexer
As we know from the previous chapter, we need a Token
class and a Lexer
class. Additionally, a TokenKind
enumeration is required to give each token class a unique number. Having an all-in-one header and implementation file does not scale, so let’s move the items. TokenKind
can be used universally and is placed in the Basic
component. The Token
and Lexer
classes belong to the Lexer
component but are placed in different headers and implementation files.
There are three different classes of tokens: keywords, punctuators, and tokens, which represent sets of many values. Examples are the CONST
keyword, the;
delimiter, and the ident
token, respectively, each of which represents identifiers in the source. Each token needs a member name for the enumeration. Keywords and punctuators have natural display names that can be used for messages.
Like in many programming languages, the keywords are a subset of the identifiers. To classify a token as a keyword, we...