Structuring the lexer
As we know from the previous chapter, we need a Token
class and a Lexer
class. Additionally, a TokenKind
enumeration is required to give each token class a unique number. Having an all-in-one header and an implementation file does not scale, so let's restructure things. The TokenKind
enumeration can be used universally and is placed in the Basic
component. The Token
and Lexer
classes belong to the Lexer
component but are placed in different header and implementation files.
There are three different classes of tokens: keywords, punctuators, and the tokens representing sets of many values. Examples include the CONST
keyword, the ;
delimiter, and the ident
token, which represent the identifiers in the source. Each token needs a member name for the enumeration. Keywords and punctuators have natural display names that can be used for messages.
Like in many programming languages, the keywords are a subset of the identifiers. To classify a token as a keyword...