Lexical analysis
As already seen in the example in the previous section, a programming language consists of many elements such as keywords, identifiers, numbers, operators, and so on. The task of the lexical analyzer is to take the textual input and create a sequence of tokens from it. The calc language consists of the tokens with
, :
, +
, -
, *
, /
, (
, )
, and regular expressions ([a-zA-Z])+
(an identifier) and ([0-9])+
(a number). We assign a unique number to each token to make the handling of tokens easier.
A hand-written lexer
The implementation of a lexical analyzer is often called Lexer
. Let’s create a header file called Lexer.h
and get started with the definition of Token
. It begins with the usual header guard and the inclusion of the required headers:
#ifndef LEXER_H #define LEXER_H #include "llvm/ADT/StringRef.h" #include "llvm/Support/MemoryBuffer.h"
The llvm::MemoryBuffer
class provides read-only access to a block of memory, filled with the...