Organizing a bytecode language implementation
To a large extent, the organization of this book follows the classic organization of a bytecode compiler and its corresponding virtual machine. These components are defined here, followed by a diagram to summarize them:
- A lexical analyzer reads in source code characters and figures out how they are grouped into a sequence of words or tokens.
- A syntax analyzer reads in a sequence of tokens and determines whether that sequence is legal, according to the grammar of the language. If the tokens are in a legal order, it produces a syntax tree.
- A semantic analyzer checks to ensure that all the names being used are legal for the operations in which they are being used. It checks their types to determine exactly what operations are being performed. All this checking makes the syntax tree heavy, laden with extra information about where variables are declared and what their types are.
- An intermediate code generator figures out memory locations for all the variables and all the places where a program may abruptly change execution flow, such as loops and function calls. It adds them to the syntax tree and then walks this even fatter tree, before building a list of machine-independent intermediate code instructions.
- A final code generator turns the list of intermediate code instructions into the actual bytecode, in a file format that will be efficient to load and execute.
In addition to the steps of this bytecode virtual machine compiler, a bytecode interpreter is written to load and execute programs. It is a giant loop with a switch statement in it. For very high-level programming languages, the compiler might be no big deal, and all the magic may be in the bytecode interpreter. The whole organization can be summarized by the following diagram:
Figure 1.1: Phases and dataflow in a simple programming language
It will take a lot of code to illustrate how to build a bytecode machine implementation of a programming language. How that code is presented is important and will tell you what you need to know going in, as well as what you may learn from going through this book.