You're reading from Supercharge Your Applications with GraalVM Hands-on examples to optimize and extend your code using GraalVM's high performance and polyglot capabilities

Product type Paperback

Published in Aug 2021

Publisher Packt

ISBN-13 9781800564909

Length 360 pages

Edition 1st Edition

Languages

Java

Tools

LLVM

Concepts

High Performance Programming

Author (1):

A B Vijay Kumar

View More author details

Table of Contents (17) Chapters

Preface

1. Section 1: The Evolution of JVM

2. Chapter 1: Evolution of Java Virtual Machine FREE CHAPTER

3. Chapter 2: JIT, HotSpot, and GraalJIT

4. Section 2: Getting Up and Running with GraalVM – Architecture and Implementation

5. Chapter 3: GraalVM Architecture

6. Chapter 4: Graal Just-In-Time Compiler

7. Chapter 5: Graal Ahead-of-Time Compiler and Native Image

8. Section 3: Polyglot with Graal

9. Chapter 6: Truffle for Multi-language (Polyglot) support

10. Chapter 7: GraalVM Polyglot – JavaScript and Node.js

11. Chapter 8: GraalVM Polyglot – Java on Truffle, Python, and R

12. Chapter 9: GraalVM Polyglot – LLVM, Ruby, and WASM

13. Section 4: Microservices with Graal

14. Chapter 10: Microservices Architecture with GraalVM

15. Assessments

16. Other Books You May Enjoy

Understanding the JVM architecture

Over the years, JVM has evolved into the most mature VM runtime. It has a very structured and sophisticated implementation of a runtime. This is one of the reasons why GraalVM is built to utilize all the best features of the JVM and provide further optimizations required for the cloud-native world. To better appreciate the GraalVM architecture and optimizations that it brings on top of the JVM, it's important to understand the JVM architecture.

This section walks you through the JVM architecture in detail. The following diagram shows the high-level architecture of various subsystems in JVM:

Figure 1.2 – High-level architecture of JVM

The rest of this section will walk you through each of these subsystems in detail.

Class loader subsystem

The class loader subsystem is responsible for allocating all the relevant .class files and loading these classes to the memory. The class loader subsystem is also responsible for linking and verifying the schematics of the .class file before the classes are initialized and loaded to memory. The class loader subsystem has the following three key functionalities:

Loading
Linking
Initializing

The following diagram shows the various components of the class loader subsystem:

Figure 1.3 – Components of the class loader subsystem

Let's now look at what each of these components does.

Loading

In traditional compiler-based languages such as C/C++, the source code is compiled to object code, and then all the dependent object code is linked by a linker before the final executable is built. All this is part of the build process. Once the final executable is built, it is then loaded into the memory by the loader. Java works differently.

Java source code (.java) is compiled by Java Compiler (javac) to bytecode (.class) files. Class loader is one of the key subsystems of the JVM, which is responsible for loading all the dependent classes that are required to run the application. This includes the classes that are written by the application developer, the libraries, and the Java Software Development Kit (SDK) classes.

There are three types of class loaders as part of this system:

Bootstrap: Bootstrap is the first classloader that loads rt.jar, which contains all the Java Standard Edition JDK classes, such as java.lang, java.net, java.util, and java.io. Bootstrap is responsible for loading all the classes that are required to run any Java application. This is a core part of the JVM and is implemented in the native language.
Extensions: Extension class loaders load all the extensions to the JDK found in the jre/lib/ext directory. Extension class loader classes are typically extension classes of the bootstrap implemented in Java. The extension class loader is implemented in Java (sun.misc.Launcher$ExtClassLoader.class).
Application: The application class loader (also referred to as a system class loader) is a child class of the extension class loader. The application class loader is responsible for loading the application classes in the application class path (CLASSPATH env variable). This is also implemented in Java (sun.misc.Launcher$AppClassLoader.class).

Bootstrap, extension, and application class loaders are responsible for loading all the classes that are required to run the application. In the event where the class loaders do not find the required classes, ClassNotFoundException is thrown.

Class loaders implement the delegation hierarchy algorithm. The following diagram shows how the class loader implements the delegation hierarchy algorithm to load all the required classes:

Figure 1.4 – Class loader delegation hierarchy algorithm implementation flowchart

Let's understand how this algorithm works:

JVM looks for the class in the method area (this will be discussed in detail later in this section). If it does not find the class, it will ask the application class loader to load the class into memory.
The application class loader delegates the call to the extension class loader, which in turn delegates to the bootstrap class loader.
The bootstrap class loader looks for the class in the bootstrap CLASSPATH. If it finds the class, it will load to the memory. If it does not find the class, control is delegated to the extension class loader.
The extension class loader will try to find the class in the extension CLASSPATH. If it finds the class, it will load to the memory. If it does not find the class, control is delegated to the application class loader.
The application class loader will try to look for the class in CLASSPATH. If it does not find it, it will raise ClassNotFoundException, otherwise, the class is loaded into the method area, and the JVM will start using it.

Linking

Once the classes are loaded into the memory (into the method area, discussed further in the Memory subsystem section), the class loader subsystem will perform linking. The linking process consists of the following steps:

Verification: The loaded classes are verified for their adherence to the semantics of the language. The binary representation of the class that is loaded is parsed into the internal data structure, to ensure that the method runs properly. This might require the class loader to load recursively the hierarchy of inherited classes all the way to java.lang.Object. The verification phase validates and ensures that the methods run without any issues.
Preparation: Once all the classes are loaded and verified, JVM allocates memory for class variables (static variables). This also includes calling static initializations (static blocks).
Resolution: JVM then resolves by locating the classes, interfaces, fields, and methods referenced in the symbol table. The JVM might resolve the symbol during initial verification (static resolution) or may resolve when the class is being verified (lazy resolution).

The class loader subsystem raises various exceptions, including the following:

ClassNotFoundException
NoClassDefFoundError
ClassCastException
UnsatisfiedLinkError
ClassCircularityError
ClassFormatError
ExceptionInInitializerError

You can refer to the Java specifications for more details: https://docs.oracle.com/en/java/javase.

Initializing

Once all the classes are loaded and symbols are resolved, the initialization phase starts. During this phase, the classes are initialized (new). This includes initializing the static variables, executing static blocks, and invocating reflective methods (java.lang.reflect). This might also result in loading those classes.

Class loaders load all the classes into the memory before the application can run. Most of the time, the class loader has to load the full hierarchy of classes and dependent classes (though there is lazy resolution) to validate the schematics. This is time-consuming and also takes up a lot of memory footprint. It's even slower if the application uses reflection and the reflected classes need to be loaded.

After learning about the class loader subsystem, let's now understand how the memory subsystem works.

Memory subsystem

The memory subsystem is one of the most critical subsystems of the JVM. The memory subsystem, as the name suggests, is responsible for managing the allocated memory of method variables, heaps, stacks, and registers. The following diagram shows the architecture of the memory subsystem:

Figure 1.5 – Memory subsystem architecture

The memory subsystem has two areas: JVM level and thread level. Let's discuss each in detail.

JVM level

JVM-level memory, as the name suggests, is where the objects are stored at the JVM level. This is not thread-safe, as multiple threads might be accessing these objects. This explains why programmers are recommended to code thread-safe (synchronization) when they update the objects in this area. There are two areas of JVM-level memory:

Method: The method area is where all the class-level data is stored. This includes the class names, hierarchy, methods, variables, and static variables.
Heap: The heap is where all the objects and the instance variables are stored.

Thread level

Thread-level memory is where all the thread-local objects are stored. This is accessible/visible to the respective threads, hence it is thread-safe. There are three areas of the thread-level memory:

Stack: For each method call, a stack frame is created, which stores all the method-level data. The stack frame consists of all the variables/objects that are created within the method scope, operand stack (used to perform intermediate operations), the frame data (which stores all the symbols corresponding to the method), and exception catch block information.
Registers: PC registers keep track of the instruction execution and point to the current instruction that is being executed. This is maintained for each thread that is executing.
Native Method Stack: The native method stack is a special type of stack that stores the native method information, which is useful when calling and executing the native methods.

Now that the classes are loaded into the memory, let's look at how the JVM execution engine works.

JVM execution engine subsystem

The JVM execution engine is the core of the JVM, where all the execution happens. This is where the bytecodes are interpreted and executed. The JVM execution engine uses the memory subsystem to store and retrieve the objects. There are three key components of the JVM execution engine, as shown:

Figure 1.6 – JVM execution engine architecture

We will talk about each component in detail in the following sections.

Bytecode interpreter

As mentioned earlier in this chapter, bytecode (.class) is the input to the JVM. The JVM bytecode interpreter picks each instruction from the .class file and converts it to machine code and executes it. The obvious disadvantage of interpreters is that they are not optimized. The instructions are executed in sequence, and even if the same method is called several times, it goes through each instruction, interprets it, and then executes.

JIT compiler

The JIT compiler saves the day by profiling the code that is being executed by interpreters, identifies areas where the code can be optimized and compiles them to target machine code, so that they can be executed faster. A combination of bytecode and compiled code snippets provide the optimum way to execute the class files.

The following diagram illustrates the detailed workings of JVM, along with the various types of JIT compilers that the JVM uses to optimize the code:

Figure 1.7 – The detailed working of JVM with JIT compilers

Let's understand the workings shown in the previous diagram:

The JVM interpreter steps through each bytecode and interprets it with machine code, using the bytecode to machine code mapping.
JVM profiles the code consistently using a counter, to count the number of times a code is executed, and if the counter reaches a threshold, it uses the JIT compiler to compile that code for optimization and stores it in the code cache.
JVM then checks whether that compilation unit (block) is already compiled. If JVM finds a compiled code in the code cache, it will use the compiled code for faster execution.
JVM uses two types of compilers, the C1 compiler and the C2 compiler, to compile the code.

As illustrated in Figure 1.7, the JIT compiler brings in optimizations by profiling the code that is running and, over a period of time, it identifies the code that can be compiled. The JVM runs the compiled snippets of code instead of interpreting the code. It is a hybrid method of running interpreted code and compiled code.

JVM introduced two types of compilers, C1 (client) and C2 (server), and the recent versions of JVM use the best of both for optimizing and compiling the code at runtime. Let's understand these types better:

C1 compiler: A performance counter was introduced, which counted the number of times a particular method/snippet of code is executed. Once a method/code snippet is used a particular number of times (threshold), then that particular code snippet is compiled, optimized, and cached by the C1 compiler. The next time that code snippet is called, it directly executes the compiled machine instructions from the cache, rather than going through the interpreter. This brought in the first level of optimization.
C2 compiler: While the code is getting executed, the JVM will perform runtime code profiling and come up with code paths and hotspots. It then runs the C2 compiler to further optimize the hot code paths. This is also known as a hotspot.

C1 is faster and good for short-running applications, while C2 is slower and heavy, but is ideal for long-running processes such as daemons and servers, so the code performs better over time.

In Java 6, there is a command-line option to use either C1 or C2 methods (with the command-line arguments -client (for C1) and -server (for C2)). In Java 7, there is a command-line option to use both. Since Java 8, both C1 and C2 compilers are used for optimization as the default behavior.

There are five tiers/levels of compilation. Compilation logs can be generated to understand which Java method is compiled using which compiler tier/level. The following are the five tiers/levels of compilation:

Interpreted code (level 0)
Simple C1 compiled code (level 1)
Limited C1 compiled code (level 2)
Full C1 compiled code (level 3)
C2 compiled code (level 4)

Let's now look at the various types of code optimizations that the JVM applies during compilation.

Code optimizations

The JIT compiler generates the internal representation of the code that is being compiled to understand the semantics and syntax. These internal representations are tree data structures, on which the JIT will then run the code optimization (as multiple threads, which can be controlled with the XcompilationThreads options from the command line).

The following are some of the optimizations that the JIT compilers perform on the code:

Inlining: One of the most common programming practices in object-oriented programming is to access the member variables through getter and setter methods. Inlining optimization replaces these getter/setter methods with actual variables. The JVM also profiles the code and identifies other small method calls that can be inlined to reduce the number of method calls. These are known as hot methods. A decision is taken based on the number of times that the method is called and the size of the method. The size threshold used by JVM to decide inlining can be modified using the -XX:MaxFreqInlineSize flag (by default, it is 325 bytes).
Escape analysis: The JVM profiles the variables to analyze the scope of the usage of the variables. If the variables don't escape the local scope, it then performs local optimization. Lock Elision is one such optimization, where the JVM decided whether a synchronization lock is really required for the variable. Synchronization locks are very expensive to the processor. The JVM also decides to move the object from the heap to the stack. This has a positive impact on memory usage and garbage collection, as the objects are destroyed once the method is executed.
DeOptimization: DeOptimization is another critical optimization technique. The JVM profiles the code after optimization and may decide to deoptimize the code. Deoptimizations will have a momentary impact on performance. The JIT compiler decides to deoptimize in two cases:
a. Not Entrant Code: This is very prominent in inherited classes or interface implementations. JIT may have optimized, assuming a particular class in the hierarchy, but over time when it learns otherwise, it will deoptimize and profile for further optimization of more specific class implementations.
b. Zombie Code: During Not Entrant code analysis, some of the objects get garbage collected, leading into code that may never be called. This code is marked as zombie code. This code is removed from the code cache.

Apart from this, the JIT compiler performs other optimizations, such as control flow optimization, which includes rearranging code paths to improve efficiency and native code generation to the target machine code for faster execution.

JIT compiler optimizations are performed over a period of time, and they are good for long-running processes. We will be going into a detailed explanation on JIT compilation in Chapter 2, JIT, Hotspot, and GraalVM.

Java ahead-of-time compilation

The ahead-of-time compilation option was introduced with Java 9 with jaotc, where a Java application code can be directly compiled to generate final machine code. The code is compiled to a target architecture, so it is not portable.

Java supports running both Java bytecode and AOT compiled code together in an x86 architecture. The following diagram illustrates how it works. This is the most optimum code that Java can generate:

Figure 1.8 – The detailed workings of JVM JIT time compilers along with the ahead-of-time compiler

The bytecode will go through the approach that was explained previously (C1, C2). jaotc compiles the most used java code (like libraries) into machine code, ahead of time, and this is directly loaded into the code cache. This will reduce the load on JVM. The Java byte code goes through the usual interpreter, and uses the code from the code cache, if available. This reduces a lot of load on JVM to compile the code at runtime. Typically, the most frequently used libraries can be AOT compiled for faster responses.

Garbage collector

One of the sophistication of Java is its in-built memory management. In languages such as C/C++, the programmer is expected to allocate and de-allocate the memory. In Java, JVM takes care of cleaning up the unreferenced objects and reclaims the memory. The garbage collector is a daemon thread that performs the cleanup either automatically or can also be invoked by the programmer (System.gc() and Runtime.getRuntime().gc()).

Native subsystem

Java allows programmers to access native libraries. Native libraries are typically those libraries that are built (using languages such as C/C++) and used for a specific target architecture. Java Native Interface (JNI) provides an abstraction layer and interface specification for implementing the bridge to access the native libraries. Each JVM implements JNI for the specific target system. Programmers can also use JNI to call the native methods. The following diagram illustrates the components of the native subsystem: