Getting started with the Vulkan programming model
Let's discuss the Vulkan programming model in detail. Here, the end user, considering he or she is a total beginner, will be able to understand the following concepts:
- The Vulkan programming model
- The rendering execution model, which will be described using a pseudo step-by-step approach
- How Vulkan works
The following diagram shows a top-down approach of the Vulkan application programming model; we will understand this process in detail and also delve into the sublevel components and their functionalities:
Hardware initialization
When a Vulkan application starts, its very first job is the initialization of the hardware. Here, the application activates the Vulkan drivers by communicating with the loader. The following diagram represents a block diagram of a Loader with its subcomponents:
Loader: A loader is a piece of code used in the application start-up to locate the Vulkan drivers in a system in a unified way across platforms. The following are the responsibilities of a loader:
- Locating drivers: As its primary job, a loader knows where to search for drivers in the given system. It finds the correct driver and loads it.
- Platform-independent: Initializing Vulkan is consistent across all platforms. Unlike OpenGL, where creating a context requires working with a different window system API for each environment, EGL, GLX, and WGL. Platform differences in Vulkan are expressed as extensions.
- Injectable layers: A loader supports a layered architecture and provides the capability to inject various layers at runtime. The big improvement is that the driver need not do any of the work (or retain any of the states it would need to do the work) in determining whether the application's use of the API is valid. Therefore, it's advisable to turn on the selected injectable layers, as per application requirements, during the development stage and turn them off at the deployment stage. For example, injectable layers can offer the following:
- Tracing the Vulkan API commands
- Capturing rendered scenes and executing them later
- Error and validation for debugging purposes
The Vulkan application first performs a handshake with the loader library and initializes the Vulkan implementation driver. The loader library loads Vulkan APIs dynamically. The loader also offers a mechanism that allows the automatic loading of specific layers into all Vulkan applications; this is called an Implicit-Enabled layer.
Once the loader locates the drivers and successfully links with the APIs, the application is responsible for the following:
- Creating a Vulkan instance
- Querying the physical device for the available queues
- Querying extensions and storing them as function pointers, such as WSI or special feature APIs
- Enabling an injectable layer for error checking, debugging, or the validation process
Window presentation surfaces
Once the Vulkan implementation driver is located by the loader, we are good to draw something using the Vulkan APIs. For this, we need an image to perform the drawing task and put it on the presentation window to display it:
Building a presentation image and creating windows are very platform-specific jobs. In OpenGL, windowing is intimately linked; the window system framebuffer is created along with context/device. The big difference from GL here is that context/device creation in Vulkan needn't involve the window system at all; it is managed through Window System Integration (WSI).
WSI contains a set of cross-platform windowing management extensions:
- A unique cross-platform implementation for the majority of platforms, such as Windows, Linux, Android, and other OSes
- A consistent API standard to easily create surfaces and display them without getting into the details
WSI supports multiple windowing systems, such as Wayland, X, and Windows, and it also manages the ownership of images via a swapchain.
WSI provides a swapchain mechanism; this allows the use of multiple images in such a way that, while the window system is displaying one image, the application can prepare the next.
The following screenshot shows the double-buffering swap image process. It contains two images named First Image and Second Image. These images are swapped between Application and Display with the help of WSI:
WSI works as an interface between Display and Application. It makes sure that both images are acquired by Display and Application in a mutually exclusive way. Therefore, when an Application works on First Image, WSI hands over Second Image to Display in order to render its contents. Once the Application finishes the painting First image, it submits it to the WSI and in return acquires Second Image to work with and vice-versa.
At this point, perform the following tasks:
- Create a native window (like the
CreateWindow
method in the Windows OS) - Create a WSI surface attached to the window
- Create the swapchain to present to the surface
- Request the drawing images from the created swapchain
Resource setup
Setting up resources means storing data into memory regions. It could be any type of data, for example, vertex attributes, such as position, color, or image type/name. Certainly, the data has resided somewhere in the memory for Vulkan to access it.
Unlike OpenGL, which manages the memory behind the scenes using hints, Vulkan provides full low-level access and control of the memory. Vulkan advertises the various types of available memory on the physical device, providing the application with a fine opportunity to manage these different types of memory explicitly.
Memory heaps can be categorized into two types, based upon their performance:
- Host local: This is a slower type of memory
- Device local: This is a type of memory with high bandwidth; it is faster
Memory heaps can be further divided based upon their memory type configurations:
- Device local: This type of memory is physically attached to the physical device:
- Visible to the device
- Not visible to the host
- Device local, host visible: This type of memory is also physically attached to the device:
- Visible to the device
- Visible to the host
- Host local, host visible: This refers to the local memory of the host, but it is slower than the local device:
- Visible to the device
- Visible to the host
In Vulkan, resources are explicitly taken care of by the application with exclusive control of memory management. The following is the process of resource management:
- Resource objects: For resource setup, an application is responsible for allocating memory for resources; these resources could be either images or buffer objects.
- Allocation and suballocations: When resource objects are created, only logical addresses are associated with them; there is no physical backing available. The application allocates physical memory and binds these logical addresses to it. As allocation is an expensive process, suballocation is an efficient way to manage the memory; it allocates a big chunk of physical memory at once and puts different resource objects into it. Suballocation is the responsibility of an application. The following diagram shows the suballocated object from the big allocated piece of physical memory:
- Sparse memory: For very large image objects, Vulkan fully supports sparse memory with all its features. Sparse memory is a special feature that allows you to store large image resources; which are much larger than the actual memory capacity, in the memory. This technique breaks the image into tiles and loads only those tiles that fit the application logic.
- Staging buffers: The population of the object and image buffers is done using staging, where two different memory regions are used for the physical allocation. The ideal memory placement for a resource may not be visible to the host. In this case, the application must first populate the resource in a staging buffer that is host-visible and then transfer it to the ideal location.
- Asynchronous transfer: The data is transferred asynchronously using asynchronous commands with any of the graphics or DMA/transfer queues.
Tip
Physical memory allocation is expensive; therefore, a good practice is to allocate a large physical memory and then suballocate objects.
In contrast, OpenGL resource management does not offer granular control over the memory. There is no conception of host and device memory; the driver secretly does all of the allocation in the background. Also, these allocation and suballocation processes are not fully transparent and might change from one driver to another. This lack of consistency and hidden memory management cause unpredictable behavior. Vulkan, on the other hand, allocates the object right there in the chosen memory, making it highly predictable.
Therefore, during the resource setup stage, you need to perform the following tasks:
- Create a resource object.
- Query the appropriate memory instance and create a memory object like buffer and images.
- Get the memory requirements for the allocation.
- Allocate space and store data in it.
- Bind the memory with the resource object that we created.
Pipeline setup
A pipeline is a set of events that occur in a fixed sequence defined by the application logic. These events consist of the following: supplying the shaders, binding them to the resource, and managing the state:
Descriptor sets and descriptor pools
A descriptor set is an interface between resources and shaders. It is a simple structure that binds the shader to the resource information, such as images or buffers. It associates or binds a resource memory that the shader is going to use. The following are the characteristics associated with descriptor sets:
- Frequent change: By nature, a descriptor set changes frequently; generally, it contains attributes such as material, texture, and so on.
- Descriptor pool: Considering the nature of descriptor sets, they are allocated from a descriptor pool without introducing global synchronization
- Multithread scalability: This allows multiple threads to update the descriptor set simultaneously
Tip
Updating or changing a descriptor set is one of the most performance-critical paths in rendering Vulkan. Therefore, the design of a descriptor set is an important aspect in achieving maximum performance. Vulkan supports logical partitioning of multiple descriptor sets at the scene (low frequency updates), model (medium frequency updates), and draw level (high frequency updates). This ensures that the high frequency update descriptor does not affect low frequency descriptor resources.
Shaders with SPIR-V
The only way to specify shaders or compute kernels in Vulkan is through SPIR-V. The following are some characteristics associated with it:
- Multiple inputs: SPIR-V producing compilers exist for various source languages, including GLSL and HLSL. These can be used to convert a human-readable shader into a SPIR-V intermediate representation.
- Offline compilation: Shaders/kernels are compiled offline and injected upfront.
- glslangValidator: LunarG SDK provides the glslangValidator compiler, which can be used to create SPIR-V shaders from equivalent GLSL shaders.
- Multiple entry points: The shader object provides multiple entry points. This is very beneficial for reducing the shipment size (and the loaded size) of the SPIR-V shaders. Variants of a shader can be packaged into a single module.
Pipeline management
A physical device contains a range of hardware settings that determine how the submitted input data of a given geometry needs to be interpreted and drawn. These settings are collectively called pipeline states. These include the rasterizer state, blend state, and depth stencil state; they also include the primitive topology type (point/line/triangle) of the submitted geometry and the shaders that will be used for rendering. There are two types of states: dynamic and static. The pipeline states are used to create the pipeline object (graphics or compute), which is a performance-critical path. Therefore, we don't want to create them again and again; we want to create them once and reuse them.
Vulkan allows you to control states using pipeline objects in conjunction with Pipeline Cache Object (PCO) and the pipeline layout:
- Pipeline objects: Pipeline creation is expensive. It includes shader recompilation, resource binding, Render Pass, framebuffer management, and other related operations. Pipeline objects could be numbered in hundreds and thousands; therefore, each different state combination is stored as a separate pipeline object.
- PCO: The creation of pipelines is expensive; therefore once created, a pipeline can be cached. When a new pipeline is requested, the driver can look for a closer match and create the new pipeline using the base pipeline.
Pipeline caches are opaque, and the details of their use by the driver are unspecified. The application is responsible for persisting the cache if it wishes to reuse it across runs and for providing a suitable cache at the time of pipeline creation if it wishes to reap potential benefits.
- Pipeline layout: Pipeline layouts describe the descriptor sets that will be used with the pipeline, indicating what kind of resource is attached to each binding slot in the shader. Different pipeline objects can use the same pipeline layout.
In the pipeline management stage, this is what happens:
- The application compiles the shader into SPIR-V form and specifies it in the pipeline shader state.
- The descriptor helps us connect these resources to the shader itself. The application allocates the descriptor set from the descriptor pool and connects the incoming or outgoing resources to the binding slots in the shader.
- The application creates pipeline objects, which contain the static and dynamic state configuration to control the hardware settings. The pipeline should be created from a pipeline cache pool for better performance.
Recording commands
Recording commands is the process of command buffer formation. Command buffers are allocated from the command pool memory. Command pools can also be used for multiple allocations. A command buffer is recorded by supplying commands within a given start and end scope defined by the application. The following diagram illustrates the recording of a drawing command buffer, and as you can see, it comprises many commands recorded in the top-down order responsible for object painting.
Note
Note that the commands in the command buffer may vary with the job requirement. This diagram is just an illustration that covers the most common steps performed while drawing primitives.
The major parts of drawing the are covered here:
- Scope: The scope defines the start and end of the command buffer recording.
- Render Pass: This defines the execution process of a job that might affect the framebuffer cache. It may comprise attachments, subpasses, and dependencies between those subpasses. The attachment refers to images on which the drawing is performed. In a subpass, an attachment-like image can be subpassed for multisampling resolve. Render Pass also controls how the framebuffer will be treated at the beginning of the pass: it will either retain the last information on it or clear it with the given color. Similarly, at the end of the Render Pass, the results are going to be either discarded or stored.
- Pipeline: This contains the states' (static/dynamic) information represented by a pipeline object.
- Descriptor: This binds the resource information to the pipeline.
- Bind resource: This specifies the vertex buffer, image, or other geometry-related information.
- Viewport: This determines the portion of the drawing surface on which the rendering of the primitives will be performed.
- Scissor: This defines a rectangular space region beyond which nothing will be drawn.
- Drawing: The draw command specifies geometry buffer attributes, such as the start index, total count, and so on.
Tip
The creation of a command buffer is an expensive job; it considers the most performance-critical path. It can be reused numerous times if the same work needs to happen on many frames. It can be resubmitted without needing to re-record it. Also, multiple command buffers can be produced simultaneously using multiple threads. Vulkan is specially designed to exploit multithreaded scalability. Command pools ensure there is no lock contention if used in a multithreaded environment.
The following diagram shows a scalable command buffer creation model with a multicore and multithreading approach. This model provides true parallelism with multicore processors.
Here, each thread is made to utilize a separate command buffer pool, which allocates either single or multiple command buffers, allowing no fights on resource locks.
Queue submission
Once command buffers are built, they can be submitted to a queue for processing. Vulkan exposes different types of queue to the application, such as the graphics, DMA/transfer, or compute queues. Queue selection for submission is very much dependent upon the nature of the job. For example, graphics-related tasks must be submitted to the graphics queue. Similarly, for compute operations, the compute queue will be the best choice. The submitted jobs are executed in an asynchronous fashion. Command buffers can be pushed into separate compatible queues allowing parallel execution. The application is responsible for any kind of synchronization within command buffers or between queues, even between the host and device themselves.
Queue submission performs the following jobs:
- Acquiring the images from the swapchain on which the next frame will be drawn
- Deploying any synchronization mechanism, such as semaphore and fence, required
- Gathering the command buffer and submitting it to the required device queue for processing
- Requesting the presentation of the completed painted images on the output device