Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA
Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA

Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA: Effective techniques for processing complex image data in real time using GPUs

eBook
$9.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA

Introducing CUDA and Getting Started with CUDA

This chapter gives you a brief introduction to CUDA architecture and how it has redefined the parallel processing capabilities of GPUs. The application of CUDA architecture in real-life scenarios will be demonstrated. This chapter will serve as a starting guide for software developers who want to accelerate their applications by using general-purpose GPUs and CUDA. The chapter describes development environments used for CUDA application development and how the CUDA toolkit can be installed on all operating systems. It covers how basic code can be developed using CUDA C and executed on Windows and Ubuntu operating systems.

The following topics will be covered in this chapter:

  • Introducing CUDA
  • Applications of CUDA
  • CUDA development environments
  • Installing CUDA toolkit on Windows, Linux, and macOS
  • Developing simple code, using CUDA C

Technical requirements

Introducing CUDA

Compute Unified Device Architecture (CUDA) is a very popular parallel computing platform and programming model developed by NVIDIA. It is only supported on NVIDIA GPUs. OpenCL is used to write parallel code for other types of GPUs such as AMD and Intel, but it is more complex than CUDA. CUDA allows creating massively parallel applications running on graphics processing units (GPUs) with simple programming APIs. Software developers using C and C++ can accelerate their software application and leverage the power of GPUs by using CUDA C or C++. Programs written in CUDA are similar to programs written in simple C or C++ with the addition of keywords needed to exploit parallelism of GPUs. CUDA allows a programmer to specify which part of CUDA code will execute on the CPU and which part will execute on the GPU.

The next section describes the need for parallel computing and how CUDA architecture can leverage the power of the GPU, in detail.

Parallel processing

In recent years, consumers have been demanding more and more functionalities on a single hand held device. So, there is a need for packaging more and more transistors on a small area that can work quickly and consume minimal power. We need a fast processor that can carry out multiple tasks with a high clock speed, a small area, and minimum power consumption. Over many decades, transistor sizing has seen a gradual decrease resulting in the possibility of more and more transistors being packed on a single chip. This has resulted in a constant rise of the clock speed. However, this situation has changed in the last few years with the clock speed being more or less constant. So, what is the reason for this? Have transistors stopped getting smaller? The answer is no. The main reason behind clock speed being constant is high power dissipation with high clock rate. Small transistors packed in a small area and working at high speed will dissipate large power, and hence it is very difficult to keep the processor cool. As clock speed is getting saturated in terms of development, we need a new computing paradigm to increase the performance of the processors. Let's understand this concept by taking a small real-life example.

Suppose you are told to dig a very big hole in a small amount of time. You will have the following three options to complete this work in time:

  • You can dig faster.
  • You can buy a better shovel.
  • You can hire more diggers, who can help you complete the work.

If we can draw a parallel between this example and a computing paradigm, then the first option is similar to having a faster clock. The second option is similar to having more transistors that can do more work per clock cycle. But, as we have discussed in the previous paragraph, power constraints have put limitations on these two steps. The third option is similar to having many smaller and simpler processors that can carry out tasks in parallel. A GPU follows this computing paradigm. Instead of having one big powerful processor that can perform complex tasks, it has many small and simple processors that can get work done in parallel. The details of GPU architecture are explained in the next section.

Introducing GPU architecture and CUDA

GeForce 256 was the first GPU developed by NVIDIA in 1999. Initially, GPUs were only used for rendering high-end graphics on monitors. They were only used for pixel computations. Later on, people realized that if GPUs can do pixel computations, then they would also be able to do other mathematical calculations. Nowadays, GPUs are used in many applications other than rendering graphics. These kinds of GPUs are called General-Purpose GPUs (GPGPUs).

The next question that may have come to your mind is the difference between the hardware architecture of a CPU and a GPU that allows it to carry out parallel computation. A CPU has a complex control hardware and less data computation hardware. Complex control hardware gives a CPU flexibility in performance and a simple programming interface, but it is expensive in terms of power. On the other hand, a GPU has simple control hardware and more hardware for data computation that gives it the ability for parallel computation. This structure makes it more power-efficient. The disadvantage is that it has a more restrictive programming model. In the early days of GPU computing, graphics APIs such as OpenGL and DirectX were the only way to interact with GPUs. This was a complex task for normal programmers, who were not familiar with OpenGL or DirectX. This led to the development of CUDA programming architecture, which provided an easy and efficient way of interacting with the GPUs. More details about CUDA architecture are given in the next section.

Normally, the performance of any hardware architecture is measured in terms of latency and throughput. Latency is the time taken to complete a given task, while throughput is the amount of the task completed in a given time. These are not contradictory concepts. More often than not, improving one improves the other. In a way, most hardware architectures are designed to improve either latency or throughput. For example, suppose you are standing in a queue at the post office. Your goal is to complete your work in a small amount of time, so you want to improve latency, while an employee sitting at a post office window wants to see more and more customers in a day. So, the employee's goal is to increase the throughput. Improving one will lead to an improvement in the other, in this case, but the way both sides look at this improvement is different.

In the same way, normal sequential CPUs are designed to optimize latency, while GPUs are designed to optimize throughput. CPUs are designed to execute all instructions in the minimum time, while GPUs are designed to execute more instructions in a given time. This design concept of GPUs makes them very useful in image processing and computer vision applications, which we are targeting in this book, because we don't mind a delay in the processing of a single pixel. What we want is that more pixels should be processed in a given time, which can be done on a GPU.

So, to summarize, parallel computing is what we need if we want to increase computational performance at the same clock speed and power requirement. GPUs provide this capability by having lots of simple computational units working in parallel. Now, to interact with the GPU and to take advantage of its parallel computing capabilities, we need a simple parallel programming architecture, which is provided by CUDA.

CUDA architecture

This section covers basic hardware modifications done in GPU architecture and the general structure of software programs developed using CUDA. We will not discuss the syntax of the CUDA program just yet, but we will cover the steps to develop the code. The section will also cover some basic terminology that will be followed throughout this book.

CUDA architecture includes several new components specifically designed for general-purpose computations in GPUs, which were not present in earlier architectures. It includes the unified shedder pipeline which allows all arithmetic logical units (ALUs) present on a GPU chip to be marshaled by a single CUDA program. The ALUs are also designed to comply with IEEE floating-point single and double-precision standards so that it can be used in general-purpose applications. The instruction set is also tailored to general purpose computation and not specific to pixel computations. It also allows arbitrary read and write access to memory. These features make CUDA GPU architecture very useful in general purpose applications.

All GPUs have many parallel processing units called cores. On the hardware side, these cores are divided into streaming processors and streaming multiprocessors (SMs). The GPU has a grid of these streaming multiprocessors. On the software side, a CUDA program is executed as a series of multiple threads running in parallel. Each thread is executed on a different core. The GPU can be viewed as a combination of many blocks, and each block can execute many threads. Each block is bound to a different SM on the GPU. How mapping is done between a block and SM is not known to a CUDA programmer, but it is known and done by a scheduler. The threads from same block can communicate with one another. The GPU has a hierarchical memory structure that deals with communication between threads inside one block and multiple blocks. This will be dealt with in detail in the upcoming chapters.

As a programmer, you will be curious to know what will be the programming model in CUDA and how the code will understand whether it should be executed on the CPU or the GPU. For this book, we will assume that we have a computing platform comprising a CPU and a GPU. We will call a CPU and its memory the host and a GPU and its memory a device. A CUDA code contains the code for both the host and the device. The host code is compiled on CPU by a normal C or C++ compiler, and the device code is compiled on the GPU by a GPU compiler. The host code calls the device code by something called a kernel call. It will launch many threads in parallel on a device. The count of how many threads to be launched on a device will be provided by the programmer.

Now, you might ask how this device code is different from a normal C code. The answer is that it is similar to a normal sequential C code. It is just that this code is executed on a greater number of cores in parallel. However, for this code to work, it needs data on the device's memory. So, before launching threads, the host copies data from the host memory to the device memory. The thread works on data from the device's memory and stores the result on the device's memory. Finally, this data is copied back to the host memory for further processing. To summarize, the steps to develop a CUDA C program are as follows:

  1. Allocate memory for data in the host and device memory.
  2. Copy data from the host memory to the device memory.
  3. Launch a kernel by specifying the degree of parallelism.
  4. After all the threads are finished, copy the data back from the device memory to the host memory.
  5. Free up all memory used on the host and the device.

CUDA applications

CUDA has seen an unprecedented growth in the last decade. It is being used in a wide variety of applications in various domains. It has transformed research in multiple fields. In this section, we will look at some of these domains and how CUDA is accelerating growth in each domain:

  • Computer vision applications: Computer vision and image processing algorithms are computationally intensive. With more and more cameras capturing images at high definition, there is a need to process these large images in real time. With the CUDA acceleration of these algorithms, applications such as image segmentation, object detection, and classification can achieve a real-time frame rate performance of more than 30 frames per second. CUDA and the GPU allow the faster training of deep neural networks and other deep-learning algorithms; this has transformed research in computer vision. NVIDIA is developing several hardware platforms such as Jetson TX1, Jetson TX2, and Jetson TK1, which can accelerate computer vision applications. NVIDIA drive platform is also one of the platforms that is made for autonomous drive applications.
  • Medical imaging: The medical imaging field is seeing widespread use of GPUs and CUDA in reconstruction and the processing of MRI images and Computed tomography (CT) images. It has drastically reduced the processing time for these images. Nowadays, there are several devices that are shipped with GPUs, and several libraries are available to process these images with CUDA acceleration.
  • Financial computing: There is a need for better data analytics at a lower cost in all financial firms, and this will help in informed decision-making. It includes complex risk calculation and initial and lifetime margin calculation, which have to be done in real time. GPUs help financial firms to do these kinds of analytics in real time without adding too much overhead cost.
  • Life science, bioinformatics, and computational chemistry: Simulating DNA genes, sequencing, and protein docking are computationally intensive tasks that need high computation resources. GPUs help in this kind of analysis and simulation. GPUs can run common molecular dynamics, quantum chemistry, and protein docking applications more than five times faster than normal CPUs.
  • Weather research and forecasting: Several weather prediction applications, ocean modeling techniques, and tsunami prediction techniques utilize GPU and CUDA for faster computation and simulations, compared to CPUs.
  • Electronics Design Automation (EDA): Due to the increasing complexity in VLSI technology and the semiconductor fabrication process, the performance of EDA tools is lagging behind in this technological progress. It leads to incomplete simulations and missed functional bugs. Therefore, the EDA industry has been seeking faster simulation solutions. GPU and CUDA acceleration are helping this industry to speed up computationally intensive EDA simulations, including functional simulation, placement and routing, Signal integrity and electromagnetics, SPICE circuit simulation, and so on.
  • Government and defense: GPU and CUDA acceleration is also widely used by governments and militaries. Aerospace, defense, and intelligence industries are taking advantage of CUDA acceleration in converting large amounts of data into actionable information.

CUDA development environment

To start developing an application using CUDA, you will need to set up the development environment for it. There are some prerequisites for setting up a development environment for CUDA. These include the following:

  • A CUDA-supported GPU
  • An NVIDIA graphics card driver
  • A standard C compiler
  • A CUDA development kit

How to check for these prerequisites and install them is discussed in the following sub section.

CUDA-supported GPU

As discussed earlier, CUDA architecture is only supported on NVIDIA GPUs. It is not supported on other GPUs such as AMD and Intel. Almost all GPUs developed by NVIDIA in the last decade support CUDA architecture and can be used to develop and execute CUDA applications. A detailed list of CUDA-supported GPUs can be found on the NVIDIA website: https://developer.nvidia.com/cuda-gpus. If you can find your GPU in this list, you will be able to run CUDA applications on your PC.

If you don't know which GPU is on your PC, then you can find it by following these steps:

  • On windows:
    1. In the Start menu, type device manager and press Enter.
    2. In the device manager, expand the display adaptors. There, you will find the name of your NVIDIA GPU.
  • On Linux:
    1. Open Terminal.
    2. Run sudo lshw -C video.

This will list information regarding your graphics card, usually including its make and model.

  • On macOS:
    1. Go to the Apple Menu | About this Mac | More info.
    2. Select Graphics/Displays under Contents list. There, you will find the name of your NVIDIA GPU.

If you have a CUDA-enabled GPU, then you are good to proceed to the next step.

NVIDIA graphics card driver

If you want to communicate with NVIDIA GPU hardware, then you will need a system software for it. NVIDIA provides a device driver to communicate with the GPU hardware. If the NVIDIA graphics card is properly installed, then these drivers are installed automatically with it on your PC. Still, it is good practice to check for driver updates periodically from the NVIDIA website: http://www.nvidia.in/Download/index.aspx?lang=en-in. You can select your graphics card and operating system for driver download from this link.

Standard C compiler

Whenever you are running a CUDA application, it will need two compilers: one for GPU code and one for CPU code. The compiler for the GPU code will come with an installation of CUDA toolkit, which will be discussed in the next section. You also need to install a standard C compiler for executing CPU code. There are different C compilers based on the operating systems:

  • On Windows: For all Microsoft Windows editions, it is recommended to use Microsoft Visual Studio C compiler. It comes with Microsoft Visual Studio and can be downloaded from its official website: https://www.visualstudio.com/downloads/.

The express edition for commercial applications needs to be purchased, but you can use community editions for free in non-commercial applications. For running the CUDA application, install Microsoft Visual Studio with a Microsoft Visual Studio C compiler selected. Different CUDA versions support different Visual Studio editions, so you can refer to the NVIDIA CUDA website for Visual Studio version support.

  • On Linux: Mostly, all Linux distributions come with a standard GNU C Complier (GCC), and hence it can be used to compile CPU code for CUDA applications.
  • On Mac: On the Mac operating system, you can install a GCC compiler by downloading and installing Xcode for macOS. It is freely available and can be downloaded from Apple's website:

https://developer.apple.com/xcode/

CUDA development kit

CUDA needs a GPU compiler for compiling GPU code. This compiler comes with a CUDA development toolkit. If you have an NVIDIA GPU with the latest driver update and have installed a standard C compiler for your operating system, you are good to proceed to the final step of installing the CUDA development toolkit. A step-by-step guide for installing the CUDA toolkit is discussed in the next section.

Installing the CUDA toolkit on all operating systems

This section covers instructions on how to install CUDA on all supported platforms. It also describes steps to verify installation. While installing CUDA, you can choose between a network installer and an offline local installer. A network installer has a lower initial download size, but it needs an internet connection while installing. A local offline installer has a higher initial download size. The steps discussed in this book are for local installation. A CUDA toolkit can be downloaded for Windows, Linux, and macOS for both 32-bit and 64-bit architecture from the following link: https://developer.nvidia.com/cuda-downloads.

After downloading the installer, refer to the following steps for your particular operating system. CUDAx.x is used as notation in the steps, where x.x indicates the version of CUDA that you have downloaded.

Windows

This section covers the steps to install CUDA on Windows, which are as follows:

  1. Double-click on the installer. It will ask you to select the folder where temporary installation files will be extracted. Select the folder of your choice. It is recommended to keep this as the default.
  2. Then, the installer will check for system compatibility. If your system is compatible, you can follow the on screen prompt to install CUDA. You can choose between an express installation (default) and a custom installation. A custom installation allows you to choose which features of CUDA to install. It is recommended to select the express default installation.
  3. The installer will also install CUDA sample programs and the CUDA Visual Studio integration.
Please make sure you have Visual Studio installed before running this installer.

To confirm that the installation is successful, the following aspects should be ensured:

  1. All the CUDA samples will be located at C:\ProgramData\NVIDIA Corporation\CUDA Samples\vx.x if you have chosen the default path for installation.
  2. To check installation, you can run any project.
  3. We are using the device query project located at C:\ProgramData\NVIDIA Corporation\CUDA Samples\vx.x\1_Utilities\deviceQuery.
  4. Double-click on the *.sln file of your Visual Studio edition. It will open this project in Visual Studio.
  5. Then you can click on the local Windows debugger in Visual Studio. If the build is successful and the following output is displayed, then the installation is complete:

Linux

This section covers the steps to install CUDA on Linux distributions. In this section, the installation of CUDA in Ubuntu, which is a popular Linux distribution, is discussed using distribution-specific packages or using the apt-get command (which is specific to Ubuntu).

The steps to install CUDA using the *.deb installer downloaded from the CUDA website are as follows:

  1. Open Terminal and run the dpkg command, which is used to install packages in Debian-based systems:
sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb
  1. Install the CUDA public GPG key using the following command:
sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
  1. Then, update the apt repository cache using the following command:
sudo apt-get update
  1. Then you can install CUDA using the following command:
sudo apt-get install cuda
  1. Include the CUDA installation path in the PATH environment variable using the following command:
If you have not installed CUDA at default locations, you need to change the path to point at your installation location.
  export PATH=/usr/local/cuda-x.x/bin${PATH:+:${PATH}}
  1. Set the LD_LIBRARY_PATH environment variable:
export LD_LIBRARY_PATH=/usr/local/cuda-x.x/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

You can also install the CUDA toolkit by using the apt-get package manager, available with Ubuntu OS. You can run the following command in Terminal:

sudo apt-get install nvidia-cuda-toolkit

To check whether the CUDA GPU compiler has been installed, you can run the nvcc -V command from Terminal. It calls the GCC compiler for C code and the NVIDIA PTX compiler for the CUDA code.

You can install the NVIDIA Nsight Eclipse plugin, which will give the GUI Integrated Development Environment for executing CUDA programs, using the following command:

sudo apt install nvidia-nsight

After installation, you can run the deviceQuery project located at ~/NVIDIA_CUDA-x.x_Samples. If the CUDA toolkit is installed and configured correctly, the output for deviceQuery should look similar to the following:

Mac

This section covers steps to install CUDA on macOS. It needs the *.dmg installer downloaded from the CUDA website. The steps to install after downloading the installer are as follows:

  1. Launch the installer and follow the onscreen prompt to complete the installation. It will install all prerequisites, CUDA, toolkit, and CUDA samples.
  1. Then, you need to set environment variables to point at CUDA installation using the following commands:
If you have not installed CUDA at the default locations, you need to change the path to point at your installation location.
  export PATH=/Developer/NVIDIA/CUDA-x.x/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-x.x/lib\
${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
  1. Run the script: cuda-install-samples-x.x.sh. It will install CUDA samples with write permissions.
  2. After it has completed, you can go to bin/x86_64/darwin/release and run the deviceQuery project. If the CUDA toolkit is installed and configured correctly, it will display your GPU's device properties.

A basic program in CUDA C

In this section, we will start learning CUDA programming by writing a very basic program using CUDA C. We will start by writing a Hello, CUDA! program in CUDA C and execute it. Before going into the details of code, one thing that you should recall is that host code is compiled by the standard C compiler and that the device code is executed by an NVIDIA GPU compiler. A NVIDIA tool feeds the host code to a standard C compiler such as Visual Studio for Windows and a GCC compiler for Ubuntu, and it uses macOS for execution. It is also important to note that the GPU compiler can run CUDA code without any device code. All CUDA code must be saved with a *.cu extension.

The following is the code for Hello, CUDA!:

#include <iostream>
__global__ void myfirstkernel(void) {
}
int main(void) {
myfirstkernel << <1, 1 >> >();
printf("Hello, CUDA!\n");
return 0;
}

If you look closely at the code, it will look very similar to that of the simple Hello, CUDA! program written in C for the CPU execution. The function of this code is also similar. It just prints Hello, CUDA! on Terminal or the command line. So, two questions that should come to your mind is: how is this code different, and where is the role of CUDA C in this code? The answer to these questions can be given by closely looking at the code. It has two main differences, compared to code written in simple C:

  • An empty function called myfirstkernel with __global__ prefix
  • Call the myfirstkernel function with << <1,1> >>

__global__ is a qualifier added by CUDA C to standard C. It tells the compiler that the function definition that follows this qualifier should be complied to run on a device, rather than a host. So, in the previous code, myfirstkernel will run on a device instead of a host, though, in this code, it is empty.

Now where will the main function run? The NVCC compiler will feed this function to host the C compiler, as it is not decorated by the global keyword, and hence the main function will run on the host.

The second difference in the code is the call to the empty myfirstkernel function with some angular brackets and numeric values. This is a CUDA C trick to call device code from host code. It is called a kernel call. The details of a kernel call will be explained in later chapters. The values inside the angular brackets indicate arguments we want to pass from the host to the device at runtime. Basically, it indicates the number of blocks and the number of threads that will run in parallel on the device. So, in this code, << <1,1> >> indicates that myfirstkernel will run on one block and one thread or block on the device. Though this is not an optimal use of device resources, it is a good starting point to understand the difference between code executed on the host and code executed on a device.

Again, to revisit and revise the Hello, CUDA! code, the myfirstkernel function will run on a device with one block and one thread or block. It will be launched from the host code inside the main function by a method called kernel launch.

After writing code, how will you execute this code and see the output? The next section describes the steps to write and execute the Hello, CUDA! code on Windows and Ubuntu.

Steps for creating a CUDA C program on Windows

This section describes the steps to create and execute a basic CUDA C program on Windows using Visual Studio. The steps are as follows:

  1. Open Microsoft Visual Studio.
  2. Go to File | New | Project.
  3. Select NVIDIA | CUDA 9.0 | CUDA 9.0 Runtime.
  4. Give your desired name to the project and click on OK.
  5. It will create a project with a sample kernel.cu file. Now open this file by double-clicking on it.
  6. Delete existing code from the file and write the given code earlier.
  7. Build the project from the Build tab and press Ctrl + F5 to debug the code. If everything works correctly, you will see Hello, CUDA! displayed on the command line, as shown here:

Steps for creating a CUDA C program on Ubuntu

This section describes the steps to create and execute a basic CUDA C program on Ubuntu using the Nsight Eclipse plugin. The steps are as follows:

  1. Open Nsight by opening Terminal and typing nsight into it.
  2. Go to File | New |CUDA C/C++ Projects.
  3. Give your desired name to the project and click on OK.
  4. It will create a project with a sample file. Now open this file by double-clicking on it.
  1. Delete the existing code from the file and write the given code earlier.
  2. Run the code by pressing the play button. If everything works correctly, you will see Hello, CUDA! displayed on Terminal as shown here:

Summary

To summarize, in this chapter, you were introduced to CUDA and briefed upon the importance of parallel computing. Applications of CUDA and GPUs in various domains were discussed at length. The chapter described the hardware and software setup required to execute CUDA applications on your PCs. It gave a step-by-step procedure to install CUDA on local PCs.

The last section gave a starting guide for application development in CUDA C by developing a simple program and executing it on Windows and Ubuntu.

In the next chapter, we will build on this knowledge of programming in CUDA C. You will be introduced to parallel computing using CUDA C by way of several practical examples to show how it is faster compared to normal programming. You will also be introduced to the concepts of threads and blocks and how synchronization is performed between multiple threads and blocks.

Questions

  1. Explain three methods to increase the performance of your computing hardware. Which method is used to develop GPUs?
  2. True or false: Improving latency will improve throughput.
  3. Fill in the blanks: CPUs are designed to improve ___ and GPUs are designed to improve __ .
  4. Take an example of traveling from one place to another that is 240 km away. You can take a car that can accommodate five people, with a speed of 60 kmph or a bus that can accommodate 40 people, with a speed of 40 kmph. Which option will provide better latency, and which option will provide better throughput?
  5. Explain the reasons that make GPU and CUDA particularly useful in computer vision applications.
  6. True or False: A CUDA compiler cannot compile code with no device code.
  7. In the Hello, CUDA! example discussed in this chapter, will the printf statement be executed by the host or the device?
Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Explore examples to leverage the GPU processing power with OpenCV and CUDA
  • Enhance the performance of algorithms on embedded hardware platforms
  • Discover C++ and Python libraries for GPU acceleration

Description

Computer vision has been revolutionizing a wide range of industries, and OpenCV is the most widely chosen tool for computer vision with its ability to work in multiple programming languages. Nowadays, in computer vision, there is a need to process large images in real time, which is difficult to handle for OpenCV on its own. This is where CUDA comes into the picture, allowing OpenCV to leverage powerful NVDIA GPUs. This book provides a detailed overview of integrating OpenCV with CUDA for practical applications. To start with, you’ll understand GPU programming with CUDA, an essential aspect for computer vision developers who have never worked with GPUs. You’ll then move on to exploring OpenCV acceleration with GPUs and CUDA by walking through some practical examples. Once you have got to grips with the core concepts, you’ll familiarize yourself with deploying OpenCV applications on NVIDIA Jetson TX1, which is popular for computer vision and deep learning applications. The last chapters of the book explain PyCUDA, a Python library that leverages the power of CUDA and GPUs for accelerations and can be used by computer vision developers who use OpenCV with Python. By the end of this book, you’ll have enhanced computer vision applications with the help of this book's hands-on approach.

Who is this book for?

This book is a go-to guide for you if you are a developer working with OpenCV and want to learn how to process more complex image data by exploiting GPU processing. A thorough understanding of computer vision concepts and programming languages such as C++ or Python is expected.

What you will learn

  • Understand how to access GPU device properties and capabilities from CUDA programs
  • Learn how to accelerate searching and sorting algorithms
  • Detect shapes such as lines and circles in images
  • Explore object tracking and detection with algorithms
  • Process videos using different video analysis techniques in Jetson TX1
  • Access GPU device properties from the PyCUDA program
  • Understand how kernel execution works
Estimated delivery fee Deliver to Egypt

Standard delivery 10 - 13 business days

$12.95

Premium delivery 3 - 6 business days

$34.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 26, 2018
Length: 380 pages
Edition : 1st
Language : English
ISBN-13 : 9781789348293
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Egypt

Standard delivery 10 - 13 business days

$12.95

Premium delivery 3 - 6 business days

$34.95
(Includes tracking information)

Product Details

Publication date : Sep 26, 2018
Length: 380 pages
Edition : 1st
Language : English
ISBN-13 : 9781789348293
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 152.97
Hands-On GPU Programming with Python and CUDA
$48.99
Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA
$54.99
OpenCV 3 Computer Vision with Python Cookbook
$48.99
Total $ 152.97 Stars icon
Banner background image

Table of Contents

14 Chapters
Introducing CUDA and Getting Started with CUDA Chevron down icon Chevron up icon
Parallel Programming using CUDA C Chevron down icon Chevron up icon
Threads, Synchronization, and Memory Chevron down icon Chevron up icon
Advanced Concepts in CUDA Chevron down icon Chevron up icon
Getting Started with OpenCV with CUDA Support Chevron down icon Chevron up icon
Basic Computer Vision Operations Using OpenCV and CUDA Chevron down icon Chevron up icon
Object Detection and Tracking Using OpenCV and CUDA Chevron down icon Chevron up icon
Introduction to the Jetson TX1 Development Board and Installing OpenCV on Jetson TX1 Chevron down icon Chevron up icon
Deploying Computer Vision Applications on Jetson TX1 Chevron down icon Chevron up icon
Getting Started with PyCUDA Chevron down icon Chevron up icon
Working with PyCUDA Chevron down icon Chevron up icon
Basic Computer Vision Applications Using PyCUDA Chevron down icon Chevron up icon
Assessments Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4
(5 Ratings)
5 star 60%
4 star 20%
3 star 20%
2 star 0%
1 star 0%
Eduardo Hiroshi Nakamura Nov 05, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Conteúdo condiz com o titulo.
Amazon Verified review Amazon
syu Nov 13, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
この本は、CUDA,OpenCV,Jetson,PyCUDAについて解説されていますが、Jetsonは使用予定がないので評価していません。環境構築、基本的な仕組み・使い方までわかりやすく解説されています。CUDAについては比較的新しい情報が記載されているので、C/C++の知識があることが前提ですがCUDAの基礎知識を得るのには適していると思います。CUDAを使用すれば必ず高速化するわけではないので、特にカーネルパラメータの設定、スレッド、メモリ特性、CUDAストリームの内容はパフォーマンスを向上させるのに役に立ちます。より最新または高度な情報が知りたいときはNVIDIAが無料で公開しているドキュメントを参考にする必要がありますが、基礎知識があれば時間の節約になります。OpenCVについてはいくつかの関数が紹介されていますが、使い方のパターンがあるので似たようなものは簡単に試すことができます。画像処理の効果については主要なものは解説されています。それ以外は実験が必要です。OpenCVでCUDAを使用する場合は、OpenCVに定義されている場合はそれを使用し、そうでない場合はCUDAで自作するとよいでしょう。ちなみに、CUDA10.1+OpenCV4.4(VS2019でビルド)+GeForce RTX 2070の環境でサンプルコードが動作することは確認できました。紙面のサンプルコードは後半になるほど雑になってるので、注意が必要です。PyCUDAについてはサンプル数は少ないけれど、前半で紹介したようなことがPythonでもできることが解説されています。(C/C++を使用しない人はこの本はお勧めしません。)試しに機械学習で使用しているPython3.7の環境にインストールしてみたが、動作しなかった!!もしかしたらダウングレードが必要かもしれないが、実際に使うときに調べようと思う。
Amazon Verified review Amazon
Robin T. Wernick Feb 28, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I was looking for just this kind of concise introduction to Image analysis on several target areas. The documentation for combining these two technologies is sparse to say the least and this book not only had a precise introduction but also several detailed examples.I now know how to proceed with the object recognition that I was looking to apply to silicon disk defect analysis, but I also know how to speed it up by several hundred percent. I have several department managers in mind that I would love to tantalize with this information.This book should make it easier to make better technology for Computer Vision applications an I wish all the readers more success by reading it.
Amazon Verified review Amazon
Force Commander Oct 24, 2018
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
The book delivers a detailed introduction to CUDA and OpenCV on the Jetson Tx1 that is also applicable to other Nvidia GPUs.
Amazon Verified review Amazon
Amazon Customer Dec 21, 2018
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
The book is a introduction to CUDA programming. The most important concepts are explained but not in detail. Performance optimization of CUDA programs are only briefly explained.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela