Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On GPU Computing with Python

You're reading from   Hands-On GPU Computing with Python Explore the capabilities of GPUs for solving high performance computational problems

Arrow left icon
Product type Paperback
Published in May 2019
Publisher Packt
ISBN-13 9781789341072
Length 452 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Avimanyu Bandyopadhyay Avimanyu Bandyopadhyay
Author Profile Icon Avimanyu Bandyopadhyay
Avimanyu Bandyopadhyay
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Section 1: Computing with GPUs Introduction, Fundamental Concepts, and Hardware
2. Introducing GPU Computing FREE CHAPTER 3. Designing a GPU Computing Strategy 4. Setting Up a GPU Computing Platform with NVIDIA and AMD 5. Section 2: Hands-On Development with GPU Programming
6. Fundamentals of GPU Programming 7. Setting Up Your Environment for GPU Programming 8. Working with CUDA and PyCUDA 9. Working with ROCm and PyOpenCL 10. Working with Anaconda, CuPy, and Numba for GPUs 11. Section 3: Containerization and Machine Learning with GPU-Powered Python
12. Containerization on GPU-Enabled Platforms 13. Accelerated Machine Learning on GPUs 14. GPU Acceleration for Scientific Applications Using DeepChem 15. Other Books You May Enjoy Appendix A

Understanding how CUDA-C/C++ works via a simple example

By now, you must be aware of the computational advantages of CUDA C/C++ as per our earlier discussions. C/C++ coupled with CUDA allows you to modify parts of your source code to accelerate your computational results. The primary steps necessary for implementing CUDA code will be explored through a GPU program.

Please manually type in the code used in this book on your IDE from this point onward. Directly copying and pasting from the PDF will ruin the indentations in the code and make it unready to deploy.

First, let's look into the following conventional C++ program that multiplies two array elements using double precision. We'll run the kernel on 500 million elements on the CPU. All the elements of the p and q arrays are set to 24 and 12 respectively.

The following is the C++ program we've just described ...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime