Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hands-On GPU Programming with Python and CUDA

You're reading from   Hands-On GPU Programming with Python and CUDA Explore high-performance parallel computing with CUDA

Arrow left icon
Product type Paperback
Published in Nov 2018
Publisher Packt
ISBN-13 9781788993913
Length 310 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Dr. Brian Tuomanen Dr. Brian Tuomanen
Author Profile Icon Dr. Brian Tuomanen
Dr. Brian Tuomanen
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Why GPU Programming? 2. Setting Up Your GPU Programming Environment FREE CHAPTER 3. Getting Started with PyCUDA 4. Kernels, Threads, Blocks, and Grids 5. Streams, Events, Contexts, and Concurrency 6. Debugging and Profiling Your CUDA Code 7. Using the CUDA Libraries with Scikit-CUDA 8. The CUDA Device Function Libraries and Thrust 9. Implementation of a Deep Neural Network 10. Working with Compiled GPU Code 11. Performance Optimization in CUDA 12. Where to Go from Here 13. Assessment 14. Other Books You May Enjoy

Summary

The main advantage of using a GPU over a CPU is its increased throughput, which means that we can execute more parallel code simultaneously on GPU than on a CPU; a GPU cannot make recursive algorithms or nonparallelizable algorithms somewhat faster. We see that some tasks, such as the example of building a house, are only partially parallelizable—in this example, we couldn't speed up the process of designing the house (which is intrinsically serial in this case), but we could speed up the process of the construction, by hiring more laborers (which is parallelizable in this case).

We used this analogy to derive Amdahl's Law, which is a formula that can give us a rough estimate of potential speedup for a program if we know the percentage of execution time for code that is parallelizable, and how many processors we will have to run this code. We then applied Amdahl's Law to analyze a small program that generates the Mandelbrot set and dumps it to an image file, and we determined that this would be a good candidate for parallelization onto a GPU. Finally, we ended with a brief overview of profiling code with the cPython module; this allows us to see where the bottlenecks in a program are, without explicitly timing function calls.

Now that we have a few of the fundamental concepts in place, and have a motivator to learn GPU programming, we will spend the next chapter setting up a Linux- or Windows 10-based GPU programming environment. We will then immediately dive into the world of GPU programming in the following chapter, where we will actually write a GPU-based version of the Mandelbrot program that we saw in this chapter.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime