Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Hands-On GPU Programming with Python and CUDA Explore high-performance parallel computing with CUDA

Product type Paperback

Published in Nov 2018

Publisher Packt

ISBN-13 9781788993913

Length 310 pages

Edition 1st Edition

Languages

Python

Tools

CUDA

Concepts

Graphics Programming

Author (1):

Dr. Brian Tuomanen

View More author details

Table of Contents (15) Chapters

Preface

1. Why GPU Programming?

2. Setting Up Your GPU Programming Environment FREE CHAPTER

3. Getting Started with PyCUDA

4. Kernels, Threads, Blocks, and Grids

5. Streams, Events, Contexts, and Concurrency

6. Debugging and Profiling Your CUDA Code

7. Using the CUDA Libraries with Scikit-CUDA

8. The CUDA Device Function Libraries and Thrust

9. Implementation of a Deep Neural Network

10. Working with Compiled GPU Code

11. Performance Optimization in CUDA

12. Where to Go from Here

13. Assessment

14. Other Books You May Enjoy

Leave a review - let other readers know what you think

Questions

There are three for statements in this chapter's Mandelbrot example; however, we can only parallelize over the first two. Why can't we parallelize over all of the for loops here?
What is something that Amdahl's Law doesn't account for when we apply it to offloading a serial CPU algorithm to a GPU?
Suppose that you gain exclusive access to three new top-secret GPUs that are the same in all respects, except for core counts—the first has 131,072 cores, the second has 262,144 cores, and the third has 524,288 cores. If you parallelize and offload the Mandelbrot example onto these GPUs (which generates a 512 x 512 pixel image), will there be a difference in computation time between the first and second GPU? How about between the second and third GPU?
Can you think of any problems with designating certain algorithms or blocks of code as parallelizable in the context of Amdahl's Law?
Why should we use profilers instead of just using Python's time function?

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Tuomanen

Dr. Brian Tuomanen has been working with CUDA and General-Purpose GPU Programming since 2014. He received his Bachelor of Science in Electrical Engineering from the University of Washington in Seattle, and briefly worked as a Software Engineer before switching to Mathematics for Graduate School. He completed his Ph.D. in Mathematics at the University of Missouri in Columbia, where he first encountered GPU programming as a means for studying scientific problems. Dr. Tuomanen has spoken at the US Army Research Lab about General Purpose GPU programming, and has recently lead GPU integration and development at a Maryland based start-up company. He currently lives and works in the Seattle area.

See other products by Tuomanen