Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Hands-On High Performance with Go Boost and optimize the performance of your Golang applications at scale with resilience

Product type Paperback

Published in Mar 2020

Publisher Packt

ISBN-13 9781789805789

Length 406 pages

Edition 1st Edition

Languages

Tools

Boost

Concepts

High Performance Programming

Author (1):

Bob Strecansky

View More author details

Table of Contents (20) Chapters

Preface

1. Section 1: Learning about Performance in Go

2. Introduction to Performance in Go FREE CHAPTER

3. Data Structures and Algorithms

4. Understanding Concurrency

5. STL Algorithm Equivalents in Go

6. Matrix and Vector Computation in Go

7. Section 2: Applying Performance Concepts in Go

8. Composing Readable Go Code

9. Template Programming in Go

10. Memory Management in Go

11. GPU Parallelization in Go

12. Compile Time Evaluations in Go

13. Section 3: Deploying, Monitoring, and Iterating on Go Programs with Performance in Mind

14. Building and Deploying Go Code

15. Profiling Go Code

16. Tracing Go Code

17. Clusters and Job Queues

18. Comparing Code Quality Across Versions

19. Other Books You May Enjoy

Leave a review - let other readers know what you think

CUDA – powering the program

After we have all of our CUDA dependencies installed and running, we can start out with a simple CUDA C++ program:

First, we'll include all of our necessary header files and define the number of elements we'd like to process. 1 << 20 is 1,048,576, which is more than enough elements to show an adequate GPU test. You can shift this if you'd like to see the difference in processing time:

#include <cstdlib>
#include <iostream>

const int ELEMENTS = 1 << 20;

Our multiply function is wrapped in a __global__ specifier. This allows nvcc, the CUDA-specific C++ compiler, to run a particular function on the GPU. This multiply function is relatively straightforward: it takes the a and b arrays, multiplies them together using some CUDA magic, and returns the value in the c array:

__global__ void multiply(int j, float...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Strecansky

Bob Strecansky is a senior site reliability engineer. He graduated with a computer engineering degree from Clemson University with a focus on networking. He has worked in both consulting and industry jobs since graduation. He has worked with large telecom companies and much of the Alexa top 500. He currently works at Mailchimp, working to improve web performance, security, and reliability for one of the world's largest email service providers. He has also written articles for web publications and currently maintains the OpenTelemetry PHP project. In his free time, Bob enjoys tennis, cooking, and restoring old cars. You can follow Bob on the internet to hear musings about performance analysis: Twitter: @bobstrecansky GitHub: @bobstrecansky

See other products by Strecansky