In the last chapter, we looked at memory architecture in CUDA and saw how it can be used efficiently to accelerate applications. Up until now, we have not seen a method to measure the performance of CUDA programs. In this chapter, we will discuss how we can do that using CUDA events. The Nvidia Visual Profiler will also be discussed, as well as how to resolve errors in CUDA programs from within the CUDA code and using debugging tools. How we can improve the performance of CUDA programs will also be discussed. This chapter will describe how CUDA streams can be used for multitasking and how we can use them to accelerate applications. You will also learn how array-sorting algorithms can be accelerated using CUDA. Image processing is an application where we need to process a large amount of data in a very small amount of time, so CUDA can be an ideal choice...
United States
Great Britain
India
Germany
France
Canada
Russia
Spain
Brazil
Australia
Singapore
Hungary
Philippines
Mexico
Thailand
Ukraine
Luxembourg
Estonia
Lithuania
Norway
Chile
South Korea
Ecuador
Colombia
Taiwan
Switzerland
Indonesia
Cyprus
Denmark
Finland
Poland
Malta
Czechia
New Zealand
Austria
Turkey
Sweden
Italy
Egypt
Belgium
Portugal
Slovenia
Ireland
Romania
Greece
Argentina
Malaysia
South Africa
Netherlands
Bulgaria
Latvia
Japan
Slovakia