- The first two for loops iterate over every pixel, whose outputs are invariant to each other; we can thus parallelize over these two for loops. The third for loop calculates the final value of a particular pixel, which is intrinsically recursive.
- Amdahl's Law doesn't account for the time it takes to transfer memory between the GPU and the host.
- 512 x 512 amounts to 262,144 pixels. This means that the first GPU can only calculate the outputs of half of the pixels at once, while the second GPU can calculate all of the pixels at once; this means the second GPU will be about twice as fast as the first here. The third GPU has more than sufficient cores to calculate all pixels at once, but as we saw in problem 1, the extra cores will be of no use to us here. So the second and third GPUs will be equally fast for this problem.
- One issue with generically...
United States
Great Britain
India
Germany
France
Canada
Russia
Spain
Brazil
Australia
Singapore
Hungary
Philippines
Mexico
Thailand
Ukraine
Luxembourg
Estonia
Lithuania
Norway
Chile
South Korea
Ecuador
Colombia
Taiwan
Switzerland
Indonesia
Cyprus
Denmark
Finland
Poland
Malta
Czechia
New Zealand
Austria
Turkey
Sweden
Italy
Egypt
Belgium
Portugal
Slovenia
Ireland
Romania
Greece
Argentina
Malaysia
South Africa
Netherlands
Bulgaria
Latvia
Japan
Slovakia