Practical sessions
You can try out these ideas to get deeper insights into the process of code optimization:
- Search for more hotspots using a profiler and try to reduce the calculation time for every instance even more.
The optimized code from Chapter 15 needs about 0.02 milliseconds for the creation of the joint matrices or dual quaternions of every model on a recent CPU. For 1,000 models drawn using the GPU instancing, the matrix data update takes about 20 milliseconds per frame. Maybe you will find more places where a couple of CPU cycles can be saved.
- Advanced difficulty: Use multithreading for the update of the matrix data.
You could try to update more than one model at once by parallelizing the joint matrix update process. This may be done by a simple worker or consumer/producer model, where you add the update tasks to a list or vector and let the threads take the topmost entry to work on the matrices. But beware, synchronization between threads can be difficult, and...