Optimizing texel fetches
Even on a gaming PC, rendering over 200 crowd characters will take more than 4 milliseconds, which is a pretty long time, assuming you have a 16.6 ms frame time. So, why is crowd rendering so expensive?
Every time the GetPose
helper function is called, the shader performs 6 texel fetches. Since each vertex is skinned to four influences, that's 24 texel fetches per vertex! Even with a low poly model, that is a lot of texel fetches. Optimizing this shader will boil down to minimizing the number of texel fetches.
The following sections present different strategies you can use to minimize the number of texel fetches per vertex.
Limiting influences
A naive way to optimize texel fetches would be to add a branch to the shader code. After all, if the weight of the matrix is 0, why bother getting the pose? This optimization could be implemented as follows:
    mat4 pose0 = (weights.x < 0.0001)?      ...