Unfortunately, one drawback of this approach is the lack of support for projected textures. One way around this issue is to render light sources that use projected textures separately from the rest of the light sources. This limit may not be all that bad depending on the rendered scene light setup.
In order to take full advantage of the GPU's vectorized math operations, all the light source values are going to be packed in groups of four. Here is a simple illustration that explains how four-three component variables can be packed into three-four component variables:
All variable packing should be done on the CPU. Keeping in mind that the constant registers of the GPU are the size of four floats, this packing is more efficient compared to the single light version, where most of the values use only three floats and waste the forth one.
Light positions X, Y, and Z components of each of the four light sources are packed into the following shader constants:
Light directions are separated to X, Y, and Z components as well. This group of constants is used for both, spot and capsule light source directions. For point lights make sure to set the respected value in each constant to 0
:
Light color is separated to R, G, and B components. For disabled light sources just set the respected values to 0
:
As before, you should combine the color and intensity of each light before passing the values to the GPU.
All four light ranges are stored in a single four-component constant:
All four lights' capsule lengths are stored in a single four-component constant. For noncapsule lights just store the respected value to 0
:
Spot light's cosine outer cone angle is again stored in a four-component constant. For nonspot light sources set the respected value to -2
:
Unlike the single spot light, for the inner cone angle we are going to store one over the spot light's cosine inner cone angle. For nonspot light sources set the respected value to 1
:
We are going to use two new helper functions that will help us calculate the dot product of four component vectors. The first one calculates the dot product between two groups of three-four component variable. The return value is a four-component variable with the four-dot product values. The code is as follows:
The second helper function calculates the dot product of three-four component variables with a single three-component variable:
Finally, the code to calculate the lighting for the four light sources is as follows:
float3 ToEye = EyePosition.xyz - position;
// Find the shortest distance between the pixel and capsules segment
float4 ToCapsuleStartX = position.xxxx - LightPosX;
float4 ToCapsuleStartY = position.yyyy - LightPosY;
float4 ToCapsuleStartZ = position.zzzz - LightPosZ;
float4 DistOnLine = dot4x4(ToCapsuleStartX, ToCapsuleStartY, ToCapsuleStartZ, LightDirX, LightDirY, LightDirZ);
float4 CapsuleLenSafe = max(CapsuleLen, 1.e-6);
DistOnLine = CapsuleLen * saturate(DistOnLine / CapsuleLenSafe);
float4 PointOnLineX = LightPosX + LightDirX * DistOnLine;
float4 PointOnLineY = LightPosY + LightDirY * DistOnLine;
float4 PointOnLineZ = LightPosZ + LightDirZ * DistOnLine;
float4 ToLightX = PointOnLineX - position.xxxx;
float4 ToLightY = PointOnLineY - position.yyyy;
float4 ToLightZ = PointOnLineZ - position.zzzz;
float4 DistToLightSqr = dot4x4(ToLightX, ToLightY, ToLightZ, ToLightX, ToLightY, ToLightZ);
float4 DistToLight = sqrt(DistToLightSqr);
// Phong diffuse
ToLightX /= DistToLight; // Normalize
ToLightY /= DistToLight; // Normalize
ToLightZ /= DistToLight; // Normalize
float4 NDotL = saturate(dot4x1(ToLightX, ToLightY, ToLightZ, material.normal));
//float3 finalColor = float3(dot(LightColorR, NDotL), dot(LightColorG, NDotL), dot(LightColorB, NDotL));
// Blinn specular
ToEye = normalize(ToEye);
float4 HalfWayX = ToEye.xxxx + ToLightX;
float4 HalfWayY = ToEye.yyyy + ToLightY;
float4 HalfWayZ = ToEye.zzzz + ToLightZ;
float4 HalfWaySize = sqrt(dot4x4(HalfWayX, HalfWayY, HalfWayZ, HalfWayX, HalfWayY, HalfWayZ));
float4 NDotH = saturate(dot4x1(HalfWayX / HalfWaySize, HalfWayY / HalfWaySize, HalfWayZ / HalfWaySize, material.normal));
float4 SpecValue = pow(NDotH, material.specExp.xxxx) * material.specIntensity;
//finalColor += float3(dot(LightColorR, SpecValue), dot(LightColorG, SpecValue), dot(LightColorB, SpecValue));
// Cone attenuation
float4 cosAng = dot4x4(LightDirX, LightDirY, LightDirZ, ToLightX, ToLightY, ToLightZ);
float4 conAtt = saturate((cosAng - SpotCosOuterCone) * SpotCosInnerConeRcp);
conAtt *= conAtt;
// Attenuation
float4 DistToLightNorm = 1.0 - saturate(DistToLight * LightRangeRcp);
float4 Attn = DistToLightNorm * DistToLightNorm;
Attn *= conAtt; // Include the cone attenuation
// Calculate the final color value
float4 pixelIntensity = (NDotL + SpecValue) * Attn;
float3 finalColor = float3(dot(LightColorR, pixelIntensity), dot(LightColorG, pixelIntensity), dot(LightColorB, pixelIntensity));
finalColor *= material.diffuseColor;
return finalColor;
Don't think that you are limited to four lights at a time just because of the GPU's constant size. You can rewrite CalcFourLights
to take the light constant parameters as inputs, so you could call this function more than once in a shader.
Some scenes don't use all three light types. You can remove either the spot or capsule light support if those are not needed (point lights are at the base of the code, so those have to be supported). This will reduce the shader size and improve performance.
Another possible optimization is to combine the ambient, directional, and multiple lights code into a single shader. This will reduce the total amount of draw calls needed and will improve performance.