How a single magic number let Quake III normalize a million vectors a second, and why it still teaches us something about IEEE-754.
Open q_math.c from the Quake III Arena source release and you find this:
float Q_rsqrt(float number) {
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
return y;
}
The comments are part of the legend. Ten lines of code, one cryptic constant, and an inverse square root computed in roughly four times less time than 1.0f / sqrtf(x) on the CPUs of 1999. Lighting calculations and vector normalization in a 3D engine are dominated by 1/sqrt, so the speed-up was load-bearing for the entire renderer.
It is also one of the few times in graphics where the math is short enough to derive on a napkin.
Every shaded surface in a Quake III scene needs a unit normal. To turn a vector v = (x, y, z) into a unit vector you divide by its length:
$$\hat v = \frac{v}{\sqrt{x^2 + y^2 + z^2}}$$
Equivalently, multiply by 1/sqrt(...). Lambert and Phong shading, reflection vectors, the dot products that fall out of BSP traversal — they all want a normalized direction. Doing this per vertex per frame, on a Pentium II, is exactly the kind of inner loop where a 4x speed-up changes what your engine is allowed to render.
IEEE-754 single precision packs a float into 32 bits: 1 sign bit, 8 exponent bits (biased by 127), 23 mantissa bits. For a positive number y = 2^e * (1 + m), the bit pattern interpreted as an integer is
$$I_y = (e + 127) \cdot 2^{23} + m \cdot 2^{23}$$
Which is, surprisingly, a piecewise-linear approximation of log2(y), scaled and shifted:
$$I_y \approx 2^{23} \cdot (\log_2 y + 127)$$
That is the punchline. Reinterpreting a float as an int gives you a free, very rough logarithm.
Now what we want is y = 1 / sqrt(x), i.e. log2(y) = -0.5 * log2(x). In bit-pattern land:
$$I_y \approx 2^{23}(-0.5 \log_2 x + 127) = -0.5 \cdot I_x + 1.5 \cdot 127 \cdot 2^{23}$$
The constant on the right works out to about 0x5F400000. The -0.5 * I_x is precisely i >> 1 subtracted from a constant. So:
i = MAGIC - (i >> 1);
is computing log2(1/sqrt(x)) directly on the bits.
The piecewise-linear log approximation is wrong by a small amount that depends on where you are in the mantissa. You can pick the magic constant to minimize the worst-case error of the approximation across the input range. 0x5F400000 is the naive answer; 0x5F3759DF is roughly the empirically tuned answer that minimizes max relative error before the Newton step.
There is no closed-form derivation in the original code — Greg Walsh and Cleve Moler are the names usually attached to it, with deeper roots in Kahan and SGI numerics. Chris Lomont later showed 0x5F375A86 is mathematically slightly better. The Quake constant remains within 0.001% of optimal, and the difference is invisible after the Newton iteration.
After the bit trick, y is within a few percent of 1/sqrt(x). One Newton-Raphson iteration on f(y) = 1/y^2 - x (whose root is the inverse square root) is:
$$y_{n+1} = y_n \cdot (1.5 - 0.5 \cdot x \cdot y_n^2)$$
That is the last line of the function. One iteration is enough for game-quality lighting; a second would buy more precision but the function is fast precisely because it stops here.
The reason this snippet has outlived the engine is that it crisply demonstrates three things:
float as opaque, you miss optimizations like this, fast abs via & 0x7FFFFFFF, fast sign tests, and the entire family of denormal handling tricks.No, and that is also part of the lesson. Modern x86 has RSQRTSS and VRSQRTPS, ARM NEON has VRSQRTE, and every GPU has a hardware reciprocal-sqrt unit that runs in a couple of cycles per lane. Compilers will not auto-vectorize the bit-trick version. If you write Q_rsqrt in a modern raytracer you will be slower than a one-liner using intrinsics.
But the technique — derive a fast approximation from the bit layout, refine with Newton — is exactly what you reach for when you build SDF evaluators, soft shadows with cone tracing, or any inner loop that does not map to existing hardware. The Quake function is the canonical worked example, and once you have understood it, the rest of fast-math programming feels less like sorcery.
code/game/q_math.c), GPL release 2005.