A floating-point number is represented by three quantities: the sign, the mantissa, and the exponent:

with
and
.
is called the mantissa,
the basis, and e the exponent, with
.
is called the mantissa length. The condition
makes the representation unique and saves, in the binary case (
), one bit.
Two-floating point zeros,
and
, exist, both represented by the mantissa
.
On a typical Intel processor,
. To represent a number in the float type, 64 bits are used, namely, 1 bit for the sign,
bits for the mantissa, and
bits for the exponent
. The upper bound
for the exponent is consequently
.
With this data, the smallest positive representable number is
, and the largest
.
Note that floating-point numbers are not equally spaced in
. There is, in particular, a gap at zero (see also [29]). The distance between
and the first positive number is
, while the distance between the first and the second is smaller by a factor
. This effect, caused by the normalization
, is visualized in Figure 2.1:

Figure 2.1: The floating-point gap at zero. Here

This gap is filled equidistantly with subnormal floating-point numbers to which such a result is rounded. Subnormal floating-point numbers have the smallest possible exponent and do not follow the normalization convention,
.