# Proving strict local maxima with interval arithmetic

At least until recently, floating point numbers on a computer were
to me approximations which I couldn’t imagine trusting in any
substantial way. They’re rounded! You have nonsense like
`1.0 + 1e-16 == 1.0`, or like `1.0/1 + 1.0/2 + 1.0/3 +
...` being a convergent series! How could one obtain
mathematically meaningful results?

However, I learned recently that IEEE 754 floating point
numbers (the
standard-issue floating point numbers on most hardware, sometimes
called “floats”) were designed to support obtaining mathematically
meaningful results. But, how can that be, given the nonsense above?
These floating point numbers support doing what’s called
interval
arithmetic.^{[1]}

Basically, instead of representing a number as a single float you instead represent a closed interval which contains that number. The basic operations (addition, multiplication, etc.) can be defined on intervals so that the bounds are set appropriately to give a minimally pessimistic interval which is guaranteed to contain the result.

If we naively try to implement intervals, we’d get the same
nonsense: `[1.0, 1.0] + [1e-16, 1e-16] == [1.0 + 1e-16, 1.0 +
1e-16] == [1.0, 1.0]`, which is not an interval which contains the
actual sum. So, how is interval arithmetic possible on a computer?
What does IEEE 754 give us? The answer is rounding modes. There
are four rounding modes in IEEE 754: towards infinity, towards 0,
towards negative infinity, and even-biased “normal” rounding (the
nonsense-inducing-but-generally-what-you-want default). Floating
point hardware computes with more precision than is represented (but
exactly as much as is required), and these rounding modes are for
dealing with these extra bits. So, the right thing to do with
intervals is for the lower bounds to be computed rounding towards
−∞ and the upper bounds to be computed rounding towards
∞. In the example above, the resulting interval would be
`[1.0, 1.0000000000000002]`, which is, in fact, an interval
which contains the result.

Sometimes the resulting intervals can be too pessimistic. One
example is `x*x` if `x` is `[-1.0, 1.0]`.
While we can tell that the minimal interval for `x*x` is
`[0.0, 1.0]`, intervals don’t know the correlation between
the `x` on either side of the multiplication sign, so the
resulting interval would be `[-1.0,
1.0]`.^{[2]} But, if we split
the interval for `x` into the two intervals `[-1.0,
0.0]` and `[0.0, 1.0]` and compute `x*x` on each, we
can obtain the correct interval `[0.0, 1.0]` as the union of
the images.

So, what can these intervals help us do? A first example is proving
that a function `f` has no roots on an interval `[a, b]`,
where `f` is a function composed of interval-friendly operations.
If we compute ` f([a, b])` and see that

`0.0`is not in the resulting interval, then this is a

*proof*that

`f`has no root there. Otherwise, while it might be that

`0.0`appears in the interval, because intervals are pessimistic, this does not constitute a proof that

`f`has a root (consider

`x*x + 1`on

`[-2.0, 2.0]`). But, we might be able to subdivide that interval and show that for each subinterval that

`0.0`is not present. Each of these individual proofs would, together, constitute a proof of

`f`having no root anywhere on

`[a, b]`.

Suppose we have a function `f`, its derivative `f` ′, and its second
derivative `f` ′ ′ (with each of these continuous). If we want to
prove that `f` has a strict local maximum in `[a, b]`, we can
try the following process:

- Show that
`f`′ has a root on that interval using the intermediate value theorem (possibly using`f`′ and`f`′ ′ with Newton’s method). - Show that
`f`′ ′ has no root, using the process outlined above. - Show that
`f`′ ′ is negative somewhere on that interval.

`f`must reach a local maximum on that interval, and, furthermore, because the second derivative is never zero, that

`f`′ is injective on that interval, and so it has exactly one root.

Later, I will provide code to compute this as well as show how to
use automatic differentiation techniques to compute *exact*
derivatives. With these techniques, I’m able to prove that `x`^{3} − `x`
has a strict local maximum between −2 and −0.1 as well as prove that

1 |

1 + x^{2} |

has a strict local maximum between −0.1 and 0.1.

^{[1]}Validated Numerics for Pedestrians by Warwick Tucker is a short paper about the subject.

^{[2]}Affine arithmetic is a way to preserve and make use of correlations.