Proving strict local maxima with interval arithmetic

At least until recently, floating point numbers on a computer were to me approximations which I couldn’t imagine trusting in any substantial way. They’re rounded! You have nonsense like 1.0 + 1e-16 == 1.0, or like 1.0/1 + 1.0/2 + 1.0/3 + ... being a convergent series! How could one obtain mathematically meaningful results?

However, I learned recently that IEEE 754 floating point numbers (the standard-issue floating point numbers on most hardware, sometimes called “floats”) were designed to support obtaining mathematically meaningful results. But, how can that be, given the nonsense above? These floating point numbers support doing what’s called interval arithmetic.[1]

Basically, instead of representing a number as a single float you instead represent a closed interval which contains that number. The basic operations (addition, multiplication, etc.) can be defined on intervals so that the bounds are set appropriately to give a minimally pessimistic interval which is guaranteed to contain the result.

If we naively try to implement intervals, we’d get the same nonsense: [1.0, 1.0] + [1e-16, 1e-16] == [1.0 + 1e-16, 1.0 + 1e-16] == [1.0, 1.0], which is not an interval which contains the actual sum. So, how is interval arithmetic possible on a computer? What does IEEE 754 give us? The answer is rounding modes. There are four rounding modes in IEEE 754: towards infinity, towards 0, towards negative infinity, and even-biased “normal” rounding (the nonsense-inducing-but-generally-what-you-want default). Floating point hardware computes with more precision than is represented (but exactly as much as is required), and these rounding modes are for dealing with these extra bits. So, the right thing to do with intervals is for the lower bounds to be computed rounding towards −∞ and the upper bounds to be computed rounding towards ∞. In the example above, the resulting interval would be [1.0, 1.0000000000000002], which is, in fact, an interval which contains the result.

Sometimes the resulting intervals can be too pessimistic. One example is x*x if x is [-1.0, 1.0]. While we can tell that the minimal interval for x*x is [0.0, 1.0], intervals don’t know the correlation between the x on either side of the multiplication sign, so the resulting interval would be [-1.0, 1.0].[2] But, if we split the interval for x into the two intervals [-1.0, 0.0] and [0.0, 1.0] and compute x*x on each, we can obtain the correct interval [0.0, 1.0] as the union of the images.

So, what can these intervals help us do? A first example is proving that a function f has no roots on an interval [a, b], where f is a function composed of interval-friendly operations. If we compute f([a, b]) and see that 0.0 is not in the resulting interval, then this is a proof that f has no root there. Otherwise, while it might be that 0.0 appears in the interval, because intervals are pessimistic, this does not constitute a proof that f has a root (consider x*x + 1 on [-2.0, 2.0]). But, we might be able to subdivide that interval and show that for each subinterval that 0.0 is not present. Each of these individual proofs would, together, constitute a proof of f having no root anywhere on [a, b].

Suppose we have a function f, its derivative f ′, and its second derivative f ′ ′ (with each of these continuous). If we want to prove that f has a strict local maximum in [a, b], we can try the following process:

  1. Show that f ′ has a root on that interval using the intermediate value theorem (possibly using f ′ and f ′ ′ with Newton’s method).
  2. Show that f ′ ′ has no root, using the process outlined above.
  3. Show that f ′ ′ is negative somewhere on that interval.
With these together, we can conclude that f must reach a local maximum on that interval, and, furthermore, because the second derivative is never zero, that f ′ is injective on that interval, and so it has exactly one root.

Later, I will provide code to compute this as well as show how to use automatic differentiation techniques to compute exact derivatives. With these techniques, I’m able to prove that x3x has a strict local maximum between −2 and −0.1 as well as prove that

1 + x2

has a strict local maximum between −0.1 and 0.1.

[1] Validated Numerics for Pedestrians by Warwick Tucker is a short paper about the subject.

[2] Affine arithmetic is a way to preserve and make use of correlations.