\input papervorspann
\begin{document}
\begin{center}
\bf \Large Newton Iteration
\end{center}
\medskip
\tableofcontents
\section{Introduction}
Newton iteration is the basic method to determine zeros of univariate and
multivariate functions. I only discuss the univariate case. Let $f$ be a
real-valued function of a single variable; $f$ is assumed to be continuously
differentiable. The goal is to find a zero $x^*$ of
$f$. Consider an arbitrary value $x_0$. The tangent to $f$ at $(x_0,f(x_0))$
has the equation
\[ y = f(x_0) + (x - x_0) f'(x_0) \]
and intersects the $x$-axis in
\[ x_1 = x_0 - \frac{f(x_0)}{f'(x_0)} \eqndot \]
If $x_0$ is close to $x^*$ and $f$ is reasonably smooth in the vicinity of
$x^*$, $x_1$ will be even closer to $x^*$. This suggests the iteration
\[ x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)} \quad \text{for $k \ge 0$} \eqndot \]
The iteration above is called \emph{Newton Iteration}.
In Section~\ref{main theorem}
show that Newton iteration converges quadratically to $x^*$ if the
initial approximation $x_0$ is sufficiently close to $x^*$. It is important to
remember that Newton's method is not globally convergent. It needs a good
initial approximation.
In Section~\ref{square roots} we will study the important case of computing
square roots.
In Sections~\ref{arithmetic} and~\ref{Adaptive Precision}
we will leave the ideal world of exact arithmetic
and take rounding error into account.
Section~\ref{Newton and Floats} remains to be written.
\section{A Convergence Theorem}\label{main theorem}
\begin{theorem}[Deuflhardt/Hohman, Theorem 4.10, adapted to univariate case]
\label{th: main theorem}
Let $I= \range{a}{b}$ and let $f: I \mapsto \R$ be a continuously
differentiable function with non-vanishing derivate for all $x \in
\R$. Assume further that there is a constant\footnote{Newton's
method applied to $af(x)$ instead of $f(x)$ for an
arbitrary constant $a$ produces the same sequence of iterates. The requirement
on $\omega$ is invariant under multiplication by constants.}
\footnote{If $f$ is twice
differentiable, we can use $\omega = \frac{\max_{x \in I} f''(x)}{\min_{x \in I}
f'(x)}$ since $f'(y) - f'(x) = f''(\xi)$ for some $\xi$ between $x$ and $y$ by
the mean value theorem.} $\omega$ such that
for all $x, y \in I$
\[ \abs{\frac{f'(y) - f'(x)}{f'(x)}} \le \omega \abs{y - x}\]
Assume further that there is an $x^* \in I$ with $f(x^*) = 0$ and that our
initial approximation $x_0$ satisfies:
\[ x_0 \in B_\rho(x^*) \subset I \quad \text{where $\rho < 1/\omega$} \eqndot \]
Then $\sset{x_k}$ converges quadratically to $x^*$, all iterates are contained
in $B_\rho$, and $x^*$ is the only root of $f$ in $B_\rho$. More precisely,
\[ x_k \in B_\rho \quad\text{for all $k$}
\quad\text{and}\quad
\abs{x_{k+1} - x^*} \le \frac{\omega}{2} \abs{x_k - x^*}^2
\quad \text{and}\quad
\abs{x_k - x^*} \le \frac{\delta^{(2^k)}}{2^k \omega} \]
where $\delta = \rho/\omega < 1$.
\end{theorem}
\begin{proof}
Assume that $x_k \in B_\rho$. Then
\begin{align*}
\abs{x_{k+1} - x^*} &= \abs{x_k - \frac{f(x_k)}{f'(x_k)} - x^*} \\
&= \abs{x_k - x^* - \frac{f(x_k) - f(x^*)}{f'(x_k)}}\\
&= \abs{x_k - x^* + \frac{f'(\xi)(x_k - x^*)}{f'(x_k)}}\\
\intertext{where $\xi$ is a point between $x_k$ and $x^*$; recall that
$(f(x_k) - f(x^*))/(x_k - x^*) = f(\xi)$ for some $\xi$ between $x_k$ and $x^*$
by the mean value theorem.}
&= \abs{\frac{f'(x_k) - f'(\xi)}{f'(x_k)}}\abs{x_k - x^*} \\
& \le \omega \abs{(\xi - x^*)(x_k - x^*)} \le \omega \abs{x_k - x^*}^2 \eqndot
\end{align*}
This falls a factor $1/2$ short of what we claimed. A slightly more elaborate
argument gives us the improved bound. For any $s$ with $0 \le s \le 1$, let
$g(s) = f(x^* + s(x_k - x^*))$. Then
\begin{align*}
f(x_k) - f(x^*) &= g(1) - g(0) = \int_0^1 g'(s) \ ds \\
&= \int_0^1 f'(x^* + s(x_k - x^*)) (x_k - x^*) \ ds
\end{align*}
and hence
\begin{align*}
\abs{x_{k+1} - x^*} &= \abs{x_k - \frac{f(x_k)}{f'(x_k)} - x^*} \\
&= \abs{x_k - x^* - \frac{f(x_k) - f(x^*)}{f'(x_k)}}\\
&= \abs{\frac{f'(x_k)(x_k - x^*) + \int_0^1 f'(x^* + s(x_k -
x^*)) (x_k - x^*) \ ds }{f'(x_k)}}\\
&= \abs{(x_k - x^*) \cdot \int_0^1 \frac{f'(x_k) + f'(x^* +
s(x_k - x^*))}{f'(x_k)} \ ds }\\
&\le \abs{(x_k - x^*) \cdot \int_0^1 \omega s(x_k - x^*) \
ds }\\
&\le \omega \int_0^1 s \ ds \abs{x_k - x^*}^2
= \frac{\omega}{2} \abs{x_k - x^*}^2
\end{align*}
The remaining claims follow easily. We have
\[ \abs{x_0 - x^*} = \rho =
\delta/\omega = \frac{\delta^{(2^0)}}{2^0\omega} \eqndot \]
Also
\begin{align*}
\abs{x_{k+1} - x^*} &= \frac{\omega}{2} \abs{x_k - x^*}^2\\
&\le \frac{\omega}{2} \left(\frac{\delta^{(2^k)}}{2^k
\omega} \right)^2 \\
&= \frac{(\delta^{(2^k)})^2}{2^{k+1} \omega}
= \frac{\delta^{(2^k)\cdot 2}}{2^{k+1} \omega} =
\frac{\delta^{(2^{k+1})}}{2^{k+1} \omega}
\end{align*}
Uniqueness of $x^*$ also follows easily. Assume the existence of a second root
$x^{**}$ in $B_\rho(x^*)$. Both points are fixed points of the Newton
iteration. However, the Newton iteration started at one of the fix points must
converge to the other. Thus there is only one fix point.
\end{proof}
We also need an estimate on the decrease of $\abs{x_k - x_{k-1}}$, an
observable quantity.
\begin{lemma} Under the assumptions of Theorem~\ref{th: main theorem}
\[ \abs{x_{k+1} - x_{k}} \le 6 \omega \abs{x_k - x_{k-1}} ^2\eqndot \]
\end{lemma}
\begin{proof} We have $\abs{x_{k+1} - x^*} \le \omega/2 \abs{x_k - x^*} \le 1/2
\abs{x_k - x^*}$ since $x_k \in B_\rho(x^*)$ and $\rho < 1/\omega$. Thus
$\abs{x_{k+1} - x_k} \le \abs{x_{k+1} - x^*} + \abs{x^* - x_k} \le (3/2) \abs{x^* - x_k}$ and
$\abs{x_{k+1} - x_k} \ge \abs{x_{k} - x^*} - \abs{x^* - x_{k+1}} \ge (1/2) \abs{x^* - x_k}$. Hence
\[ \abs{x_{k+1} - x_k} \le \frac{3}{2}\abs{x^* - x_k} \le \frac{6 \omega}{4}
\abs{x^* - x_{k-1}}^2 \le \frac{6 \omega}{4}
\left( 2 \abs{x_k - x_{k-1}} \right)^2 = 6 \omega \abs{x_k - x_{k-1}}^2 \eqndot
\]
Observe that we use the contraction property for two subsequent iterations:
the iteration from $x_{k-1}$ to $x_k$ and the iteration from $x_k$ to
$x_{k+1}$.
\end{proof}
\section{The Computation of Square Roots}\label{square roots}
In this section, we will study the use of Newton Iteration to compute square
roots. The goal is to compute $\sqrt{a}$ for a positive real $a$. We write
$a = a' 2^{2m}$ where $1/4 < a' \le 1$ and $m \in \Z$. Then
$\sqrt{a'} 2^m$. So from now on we assume $1/4 < a \le 1$.
The square root of $a$ is a zero of $f(x) = x^2 - a$. We have $f'(x) = 2x$
and hence use the iteration formula
\[ x_{k+1} = x_k - \frac{x_k^2 - a}{2x_k} = \frac{1}{2} (x_k + \frac{a}{x_k})
\eqndot \]
We use $I = \range{1/4}{1}$. For $x,y \in I$, we have $\abs{(f'(y) -
f'(x))/f'(x)} = \abs{(y-x)/x} \le 4(y-x)$. Thus we may use $\omega = 4$ and
hence need to start with an approximation $x_0$ with $x_0 - \sqrt{a} < 1/4$.
We can use $x_0 = 1$ if $\sqrt{a} > 3/4$ or $a > 9/16$ and $x_0 = 0.6$ if
$1/2 \le \sqrt{a} \le 3/4$.
We can obtain slightly stronger results if we study the iteration directly
without relying on Theorem~\ref{th: main theorem}.
\begin{theorem} Let $1/4 \le a \le 1$ and let $x_0 = 1$. Then $\sset{x_k}$ is
monotonically decreasing, $\sqrt{a} \le x_k \le 1$ for all $k$,
$x_{k+1} - \sqrt{a} \le (x_k - \sqrt{a})^2$,
and $x_k- \sqrt{a} \le 2^{-(2^k)}$.
\end{theorem}
\begin{proof} Assume $\sqrt{a} \le x_k \le 1$. set $d_k = x_k - \sqrt{a}$. We have
\begin{align*}
d_{k+1} &= x_{k+1} - \sqrt{a} = \frac{1}{2}(x_k + \frac{a}{x_k}) - \sqrt{a}\\
&=\frac{1}{2x_k}(x_k^2 + a - 2 \sqrt{a} x_k) \\
&= \frac{1}{2x_k}(x_k - \sqrt{a})^2 = \frac{1}{2x_k}d_k^2 \eqndot
\end{align*}
Thus $d_k^2/2 \le d_{k+1} \le d_k^2$ since $1/2 \le 1/(2 x_k) \le 1$.
In particular, $d_{k+1} \ge 0$ and
$d_{k+1} \le d_k/2$ and hence $\sset{x_k}$ is converging from above to
$\sqrt{a}$.
Also $d_0 \le 1/2 = 2^{-(2^0)}$ and hence
\[d_{k+1} \le d_k^2 \le \left(2^{-(2^k)}\right)^2 =
2^{-(2^{k+1})} \eqndot \]
\end{proof}
We can also improve our statement about the observable quantities.
\begin{lemma} Assume $\sqrt{a} \le x_k \le 1$. Then
\[ \abs{x_{k+1} - x_{k}} \le 4 \abs{x_k - x_{k-1}} ^2\eqndot \]
\end{lemma}
\begin{proof} As in the preceding Lemma let $d_k = x_k - \sqrt{a}$. Then
$d_{k+1} \le d_k^2$ by the preceding Lemma and hence
\[ x_k - x_{k+1} \le x_k - \sqrt{a} = d_k \le d_{k-1}^2 \eqndot \]
It therefore suffices to show $d_{k-1}\le 2(d_{k-1} - d_k)$ or $d_k \le
d_{k-1}/2$. This follows from $d_k \le d_{k-1}^2 \le d_{k-1}/2$.
\end{proof}
\section{Square Roots with Floating Point Arithmetic}\label{arithmetic}
We assume that we use floating point arithmetic with relative precision $\epsilon$,
i.e., the computed result of any operation differs from the true result
by at most $\epsilon$ times the exact result. In other words, for any
$\circ \in \sset{+,-,*,/ }$ we have
\[ \abs{ x \widetilde{\circ} y - x \circ y } \le \epsilon \cdot ( x \circ y)
\eqndot \]
or
\[ x \widetilde{\circ} y = (x \circ y)(1 + \delta) \quad \text{for some
$\delta$ with $\abs{\delta} \le \epsilon$} \eqndot \]
In the following we use $\oplus$, $\ominus$, $\otimes$,
$\oslash$ to denote floating point operations. \medskip
Newton Iteration with floating point arithmetic produces a sequence
$(\widehat{x}_{k})_{k \in \N}$ of iterates, where $\widehat{x}_0 = 1$ and
\[ \widehat{x}_{k+1} = (\widehat{x}_k \oplus (a\oslash \widehat{x}_k))\oslash 2 = (\widehat{x}_k \oplus
(a\oslash \widehat{x}_k))/2 \eqndot \]
Observe that a division by two incurs no round-off error. It amounts to a
simple decrement of the exponent. We use $x_{k+1}$ to denote \emph{the exact value of
the next iterate}, i.e.,
\begin{equation}\label{def of $x_{k+1}$}
x_{k+1} = (\widehat{x}_k + (a / \widehat{x}_k))/2 = \sqrt{a} +
\frac{1}{2\widehat{x}_k} (\widehat{x}_k - \sqrt{a})^2 \eqndot
\end{equation}
We also define $x_0 = 1$. Note that the sequence $(x_k)_{k \in \N}$ is not the
sequence obtained by an exact Newton iteration, i.e., our notation differs from
the one used in the preceding section.
We need to discuss the following issues:
\begin{itemize}
\item What is the distance between $\widehat{x}_{k+1}$ and $x_{k+1}$?
\item What is an appropriate termination condition?
\item Will we always terminate?
\item How good an approximation to $\sqrt{a}$ is the last iterate?
\item Is there an a posteriori estimate for the quality of an approximation.
\item How does one choose the working precision as a function of the desired
quality of the result.
\end{itemize}
We discuss the items in term. \medskip
We first show\footnote{Our estimates are crude and can be
improved. However, it took me several hours to write this section and I got tired
at the end.}
\[ \abs{\widehat{x}_{k} - x_k} \le (10/4) \epsilon\quad\text{and}\quad
\sqrt{a}/2 \le \widehat{x}_k \le 4/3 \]
for all $k$. Observe that the iterates of the exact Newton iteration lie
between $\sqrt{a}$ and $1$. For the iterates of the approximate Newton
iteration, we have only weaker inclusions.
The claims are trivially true for $k = 0$. So assume the claim holds for
some $k$. From equation (\ref{def of $x_{k+1}$}) we conclude
$\sqrt{a} \le x_{k+1} \le \sqrt{a} + (1/\sqrt{a})(\sqrt{a}/2)^2 = (5/4)\sqrt{a}
\le 5/4$. We next bound $\abs{\widehat{x}_{k+1} - x_{k+1}}$.
We have
\begin{align*}
\widehat{x}_{k+1} &= (\widehat{x}_k \oplus (a\oslash \widehat{x}_k))\oslash 2 \\
&= (\widehat{x}_k + (a/\widehat{x}_k)(1 + \delta_1))(1 + \delta_2)/2
\intertext{with $\abs{\delta_i} \le
\epsilon$ for $i = 1,2$. Observe that a division by two incurs no round-off error since it
amounts to a decrement of the exponent.}
&= (\widehat{x}_k + (a/\widehat{x}_k) + (a/\widehat{x}_k)\delta_1))(1 + \delta_2)/2\\
&= x_{k+1} + x_{k+1}\delta_2 + (a/\widehat{x}_k)\delta_1(1 + \delta_2)/2 \eqndot
%\end{align*}
\intertext{Thus}
%\begin{align*}
\abs{\widehat{x}_{k+1} - x_{k+1}} &\le x_{k+1}\epsilon +
(a/\widehat{x}_k)\epsilon(1 + \epsilon_2)/2 = (x_{k+1}+
(a/\widehat{x}_k)(1 + \epsilon_2)/2) \cdot \epsilon\\
&\le ((5/4) + 2\sqrt{a}(1 + \epsilon)/2)\epsilon \le (9/4 +
\epsilon)\epsilon \le (10/4) \epsilon
\intertext{where the last inequality assumes $\epsilon \le 1/4$ and hence}
\widehat{x}_{k+1} &\le x_{k+1} + (10/4) \epsilon \le 5/4 + (10/4) \epsilon
\le 4/3\\
\intertext{where the last inequality assume $\epsilon \le 1/32$ and}
\widehat{x}_{k+1} &\ge x_{k+1} - (10/4) \epsilon \ge \sqrt{a} - (10/4) \epsilon
\ge \sqrt{a}/2 \eqndot
\end{align*}\medskip
With exact arithmetic, Newton's method converges quadratically and the distance
between subsequent iterates goes to zero quadratically. We now have an error
of at most $(10/4) \epsilon$ in the computation of the iterates. We should
therefore expect that the convergence breaks down once the distance between
adjacent iterates comes close to $(10/4) \epsilon$. We therefore terminate once
\[ \abs{\widehat{x}_{k+1} \ominus \widehat{x}_k } \le \beta \epsilon \]
for a constant $\beta$ still to be determined. For what choice of $\beta$
is termination guaranteed? We have
\begin{align*}
\abs{\widehat{x}_{k+1} - \sqrt{a}} &\le (10/4)\epsilon + \abs{x_{k+1} -
\sqrt{a}}\\
&\le (10/4)\epsilon + \abs{\widehat{x}_k - \sqrt{a}}^2\\
&\le (10/4)\epsilon + (5/6)\abs{\widehat{x}_k - \sqrt{a}}
\intertext{since $\abs{\widehat{x}_k - \sqrt{a}} \le 5/6$ (Observe: if
$\widehat{x}_k \le 1/2$ and hence $\widehat{x}_k \le \sqrt{a}$, we have
$\abs{\widehat{x}_k - \sqrt{a}} \le \sqrt{a}/2 \le 1/2$. If $\widehat{x}_k \ge
1/2$, we have $\widehat{x}_k,\sqrt{a} \in [1/2,4/3]$ and hence
$\abs{\widehat{x}_k - \sqrt{a}} \le 5/6$.)}
&\le (11/12)\abs{\widehat{x}_k - \sqrt{a}}
\intertext{provided that $(1/12)\abs{\widehat{x}_k - \sqrt{a}} \ge (10/4)\epsilon$
or $\abs{\widehat{x}_k - \sqrt{a}} \ge 30\epsilon$. The iteration is therefore
guaranteed to reach a $\widehat{x}_k$ with
$\abs{\widehat{x}_k - \sqrt{a}} < 30\epsilon$. The argument above also, shows
that once $\abs{\widehat{x}_k - \sqrt{a}} < 30\epsilon$, this will stay true
forever. Then}
\abs{\widehat{x}_{k+1} \ominus \widehat{x}_k}
&\le \abs{\widehat{x}_{k+1} -\widehat{x}_k}(1 + \epsilon)\\
&\le (\abs{\widehat{x}_{k+1} - x_{k+1}} + \abs{x_{k+1} - \sqrt{a}} + \abs{\sqrt{a} -
\widehat{x}_k})\cdot(1 + \epsilon)\\
&\le [(10/4)\epsilon + \abs{\widehat{x}_{k} - \sqrt{a}}^2 + \abs{\widehat{x}_k
- \sqrt{a}}](1 + \epsilon)\\
&\le [(10/4)\epsilon + (30\epsilon)^2 + 30\epsilon](1 + \epsilon)\\
&< 33\epsilon \eqndot
\end{align*}%
Thus $\beta = 33 \epsilon$ guarantees termination.\medskip
We finally come to the question, we are really interested in. How good an
approximation of $\sqrt{a}$ do we have at termination? We have
$\abs{\widehat{x}_{k+1} - \widehat{x}_k} \le 33\epsilon$ and hence
$\abs{x_{k+1} - \widehat{x}_k} \le \abs{x_{k+1} - \widehat{x}_{k+1}} +
\abs{\widehat{x}_{k+1} - \widehat{x}_k} \le 36\epsilon$. Thus
\begin{align*}
\abs{\sqrt{a} - \widehat{x}_k} &\le \abs{\sqrt{a} - x_{k+1}} + \abs{x_{k+1} -
\widehat{x}_k}\\
&\le \abs{\sqrt{a} - \widehat{x}_{k}}^2 + 36\epsilon
\end{align*}
Let $\delta = \abs{\sqrt{a} - \widehat{x}_k}$. Then $\delta \le \delta^2 + 36
\epsilon$ or $(\delta - 1/2)^2 \ge 1/4 - 36 \epsilon = 1/4(1 - 144
\epsilon$. Thus $\delta \le 1/2 - 1/2\sqrt{1 - 144\epsilon} \le 1/2 - 1/2(1 -
73 \epsilon) \le 37\epsilon$. For the last inequality observe, that
$(1 - 37\epsilon)^2 = 1 - 74 \epsilon + 37^2 \epsilon^2 \le 1 - 73 \epsilon$
provided that $\epsilon \le 1/37$.
\bigskip
We summarize the discussion
\begin{itemize}
\item We use floating point arithmetic with precision $\epsilon$ with $\epsilon
\le 1/37$. This is not a restriction; already double precision floating
point arithmetic gives us $\epsilon = 2^{-53}$.
\item We set $x_0 = 0$ and $x_{k+1} = (x_k \oplus a \oslash x_k)/2$.
\item We terminate as soon as $\abs{x_{k+1} \ominus x_k}\le 33 \epsilon$.
\item At termination we have $\abs{x_k - \sqrt{a}} \le 37\epsilon$.
\item If we want to guarantee maximal error $2^{-m}$ we choose $\epsilon =
2^{-(m + 6)}$. Observe that $2^6 = 64 \ge 37$.
\end{itemize}
We next estimate the running time. Assume that we want to guarantee an
error of $2^{-m}$. Then we work with mantissa length $m + 6$ and hence the cost
of an iteration is $O(M(m))$, where $M(m)$ is the time to divide $m$ bit
integers. LEDA uses Karazuba-Offman multiplication with $M(m) = m^{\log
3}$. Newton\apostrophe s method converges quadratically and hence the total
running time is $O(M(m)\log m)$. \bigskip
Many of our estimates are fairly crude and we might actually compute an
estimate that is better than what we claim (not much better; we should not
expect an error that is less than $\epsilon$ and hence our estimates are off by
at most the factor $37$). We next show that it is easy to give an a posteriori
estimate of the quality of the final iterate. We have
\[ \abs{x - \sqrt{a}} = \frac{\abs{(x - \sqrt{a})(x + \sqrt{a})}}{\abs{x + \sqrt{a}}} \le 2 \abs{x^2
- a} \eqndot \]
We close this section we two test programs which illustrate Newton\paragraph 's
method.
\paragraph{Quality of Final Iterate: A Demo Program:}
We implement the method as described above and output the
claimed quality and the actual quality of the final iterate.
<>=
#include
main(){
while (true)
{ int m = read_int("\n\n\nmaximal error less than 2^{-m}; m = ");
double a = read_real("\na double value between 1/4 and 1, a = ");
float T = used_time();
bigfloat::set_precision(m + 6);
bigfloat::set_output_precision(m + 6);
bigfloat xold;
bigfloat xnew = 1;
bigfloat A(a);
bigfloat epsilon = bigfloat(1,-m -6);
bigfloat Epsilon = 33*epsilon;
int iterations = 0;
do{ xold = xnew;
xnew = (xold + A/xold)/2;
iterations++;
} while ( abs( xold - xnew ) > Epsilon );
cout << "\n\nclaimed error is less than 2^L where L = " << -m ;
bigfloat::set_precision(4*m);
bigfloat est = 2.0 * abs(xnew*xnew - A);
cout << "\nactual error is less than 2^L where L = " << ilog2(est);
cout <<"\nnumber of iterations = " << iterations;
cout << "\nthis took " << used_time(T) << " seconds";
}
return 0;
}
@ The following is a sample run of the program. We see that the actual error is
slightly better than what we claim.
@c
maximal error less than 2^{-m}; m = 100000
a double value between 1/4 and 1, a = 0.56543254
claimed error is less than 2^L where L = -100000
actual error is less than 2^L where L = -100004
number of iterations = 17
this took 1.79 seconds
@
\paragraph{Quadratic Convergence:} We implemented the method described
above. In each iteration we report the quality of the iterate. This nicely
shows the quadratic convergence of the method.
<>=
#include
main(){
while (true)
{ int m = read_int("\n\n\nmaximal error less than 2^{-m}; m = ");
double a = read_real("\na double value between 1/4 and 1, a = ");
float T = used_time();
bigfloat::set_precision(m + 6);
bigfloat::set_output_precision(m + 6);
bigfloat xold;
bigfloat xnew = 1;
bigfloat A(a);
bigfloat epsilon = bigfloat(1,-m -6);
bigfloat Epsilon = 33*epsilon;
int iterations = 0;
do{ bigfloat::set_rounding_mode(TO_INF);
bigfloat::set_precision(4*m);
bigfloat est = 2.0 * abs(xnew*xnew - A);
cout << "\n\niteration = " << iterations;
if ( est == 0 ) cout << " actual error is zero";
else cout << " actual error is less than 2^L where L = " << ilog2(est);
bigfloat::set_rounding_mode(TO_NEAREST);
bigfloat::set_precision(m + 6);
xold = xnew;
xnew = (xold + A/xold)/2;
iterations++;
} while ( abs( xold - xnew ) > Epsilon );
cout << "\n\nthis took " << used_time(T) << " seconds";
}
return 0;
}
@ The following is a sample run of the program.
@c
maximal error less than 2^{-m}; m = 100000
a double value between 1/4 and 1, a = 0.56543254
iteration = 0 actual error is less than 2^L where L = 0
iteration = 1 actual error is less than 2^L where L = -3
iteration = 2 actual error is less than 2^L where L = -9
iteration = 3 actual error is less than 2^L where L = -20
iteration = 4 actual error is less than 2^L where L = -42
iteration = 5 actual error is less than 2^L where L = -88
iteration = 6 actual error is less than 2^L where L = -178
iteration = 7 actual error is less than 2^L where L = -358
iteration = 8 actual error is less than 2^L where L = -719
iteration = 9 actual error is less than 2^L where L = -1441
iteration = 10 actual error is less than 2^L where L = -2885
iteration = 11 actual error is less than 2^L where L = -5773
iteration = 12 actual error is less than 2^L where L = -11549
iteration = 13 actual error is less than 2^L where L = -23101
iteration = 14 actual error is less than 2^L where L = -46205
iteration = 15 actual error is less than 2^L where L = -92412
iteration = 16 actual error is less than 2^L where L = -100004
this took 2.02 seconds
@
\section{Adaptive Precision}\label{Adaptive Precision}
Our implementation of Newton\apostrophe s method is very wasteful. In this
section we will learn how to improve it by the method of \emph{adaptive
precision}.
The idea is simple. We start with a working precision of, say, $2^{-16}$, and
start Newton\apostrophe s method. When our iterates stop to improve, we double
the exponent of the working precision (if this brings us above $m + 6$ we take
$m + 6$) and continue. In this way, only the last iterations use the full
precision and hence we gain enormously.
We have only a constant number of iterations for each working precision and
hence total running time is $O(\sum_{1 \le i \le \ceil{\log m}} M(2^i)) =
O(M(m))$. We thus safe a $\log m$-factor.
<>=
#include
main(){
while (true)
{ int m = read_int("\n\n\nmaximal error less than 2^{-m}; m = ");
double a = read_real("\na double value between 1/4 and 1, a = ");
float T = used_time();
int ma = 16; // actual precision
bigfloat::set_precision(ma);
bigfloat xold;
bigfloat xnew = 1;
bigfloat A(a);
int iterations = 0;
while ( true )
{ bigfloat epsilon = bigfloat(1,-ma - 6);
bigfloat Epsilon = 33*epsilon;
do{
xold = xnew;
xnew = (xold + A/xold)/2;
iterations++;
bigfloat::set_rounding_mode(TO_INF);
bigfloat::set_precision(4*ma);
bigfloat est = 2.0 * abs(xnew*xnew - A);
cout << "\n\niteration = " << iterations;
if ( est == 0 ) cout << " actual error is zero";
else cout << " actual error is less than 2^L where L = " << ilog2(est);
bigfloat::set_rounding_mode(TO_NEAREST);
bigfloat::set_precision(ma);
} while ( abs( xold - xnew ) > Epsilon );
if ( ma >= m + 8 ) break;
ma = leda_min(2*ma,m+8); bigfloat::set_precision(ma);
cout << "\nincrease of working precision to " << ma << " bits.";
}
cout << "\n\nthis took " << used_time(T) << " seconds";
}
return 0;
}
@ The following is a sample run of the program. Observe that we are more
ambitious now and ask for the computation of the first $10^6$ binary digits.
@c
maximal error less than 2^{-m}; m = 1000000
a double value between 1/4 and 1, a = 0.56543254
iteration = 1 actual error is less than 2^L where L = -3
iteration = 2 actual error is less than 2^L where L = -9
iteration = 3 actual error is less than 2^L where L = -18
iteration = 4 actual error is less than 2^L where L = -18
increase of working precision to 32 bits.
iteration = 5 actual error is less than 2^L where L = -33
iteration = 6 actual error is less than 2^L where L = -33
increase of working precision to 64 bits.
iteration = 7 actual error is less than 2^L where L = -67
iteration = 8 actual error is less than 2^L where L = -67
increase of working precision to 128 bits.
iteration = 9 actual error is less than 2^L where L = -131
iteration = 10 actual error is less than 2^L where L = -131
increase of working precision to 256 bits.
iteration = 11 actual error is less than 2^L where L = -255
iteration = 12 actual error is less than 2^L where L = -255
increase of working precision to 512 bits.
iteration = 13 actual error is less than 2^L where L = -512
iteration = 14 actual error is less than 2^L where L = -512
increase of working precision to 1024 bits.
iteration = 15 actual error is less than 2^L where L = -1025
iteration = 16 actual error is less than 2^L where L = -1025
increase of working precision to 2048 bits.
iteration = 17 actual error is less than 2^L where L = -2048
iteration = 18 actual error is less than 2^L where L = -2048
increase of working precision to 4096 bits.
iteration = 19 actual error is less than 2^L where L = -4095
iteration = 20 actual error is less than 2^L where L = -4095
increase of working precision to 8192 bits.
iteration = 21 actual error is less than 2^L where L = -8191
iteration = 22 actual error is less than 2^L where L = -8191
increase of working precision to 16384 bits.
iteration = 23 actual error is less than 2^L where L = -16383
iteration = 24 actual error is less than 2^L where L = -16383
increase of working precision to 32768 bits.
iteration = 25 actual error is less than 2^L where L = -32770
iteration = 26 actual error is less than 2^L where L = -32770
increase of working precision to 65536 bits.
iteration = 27 actual error is less than 2^L where L = -65535
iteration = 28 actual error is less than 2^L where L = -65535
increase of working precision to 131072 bits.
iteration = 29 actual error is less than 2^L where L = -131072
iteration = 30 actual error is less than 2^L where L = -131072
increase of working precision to 262144 bits.
iteration = 31 actual error is less than 2^L where L = -262145
iteration = 32 actual error is less than 2^L where L = -262145
increase of working precision to 524288 bits.
iteration = 33 actual error is less than 2^L where L = -524289
iteration = 34 actual error is less than 2^L where L = -524289
increase of working precision to 1000008 bits.
iteration = 35 actual error is less than 2^L where L = -1000007
iteration = 36 actual error is less than 2^L where L = -1000007
this took 39.42 seconds
@ We see that we do two iterations for each precision. Why two? The first
iteration brings the iterate close enough to $\sqrt{a}$ and in the second
iteration the old and the new iterate are both close enough to $\sqrt{a}$ to
trigger termination. The fact that we need one additional iteration in order to
detect termination is no problem in the non-adaptive version of
Newton\apostrophe s method. It is a problem for the adaptive version because it
brings about one additional iteration for each fixed precision. Thus it
effectively doubles the number of iterations.
The problem can be solved by a more careful analysis (which is actually simpler
of what we have done above) and a modified algorithm. Assume inductively that
we have an approximation $\widehat{x}_k$ with $d_k = \abs{\sqrt{a} -
\widehat{x}_k} \le 2^{-m_k}$. For $k = 0$, we set $\widehat{x}_0 = 1$ and have
$d_0 \le 1/2$. Thus we can use $m_0 = 1$.
For the computation of $\widehat{x}_{k+1}$ we use mantissa length $2m_k +
2$. Thus $\epsilon = 2^{-2m_k -2}$. Then
\begin{align*}
d_{k+1} &= \abs{\widehat{x}_{k+1} - \sqrt{a} }\\
&\le \abs{\widehat{x}_{k+1} - x_{k+1}} + \abs{x_{k+1} - \sqrt{a} }\\
&\le \abs{\widehat{x}_{k+1} - x_{k+1}} + \abs{x_{k} - \sqrt{a} }^2\\
&\le (10/4)\epsilon + d_k^2 \\
&\le 2^{-2m_k} + 2^{-2m_k} = 2^{-2m_k + 1}
\end{align*}
We may therefore set $m_{k+1} = 2m_k - 1$. OUCH, this does not quite work for
the first iteration. Observe that $m_1 = 1 = m_0$ and hence we are not going to
improve. We need to start with an iteration which guarantees $m_0 = 2$. We
start with $x = 1$ if $\sqrt{a} \ge 3/4$ and with $x = 3/4$ otherwise.
Our considerations lead to the
following program.
<>=
#include
main(){
while (true)
{ int m = read_int("\n\n\nmaximal error less than 2^{-m}; m = ");
double a = read_real("\na double value between 1/4 and 1, a = ");
float T = used_time();
bigfloat xold;
bigfloat xnew;
if ( a >= 0.75 ) xnew = 1; else xnew = 0.75;
int k = 0;
int mk = 2; // abs(xnew - sqrt{a}) <= 2^{-mk}
bigfloat A(a);
do
{ mk = leda_min(2*mk - 1,m);
bigfloat::set_precision(mk + 3);
xold = xnew;
xnew = (xold + A/xold)/2;
k++;
bigfloat::set_rounding_mode(TO_INF);
bigfloat::set_precision(4*mk);
bigfloat est = 2.0 * abs(xnew*xnew - A);
cout << "\n\niteration = " << k << " claimed error is at ";
cout << "most 2^-mk where mk= " << mk;
if ( est == 0 ) cout << "\n actual error is zero";
else cout << "\n actual error is less than 2^L where L = " << ilog2(est);
bigfloat::set_rounding_mode(TO_NEAREST);
bigfloat::set_precision(mk+3);
} while ( mk < m );
cout << "\n\nthis took " << used_time(T) << " seconds";
}
return 0;
}
@ And again the output of a sample run.
@c
maximal error less than 2^{-m}; m = 1000000
a double value between 1/4 and 1, a = 0.56543254
iteration = 1 claimed error is at most 2^-mk where mk= 3
actual error is less than 2^L where L = -7
iteration = 2 claimed error is at most 2^-mk where mk= 5
actual error is less than 2^L where L = -7
iteration = 3 claimed error is at most 2^-mk where mk= 9
actual error is less than 2^L where L = -18
iteration = 4 claimed error is at most 2^-mk where mk= 17
actual error is less than 2^L where L = -18
iteration = 5 claimed error is at most 2^-mk where mk= 33
actual error is less than 2^L where L = -35
iteration = 6 claimed error is at most 2^-mk where mk= 65
actual error is less than 2^L where L = -67
iteration = 7 claimed error is at most 2^-mk where mk= 129
actual error is less than 2^L where L = -131
iteration = 8 claimed error is at most 2^-mk where mk= 257
actual error is less than 2^L where L = -260
iteration = 9 claimed error is at most 2^-mk where mk= 513
actual error is less than 2^L where L = -515
iteration = 10 claimed error is at most 2^-mk where mk= 1025
actual error is less than 2^L where L = -1027
iteration = 11 claimed error is at most 2^-mk where mk= 2049
actual error is less than 2^L where L = -2051
iteration = 12 claimed error is at most 2^-mk where mk= 4097
actual error is less than 2^L where L = -4099
iteration = 13 claimed error is at most 2^-mk where mk= 8193
actual error is less than 2^L where L = -8198
iteration = 14 claimed error is at most 2^-mk where mk= 16385
actual error is less than 2^L where L = -16388
iteration = 15 claimed error is at most 2^-mk where mk= 32769
actual error is less than 2^L where L = -32773
iteration = 16 claimed error is at most 2^-mk where mk= 65537
actual error is less than 2^L where L = -65538
iteration = 17 claimed error is at most 2^-mk where mk= 131073
actual error is less than 2^L where L = -131074
iteration = 18 claimed error is at most 2^-mk where mk= 262145
actual error is less than 2^L where L = -262147
iteration = 19 claimed error is at most 2^-mk where mk= 524289
actual error is less than 2^L where L = -524295
iteration = 20 claimed error is at most 2^-mk where mk= 1000000
actual error is less than 2^L where L = -1000003
this took 9.81 seconds
@ And finally the running time without the computation of the actual error.
<>=
#include
main(){
while (true)
{ int m = read_int("\n\n\nmaximal error less than 2^{-m}; m = ");
double a = 0.56545254; //read_real("\na double value between 1/4 and 1, a = ");
cout << "\nwe compute root of " << a;
float T = used_time();
bigfloat xold;
bigfloat xnew;
if ( a >= 0.75 ) xnew = 1; else xnew = 0.75;
int k = 0;
int mk = 2; // abs(xnew - sqrt{a}) <= 2^{-mk}
bigfloat A(a);
do
{ mk = leda_min(2*mk - 1,m);
bigfloat::set_precision(mk + 3);
xold = xnew;
xnew = (xold + A/xold)/2;
} while ( mk < m );
cout << "\n\nthis took " << used_time(T) << " seconds";
}
return 0;
}
@ The output is
@c
maximal error less than 2^{-m}; m = 1000000
we compute root of 0.565453
this took 8.61 seconds
@
\section{Newton Iteration with Floating Point Arithmetic}\label{Newton and Floats}
I leave it to one (or more) of you to write this section. I would like to treat the
following topics.
\begin{enumerate}
\item Extend the preceding section to Newton iteration in general. Is it also
possible to extend the adaptive precision scheme. What do we need to know about
the first iterate?
\item Uspensky's algorithm for root isolation can be combined with Newton's
method to compute arbitrary precision approximations of real roots. We run
Uspensky's method until we have in interval $I$ containing a single root of $P$
and no root of $P'$ and $P''$. Then $P$ is a convex function and hence Newton's
method will converge from an arbitrary starting point. However, it will not
necessarily converge quadratically.
\begin{itemize}
\item Can we modify Uspensky's algorithm so that it delivers isolating
intervals which guarantee quadratic convergence of Newton's method.
\item Or should we run Newton's method right from the beginning. We would only
have a guarantee of linear convergence at the beginning. What does this mean
for the adaptive scheme?
\end{itemize}
\item In the separation bound approach to sign determination, we need to
approximate an expression with absolute error bounded by $B/2$ where
$B = u(E)/u(E)^{2^k}$. What precision is required in a floating point computation
to guarantee this error bound? What does this imply for the complexity of
computing the approximation?
\end{enumerate}
\end{document}