PDF
\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\)

Krylov Methods and Polynomials

Ron Morgan

Abstract

It is well-known that all Krylov methods have polynomials underlying them. Here these polynomials are used in new ways to develop iterative methods. There are several applications including systems of linear equations with multiple right-hand sides.

1 Polynomial Preconditioning

Polynomial preconditioning goes way back, see for instance [6, 11, 10, 2, 12], but has never become standard. In [5, 8], polynomial preconditioning is made a more practical method with these improvements:

With polynomial preconditioning, \(Ax = b\) becomes

\begin{equation} Ap(A)y = b, \ \ \ \ x = p(A)y. \label {Eq1} \end{equation}

Polynomial preconditioning works because it transforms the spectrum of \(A\) into a better spectrum. Another reason why it is effective is that there is more power per iteration and much less orthogonalization. Also, there is greater opportunity for parallelism. However, polynomial preconditioning may not needed for fairly easy systems.

Example Polynomial preconditioning is used for a matrix from a biharmonic differential equation. A degree 200 polynomial is applied, then a degree 50 polynomial is used along with an incomplete factorization preconditioner. Table 1 first shows that polynomial preconditioning can be very effective for the difficult original system of equations. The time is reduced from hours to under a minute. The problem is easier when standard incomplete factorization preconditioning is applied. Then the effect of the polynomial preconditioning is not as dramatic, but it is still helpful. The time goes from over 3 minutes to below 9 seconds. See [8] for more results.

Table 1: Polynomial preconditioning applied to a biharmonic matrix of size \(n=40{,}000\) from the differential equation \(- u_{xxxx} - u_{yyyy} + u_{xxx} = f\) on the unit square. IF stands for incomplete factorization with no fill-in after shifting the matrix by \(0.5*I\).

.
Method GMRES(50) PP(200)-GMRES(50) IF-GMRES(50) PP(50)-IF-GMRES(50)
Time 14.6 hours 55 seconds 3.5 minutes 8.5 seconds

2 Polynomial Approximation of \(A^{-1}\)

It is surprising that even for a large matrix, it is often possible to find a polynomial \(p\) so that \(p(A)\) is a good approximation to \(A^{-1}\) [4]. We use the \(p\) polynomial from Equation (1) that is generated by GMRES. This is implemented using the roots of the GMRES residual polynomial.

It can be proved for the symmetric case that the accuracy of the approximating polynomial follows the residual norm curve. Stability control with added roots is even more important here than for polynomial preconditioning, because a high degree polynomial is needed.

Example: For a convection-diffusion matrix, Table 2 shows that the relative accuracy of the polynomial approximation to the inverse follows the GMRES residual norm. See [4] for more.

Table 2: A polynomial approximation is found to the inverse of a matrix of size \(n=2500\) from the convection-diffusion equation \(- u_{xx} - u_{yy} + 2 u_{x} = f\) on the unit square. The starting vector is a random vector normed to one. Relative accuracy of the polynomial is compared to the GMRES residual norm.

.
Degree GMRES residual norm \(\frac {\|A^{-1} - p(A)\|} {\|A^{-1}\|}\)
50 \(8*10^{-3}\) \(2*10^{-1}\)
100 \(4*10^{-5}\) \(6*10^{-4}\)
150 \(1*10^{-8}\) \(3*10^{-7}\)
200 \(8*10^{-12}\) \(3*10^{-10}\)

Applications of polynomial approximation to \(A^{-1}\) include:

3 Twin CG for multiple right-hand side systems. Also Twin CR, BiCGStab, BiCG, GMRES and QMR.

The first application in the previous section used the same polynomial for multiple right-hand sides. Here we again have the same polynomial, but applied in a simpler way. We use the same iteration coefficients for all right-hand sides. When this approach is used for the conjugate gradient method (CG), we call this “Twin CG for multiple right-hand sides.”

As an example, consider two systems, \(Ax^{(1)} = b^{(1)}\) and \(Ax^{(2)} = b^{(2)}\). One step in the CG iteration for the first system is \(x^{(1)} = x^{(1)} + \alpha ^{(1)}*p^{(1)}\). Our approach is to also do \(x^{(2)} = x^{(2)} + \alpha ^{(1)}*p^{(2)}\) with the same \(\alpha ^{(1)}\) as for the first system. Similarly, the other CG steps have the same constants for both right-hand sides.

The residual polynomial helps explain why this is effective. For the second right-hand side, the residual polynomial is the same as for the first. So if \(r^{(1)} = \pi ^{(1)}(A)b^{(1)}\), then also \(r^{(2)} = \pi ^{(1)}(A)b^{(2)}\). The polynomial \(\pi ^{(1)}\) needs to be small at the eigenvalues of \(A\) in order for the first system to converge, and this makes the second system converge also. The first right-hand side, \(b^{(1)}\), does need to not be deficient in some eigencomponents. For the deficient case, creating a first system with a random right-hand side is recommended.

With this twin approach for CG, the multiple right-hand systems generally converge together. However, there can be momentary instability due to steep slope of the polynomial at an outstanding eigenvalue. This tends to be quickly automatically corrected.

Remarkable things about Twin CG:

Finally, a few comments about nonsymmetric systems:

References