PDF
\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\def \Qt {\widetilde {Q}}\) \(\def \At {\widetilde {A}}\) \(\def \Qt {\widetilde {Q}}\) \(\def \At {\widetilde {A}}\) \(\def \scond {\kappa _{2}^{S}}\) \(\def \diag {\mathrm {\mathop {diag}}}\) \(\def \off {\mathrm {\mathop {off}}}\) \(\def \Atcomp {{\At _{\mathrm {comp}}}}\) \(\def \Ahcomp {{\At _{\mathrm {h,comp}}}}\) \(\def \normt #1{\|#1\|_2}\)

Computing Accurate Eigenvalues of Symmetric Matrices
With a Mixed Precision Jacobi Algorithm

Nicholas J. Higham, Françoise Tisseur, Marcus Webb, Zhengbo Zhou

Abstract

Modern hardware increasingly supports not only single and double precisions, but also half and quadruple precisions. These precisions provide new opportunities to considerably accelerate linear algebra computations while maintaining numerical stability and accuracy. Efforts on developing mixed precision algorithms in the numerical linear algebra and high performance computing communities have mainly focussed on linear systems and least squares problems. Eigenvalue problems are considerably more challenging to solve and have a larger solution space that cannot be computed in a finite number of steps [5].

There are two classes of algorithms for symmetric eigenproblems: (i) those that work directly on the matrix, such as the Jacobi algorithm and the QR-based Dynamically Weighted Halley (QDWH-eig) algorithm and (ii) those that reduce the matrix to tridiagonal form in a finite number of steps and then employ an iterative scheme to compute all or just part of the eigenvalues and/or the eigenvectors, such as bisection and inverse iteration (BI), the QR algorithm, and the divide-and-conquer algorithm (DC). All these algorithms have pros and cons. DC and the method of multiple relatively robust representations (MR), which is a sophisticated variant of inverse iteration, are generally much faster than QR and BI on large matrices, with MR performing the fewest floating point operations but at a lower MFLOPS rate than DC. The latter and QR are the most accurate algorithms with observed accuracy \(O(\sqrt {n}u)\), where \(u\) is the working precision, \(n\) the size of the matrix, and accuracy is measured in terms of scaled residual norms and loss of orthogonality for the eigenvectors [1]. None of these eigensolvers exploits the low precisions available in modern hardware.

A key question is how can we exploit access to multiple precisions arithmetic to accelerate symmetric eigensolvers while maintaining numerical stability and accuracy?

In terms of arithmetic cost, solving a symmetric eigenvalue problem is about 27 times more expensive than solving a symmetric positive definite linear system. Unlike for linear systems for which the \(O(n^3)\) part of the computation can be performed at low precision and the \(n\)-dimensional solution refined at working precision in \(O(n^2)\) operations, it can be shown that for the eigenvalue problem, some of the \(O(n^3)\) operations need to be performed in the working precision if one hopes to maintain numerical stability and achieve accuracy. So to gain any speedup, these should be BLAS 3 operations, i.e., highly optimized matrix-matrix multiplies. Modern architectures execute matrix multiplies of large size \(n\) at least 18 faster than symmetric eigensolvers on the same size matrices. Low precision arithmetic can be used to preprocess or to precondition the eigenproblem to allow for a faster solution.

In this talk we concentrate on symmetric positive definite matrices \(A\in \mathbb {R}^{n\times n}\) and consider a mixed precision preconditioned Jacobi algorithm that uses three precisions \(u_h<u<u_\ell \). The preconditioner \(\Qt \) is an approximate eigenvector matrix that is efficiently computed using a combination of low and working precisions. Zhang and Bai [7] and Zhou [8] suggested to compute an eigenvector matrix at low precision and then orthogonalize it to working precision so that

\begin{equation} \label {ft.eq.assumption1} \normt {\Qt ^T\Qt - I} \le p_{1}u < 1, \end{equation}

where \(p_{1}\) is a low degree polynomial in \(n\). It is essential that the preconditioner \(\Qt \) satisfies (1) to ensure that the eigenvectors returned by the mixed precision preconditioned Jacobi algorithm are orthogonal to working precision \(u\). We discuss several alternative efficient ways to construct such preconditioner and prove it reduces the off-diagonal entries of \(A\) to a level determined by the chosen low precision \(u_\ell \) so that the initial slow convergence phase of the Jacobi algorithm can be skipped.

Demmel and Veselič [2] showed that the eigenvalues computed by the Jacobi algorithm with stopping criterion \(|{a_{ij}}| \le \sqrt {a_{ii}a_{jj}}\) for all \(i,j\) satisfy

\begin{equation} \label {eq.deve92-bound} \frac {|\lambda _{i}(A)-\widetilde {\lambda }_{i}(A)|} {|\lambda _{i}(A)|} \le p(n)\,u\,\scond (A), \end{equation}

where \(\lambda _{i}(A)\) and \(\widetilde \lambda _{i}(A)\) denote the \(i\)th largest exact and computed eigenvalue of \(A\), \(p(n)\) is a low degree polynomial and \(u\) is the working precision. Here \(\scond (A)\) is the scaled condition number of \(A\) defined by

\[ \scond (A) = \kappa _{2}(DAD), \qquad D = \diag (a_{ii}^{-1/2}), \]

where \(\kappa _2(B)= \lambda _1(B)/\lambda _n(B)\). For the QR and DC algorithms, the relative error is bounded by \(n^{1/2} p(n)\,u\,\kappa _2(A)\) so when \(\kappa _{2}(DAD)\ll \kappa _2(A)\), the Jacobi algorithm can produce much more accurate approximations to the smaller eigenvalues than QR or DC algorithms.

Malyshev [6] and Drygalla [3, 4] suggest that preconditioning the matrix at a precision \(u_h\) higher than the working precision \(u\) improves the accuracy of the spectral decomposition computed by the preconditioned Jacobi algorithm. However, Malyshev only discuss the backward error and Drygalla only claims the high accuracy property without proving it. Let us denote by \(\At \) and \(\Atcomp \) the product \(\Qt ^T A\Qt \) computed in exact and floating point arithmetic, respectively. We prove under mild assumptions that the relative errors in the computed eigenvalues are proportional to \(u\scond (\Atcomp )\) and \(u\scond (\At )\) instead of \(u\scond (A)\) which appears in (2). Moreover, we prove that if \(\At \) is \(\theta \)-scaled diagonally dominant, i.e., \(\theta = \normt {\widetilde {D}\At \widetilde {D}}<1\) then the scaled condition numbers \(\scond (\At )\) and \(\scond (\Atcomp )\) are of order 1. Hence, all the eigenvalues are computed to high relative accuracy. We remark that any preconditioner \(\Qt \) such that \(\off (\At )/\min _i(\widetilde {a}_{ii})<1\), where \(\off (\At ) = (\sum _{i\neq j} \widetilde {a}_{ij}^{2})^{1/2}\), yields an \(\At \) that is scaled diagonally dominant. For a preconditioned matrix \(\At \) that is not scaled diagonally dominant, we use a result by Demmel and Veselič [2, Prop. 6.2] to argue that if \(\off (\At )\) is sufficiently small so that we can treat the diagonals of \(\At \) as its approximate eigenvalues, the scaled condition numbers \(\scond (\Atcomp )\) and \(\scond (\At )\) are significantly smaller than \(\scond (A)\).

Finally, we present numerical results to support our theoretical analysis.

References