Estimation of Spectral Gaps for Sparse Symmetric Matrices
Michele Benzi, Michele Rinelli, Igor Simunec
Abstract
Identifying the gap between two consecutive eigenvalues of a real symmetric matrix \(A \in \R ^{n \times n}\) is an important task that is often encountered in applications, such as in electronic structure computations. For instance, in Kohn-Sham Density Functional Theory [5] one has to determine the spectral projector \(P_\mu \) associated with the Fermi level (or chemical potential) \(\mu \in \R \), which corresponds to the occupied states of a system described by a discrete Hamiltonian \(A\). In the case of insulators at zero electronic temperature, there is a gap separating the first \(k\) eigenvalues of \(A\) from the rest of the spectrum, and the Fermi level \(\mu \) lies inside the gap between the \(k\)-th and \((k+1)\)-th eigenvalue of \(A\), where \(k\) is the number of electrons in the system. In this setting, the spectral projector \(P_\mu = h_\mu (A)\) is often approximated using polynomials or rational functions that approximate the step function \(h_\mu (\lambda )\), which takes the value \(1\) for \(\lambda < \mu \) and \(0\) for \(\lambda > \mu \). In order to use this approach, one first needs to compute a value of \(\mu \) that lies in the gap between \(\lambda _k\) and \(\lambda _{k+1}\), where we denote the eigenvalues of \(A\) as \(\lambda _1, \dots , \lambda _n\) in nondecreasing order.
Instead of looking for the gap between \(\lambda _k\) and \(\lambda _{k+1}\) for a specific \(k\), in this talk we tackle this problem from a different perspective, and aim to find all gaps in the spectrum of \(A\) that are larger than a certain threshold. Since in practical applications the gap between \(\lambda _k\) and \(\lambda _{k+1}\) is often relatively large, we expect that this approach will provide useful results even when one needs to find a single, specific gap.
Let us denote by \(\eigcount (\mu )\) the number of eigenvalues of \(A\) that are strictly smaller than \(\mu \). Assuming that \(\mu \) does not coincide with an eigenvalue of \(A\), we have \(\eigcount (\mu ) = \rank (P_\mu ) = \trace (P_\mu )\). Therefore, \(\eigcount (\mu )\) can be estimated by estimating \(\trace (P_\mu )\) with Hutchinson’s stochastic trace estimator [4] combined with the Lanczos algorithm [7, Algorithm 6.15]. Given \(s\) random Gaussian vectors \(\{\vec x_i\}_{i = 1}^s\), Hutchinson’s stochastic trace estimator approximates \(\trace (P_\mu )\) with
\(\seteqnumber{0}{}{0}\)\begin{equation*} \traceH {s}(P_\mu ) := \frac {1}{s} \sum _{i = 1}^s \vec x_i^T P_\mu \vec x_i. \end{equation*}
Since \(P_\mu = h_\mu (A)\), each quadratic form can be approximated using the Lanczos algorithm. Let \(V_m^{(i)} \in \R ^{n \times m}\) be an orthonormal basis of the Krylov subspace
\(\seteqnumber{0}{}{0}\)\begin{equation*} \kryl _m(A, \vec x_i) = \vspan \{\vec x_i, A \vec x_i, \dots , \vec A^{m-1} \vec x_i\}, \end{equation*}
constructed with the Lanczos algorithm, and let the tridiagonal matrix \(T_m^{(i)}:= V_m^{(i)T} A V_m^{(i)}\) be the projection of \(A\) onto \(\kryl _m(A, \vec x_i)\). We can approximate \(\vec x_i^T P_\mu \vec x_i\) with
\(\seteqnumber{0}{}{0}\)\begin{equation*} \psi _m^{(i)}(\mu ) := \norm {\vec x}_2^2 \vec e_1^T h_\mu (T_m^{(i)}) \vec e_1, \qquad j = 1, \dots , N_f, \end{equation*}
so we obtain the trace approximations
\(\seteqnumber{0}{}{0}\)\begin{equation*} \trace (P_\mu ) \approx \frac {1}{s} \sum _{i = 1}^s \psi _m^{(i)}(\mu ). \end{equation*}
If we use the same vectors \(\{\vec x_i\}_{i = 1}^s\) for different \(\mu \), these trace approximations can be computed simultaneously for many different \(\mu \) by using the same Krylov subspaces \(\kryl _m(A, \vec x_i)\), with a cost that is only slightly higher than for a single value of \(\mu \). This approach has already been used in literature on related problems, such as the estimation of the number of eigenvalues of \(A\) in an interval or the estimation of spectral densities [3, 6]. In this talk we will focus on thoroughly analyzing this method for the detection of spectral gaps, with the goal of determining how to choose the parameters \(s\) and \(m\) in order to minimize the computational cost and ensure that all gaps with relative width above a given threshold \(\theta \in (0,1)\) are found (up to a failure probability \(\delta \)).
The error of Hutchinson’s estimator can be bounded, for instance, with [2, Theorem 1], which states that
\(\seteqnumber{0}{}{0}\)\begin{equation*} \prob \big ( \abs {\trace (P_\mu ) - \traceH {s}(P_\mu )} \ge \varepsilon \big ) \le \delta \qquad \text {if} \qquad s \ge \frac {4}{\varepsilon ^2} \big (\norm {P_\mu }_F^2 + \varepsilon \norm {P_\mu }_2\big ) \log (2/\delta ). \end{equation*}
However, we have \(\norm {P_\mu }_F^2 = \eigcount (\mu )\), and \(\eigcount (\mu ) = O(n)\) when \(\mu \) is near the middle of the spectrum, so to achieve any fixed absolute accuracy \(\varepsilon \) we would have to take \(s = O(n)\), which becomes unfeasible as \(n\) grows. This means that with this approach it is prohibitively expensive to try and find a value of \(\mu \) in the gap between \(\lambda _k\) and \(\lambda _{k+1}\) by requiring that \(\traceH {s}(P_\mu ) \approx \eigcount (\mu ) = k\). Instead, we use a different point of view.
If we consider \(\trace (P_\mu )\) as a function of \(\mu \), it is a nondecreasing and piecewise constant function, with a jump of height \(1\) whenever \(\mu \) coincides with an eigenvalue of \(A\). A similar property holds for \(\traceH {s}(P_\mu )\), with the difference that the jumps have random heights, each with expected value \(1\). In particular, \(\traceH {s}(P_\mu )\) is constant for all \(\mu \in [a, b]\) if the interval \([a, b]\) contains no eigenvalues of \(A\). This observation can be exploited to find gaps in the spectrum of \(A\), by looking for intervals in which the Lanczos approximation \(\frac {1}{s} \sum _{i = 1}^s \psi _m^{(i)}(\mu )\) is almost constant in \(\mu \) and has a small error. For instance, if for a constant \(C\) and a small \(\varepsilon > 0\) we can show that
\(\seteqnumber{0}{}{0}\)\begin{equation*} \traceH {s}(P_\mu ) \in [C - \varepsilon , C + \varepsilon ] \qquad \text {for all } \mu \in [a, b], \end{equation*}
then we can conclude that either \(\traceH {s}(P_\mu )\) is constant in the interval \([a, b]\) and hence \([a, b]\) is a gap in the spectrum of \(A\), or all jumps in \(\traceH {s}(P_\mu )\) associated with eigenvalues in \([a, b]\) have heights smaller than \(2 \varepsilon \). If \(\varepsilon \) is small enough, the latter event has a small chance of occurring.
In order to make the argument outlined above rigorous, we obtain a bound on the probability of having small jumps in \(\traceH {s}(P_\mu )\), as well as a posteriori error bounds and estimates for the Lanczos approximation of the quadratic forms \(\vec x_i^T P_\mu \vec x_i\). By combining these bounds, we will show that for a given budget of matrix-vector products with \(A\), the best way to allocate it is to set \(s = 1\), i.e., use a single random vector for Hutchinson’s estimator and concentrate all the computational effort on the Lanczos algorithm. We also obtain an a priori bound on the accuracy of the Lanczos approximation that depends on the relative gap width \(\theta \), which will allow us to predict how many Lanczos iterations are needed to ensure that all gaps larger than a given width are detected.
The theoretical analysis is complemented by a detailed computational discussion, leading to an algorithm that is able to detect gaps efficiently and reliably. The effectiveness of the proposed method will be showcased with several numerical examples. Further details can be found in the preprint [1].
References
-
[1] M. Benzi, M. Rinelli, and I. Simunec. Estimation of spectral gaps for sparse symmetric matrices, arXiv:2410.15349, 2024.
-
[2] A. Cortinovis and D. Kressner. On randomized trace estimates for indefinite matrices with an application to determinants. Found. Comput. Math., 22(3):875–903, 2022.
-
[3] E. Di Napoli, E. Polizzi, and Y. Saad. Efficient estimation of eigenvalue counts in an interval. Numer. Linear Algebra Appl., 23(4):674–692, 2016.
-
[4] M. F. Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Comm. Statist. Simulation Comput., 18(3):1059–1076, 1989.
-
[5] L. Lin, J. Lu, and L. Ying. Numerical methods for Kohn-Sham density functional theory. Acta Numer., 28:405–539, 2019.
-
[6] L. Lin, Y. Saad, and C. Yang. Approximating spectral densities of large matrices. SIAM Rev., 58(1):34–65, 2016.
-
[7] Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia, PA, second edition, 2003.