Householder Symposium

PDF

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\newcommand {\bm }[1]{\boldsymbol {#1}}\)

Randomized Nyström approximation of non-negative self-adjoint operators

David Persson, Nicolas Boullé, Daniel Kressner

Abstract

A ubiquitous task in numerical linear algebra is to compute a low-rank approximation to a matrix \(\bm {A}\). Randomized techniques [8, 9, 10, 12] are becoming increasingly popular for computing cheap, yet accurate, low-rank approximations to matrices. Most notably, the randomized singular value decomposition (SVD) [9] has evolved into one of the primary choices, due to its simplicity, performance, and reliability. In its most basic form, the randomized SVD performs the approximation \(\bm {Q} \bm {Q}^* \bm {A}\approx \bm {A}\), where \(\bm {Q}\) is an orthonormal basis for the range of \(\bm {A} \bm {\Omega }\), with \(\bm {\Omega }\) being a tall and skinny random sketch matrix. In many applications of low-rank approximation, such as \(k\)-means clustering [13], PCA [14], and Gaussian process regression [7], it is known that \(\bm {A}\) is symmetric positive semi-definite. In this case, one usually prefers the so-called randomized Nyström approximation [8]

\begin{equation} \label {eq:nystrom} \widehat {\bm {A}} := \bm {A} \bm {\Omega }(\bm {\Omega }^* \bm {A} \bm {\Omega })^{\dagger } \bm {\Omega }^*\bm {A} \approx \bm {A}, \end{equation}

where \(\bm {\Omega }\) is, again, a random sketch matrix. This approximation has received significant attention in the literature [8, 11, 12] and, like the randomized SVD, it enjoys strong theoretical guarantees. With the same number of matrix-vector products, the randomized Nyström approximation is typically significantly more accurate than the randomized SVD when the matrix has rapidly decaying singular values. Additionally, the Nyström method requires only a single pass over the matrix, compared to two passes for the randomized SVD, enabling all matrix-vector products to be performed in parallel.

Recently, Boullé and Townsend [4, 5] generalized the randomized SVD from matrices to Hilbert-Schmidt operators. Subsequent works [3, 6] employed this infinite-dimensional generalization of the randomized SVD to learn Green’s functions associated with an elliptic or parabolic partial differential equations (PDE) from a few solutions of the PDE. This approach uses hierarchical low-rank techniques and exploits the fact that Green’s functions are smooth away from the diagonal and therefore admit accurate off-diagonal low-rank approximations [1, 2]. Other applications, like Gaussian process regression and Support Vector Machines, involve integral operators that feature positive and globally smooth kernels. In turn, the operator is not only self-adjoint and positive but it also allows for directly applying low-rank approximation, without the need to resort to hierarchical techniques. Given existing results on matrices, it would be sensible to use an infinite-dimensional extension of the randomized Nyström approximation in such situations.

In this work, we present and analyze an infinite-dimensional extension of the randomized Nyström approximation for computing low-rank approximations to self-adjoint, positive, trace class operators. A significant advantage of the proposed framework is that once a low-rank approximation of the operator is computed, one can use this approximation to compute a low-rank approximation to any discretization of the operator.

References

[1] Mario Bebendorf. Hierarchical matrices. Springer, 2008.
[2] Mario Bebendorf and Wolfgang Hackbusch. Existence of \(\mathcal {H}\)-matrix approximants to the inverse FE-matrix of elliptic operators with \(L^\infty \)-coeﬀicients. Numer. Math., 95:1–28, 2003.
[3] Nicolas Boullé, Seick Kim, Tianyi Shi, and Alex Townsend. Learning Green’s functions associated with time-dependent partial differential equations. J. Mach. Learn. Res., 23(1):9797–9830, 2022.
[4] Nicolas Boullé and Alex Townsend. A generalization of the randomized singular value decomposition. In International Conference on Learning Representations, 2022.
[5] Nicolas Boullé and Alex Townsend. Learning elliptic partial differential equations with randomized linear algebra. Found. Comput. Math., 23(2):709–739, 2023.
[6] Nicolas Boullé, Diana Halikias, and Alex Townsend. Elliptic PDE learning is provably data-eﬀicient. Proc. Natl. Acad. Sci. U.S.A., 120(39):e2303904120, 2023.
[7] Jacob Gardner, Geoff Pleiss, Kilian Q Weinberger, David Bindel, and Andrew G Wilson. GPyTorch: Blackbox matrix-matrix Gaussian process inference with GPU acceleration. In Advances in Neural Information Processing Systems, volume 31, 2018.
[8] Alex Gittens and Michael W. Mahoney. Revisiting the Nyström method for improved large-scale machine learning. J. Mach. Learn. Res., 17:Paper No. 117, 65, 2016.
[9] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Rev., 53(2):217–288, 2011.
[10] Yuji Nakatsukasa. Fast and stable randomized low-rank matrix approximation. arXiv preprint arXiv:2009.11392, 2020.
[11] David Persson and Daniel Kressner. Randomized low-rank approximation of monotone matrix functions. SIAM J. Matrix Anal. Appl., 44(2):894–918, 2023.
[12] Joel A Tropp, Alp Yurtsever, Madeleine Udell, and Volkan Cevher. Fixed-rank approximation of a positive-semidefinite matrix from streaming data. In Advances in Neural Information Processing Systems, volume 30, 2017.
[13] Shusen Wang, Alex Gittens, and Michael W Mahoney. Scalable kernel K-means clustering with Nyström approximation: relative-error bounds. J. Mach. Learn. Res., 20(1):431–479, 2019.
[14] Kai Zhang, Ivor W. Tsang, and James T. Kwok. Improved Nyström low-rank approximation and error analysis. In Proc. International Conference on Machine Learning, pages 1232–1239, 2008.