PDF
\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\require {mathtools}\) \(\newenvironment {crampedsubarray}[1]{}{}\) \(\newcommand {\smashoperator }[2][]{#2\limits }\) \(\newcommand {\SwapAboveDisplaySkip }{}\) \(\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}\) \(\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}\) \(\newcommand {\LWRmultlined }[1][]{\begin {multline*}}\) \(\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}\) \(\let \LWRorigshoveleft \shoveleft \) \(\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }\) \(\let \LWRorigshoveright \shoveright \) \(\renewcommand {\shoveright }[1][]{\LWRorigshoveright }\) \(\newcommand {\shortintertext }[1]{\text {#1}\notag \\}\) \(\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}\) \(\newcommand {\bm }[1]{\boldsymbol {#1}}\) \(\newcommand {\mleft }{\left }\) \(\newcommand {\mright }{\right }\) \(\newcommand {\mleftright }{}\) \(\newcommand {\mleftrightrestore }{}\) \(\newcommand {\mat }[1]{\boldsymbol {#1}}\) \(\newcommand {\lowrank }[1]{\left [ \left [#1 \right ] \right ]}\) \(\newcommand {\norm }[1]{\left \| #1 \right \|}\) \(\newcommand {\order }{\mathcal {O}}\) \(\renewcommand {\hat }[1]{\widehat {#1}}\)

Randomly Pivoted Cholesky: Near-Optimal Positive Semidefinite Low-Rank Approximation from a Small Number of Entry Evaluations

Ethan N. Epperly, Yifan Chen, Joel A. Tropp, Robert J. Webber

Abstract

This talk describes randomly pivoted Cholesky (RPCholesky), a randomized algorithm for computing a low-rank approximation to a Hermitian postive semidefinite (psd) matrix. To compute a rank-\(k\) approximation to an \(N\times N\) matrix, RPCholesky performs a \(k\)-step partial Cholesky decomposition with a pivot entry randomly chosen with probabilities proportional to diagonal entries of the current residual matrix (i.e., Schur complement). The algorithm requires \(\order (k^2N)\) operations and reads only \((k+1)N\) entries of the input matrix.

The RPCholesky method has an interesting history. The existence of the method was briefly noted in a 2017 paper of Musco and Woodruff [9], and it is algebraically related to an earlier “randomly pivoted QR” algorithm of Desphande, Rademacher, Vempala, and Wang (2006, [3]). Our paper [2], originally released in 2022, reintroduced the algorithm, described its connection to Cholesky decomposition, evaluated the method numerically, and provided new theoretical results.

Surprisingly, this simple algorithm is guaranteed to produce a near-optimal low-rank approximation. The output of RPCholesky, and any other partial Cholesky decomposition, is low-rank approximation of the form

\begin{equation*} \mat {\hat {A}} = \mat {A}(:,\set {S}) \mat {A}(\set {S},\set {S})^\dagger \mat {A}(\set {S},:), \end{equation*}

where \(\set {S}\) denotes the set of pivots selected by the algorithm and \({}^\dagger \) denotes the Moore–Penrose pseudoinverse. This type of low-rank approximation is known as a (column) Nyström approximation and is used widely to accelerate kernel machine learning methods. It is known [7] that \(k \ge r/\varepsilon \) columns \(\set {S}\) are needed to produce a Nyström approximation \(\mat {\hat {A}}\) within a \(1+\varepsilon \) multiplicative factor of the best rank-\(r\) approximation \(\lowrank {\mat {A}}_r\), i.e.,

\begin{equation*} \norm {\smash {\mat {A} - \mat {\hat {A}}}}_* \le (1+\varepsilon ) \norm {\mat {A} - \lowrank {\mat {A}}_r}_*. \end{equation*}

Here, \(\norm {\cdot }_*\) denotes the trace norm. In [2], we showed that RPCholesky achieves the guarantee:

\begin{equation*} \mathbb {E} \left [\norm {\smash {\mat {A} - \mat {\hat {A}}}}_*\right ] \le (1+\varepsilon ) \norm {\mat {A} - \lowrank {\mat {A}}_r}_* \quad \text {when } k \ge \frac {r}{\varepsilon } + r \log \left ( \frac {1}{\varepsilon \eta } \right ). \end{equation*}

Here, \(\mat {\hat {A}}\) is the approximation produced by \(k\) steps of RPCholesky and \(\eta = \norm {\mat {A} - \lowrank {\mat {A}}_r}_*/\norm {\mat {A}}_*\) denotes the relative error of the best rank-\(r\) approximation. In expectation, RPCholesky achieves the optimal scaling \(k = r/\varepsilon \) up to an additive term that is logarithmic in the relative error \(\eta \).

RPCholesky has proven effective at accelerating kernel machine learning methods. Given a data set \(\vec {x}_1,\ldots ,\vec {x}_N\), kernel methods perform machine learning tasks such as regression and clustering by manipulating a psd kernel matrix \(\mat {A} = (\kappa (\vec {x}_i,\vec {x}_j))_{1\le i,j\le N}\), where \(\kappa \) is a given positive definite kernel function. When implemented directly, kernel methods require \(\order (N^3)\) operations and \(\order (N^2)\) storage. By replacing \(\mat {A}\) with a low-rank approximation \(\mat {\hat {A}}\) (say, of rank \(k = \order (1)\)), the storage and runtime costs of these methods are reduced to \(\order (N)\). This talk will present numerical experiments from [2], which show that an RPCholesky-accelerated clustering method can be \(9\times \) to \(14\times \) more accurate than accelerated clustering methods using other low-rank approximation techniques. Subsequent papers have applied RPCholesky to accelerate learning of committer functions in biochemistry [1], as a preconditioner for conjugate gradient [4], for quadrature in reproducing kernel Hilbert spaces [5], and compression of data sets [8].

While the standard version of RPCholesky is already fast, it is slower than it could be because it processes the columns of the input matrix one-by-one. A blocked version of the method is faster, but can produce approximations of lower accuracy. This talk will conclude by discussing the recently introduced accelerated RPCholesky method [6], which simulates the performance of original RPCholesky using a combination of rejection sampling and block-wise computations. The accelerated RPCholesky method can be up to \(40\times \) faster than the original method while producing the same random output (in exact arithmetic).

References