PDF
\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \)

Collect, Commit, Expand: an Efficient Strategy for Column Subset Selection on Extremely Wide Matrices

Robin Armstrong, Anil Damle

Abstract

The column subset selection problem (CSSP) appears in a remarkably wide range of applications. For example, point selection problems that arise in model order reduction [5], computational chemistry [7], spectral clustering [8], low-rank approximation [6, 13], and Gaussian process regression [15] can all be treated as instances of CSSP. Given a matrix \(A \in \mathbb {R}^{m \times n}\) and a target rank \(k \leq \min \{ m,\, n \}\), CSSP seeks to find a set of \(k\) columns from \(A\) that are “highly linearly independent.” A more formal statement, using the framework of rank-revealing QR factorizations [4, 11, 12], is that algorithms for CSSP produce an index set \(S \in [n]^k\) satisfying

\begin{equation} \sigma _\mathrm {min}(A(:,S)) \geq \frac {\max _{J \in [n]^k} \sigma _\mathrm {min}(A(:,J))}{q(n,\, k)} \label {eqn:cssp_statement} \end{equation}

for some low-degree bivariate polynomial \(q\). The Golub-Businger algorithm [3], which uses alternating column pivots and Householder reflections to compute a column-pivoted QR (CPQR) factorization \(A\Pi = QR\), is widely used for this problem. After running this algorithm, choosing \(A(:,S) = A\Pi (:,1:k)\) results in an \(S\) which does not provably satisfy (1), but which nearly always brings \(\sigma _\mathrm {min}(A(:,S))\) reasonably close to its maximum.

We seek to address a computational bottleneck in the Golub-Businger algorithm that results from sequential application of Householder reflections with level-2 BLAS. Most existing solutions to this problem involve reducing the number of rows manipulated with BLAS-2. For example, the CPQR factorization routine in LAPACK reflects only as many rows as are needed to determine a small block of pivot columns, deferring the full Householder reflection to a later application with BLAS-3 [16]. There also exists a large class of randomized algorithms that apply standard CPQR routines to sketched matrices with far fewer rows [6, 10, 14, 17]. We, however, are interested in problems where the difficulty arises not from the number of rows, but from the number of columns. For example, spectral clustering generates instances of CSSP where each row represents a cluster and each column represents a data point [8], meaning \(m\) may be several orders of magnitude smaller than \(n\). In these applications, reducing the number of rows being manipulated with BLAS-2 does not address the main bottleneck.

We will demonstrate a new CPQR-based column selection algorithm that effectively mitigates the BLAS-2 bottleneck for matrices with far more columns than rows. Our algorithm divides columns into a “tracked” set, where residual column norms are recorded, and an “untracked” set, where only overall norms are recorded. Pivot columns are selected in blocks, and each block is selected using a three-step strategy:

We call this algorithm CCEQR (“Collect-Commit-Expand QR”).

CCEQR is fully deterministic, and unlike CPQR-based column selection algorithms which distribute the column load across several parallel processors [1, 2, 9], it provably selects the same basis columns as the Golub-Businger algorithm (assuming no ties between residual column norms). Using test problems from domains such as computational chemistry, model order reduction, and spectral clustering, we will demonstrate that CCEQR can run several times faster than the standard LAPACK routine (GEQP3) for matrices with an unbalanced column norm distribution. For example, Figure 1 shows that CCEQR can run as much as 10 times faster than GEQP3 for certain spectral clustering problems. We will also show that CCEQR and GEQP3 have essentially the same runtime for large unstructured problems, such as Gaussian random test matrices.

.
\(n\) GEQP3 Runtime (s) CCEQR Runtime (s)
\(10^2\) \(1.9 \times 10^{-5}\) \(8.1 \times 10^{-5}\)
\(10^3\) \(2.7 \times 10^{-4}\) \(3.7 \times 10^{-4}\)
\(10^4\) \(1.8 \times 10^{-3}\) \(7.5 \times 10^{-4}\)
\(10^5\) \(1.8 \times 10^{-2}\) \(3.7 \times 10^{-3}\)
\(10^6\) \(4.0 \times 10^{-1}\) \(4.5 \times 10^{-2}\)

Figure 1: Average runtimes of GEQP3 and CCEQR (over 20 trials) on matrices of size \(20 \times n\), for increasing \(n\). Test matrices were generated from a spectral clustering problem, and correspond to Laplacian embeddings of \(n\) data points drawn from a 20-component Gaussian mixture model.

References