PDF
\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \)

Riemannian optimization for matrix nearness problems

Vanni Noferini, Froilán Dopico, Miryam Gnazzo, Lauri Nyman, Federico Poloni

Abstract

Matrix nearness problems are central to numerical linear algebra and matrix theory. As highlighted in [6], these problems have wide-ranging applications. The proposed talk, outlined below, is based on several recent papers of mine, written with different coauthors [3, 4, 7].

Overview of matrix nearness problems

Let us start with a general description, inspired in part by Nick Higham’s PhD thesis [5].

A matrix nearness problem involves finding a matrix \(B \) that is closest to a given matrix \(A \), such that \(B \) satisfies a specific property \(\mathfrak {P}\) which \(A \) does not. Formally, if \(\mathcal {Q} \) represents the set of matrices that possess property \(\mathfrak {P} \), the goal is to solve the following optimization problem:

\begin{equation} \label {eq:1} \min \| A - X \| \quad \text {subject to} \quad X \in \mathcal {Q}. \end{equation}

The distance to be minimized in (1) is typically the Euclidean one, i.e., the distance induced by the Frobenius norm \(\| X \|_F^2 = \text {tr}(X^* X) \), though other options can also be considered. Matrix nearness problems can be generalized to matrix pencils or matrix polynomials. For example, given a matrix polynomial \(A(z) = \sum _{i=0}^d A_i z^i \), the objective becomes finding a polynomial \(B(z) = \sum _{i=0}^d B_i z^i \) that minimizes the squared distance \(\sum _{i=0}^d \| A_i - B_i \|^2 \) while ensuring that \(B(z) \) has the desired property \(\mathfrak {P} \), which \(A(z) \) lacks.

Some matrix nearness problems have well-understood, efficient solutions. For example, by the Eckart–Young–Mirsky theorem, the nearest singular matrix to a full-rank matrix \(A \in GL(n, \mathbb {C}) \) can be found via singular value decomposition (SVD). The solution is \(B = \sum _{i=0}^{n-1} \sigma _i u_i v_i^* \), with the distance given by the smallest singular value \(\sigma _n \).

However, many matrix nearness problems are more difficult, and have been either shown or conjectured to be NP-hard. The feasible set \(\mathcal {Q} \) is often non-convex, making it challenging to find anything beyond a local minimum. Moreover, optimization algorithms that attempt the task are frequently quite slow and inefficient, making it almost impossible in practice to find even a local minimizer when the input size grows beyond very small matrices.

A new approach

In this talk, I will propose a new method to solve matrix nearness problems. There are three key features of the new approach:

Our method is based on a key insight: many matrix nearness problems become more tractable when supplemented with additional information about the minimizer. This is akin to the concept of an “oracle” in theoretical computer science: an abstract machine that can solve specific problem instances in one step.

Let us denote the supplementary information about the optimizer by \(\theta \). The problem can then be stiffened by restricting the feasible set to matrices that share this information. Specifically, if we decompose the set \(\mathcal {Q} \) as \(\mathcal {Q} = \bigcup _\theta \mathcal {Q}_\theta \), where \(\mathcal {Q}_\theta \) represents matrices with property \(\mathfrak {P} \) and the additional attribute \(\theta \), we can solve the restricted problem:

\[ f(\theta ) = \min \| A - X \| \quad \text {subject to} \quad X \in \mathcal {Q}_\theta . \]

The original problem can then by equivalently reformulated by optimizing over \(\theta \):

\[ \min _\theta f(\theta ). \]

This often reduces the problem to an optimization task over a Riemannian manifold. For example, if \(\theta \) is an eigenvector, the optimization is over the unit sphere. If \(\theta \) represents a set of \(d \) independent eigenvectors, the optimization is over the Grassmann manifold of \(d \)-dimensional subspaces.

Of course, the idea to use Riemannian optimization to solve nearness problems is not new. Many works have applied manifold optimization to specific matrix nearness problems, but typically in a much more direct manner than the approach that I will present in this talk. To give but a couple of the many examples, Vandereycken [8] addressed low-rank matrix completion via optimization on fixed-rank manifolds, and Borsdorf [1] used augmented Lagrangian techniques for a chemistry-related problem over the Stiefel manifold. Oracle-based strategies have also previously appeared, such as the method by Byers [2] for finding the distance to Hurwitz instability.

However, our recent work has further advanced these ideas, and its distinctive feature is to use at the same time both oracles and Riemannian optimization, as well as other classical optimization tools such as regularization. It is, to my knowledge, the first time that this strategy has been attempted in a systematic way to tackle a wide range of matrix nearness problems.

A two-level optimization framework

More in detail, our method introduces a two-level optimization framework:

We refer to this strategy as the Riemann-Oracle approach.

In the talk, I plan to describe how and why the Riemann-Oracle method can be applied to several matrix nearness problems of practical importance, including applications in numerical linear algebra, control theory, and computer algebra. Our experiments show that this approach consistently outperforms many specialized algorithms in both speed and accuracy.

I will explore in the talk both the theoretical details and the practical implementation aspects of this method. Furthermore, I will provide numerical experiments demonstrating concrete success stories. Examples include:

References