Householder Symposium

PDF

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\require {mathtools}\) \(\newenvironment {crampedsubarray}[1]{}{}\) \(\newcommand {\smashoperator }[2][]{#2\limits }\) \(\newcommand {\SwapAboveDisplaySkip }{}\) \(\newcommand {\LaTeXunderbrace }[1]{\underbrace {#1}}\) \(\newcommand {\LaTeXoverbrace }[1]{\overbrace {#1}}\) \(\newcommand {\LWRmultlined }[1][]{\begin {multline*}}\) \(\newenvironment {multlined}[1][]{\LWRmultlined }{\end {multline*}}\) \(\let \LWRorigshoveleft \shoveleft \) \(\renewcommand {\shoveleft }[1][]{\LWRorigshoveleft }\) \(\let \LWRorigshoveright \shoveright \) \(\renewcommand {\shoveright }[1][]{\LWRorigshoveright }\) \(\newcommand {\shortintertext }[1]{\text {#1}\notag \\}\) \(\newcommand {\vcentcolon }{\mathrel {\unicode {x2236}}}\) \(\newcommand {\bbM }{\mathbb {M}}\) \(\newcommand {\bbN }{\mathbb {N}}\) \(\newcommand {\bbO }{\mathbb {O}}\) \(\newcommand {\bbP }{\mathbb {P}}\) \(\newcommand {\bbQ }{\mathbb {Q}}\) \(\newcommand {\R }{\mathbb {R}}\) \(\newcommand {\E }{\mathbb {E}}\) \(\newcommand {\sfE }{\textsf {E}}\) \(\newcommand {\sfCov }{\textsf {cov}}\) \(\newcommand {\rank }{\textsf {rank}}\) \(\newcommand {\Ran }{\textsf {Ran}}\) \(\newcommand {\Ker }{\textsf {Ker}}\) \(\newcommand {\Cov }{\mathbb {C}{\rm ov}}\) \(\newcommand {\bGamma }{\boldsymbol {\Gamma }}\) \(\newcommand {\bSigma }{\boldsymbol {\Sigma }}\) \(\newcommand {\bheta }{\boldsymbol {\eta }}\) \(\newcommand {\beps }{\boldsymbol {\varepsilon }}\) \(\newcommand {\bcP }{\boldsymbol {\mathcal {P}}}\) \(\newcommand {\bcQ }{\boldsymbol {\mathcal {Q}}}\) \(\newcommand {\bcR }{\boldsymbol {\mathcal {R}}}\) \(\newcommand {\bcN }{\boldsymbol {\mathcal {N}}}\) \(\newcommand {\bH }{\mathbf {H}}\) \(\newcommand {\bK }{\mathbf {K}}\) \(\newcommand {\bm }{\mathbf {m}}\) \(\newcommand {\by }{\mathbf {y}}\) \(\newcommand {\bw }{\mathbf {w}}\) \(\newcommand {\bv }{\mathbf {v}}\) \(\newcommand {\bx }{\mathbf {x}}\) \(\newcommand {\bu }{\mathbf {u}}\) \(\newcommand {\bh }{\mathbf {h}}\) \(\newcommand {\ba }{\mathbf {a}}\) \(\newcommand {\bb }{\mathbf {b}}\) \(\newcommand {\br }{\mathbf {r}}\)

The Fundamental Subspaces of Ensemble Kalman Inversion

Elizabeth Qian, Christopher Beattie

Abstract

Ensemble Kalman Inversion (EKI) methods are a family of iterative methods for solving weighted least-squares problems of the form

\begin{align} \label {eq: least squares problem} \min _{\bv \in \R ^d} (\by - \bH (\bv ))^\top \bSigma ^{-1} (\by - \bH (\bv )) = \min _{\bv \in \R ^d} \|\by - \bH (\bv )\|_{\bSigma ^{-1}}^2, \end{align} where \(\bSigma \in \R ^{n\times n}\) is symmetric positive definite, and \(\bH :\R ^d\to \R ^n\). Such problems arise in many settings, including in inverse problems in which \(\bv \in \R ^d\) represents an unknown parameter or state of a system of interest which must be inferred from observed data \(\by \in \R ^n\). Inverse problems arise in many disciplines across science, engineering, and medicine, including earth, atmospheric, and ocean modeling, medical imaging, robotics and autonomy, and more. In large-scale scientific and engineering applications, solving (1) using standard gradient-based optimization methods can be prohibitively expensive due to the high cost of evaluating derivatives or adjoints of the forward operator \(\bH \). In contrast, EKI methods can be implemented in an adjoint-/derivative-free way. This makes EKI an attractive alternative to gradient-based methods for solving (1) in large-scale inverse problems.

We introduce a basic version of EKI from [4] in Algorithm 1, noting that other EKI methods can be viewed as variations on this theme. In Algorithm 1, we use \(\sfE \) and \(\sfCov \) to denote the empirical (sample) mean and covariance operators, respectively: given \(J\in \mathbb {N}\) samples \(\{\ba ^{(j)}\}_{j=1}^J\) and \(\{\bb ^{(j)}\}_{j=1}^J\), we define \(\sfE [\ba ^{(1:J)}] = \frac 1J\sum _{j=1}^J \ba ^{(j)}\), and

\begin{align*} \sfCov [\ba ^{(1:J)},\bb ^{(1:J)}] = \frac 1{J-1}\sum _{j=1}^J (\ba ^{(j)}-\sfE [\ba ^{(1:J)}])(\bb ^{(j)}-\sfE [\bb ^{(1:J)}])^\top , \end{align*} and \(\sfCov [\ba ^{(1:J)}] = \sfCov [\ba ^{(1:J)},\ba ^{(1:J)}]\). Algorithm 1 prescribes the evolution of an ensemble of \(J\) particles, \(\{\bv _i^{(1)},\ldots ,\bv _i^{(J)}\}\), initialized at \(i = 0\) in some way, e.g., by drawing from a suitable prior distribution, and subsequently updated for \(i = 1,2,3,\) etc. We emphasize that Algorithm 1 does not require the evaluation of adjoints or derivatives of \(\bH \). Those familiar with ensemble Kalman filtering methods will recognize familiar elements in Algorithm 1. Indeed, one way to obtain Algorithm 1 is to apply the ensemble Kalman filter to a system whose dynamics are given by the identity map in the “forecast” step of the filter. The connection to the ensemble Kalman filter also motivates the perturbation of the observations by random noise in Step 7; these perturbations ensure unbiased estimates of the filtering statistics in the linear Gaussian setting.

Algorithm 1: Basic Ensemble Kalman Inversion (EKI)

1: Input: forward operator \(\bH :\R ^d\to \R ^n\), initial ensemble \(\{\bv _0^{(1)},\ldots ,\bv _0^{(J)}\}\subset \R ^d\), observations \(\by \in \R ^n\), observation error covariance \(\bSigma \in \R ^{n\times n}\)
2: for \(i = 0,1,2,\ldots ,\) do
3: Compute observation-space ensemble: \(\bh _i^{(j)} = \bH \big (\bv _i^{(j)}\big ),\) \(j = 1,2,\ldots ,J\).
4: Compute empirical covariances: \(\sfCov [\bv _i^{(1:J)},\bh _i^{(1:J)}]\) and \(\sfCov [\bh _i^{(1:J)}]\)
5: Compute Kalman gain: \(\bK _i = \sfCov [\bv _i^{(1:J)}\!,\bh _i^{(1:J)}]\cdot \big (\sfCov [\bh _i^{(1:J)}] + \bSigma \big )^{-1}\)
6: Sample \(\beps _i^{(j)}\) i.i.d. from \(\cN (0,\bSigma )\) for \(j = 1,2,\ldots ,J\).
7: Perturb observations: set \(\by _i^{(j)}=\by + \beps _i^{(j)}\) for \(j = 1,2,\ldots ,J\).
8: Compute particle update: \(\bv _{i+1}^{(j)} = \bv _i^{(j)}+\bK _i(\by _i^{(j)}-\bH \bv _i^{(j)})\) for \(j = 1,2,\ldots ,J\).
9: if converged then
10: return current ensemble mean, \(\sfE [\bv _{i+1}^{(1:J)}]\)
11: end if
12: end for

This is Stochastic EKI. For Deterministic EKI, skip \(5\)-\(6\) and assign \(\by _i^{(j)} = \by \) in \(7\).

There is a very rich literature developing both EKI methods and accompanying theory (see [3] for an extensive survey). Variants of the basic method include the incorporation of a Tikhonov regularization term into the least-squares objective function, the enforcement of constraints in the optimization, or hierarchical, multilevel, and parallel versions of the algorithm. Beyond the successful use of EKI for solving diverse inverse problems in the physical sciences, e.g., in geophysical and biological contexts, EKI has also been used as an optimizer for training machine learning models. In particular, the use of EKI for training neural networks has motivated the development of EKI variants based on ideas used for gradient-based training of neural networks, including dropout, data subsampling (also called ‘(mini-)batching’), adaptive step sizes, and convergence acceleration with Nesterov momentum.

Theoretical analyses of EKI convergence behavior have mostly considered linear observation operators \(\bH \in \R ^{n\times d}\), for which the standard norm-minimizing solution of (1) is given by

\begin{equation} \label {eq: ls solution} \bv ^* = (\bH ^\top \bSigma ^{-1}\bH )^{\dagger }\bH ^\top \bSigma ^{-1}\by \equiv \bH ^+\by , \end{equation}

where “\(\dagger \)” denotes the usual Moore-Penrose pseudoinverse and we have introduced the weighted pseudoinverse, \(\bH ^+=(\bH ^\top \bSigma ^{-1}\bH )^{\dagger }\bH ^\top \bSigma ^{-1}\). Previous analyses of linear EKI have largely considered mean-field limits (equivalent to an infinitely large ensemble) [3] or continuous-time limits of the EKI iteration, in which the deterministic iteration becomes a system of coupled ordinary differential equations (ODEs) [2, 6] and the stochastic iteration becomes a system of coupled stochastic differential equations (SDEs) [1]. These continuous-time analyses have shown that the EKI ensemble covariance collapses at a rate inversely proportional to time [6, 1, 2], meaning that the residual of the EKI iteration (with respect to the final solution) converges at a \(1/\sqrt {i}\) rate. These analyses have also characterized EKI solutions either by assuming \(\bH \) is one-to-one or by assuming the ensemble covariance is full rank. In particular, the works [6] show that if \(\bH \) is one-to-one, then EKI converges to the pre-image of the data restricted to the span of the ensemble [6, 1]. On the other hand, the work [2] shows that if the ensemble covariance is full rank (and \(\bH \) may be low-rank), then EKI converges to the (non-standard) minimizer of (1) closest to the initial ensemble mean in the norm induced by the initial ensemble covariance. The characterization of EKI solutions in the general case where both \(\bH \) and the ensemble covariance may be low-rank, and the relationship between EKI solutions and the standard minimum-norm least-squares solution (2), are open questions addressed in this work.

In this work, we provide a new analysis of EKI for linear observation operators \(\bH \in \R ^{n\times d}\) which directly considers the discrete iteration for a finite ensemble, relying principally on linear algebra as an analysis tool. Our analysis yields new results relating EKI solutions to the standard minimum-norm least-squares solution (2), together with a new and natural interpretation of EKI convergence behavior in terms of ‘fundamental subspaces of EKI’, analogous to the four fundamental subspaces characterizing Strang’s ‘fundamental theorem of linear algebra’ [7], which we now review.

Strang’s four ‘fundamental subspaces of linear algebra’ arise from dividing observation space \(\R ^n\) and state space \(\R ^d\) into two subspaces each, one subspace associated with ‘observable’ directions and a complementary subspace associated with ‘unobservable’ directions [7]. That is, in observation space \(\R ^n\), the two fundamental subspaces are:

1. \(\Ran (\bH )\) (denoting the range of \(\bH \)), and
2. \(\Ker (\bH ^\top \bSigma ^{-1})\) (denoting the null space of \(\bH ^\top \bSigma ^{-1}\)), the \(\bSigma ^{-1}\)-orthogonal complement to \(\Ran (\bH )\).

In state space \(\R ^d\), the two fundamental subspaces are:

1. \(\Ran (\bH ^\top )\), and
2. \(\Ker (\bH )\), the orthogonal complement to \(\Ran (\bH ^\top )\) with respect to the Euclidean norm.

The standard minimum-norm solution (2) to the linear least-squares problem (1) can be understood in terms of these fundamental subspaces as follows (see [5, Figure 1]): in observation space \(\R ^n\), the closest that \(\bH \bv \) can come to \(\by \in \R ^n\) with respect to the \(\bSigma ^{-1}\)-norm is the \(\bSigma ^{-1}\)-orthogonal projection of \(\by \) onto the observable space \(\Ran (\bH )\), which then has a zero component in the (unobservable) subspace \(\Ker (\bH ^\top \bSigma ^{-1})\). In state space \(\R ^d\), directions in \(\Ker (\bH )\) are unobservable because they are mapped by \(\bH \) to zero and thus do not influence the minimand of (1). If \(\Ker (\bH )\) is non-trivial, multiple minimizers of (1) exist. The unique norm-minimizing solution (2) lies in the observable space \(\Ran (\bH ^\top )\) and has a zero component in the unobservable space \(\Ker (\bH )\).

Our analysis reveals that EKI solutions to the weighted least squares problem admit a similar interpretation in terms of fundamental subspaces of EKI. However, the EKI fundamental subspaces arise first from dividing the state and observation spaces into directions that are ‘populated’ by particles, lying in the range of ensemble covariance \(\bGamma _i\) (it is well-known [1, 2, 4, 6] that \(\Ran (\bGamma _i)\) is invariant for all \(i\)), and ‘unpopulated’ directions lying in a complementary subspace. The populated subspace can then be further divided into two subspaces associated with observable and unobservable directions. There are therefore three subspaces in each of the observation and state spaces. In observation space \(\R ^n\), the three fundamental subspaces of EKI are associated with three complementary oblique projection operators, \(\bcP ,\bcQ ,\bcN \in \R ^{n\times n}\). These projections are defined via a spectral analysis of the iteration map which governs the evolution of the data misfit, \(\bH \bv _i^{(j)}-\by \), so that the range of each projector is an invariant subspace under the misfit iteration map. The three fundamental subspaces of observation space \(\R ^n\) are then

1. \(\Ran (\bcP )\equiv \Ran (\bH \bGamma _i)\), associated with observable populated directions,
2. \(\Ran (\bcQ )\equiv \bH \,\Ker (\bGamma _i\fisher )\), associated with observable but unpopulated directions, and
3. \(\Ran (\bcN )\equiv \Ker (\bH ^\top \bSigma ^{-1})\), associated with unobservable directions.

In state space \(\R ^d\), the three fundamental subspaces of EKI are also associated with three complementary oblique projection operators, \(\bbP ,\bbQ ,\bbN \in \R ^{n\times n}\). These projections are defined via a spectral analysis of the iteration map which governs the evolution of the least squares residual, \(\bv _i^{(j)}-\bv ^*\). The range of each projector is again an invariant subspace under the residual iteration map. The three fundamental subspaces of state space \(\R ^d\) are then

1. \(\Ran (\bbP )\subset \Ran (\bGamma _i)\), associated with observable populated directions (but generally not simply the intersection of \(\Ran (\bGamma _i)\) with \(\Ran (\bH ^\top )\)),
2. \(\Ran (\bbQ )\subset \Ran (\bH ^\top )\), associated with observable unpopulated directions, and
3. \(\Ran (\bbN )\), associated with unobservable directions.

The fundamental subspaces of EKI are depicted in [5, Figure 2], and an interactive three-dimensional visualization is available at https://elizqian.github.io/eki-fundamental-subspaces/.

We show that EKI misfits [residuals] converge to zero at a \(1/\sqrt {i}\) rate in the fundamental subspace associated with observable and populated directions, \(\Ran (\bcP )\) [\(\Ran (\bbP )\)], and remain constant in the fundamental subspaces associated with observable unpopulated directions, \(\Ran (\bcQ )\) [\(\Ran (\bbQ )\)]. The misfits [residuals] also remain constant in the unobservable directions, \(\Ran (\bcN )\) [\(\Ran (\bbN )\)]. Numerical experiments illustrating these results may be found in [5, Figure 3]. Our results verify for the discrete iteration and finite ensemble case the \(1/\sqrt {i}\) convergence rate previously shown in continuous time or infinite ensemble limits, and provide the first results describing the relationship between EKI solutions and the standard minimum-norm least squares solution (2).

Our analysis sheds light on several directions of interest for future work connecting EKI with classical iterative methods. Because we have shown that the convergence behavior of deterministic EKI can be characterized in terms of an evolving spectral problem that has invariant subspaces that are independent of iteration index, this allows for straightforward EKI acceleration strategies analogous to overrelaxation schemes in classical stationary iterative methods. Other potential directions of interest could exploit the well-known connection between the extended Kalman filter and Gauss-Newton methods to establish further connections between EKI and classical methods.

References

[1] Dirk Blömker, Claudia Schillings, Philipp Wacker, and Simon Weissmann. Well posedness and convergence analysis of the ensemble Kalman inversion. Inverse Problems, 35(8):085007, 2019.
[2] Leon Bungert and Philipp Wacker. Complete deterministic dynamics and spectral decomposition of the linear ensemble Kalman inversion. SIAM/ASA Journal on Uncertainty Quantification, 11(1):320–357, 2023.
[3] Edoardo Calvello, Sebastian Reich, and Andrew M Stuart. Ensemble Kalman methods: A mean field perspective. arXiv preprint arXiv:2209.11371, 2022.
[4] Marco A. Iglesias, Kody J. H. Law, and Andrew M. Stuart. Ensemble Kalman methods for inverse problems. Inverse Problems, 29:045001, 2013.
[5] Elizabeth Qian and Christopher Beattie. The fundamental subspaces of ensemble Kalman inversion. arXiv:2409.08862, 2024.
[6] Claudia Schillings and Andrew M. Stuart. Analysis of the ensemble Kalman filter for inverse problems. SIAM Journal on Numerical Analysis, 55(3):1264–1290, 2017.
[7] Gilbert Strang. The fundamental theorem of linear algebra. The American Mathematical Monthly, 100(9):848–855, 1993.