PDF
\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\newcommand {\LWRsubmultirow }[2][]{#2}\) \(\newcommand {\LWRmultirow }[2][]{\LWRsubmultirow }\) \(\newcommand {\multirow }[2][]{\LWRmultirow }\) \(\newcommand {\mrowcell }{}\) \(\newcommand {\mcolrowcell }{}\) \(\newcommand {\STneed }[1]{}\) \(\def \hot #1{\textcolor {red}{#1}}\) \(\def \cool #1{\textcolor {blue}{#1}}\) \(\newcommand {\bb }[1]{\mathbb {#1}}\) \(\newcommand {\mbf }[1]{\mathbf {#1}}\) \(\newcommand {\tbf }[1]{\textbf {#1}}\) \(\newcommand {\mc }[1]{\mathcal {#1}}\) \(\newcommand {\bmat }[1]{\begin {bmatrix}#1\end {bmatrix}}\) \(\newcommand {\sbmat }[1]{\bigl [ \begin {smallmatrix}#1\end {smallmatrix} \bigr ]}\) \(\newcommand {\orth }{\textup {\texttt {orth}}}\) \(\newcommand {\range }{\textup {\texttt {range}}}\) \(\newcommand {\MATLAB }{\textsc {Matlab}\xspace }\) \(\DeclareMathOperator {\rank }{rank}\)

Leveraging Numerical Linear Algebra for Robust Learning of Optimal \(\mathcal H_2\) models from time-domain data

Michael S. Ackermann, Serkan Gugercin

Abstract

We investigate the optimal \(\mathcal {H}_2\) approximation of a discrete-time, single-input single-output system

\begin{equation} \begin{aligned} \mbf x[k+1] &= \mbf A\mbf x[k] + \mbf bu[k]\\ y[k] &= \mbf c^{\top }\mbf x[k] \end {aligned} ~~~~~~\mbox {with~transfer~function}~~~~~~H(z) = \mbf c^{\top }(z\mbf I-\mbf A)^{-1}\mbf b, \label {eq:LinSysAndTransferFunction} \end{equation}

where \(\mbf x[k] \in \bb R^{n}\), \(u[k] \in \bb R\), and \(y[k] \in \bb R\) are, respectively, the states, input, and output at time \(k\); \(\mbf A \in \bb R^{n \times n},\mbf b \in \bb R^n,\) and \(\mbf c \in \bb R^n\). Even though we explicitly write the state-space matrices in (1), in this work, we will never assume access to the system matrices, system state, or evaluations of the transfer function, but only to time-domain input-output data

\begin{equation} \label {eq:TimeDomainData} \bb U = [u[0]\,\ldots \,u[T]]^{\top }\in \bb R^{T+1}\quad \text {and}\quad \bb Y = [y[0]\,\ldots \,y[T]]^{\top }\in \bb R^{T+1}. \end{equation}

Given the input/output data (2), we seek to construct a data-driven reduced-order model (DDROM)

\begin{equation} \begin{aligned} \mbf x_r[k+1] &= \mbf A_r\mbf x_r[k] + \mbf b_ru[k]\\ y_r[k] &= \mbf c_r^{\top }\mbf x_r[k] \end {aligned} ~~~~~~\mbox {with~transfer~function}~~~~~~H_r(z) = \mbf c_r^{\top }(z\mbf I_r-\mbf A_r)^{-1}\mbf b_r, \label {eq:ReducedSysAndTransferFunction} \end{equation}

where \(\mbf x_r[k] \in \bb R^r\) is the reduced state, \(y_r[k]\) is the reduced output, and \(\mbf A_r \in \bb R^{r \times r},\mbf b_r \in \bb R^r,\) and \(\mbf c_r \in \bb R^r\) with \(r \ll n\). Specifically, we would like the DDROM (3) to minimize the \(\mathcal {H}_2\) distance

\begin{equation} \label {eq:H2Error} \| H - H_r ||^2_{\mathcal {H}_2} = \frac {1}{2\pi }\int _{-\pi }^{\pi } |H(e^{\mbf i \omega }) - H_r(e^{\mbf i \omega })|^2 d\omega . \end{equation}

The optimal \(\mc H_2\) reduced order modeling problem is of interest because the \(\mc H_2\) error (4) provides a bound on the output error for finite energy inputs [4], more specifically,

\begin{equation} \label {eq:h2Errbound} \| y - y_r\|_{{\mc L}_{\infty }} \leq \|H - H_r \|_{\mathcal H_2} \| u\|_{{\mc L}_{2}}. \end{equation}

The Realization independent Iterative Rational Krylov Algorithm (TF-IRKA) [5] constructs \(\mc H_2\) optimal DDROMs using only samples of the transfer function \(H(\sigma )\) and \(H'(\sigma )\) without explicit access to the underlying dynamics. However, TF-IRKA requires repeated evaluations of \(H(z)\) and \(H'(z)\) at a priori unknown points outside the unit disc, i.e., \(|\sigma | > 1\). In some settings, one cannot actively re-sample \(H(z)\), but is only provided input-output time-domain data as in (2).

In a recent work by Burohman et al. [8], a new method to calculate transfer function evaluations from time-domain data was presented. This method takes the form of a linear system relating the transfer function value \(H(\sigma )\) to the time domain data \((\bb U,\bb Y)\):

\begin{equation} \begin{bmatrix}\bb H_n(\bb U)&\mbf 0\\\bb H_n(\bb Y)&-\gamma _n(\sigma )\end {bmatrix} \begin{bmatrix}\xi \\H(\sigma )\end {bmatrix} = \begin{bmatrix}\gamma _n(\sigma )\\\mbf 0\end {bmatrix}, \label {eq:calcM0Orig} \end{equation}

where

\[\bb H_n(\bb U) = \begin {bmatrix} u[0]&\ldots &u[{T-n}]\\ \vdots &\ddots & \vdots \\ u[n] & \ldots & u[T] \end {bmatrix} \in \bb R^{(n+1) \times (T-n + 1)} \quad \text {and} \quad \gamma _n(\sigma ) = \bmat {1\\\sigma \\ \vdots \\ \sigma ^n} \in \bb C^{n+1}.\]

A similar linear system also relates \(H'(\sigma )\) to the time-domain data \((\bb U,\bb Y)\). While in exact arithmetic (6) enables recovery of \(H(\sigma )\) from time domain data (2), the numerics of the problem are much more subtle. In particular, the stacked Hankel matrices are expected to be extremely ill-conditioned [3, 6, 7], and the presence of \(\sigma ^n\) in \(\gamma _n(\sigma )\) could lead to overflow for large \(n\) and \(|\sigma | > 1\). It is these numerical linear algebra considerations that we cover in this talk.

Consider the classical method to solve (6) via the singular value decomposition of the coefficient matrix in (6)

\[\widehat {\mbf U} \widehat {\Sigma } \widehat {\mbf V}^{\top } = \begin {bmatrix}\bb H_n(\bb U)&\mbf 0\\\bb H_n(\bb Y)&-\gamma _n(\sigma )\end {bmatrix}.\]

We may then solve (6) by computing

\begin{equation} \label {eq:SVDSolution} \begin{bmatrix}\xi \\H(\sigma )\end {bmatrix} = \widehat {\mbf V}\widehat {\Sigma }^{-1}\widehat {\mbf U}^{\top } \begin{bmatrix}\gamma _n(\sigma )\\\mbf 0\end {bmatrix}. \end{equation}

If the solution is computed as in (7), we expect the ill-conditioning present in the coefficient matrix (and reflected in the singular values \(\widehat {\Sigma }\)) to negatively affect the solution accuracy, especially if the data in \((\bb U,\bb Y)\) are noisy.

Our first contribution [1] makes use of the fact that we do not need to solve for the whole vector in the linear system (6); indeed the information in \(\xi \) is not used at all; we only require the last entry of the solution vector to recover \(H(\sigma )\). This allows us to replace all but the last column in the coefficient matrix of (6) by an orthonormal basis for their range and still recover the same last component of the solution vector without needing to invert any singular values.

  • Theorem 1. Assume access to the data (2) and define

    \begin{equation} \mbf U = \orth \left (\bmat { \bb H_n(\bb U)\\ \bb H_n(\bb Y) }\right ). \end{equation}

    Then, the solution to the new linear system

    \begin{equation} \label {eq:calcM0Orth} \bmat { \multirow {2}{*}{\ensuremath {\mbf U}} & \mbf 0\\ & -\gamma _n(\sigma ) } \begin{bmatrix}\hat \xi \\H(\sigma )\end {bmatrix} = \begin{bmatrix}\gamma _n(\sigma )\\\mbf 0\end {bmatrix} \end{equation}

    has the same last component as the original linear system (6).

Therefore, the highly ill-conditioned stacked Hankel matrices may be replaced by an orthonormal basis for their range without changing the last component of the solution vector. Note that this is different than the solution formula (7) where the whole vector is constructed. While theoretically equivalent, Theorem 1 does not require inverting (small/any) singular values as solving (6) via (7) requires. The effect of Theorem 1 is quite dramatic, in some examples reducing the condition number of the coefficient matrix from \(10^{16}\) to \(10^1\). Another advantage of Theorem 1 is that when one must recover \(H(\sigma _i)\) for many different \(\sigma _i\) (as is required for \(\mc H_2\) optimality), the orthonormal basis \(\mbf U\) may be precomputed once and recycled for many transfer function evaluations, reducing the online runtime from \(\mc O(n^3)\) to \(\mc O(n^2)\).

While (9) offers great conditioning improvements over (6) when \(|\sigma | = 1\), the presence of \(\sigma ^n\) in \(\gamma _n(\sigma )\) causes the coefficient matrix in (9) to be badly scaled when \(|\sigma | > 1\). As we seek to construct \(\mc H_2\) optimal reduced models, recovering \(H(\sigma )\) where \(|\sigma | > 1\) is required. Exploration of this issue leads to the problem of finding eigenpairs of a rank-one update to an orthogonal projection

\begin{equation} \label {eq:Rank1UpdateOrthProj} \mbf Q\mbf Q^H + \mbf z\mbf z^H \end{equation}

where \(\mbf Q \in \bb C^{m\times n}\) is subunitary and \(\mbf z \in \bb C^m\) is arbitrary. In our work [2], we give explicit formulas for the eigenvectors and eigenvalues of (10).

  • Theorem 2. Let \(\mbf Q \in \bb C^{m \times n}\) with \(m>n\) be subunitary and \(\mbf z \in \bb C^m\). Let \(\mbf u = \mbf Q \mbf Q^H\mbf z\) and \(\mbf v = (\mbf I - \mbf Q \mbf Q^H)\mbf z\). Assume \(\|\mbf v\| \neq 0\) and \(\|\mbf u\| \neq 0\). Then the extreme nonzero eigenvalues of \(\mbf Q\mbf Q^H + \mbf z\mbf z^H\) are

    \begin{equation} \label {eq:eigUUhxxh} \lambda = \frac {1}{2}\left (1+ \|\mbf z\|^2 \pm \sqrt {1+\|\mbf z\|^4+2\|\mbf z\|^2-4\|\mathbf v\|^2}\right ) \end{equation}

    with associated eigenvectors

    \begin{equation} \frac {1}{2\|\mathbf u\|^2}\left (1-\|\mathbf v\|^2+\|\mathbf u\|^2\pm \sqrt {(\|\mathbf v\|^2-\|\mathbf u\|^2-1)^2+4\|\mathbf u\|^2\|\mathbf v\|^2}\right )\mbf u + \mbf v. \end{equation}

We remark that the expression for the smallest nonzero eigenvalue of \(\mbf Q \mbf Q^* + \mbf z \mbf z^*\) appears similar to the lower bound for the smallest eigenvalue of a perturbed Hermitian matrix found in [9]. While the expressions are similar, we provide an exact expression for the updated extreme nonzero eigenvalues and associated eigenvectors under the additional assumption that the unperturbed matrix is an orthogonal projection.

Clearly, Theorem 2 also gives the condition number of the matrix \(\bmat {\mbf Q & \mbf z}\), which gives us a formula for the condition number of the coefficient matrix in (9). Expressing this condition number formula in terms of only \(\|\mbf z\|\) allows us to prove that normalizing \(\mbf z\) in (9) is the optimal scaling for (9). This optimal scaling is extremely effective at further reducing the condition number of (9) and opens the door for further analysis.

Results of this analysis include a connection between the underlying dynamical system (1) that produced the data (2) and the condition number of the coefficient matrix in (9). Specifically, we show that conditioning is worse when the relative derivative \(|H'(\sigma )/H(\sigma )|\) is large, a link that is not unexpected from the definition of relative condition number for functions. This leads to a method for preventing overflow in initial computation of \(\gamma _n(\sigma )\).

We will expand upon these contributions in the talk, and in addition will showcase the efficacy of our final algorithm on benchmark examples. Comparisons with the TF-IRKA algorithm [5] show that obtaining the data \(H(\sigma )\) from time domain data \((\bb U,\bb Y)\) does not degrade the approximation quality. Also included in these examples will be a demonstration that we can construct \(\mc H_2\) optimal DDROMs from time-domain data obtained from a black-box PDE solver, an exciting indication that we may not require data explicitly obtained from a discrete-time LTI system as in (1).

References