Interpolation-Based Algorithms to Compute the \(\hinf \) Norm
of a Parametric System
Peter Benner, Tim Mitchell
Abstract
Consider the following continuous-time linear time-invariant parametric system:
\(\seteqnumber{1}{1}{0}\)\begin{align} \Ep \dot x &= \Ap x + \Bp u \\ y &= \Cp x + \Dp u, \end{align} where matrices \(\Ep ,\Ap \in \C ^{n \times n}\), \(\Bp \in \C ^{n \times m}\), \(\Cp \in \C ^{p \times n}\), and \(\Dp \in \C ^{p \times m}\) describe the dynamics and vary continuously with respect to the real-valued scalar parameter \(\p \in \P \subset \R \), while the vectors \(x \in \C ^n\), \(u \in \C ^m\), and \(y \in \C ^p\) respectively describe the state, input, and output. The \(\hinf \) norm of (1) is an important quantity in many domains. In engineering applications, it measures how robust the system remains in the presence of noise, while in model order reduction, it is used to measure how well a reduced-order model mimics the dynamical behavior of a large-scale system. For a fixed value of \(\p \in \P \), globally convergent methods for computing the \(\hinf \) norm go back to [BB90, BS90], but here we are interested in efficiently computing the worst (highest) value of \(\hinf \) norm of (1) that occurs over the parameter domain \(\P \), which we denote \(h_\star \), or said another way, the parameter(s) \(\p _\star \in \P \) where \(h_\star \) is attained and (1) is the least robust to noise.
We begin with some preliminaries. We assume that the matrix pencil \(\lambda \Ep - \Ap \) is regular and rank 1 for all values of \(\p \in \P \), all the matrices are differentiable with respect to \(\p \) (except for possibly on a subset of \(\P \) of measure zero), and that the parameter domain consists of a finite number of intervals. The associated transfer function for (1) is
\(\seteqnumber{0}{}{1}\)\begin{equation} G(s; \p ) = \Cp \left (s \Ep - \Ap \right )^{-1}\Bp + \Dp , \end{equation}
where \(s \in \C \), and for a fixed value of \(\p \), its \(\hinf \) norm is defined as
\(\seteqnumber{0}{}{2}\)\begin{equation} \label {eq:hinf_norm} \|G (\cdot ; \p ) \|_\infty = \max _{s \in \C _+} \| \Cp \left (s \Ep - \Ap \right )^{-1}\Bp + \Dp \|_2 \eqqcolon \max _{s \in \C _+} g(s;\p ), \end{equation}
where \(\C _+\) is the closed right half of the complex plane. If the system is known to be asymptotically stable, then the \(\hinf \) norm coincides with the \(\linf \) norm, i.e., the maximization of the norm of the transfer function can be limited to the imaginary axis instead of all \(\C _+\). For fixed \(\p \), let \(\lambda \) be such that \(\det (\lambda \Ep - \Ap )=0\) and let \(x\) and \(y\) respectively be its right and left eigenvectors. Then eigenvalue \(\lambda \) is controllable if \(\Bp ^* y \neq 0\) and it is observable if \(\Cp x \neq 0\). Then \(\|G (\cdot ; \p )\|_\infty < \infty \) provided that all the eigenvalues of \(\lambda \Ep - \Ap \) that are both controllable and observable are finite and in the open left half plane. In sum, our quantities of interest are given by
\(\seteqnumber{0}{}{3}\)\begin{equation} \hs = \max _{\p \in \P } \|G(\cdot ; \p )\|_\infty \qquad \text {and} \qquad \ps = \arg \max _{\p \in \P } \|G(\cdot ; \p )\|_\infty . \end{equation}
One direct way to estimate \(\hs \) would be to simply evaluate (3) using the standard level-set method [BB90, BS90] over a grid on the parameter domain \(\P \), but doing so provides no guarantee that \(\hs \) will be estimated to even moderate accuracy. A more refined yet still rather direct approach would be to globally approximate the value of \(\|G(\cdot ; \p )\|_\infty \) as it varies with \(\p \) over \(\P \), say, by using Chebfun [DHT14], and then simply extract \(\hs \) and \(\ps \) from the resulting interpolant1, but this is likely to be unnecessarily expensive. For each evaluation of \(\|G(\cdot ; \p )\|_\infty \), of which many will be needed, the standard level-set method generally needs several iterations of computing the \(\gamma \)-level set points of \(g(\iunit \omega )\), which involves computing all imaginary eigenvalues of the matrix pencil \(\Mp - \lambda \Np \), where
\(\seteqnumber{1}{5}{0}\)\begin{align} \Mp & \coloneqq \begin{bmatrix} \Ap - \Bp \Rp ^{-1}\Dp ^*\Cp & -\gamma \Bp \Rp ^{-1} \Bp ^* \\ \gamma \Cp \Sp ^{-1} \Cp & -(\Ap - \Bp \Rp ^{-1} \Dp ^* \Cp )^* \end {bmatrix}, \\ \Np & \coloneqq \begin{bmatrix} \Ep & 0 \\ 0 & \Ep ^* \end {bmatrix}, \\ \Rp & \coloneqq \Dp ^* \Dp - \gamma ^2 I, \\ \Sp & \coloneqq \Dp \Dp ^* - \gamma ^2 I. \end{align} The cost can be significantly reduced by instead using the \(\hinf \)-norm method of [BM18], as it typically reduces the number of eigenvalue computations of \(\Mp - \lambda \Np \) to just one or two values of \(\gamma \). However, if the system (1) is unstable for some values of \(\p \in \P \), then \(\hs = \infty \), but many eigenvalue computations of \(\Mp - \lambda \Np \) may be incurred to ascertain that fact in the process interpolating \(\|G(\cdot ; \p )\|_\infty \) over \(\P \). This suggests that in order to be efficient, a new algorithm that first separately addresses the question of stability and then only proceeds with further computation when \(\hs < \infty \) is needed.
For any \(\p \in \P \), define
\(\seteqnumber{1}{6}{0}\)\begin{align} \Lambda (\p ) &\coloneqq \{ \lambda \in \C : \det (\lambda \Ep - \Ap ) = 0, \text {$\lambda $ is both controllable and observable} \}, \\ \alpha (\p ) &\coloneqq \max \{ \Re \lambda : \lambda \in \Lambda (\p ) \}, \end{align} where we take \(\Re \lambda = +\infty \) for any non-finite \(\lambda \in \Lambda (\p )\). Then the system (1) is asymptotically stable if
\(\seteqnumber{0}{}{6}\)\begin{equation} \alpha _\star \coloneqq \max _{\p \in \P } \alpha (\p ) < 0, \end{equation}
and so we can determine if \(\hs < \infty \) by approximating function \(\alpha \) over \(\P \) using Chebfun, as has been done in [HMMS22] to check stability when constructing stable \(\mathcal {H}_2 \otimes \mathcal {L}_2\) reduced order models for parametric systems via optimization. Although \(\alpha \) may be discontinuous, either because \(\P \) consists of more than one interval or an eigenvalue becomes or ceases to be controllable or observable as \(\p \) varies, Chebfun can reliably approximate functions with jumps [PPT10].
Although we propose using global approximation of \(\alpha \) to ascertain \(\hs < \infty \), we do not suggest globally approximating \(\|G(\cdot ; \p )\|_\infty \) to compute \(\hs \) when it is finite. Instead, we propose an optimization-with-restarts method that directly computes local maximizers of the two-real variable optimization problem
\(\seteqnumber{0}{}{7}\)\begin{equation} \hs = \max _{\omega \in \R , \p \in \P } g(\iunit \omega ; \p ), \end{equation}
and then uses an interpolation-based globality certificate to either certify that the local maximizer is in fact a global maximizer where \(\hs \) is attained or provides new starting points on the \(\gamma \)-level set of \(g\) to restart the local optimization phase, where \(\gamma = g(\iunit \hat \omega ,\hat \p )\) for a computed local maximizer \((\hat \omega ,\hat \p )\). Interpolation-based globality certificates were first conceived in [Mit21] to develop faster and more reliable algorithms for computing Kreiss constants and the distance to uncontrollability and have since been extended to computing the quantity sep-lambda [Mit23].
Even though \(g\) may have points where it is a nonsmooth, the subset of such points has measure zero, so obtaining local maximizers of \(g\) can be done with relative ease and efficiency using gradient-based methods such as BFGS [LO13] or gradient sampling [BCL\(^{+}\)20], particularly since there are only two optimization variables and often \(m,p \ll n\). Then, with a candidate local maximizer \((\hat \omega ,\hat \p )\) of \(g\) in hand and \(\gamma = g(\iunit \hat \omega ,\hat \p )\), we check whether it is a global maximizer by approximating the one-variable function
\(\seteqnumber{0}{}{8}\)\begin{equation} c_\gamma (\p ) \coloneqq \min \{ (\Re \lambda )^2 : \det (\Mp - \lambda \Np ) = 0, \Re \lambda \geq 0 \}, \end{equation}
which is continuous on each interval in \(\P \) and where the squaring acts to smooth out non-Lipschitz behavior when a double imaginary eigenvalues bifurcates into a pair of eigenvalues with imaginary axis symmetry. Function \(c_\gamma \) is analogous to the eigenvalue-based functions that are globally approximated in the interpolation-based globality certificates used in [Mit21, Mit23], and in our setting here, has the following key properties:
-
(i) \(c_\gamma (\p ) \geq 0\) for all \(\p \in \P \).
-
(ii) If \(\gamma > \hs \), then \(c_\gamma (\p ) > 0\) for all \(\p \in \P \).
-
(iii) If \(\gamma < \hs \), then \(c_\gamma (\p ) = 0\) holds on subset of \(\P \) with positive measure.
By approximating \(c_\gamma \) globally on \(\P \), we can determine whether or not \(\gamma \) is above or below \(\hs \). When it is below, we need only find zeros of \(c_\gamma \), which are relatively easy to find by Property (iii). Meanwhile, if \(\gamma > \hs \), which will be true if \((\hat \omega ,\hat \p )\) is a global maximizer and we perturb the value \(\gamma = g(\iunit \hat \omega ,\hat \p )\) slightly upward by a tolerance, then globally approximating \(c_\gamma \) determines that it is strictly positive on \(\P \), thus certifying that \((\hat \omega ,\hat \p )\) is indeed a global maximizer and \(\hs \) has been attained. A practical benefit of approximating \(c_\gamma \) is cost; evaluating \(c_\gamma (\p )\) always only requires a single eigenvalue computation with \(\Mp - \lambda \Np \) and negligible amount of constant-time additional work, while evaluating \(\|G(\cdot ;\p )\|_\infty \) may require more than one eigenvalue computation and also does other non-constant-time work on top of that.
In general, only a handful of restarts are needed by our method and the overall work is almost entirely dominated by approximating the function \(c_\gamma \) for the final value of \(\gamma \approx \hs \), properties which we have also observed in our prior work with interpolation-based globality certificates [Mit21, Mit23]. In total, the algorithm requires \(\bigO (kn^3)\) work, where \(k\) is the total number of evaluations of \(c_\gamma \) over all values of \(\gamma \). Although \(k\) may be large, it often is not strongly correlated with the number of system states \(n\), and it corresponds to a task that is embarrassingly parallel and so its effect can be significantly diminished on multi-core machines. Consequently, our method tends to act like a cubically scaling method that has a large constant term. We have also extended this approach to compute the worst-case \(\hinf \) norm of parametric discrete-time systems. In contrast, while it might be possible to extend 2D level-set tests [Gu00, GMO\(^{+}\)06, GO06, Mit20] to finding a global maximizer of \(g\) or its discrete-time analogue, at least in some cases, based on our past experience with that technique, we believe the resulting methods would likely be both much slower and less reliable due to rounding error.
1 Chebfun can do this extraction phase exceptionally fast as it produces a piecewise Chebyshev polynomial.
References
-
[BB90] S. Boyd and V. Balakrishnan. A regularity result for the singular values of a transfer matrix and a quadratically convergent algorithm for computing its \({L}_\infty \)-norm. Systems Control Lett., 15(1):1–7, 1990.
-
[BCL\(^{+}\)20] J. V. Burke, F. E. Curtis, A. S. Lewis, M. L. Overton, and L. E. A. Simões. Gradient sampling methods for nonsmooth optimization. In A. M. Bagirov, M. Gaudioso, N. Karmitsa, M. M. Mäkelä, and S. Taheri, editors, Numerical Nonsmooth Optimization: State of the Art Algorithms, pages 201–225, Cham, 2020. Springer International Publishing.
-
[BM18] P. Benner and T. Mitchell. Faster and more accurate computation of the \(\mathcal {H}_\infty \) norm via optimization. SIAM J. Sci. Comput., 40(5):A3609–A3635, October 2018.
-
[BS90] N. A. Bruinsma and M. Steinbuch. A fast algorithm to compute the \(\mathcal {H}_{\infty }\)-norm of a transfer function matrix. Systems Control Lett., 14(4):287–293, 1990.
-
[DHT14] T. A Driscoll, N. Hale, and L. N. Trefethen. Chebfun Guide. Pafnuty Publications, Oxford, UK, 2014.
-
[GMO\(^{+}\)06] M. Gu, E. Mengi, M. L. Overton, J. Xia, and J. Zhu. Fast methods for estimating the distance to uncontrollability. SIAM J. Matrix Anal. Appl., 28(2):477–502, 2006.
-
[GO06] M. Gu and M. L. Overton. An algorithm to compute \(\mathrm {Sep}_\lambda \). SIAM J. Matrix Anal. Appl., 28(2):348–359, 2006.
-
[Gu00] M. Gu. New methods for estimating the distance to uncontrollability. SIAM J. Matrix Anal. Appl., 21(3):989–1003, 2000.
-
[HMMS22] M. Hund, T. Mitchell, P. Mlinarić, and J. Saak. Optimization-based parametric model order reduction via \(\mathcal {H}_2 \otimes \mathcal {L}_2\) first-order necessary conditions. SIAM J. Sci. Comput., 44(3):A1554–A1578, 2022.
-
[LO13] A. S. Lewis and M. L. Overton. Nonsmooth optimization via quasi-Newton methods. Math. Program., 141(1–2, Ser. A):135–163, 2013.
-
[Mit20] T. Mitchell. Computing the Kreiss constant of a matrix. SIAM J. Matrix Anal. Appl., 41(4):1944–1975, 2020.
-
[Mit21] T. Mitchell. Fast interpolation-based globality certificates for computing Kreiss constants and the distance to uncontrollability. SIAM J. Matrix Anal. Appl., 42(2):578–607, 2021.
-
[Mit23] T. Mitchell. Fast computation of sep\(_\lambda \) via interpolation-based globality certificates. Electron. Trans. Numer. Anal., 58:402–431, 2023.
-
[PPT10] R. Pachón, R. B. Platte, and L. N. Trefethen. Piecewise-smooth chebfuns. IMA J. Numer. Anal., 30(4):898–916, July 2010.