General convex optimization

Overview

A convex problem asks to optimize a convex function \(f\) over a convex set \(K\), where \(K\) is a subset of \(\mathbb{R}^n\). Here we examine the situation where the value of \(f(x)\) and the membership of \(x\) in the set \(K\) can each be efficiently computed classically. However, we do not exploit/assume any additional structure that may be present in \(f\) or \(K\). This situation contrasts with that of solving conic programs, where \(f\) is linear and \(K\) is an intersection of convex cones and affine spaces, features that can be exploited to yield more efficient classical and quantum algorithms.

A so-called "zeroth-order" solution to this problem solves it simply by adaptively evaluating \(f(x)\) and \(x \in K\) for different values of \(x\). For the zeroth-order approach, a quantum algorithm can obtain a quadratic speedup with respect to the number of times these functions are evaluated, reducing it from \(\tilde{\mathcal{O}}(n^2)\) to \(\tilde{\mathcal{O}}(n)\), where \(\tilde{\mathcal{O}}\) notation hides factors polylogarithmic in \(n\) and other parameters. This could lead to a practical speedup only if the cost to evaluate \(f(x)\) and \(x\in K\) is large, and lack of structure rules out other, possibly faster, approaches to solving the problem.

Actual end-to-end problem(s) solved

Suppose we have classical algorithms \(\mathcal{A}_f\) for computing \(f(x)\) and \(\mathcal{A}_K\) for computing \(x \in K\) ("membership oracle"), which require \(C_f\) and \(C_K\) gates to perform with a reversible classical circuit, respectively. Suppose further we have an initial point \(x_0 \in K\) and that we have two numbers \(r\) and \(R\) for which we know that \(B(x_0,r) \subset K \subset B(x_0,R)\), where \(B(y,t) = \{z \in \mathbb{R}^n: \lVert z-y \rVert \leq t\}\) denotes the ball of radius \(t\) centered at \(y\). Using \(\mathcal{A}_f\), \(\mathcal{A}_K\), \(x_0\), \(r\), \(R\), and \(\epsilon\) as input, the output is a point \(\tilde{x}\) that is \(\epsilon\)-optimal, that is, it satisfies

\[\begin{equation} f(\tilde{x}) \leq \min_{x \in K}f(x) + \epsilon\,. \end{equation}\]

Dominant resource cost/complexity

The work of [1] and [2] independently establish that there is a quantum algorithm that solves this problem with gate complexity upper bounded by

\[\begin{equation} \left[(C_f + C_K)n +n^3\right]\cdot \text{polylog}(nR/r\epsilon)\,, \end{equation}\]

where the polylogarithmic factors were left unspecified. The rough idea behind the algorithm is to leverage the quantum gradient estimation algorithm to implement a separation oracle—a routine that determines membership \(x\in K\) and when \(x \not\in K\) outputs a hyperplane separating \(x\) from all points in \(K\)—using only \(\mathcal{O}\left( 1 \right)\) queries to algorithm \(\mathcal{A}_K\) and \(\mathcal{A}_f\). It had been previously established that \(\tilde{\mathcal{O}}(n)\) queries to a separation oracle then suffice to perform optimization [3], where \(\tilde{\mathcal{O}}\) denotes that logarithmic factors have been suppressed.

Existing error corrected resource estimates

There have not been any such resource estimates for this algorithm. It may not make sense to perform such an estimate without a more concrete scenario in mind, as the estimate would highly depend on the complexity of performing the circuits for \(\mathcal{A}_f\) and \(\mathcal{A}_K\).

Caveats

One caveat is that the quantum algorithm must coherently perform reversible implementations of the classical functions that compute \(f(x)\) and \(x \in K\). Compared to a nonreversible classical implementation, this may cost additional ancilla qubits and gates. Another caveat relates to the scenario where \(f(x)\) and \(x\in K\) are determined by classical data stored in a classical database. Such a situation may appear to be an appealing place to look for applications of this algorithm because when \(f\) and \(K\) are determined empirically rather than analytically, it becomes easier to argue that there is no structure that can be exploited. However, in such a situation, implementing \(\mathcal{A}_f\) and \(\mathcal{A}_K\) would require a large gate complexity \(C_f\) and \(C_K\) scaling with the size of the classical database. It would almost certainly be the case that a quantum random access memory (QRAM) admitting log-depth queries would be needed in order for the algorithm to remain competitive with classical implementations that have access to classical RAM, and the practical feasibility of building a large-scale log-depth QRAM has many additional caveats.

Another caveat is that there may not be many practical situations that are compatible with a quantum speedup by this algorithm. The source of the speedup in [1, 2] comes from a separation between the complexity of computing the gradient of \(f\) classically vs. quantumly using calls to the function \(f\). Classically, this requires at least linear-in-\(n\) number of calls. Quantumly, it can be done in \(\mathcal{O}\left( 1 \right)\) calls using the quantum algorithm for gradient estimation. In both the classical and the quantum case, the gradient can subsequently be used to construct a "separation" oracle for the set \(K\), which is then used to solve the convex problem.

Thus, a speedup is only possible if there is no obvious way to classically compute the gradient of \(f\) other than to evaluate \(f\) at many points. This criterion is violated in many practical situations, which are often said to obey a "cheap gradient principle" [4, 5] that asserts that the gradient of \(f\) can be computed in time comparable to the time required to evaluate \(f\). For example, the fact that gradients are cheap is crucial for training modern machine learning models with a large number of parameters. When this is the case, the algorithms from [1, 2] do not offer a speedup. On the other hand, as observed in [2, Footnote 19] a nontrivial example of a problem where the cheap gradient principle may fail (enabling a possible advantage for these quantum algorithms) is the moment polytope problem, which has connections to quantum information [6].

When both the function \(f\) and the gradient of \(f\) can be evaluated at unit cost, this constitutes "first-order" optimization, which can be solved by gradient descent. However, gradient descent does not generally offer a quantum speedup, as general quantum lower bounds match classical upper bounds [7], although a quantum speedup could exist in specific cases.

Comparable classical complexity

The best classical algorithm [8] in the same setting has complexity

\[\begin{equation} \left[(C'_f + C'_K)n^2 +n^3\right]\cdot \text{polylog}(nR/r\epsilon)\,, \end{equation}\]

where \(C_f'\) and \(C'_K\) denote the classical complexity of evaluating \(f\) and querying membership in \(K\), respectively, without the restriction that the circuit be reversible.

Speedup

The speedup is greatest when quantities \(C_f\) and \(C_K\) are large compared to \(n\) and roughly equal to \(C'_f\) and \(C'_K\). In this case, the quantum algorithm can provide an \(\mathcal{O}\left( n \right)\) speedup, which is at best a polynomial speedup. The maximal power of the polynomial would be obtained if \(C_f+C_K\approx C'_f+C'_K\) scales as \(n^2\), corresponding to a subquadratic speedup from \(\mathcal{O}\left( n^4 \right)\) to \(\mathcal{O}\left( n^3 \right)\).

Outlook

The only analyses of this strategy are theoretical in nature, interested more so in the query complexity of solving this problem than any specific applications it might have. As such, the analysis is not sufficiently fine-grained to determine any impact from constant factors or logarithmic factors. While a quadratic speedup in query complexity is possible, the maximal speedup in gate complexity is smaller than quadratic. Moreover, there is a lack of concrete problems that fit into the paradigm of "structureless" quantum convex optimization. Together, these factors make it unlikely that a practical quantum advantage can be found in this instance.

Bibliography

Shouvanik Chakrabarti, Andrew M. Childs, Tongyang Li, and Xiaodi Wu. Quantum algorithms and lower bounds for convex optimization. Quantum, 4:221, 2020. arXiv: https://arxiv.org/abs/1809.01731. doi:10.22331/q-2020-01-13-221.
Joran van Apeldoorn, András Gilyén, Sander Gribling, and Ronald de Wolf. Convex optimization using quantum oracles. Quantum, 4:220, 2020. arXiv: https://arxiv.org/abs/1809.00643. doi:10.22331/q-2020-01-13-220.
Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. In Proceedings of the 56th IEEE Symposium on Foundations of Computer Science (FOCS), 1049–1065. 2015. arXiv: https://arxiv.org/abs/1508.04874. doi:10.1109/FOCS.2015.68.
Andreas Griewank and Andrea Walther. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. SIAM, 2008. doi:10.1137/1.9780898717761.
Jérôme Bolte, Ryan Boustany, Edouard Pauwels, and Béatrice Pesquet-Popescu. Nonsmooth automatic differentiation: a cheap gradient principle and other complexity results. arXiv: https://arxiv.org/abs/2206.01730, 2022.
Peter Bürgisser, Cole Franks, Ankit Garg, Rafael Oliveira, Michael Walter, and Avi Wigderson. Efficient algorithms for tensor scaling, quantum marginals, and moment polytopes. In Proceedings of the 59th IEEE Symposium on Foundations of Computer Science (FOCS), 883–897. 2018. arXiv: https://arxiv.org/abs/1804.04739. URL: http://ieee-focs.org/FOCS-2018-Papers/pdfs/59f883.pdf, doi:10.1109/FOCS.2018.00088.
Ankit Garg, Robin Kothari, Praneeth Netrapalli, and Suhail Sherif. No quantum speedup over gradient descent for non-smooth convex optimization. In James R. Lee, editor, Proceedings of the 12th Innovations in Theoretical Computer Science Conference (ITCS), volume 185 of Leibniz International Proceedings in Informatics (LIPIcs), 53:1–53:20. Dagstuhl, Germany, 2021. Schloss Dagstuhl–Leibniz-Zentrum für Informatik. arXiv: https://arxiv.org/abs/2010.01801. URL: https://drops.dagstuhl.de/opus/volltexte/2021/13592, doi:10.4230/LIPIcs.ITCS.2021.53.
Yin Tat Lee, Aaron Sidford, and Santosh S. Vempala. Efficient convex optimization with membership oracles. In Proceedings of the 31st Conference On Learning Theory (COLT), 1292–1294. 2018. arXiv: https://arxiv.org/abs/1706.07357. URL: http://proceedings.mlr.press/v75/lee18a.html.