Quantum gradient estimation

Rough overview (in words)

Estimating the gradient of a high-dimensional function is a widely useful subroutine of classical and quantum algorithms. The function's gradient at a certain point can be classically estimated by querying the value of the function at many nearby points. However, the number of evaluations will scale with the number of dimensions in the function, which can be very large. By contrast, the quantum gradient estimation algorithm evaluates the function a constant number of times (in superposition over many nearby points) and uses interference effects to produce the estimate of the gradient. While there are caveats related to the precise access model and the classical complexity of gradient estimation in specific applications, this procedure can potentially lead to significant quantum speedups.

Rough overview (in math)

Let \(f: \mathbb{R}^d \rightarrow \mathbb{R}\) be a real function on \(d\)-dimensional inputs, and assume that is differentiable at a specific input of interest, taken to be the origin \(\mathbf{0} = (0,0,\ldots,0)\) for simplicity (the algorithm works equally well elsewhere). Let \(g = (g_1,\ldots,g_d)\) denote the gradient of \(f\) at \(\mathbf{0}\), i.e., \(g = \nabla f(\mathbf{0})\). We wish to produce a classical estimate \(\tilde{g}\) of \(g\) that satisfies \(\lvert g_j - \tilde{g}_j \rvert < \varepsilon\) for all \(j =1,\ldots, d\).

Ignoring higher-order terms, the function may be approximated near the origin as \(f(x) \approx f(\mathbf{0})+\langle g, x \rangle\), where \(\langle \cdot, \cdot \rangle\) denotes the normal inner product. The original gradient estimation algorithm by Jordan [1] then considers a \(d\)-dimensional grid of points near the origin denoted by \(G\). For simplicity, suppose on each of the \(d\) dimensions, the grid has \(N\) evenly spaced points on the interval \([-\ell/2,\ell/2]\), for a certain parameter \(\ell\) related to the precision requirements of the algorithm. The quantum algorithm prepares a superposition of the grid points \(x \in G\) and computes function \(f(x)\) (times a constant \(N/\ell\)) into the phase, producing the state

\[\begin{equation} \frac{1}{\sqrt{N^d}} \sum_{x \in G} e^{i2\pi N f(x)/\ell}\ket{x} \approx \frac{e^{i2\pi N f(\mathbf{0})/\ell}}{\sqrt{N^d}}\sum_{x \in G} e^{i2\pi N g \cdot x/\ell}\ket{x} \end{equation}\]

where \(\ket{x}\) denotes the product state \(\ket{x_1}\ket{x_2}\ldots\ket{x_d}\) with \(x_j\) the binary representation of the \(j\)th dimension of \(x\). With this in mind, the latter state is rewritten as the product state

\[\begin{equation} \frac{e^{i2\pi N f(\mathbf{0})/\ell}}{\sqrt{N^d}}\left(e^{-\pi i N g_1}\sum_{l_1 = 0}^{N-1} e^{2\pi i l_1 g_1}\ket{l_1}\right)\left(e^{-\pi i N g_2}\sum_{l_2 = 0}^{N-1} e^{2\pi i l_2 g_2}\ket{l_2}\right)\ldots \left(e^{-\pi i N g_d}\sum_{l_d = 0}^{N-1} e^{2\pi i l_d g_d}\ket{l_d}\right) \end{equation}\]

Due to the approximated linearity of \(f\), each of the product state constituents is observed to be close to a basis state in the Fourier basis. By performing an inverse quantum Fourier transform (QFT) in parallel for each of the \(d\) dimensions and measuring in the computational basis, a computational basis state

\[\begin{equation} \ket{\tilde{g}} = \ket{\tilde{g}_1}\ket{\tilde{g}_2}\ldots \ket{\tilde{g}_d} \end{equation}\]

is retrieved (up to an unimportant global phase), where with high probability \(\tilde{g}_j\) approximates \(g_j\) to \(\log_2(N)\) bits of precision. Taking \(N = \mathcal{O}(1/\varepsilon)\) suffices to solve the problem. In a full analysis, one must make sure not to choose \(\ell\) too large (else the linearity approximation breaks down).

In [1], the unitary \(U_f\) sending \(\ket{x} \mapsto e^{i2\pi N f(x)/\ell} \ket{x}\) was performed using a constant number of calls to the evaluation oracle that computes an approximation to \(f(x)\) to precision \(\smash{\mathcal{O}(\varepsilon^2/\sqrt{d})}\) into an ancilla register. In [2], \(U_f\) was implemented using \(\mathcal{O}(\sqrt{d}/\varepsilon)\) calls to a "probability oracle" that (assuming \(0 \leq f(x) \leq 1\)) performs the map \(\ket{x}\ket{0} \mapsto \sqrt{f(x)}\ket{x}\ket{1} + \sqrt{1-f(x)} \ket{x}\ket{0}\). Additionally, [2] improved the algorithm sketched above by explicitly using finite difference formulas in place of evenly spaced grids to put the gradient into the phase.

The gradient estimation algorithm can be viewed as a generalization of the Bernstein–Vazirani algorithm [3], which considers binary functions \(f:\{0,1\}^n \rightarrow \{0,1\}\), and, promised that \(f(x) = \langle g, x \rangle \bmod 2\) for some unknown vector \(g\), determines \(g\) with one query to \(f\).

Dominant resource cost (gates/qubits)

The superposition over grid points can be easily accomplished with Hadamard gates. Likewise, the inverse QFT operation is relatively cheap. The number of qubits is \(O(d \log(N))\), and the number of elementary operations for each of the \(d\) parallel QFTs is \(\text{polylog}(N)\). Additionally, an important component of the complexity comes from performing the unitary \(U_f\), which requires implementing either an evaluation oracle or a probability oracle for the function \(f\). If one has access to an evaluation oracle, the function must be evaluated to precision \(\smash{\mathcal{O}(\varepsilon^2/\sqrt{d})}\). Thus, if function evaluations can be made to precision \(\delta\) in time \(\mathrm{polylog}(d,1/\delta)\), the overall runtime of the quantum subroutine will be \(\mathrm{polylog}(d,1/\varepsilon)\), a potentially exponential speedup. In the case that one has access to a probability oracle, a number of calls scaling as \(\smash{\mathcal{O}(\sqrt{d}/\varepsilon)}\) must be made.

For some functions, it is possible to classically compute \(f(x)\) to precision \(\delta\) with complexity \(\text{poly}(d,\log(1/\delta))\). This can be turned into a quantum circuit \(U_f\) with a comparable gate complexity. For other functions, computing \(f(x)\) may be much harder. For example, if \(f(x)\) is defined as the output probability of a quantum circuit, then computing \(f(x)\) to precision \(\delta\) might be difficult for a classical computer, and even on a quantum computer, it generally requires \(\mathcal{O}\left( 1/\delta \right)\) complexity. However, in this case, implementing a probability oracle is simple, leading to the motivation for the work of [2].

Caveats

Jordan's formulation of the algorithm [1] appears to offer a large quantum speedup by accomplishing in a single quantum query what requires \(\mathcal{O}\left( d \right)\) classical queries. However, this requires a fairly strong access model where one has access to an oracle for computing the value of the function \(f\) to high precision. For an exponential speedup to be possible, precision \(\varepsilon\) must be achievable at cost \(\mathrm{polylog}(d,1/\varepsilon)\). Unfortunately, for actual functions \(f\) that show up in applications where this is possible, it is often the case that one can classically compute the gradient much more efficiently than simply querying the value of \(f\) at many nearby points. Indeed, the "cheap gradient principle" [4, 5] asserts that (in many practical situations) computing the gradient has roughly the same cost as computing the function itself. This principle limits the scope of application of the large speedup of Jordan's algorithm.

By contrast, [2] shows how the gradient can be computed using a probability oracle rather than an evaluation oracle, which makes the algorithm compatible with computing gradients in the setting of variational quantum algorithms. However, \(\mathcal{O}(\sqrt{d}/\varepsilon)\) calls to the oracle are required, which represents a (much less dramatic) quadratic speedup compared to the strategy of using the probability oracle to estimate \(f(x)\) at many nearby points and subsequently estimating the gradient classically.

Example use cases

Convex optimization: In convex optimization, local optima are also global optima, and thus a global optimum can be found by greedy methods such as gradient descent. When one can efficiently compute the function \(f\) much more cheaply than computing its gradient, the quantum gradient estimation algorithm can give rise to a speedup over classical optimization procedures [6, 7].
Pure state tomography: Given access to a unitary \(U\) that prepares the pure state \(\ket{\psi}\), [8] utilizes the gradient estimation algorithm to estimate the amplitudes of \(\ket{\psi}\) in the computational basis using an optimal number of queries to \(U\).
Estimating multiple expectation values: amplitude estimation can be used to estimate an expectation value to precision \(\epsilon\) at cost \(\mathcal{O}\left( 1/\epsilon \right)\). In [9, 8], it is shown how the gradient estimation algorithm further allows \(M\) expectation values to be simultaneously estimated at cost \(\smash{\tilde{\mathcal{O}}(\sqrt{M}/\epsilon)}\) calls to a state preparation unitary, considered the most expensive part of the circuit.
Computing molecular forces: while ground-state energies are the object most often studied in algorithms for quantum chemistry, other interesting quantities such as molecular forces can be related to gradients of molecular energies. Reference [10] studies how the gradient estimation algorithm can be leveraged into a quantum algorithm for computing such quantities.
Escaping saddle points: Although not the essential ingredient, the gradient estimation algorithm was used in the algorithm of [11] for escaping saddle points.
Variational quantum algorithms: Variational quantum algorithms involve optimizing the parameters of a quantum circuit under some cost function. The ability to estimate the gradient of the cost function with respect to the parameters might allow acceleration of this loop.
Financial market risk analysis: In [12], the quantum gradient estimation subroutine was utilized to compute "the greeks," parameters associated with financial market sensitivity.

Bibliography

Stephen P. Jordan. Fast quantum algorithm for numerical gradient estimation. Physical Review Letters, 95(5):050501, 2005. arXiv: https://arxiv.org/abs/quant-ph/0405146. doi:10.1103/PhysRevLett.95.050501.
András Gilyén, Srinivasan Arunachalam, and Nathan Wiebe. Optimizing quantum optimization algorithms via faster quantum gradient computation. In Proceedings of the 30th ACM-SIAM Symposium on Discrete Algorithms (SODA), 1425–1444. 2019. arXiv: https://arxiv.org/abs/1711.00465. doi:10.1137/1.9781611975482.87.
Ethan Bernstein and Umesh Vazirani. Quantum complexity theory. SIAM Journal on Computing, 26(5):1411–1473, 1997. Earlier version in STOC'93. doi:10.1137/S0097539796300921.
Andreas Griewank and Andrea Walther. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. SIAM, 2008. doi:10.1137/1.9780898717761.
Jérôme Bolte, Ryan Boustany, Edouard Pauwels, and Béatrice Pesquet-Popescu. Nonsmooth automatic differentiation: a cheap gradient principle and other complexity results. arXiv: https://arxiv.org/abs/2206.01730, 2022.
Joran van Apeldoorn, András Gilyén, Sander Gribling, and Ronald de Wolf. Convex optimization using quantum oracles. Quantum, 4:220, 2020. arXiv: https://arxiv.org/abs/1809.00643. doi:10.22331/q-2020-01-13-220.
Shouvanik Chakrabarti, Andrew M. Childs, Tongyang Li, and Xiaodi Wu. Quantum algorithms and lower bounds for convex optimization. Quantum, 4:221, 2020. arXiv: https://arxiv.org/abs/1809.01731. doi:10.22331/q-2020-01-13-221.
Joran van Apeldoorn, Arjan Cornelissen, András Gilyén, and Giacomo Nannicini. Quantum tomography using state-preparation unitaries. In Proceedings of the 34th ACM-SIAM Symposium on Discrete Algorithms (SODA), 1265–1318. 2023. arXiv: https://arxiv.org/abs/2207.08800. doi:10.1137/1.9781611977554.ch47.
William J. Huggins, Kianna Wan, Jarrod McClean, Thomas E. O'Brien, Nathan Wiebe, and Ryan Babbush. Nearly optimal quantum algorithm for estimating multiple expectation values. Physical Review Letters, 129:240501, 12 2022. arXiv: https://arxiv.org/abs/2111.09283. URL: https://link.aps.org/doi/10.1103/PhysRevLett.129.240501, doi:10.1103/PhysRevLett.129.240501.
Thomas E. O'Brien, Michael Streif, Nicholas C. Rubin, Raffaele Santagati, Yuan Su, William J. Huggins, Joshua J. Goings, Nikolaj Moll, Elica Kyoseva, Matthias Degroote, Christofer S. Tautermann, Joonho Lee, Dominic W. Berry, Nathan Wiebe, and Ryan Babbush. Efficient quantum computation of molecular forces and other energy gradients. Physical Review Research, 4:043210, 12 2022. arXiv: https://arxiv.org/abs/2111.12437. URL: https://link.aps.org/doi/10.1103/PhysRevResearch.4.043210, doi:10.1103/PhysRevResearch.4.043210.
Chenyi Zhang, Jiaqi Leng, and Tongyang Li. Quantum algorithms for escaping from saddle points. Quantum, 5:529, 8 2021. arXiv: https://arxiv.org/abs/2007.10253. URL: https://doi.org/10.22331/q-2021-08-20-529, doi:10.22331/q-2021-08-20-529.
Nikitas Stamatopoulos, Guglielmo Mazzola, Stefan Woerner, and William J Zeng. Towards quantum advantage in financial market risk using quantum gradient algorithms. Quantum, 6:770, 2022. arXiv: https://arxiv.org/abs/2111.12509. doi:10.22331/q-2022-07-20-770.