Zero-sum games: Computing Nash equilibria

Overview

In a two-player zero-sum game, each player independently chooses a strategy and then receives a "payoff" (such that the sum of the payoffs is always zero) that depends on which pair of strategies was chosen. A Nash equilibrium is the optimal way of probabilistically choosing a strategy that maximizes a player's worst-case payoff. The problem of computing a Nash equilibrium is, in a certain sense, equivalent to solving a Linear Program (LP): computing a Nash equilibrium is a special case of LP, and conversely any LP can be reduced to computing a Nash equilibrium at the expense of introducing dependencies on a certain instance-specific "scale-invariant" precision parameter [1]. However, the quantum approach to solving LPs based on the multiplicative weights update method [1] is more efficient in the special case of computing Nash equilibria, and has fewer caveats. It gives a potentially quadratic speedup over its classical counterpart.

Actual end-to-end problem(s) solved

A two-player zero-sum game is defined by an \(n \times m\) matrix \(A\) called the "payoff matrix," which specifies how much player 1 wins from player 2 when player 1 plays (pure) strategy \(i \in [n]\) and player 2 plays (pure) strategy \(j \in [m]\). A pure strategy is one in which the players use one fixed strategy in each game. By contrast, a mixed strategy is one in which players randomly choose a pure strategy, according to some probability distribution. Assume the entries of \(A\) are between \(-1\) and \(1\). A Nash equilibrium is an optimal (generally mixed) strategy that maximizes a player's worst-case payoff regardless of the other player's choice. That is, a distribution \(y \in \Delta^m\), where \(\Delta^m\) denotes the \(m\)-dimensional probability simplex, is an optimal strategy for player 2 if it is the argument that optimizes the equation

\[\begin{equation} % \begin{align} \lambda^* = \min_{y \in \Delta^m} \max_{i \in [n]} e_i^\intercal A y % \end{align} \end{equation}\]

where \([n]\) denotes the set of strategies available to player 1, and \(e_i\) denotes a basis state associated with strategy \(i\). The quantity \(\lambda^*\) is the value of the game. This can be rewritten explicitly [1] as the following LP

\[\begin{align} & \min_{y \in \mathbb{R}^m} \lambda \\ \text{subject to }\qquad & Ay \leq \lambda \mathbf{1}, \qquad \sum_j y_j = 1, \qquad y_j \geq 0\;\forall j \end{align}\]

where \(\mathbf{1}\) is the all-ones vector. The dual LP for the above then corresponds to computing the Nash equilibrium for player 1.

The end-to-end problem solved is to, given access to the entries of the matrix \(A\) and an error parameter \(\epsilon\), compute a probability vector \(y\) such that

\[\begin{equation} Ay \leq (\lambda^* + \epsilon) \mathbf{1}\,. \end{equation}\]

Dominant resource cost/complexity

The quantum algorithm builds from a classical algorithm based on the multiplicative weights update method from [2]. With probability at least \(1-\delta\), the classical algorithm finds a solution \(y\) that approximates the Nash equilibrium to error \(\epsilon\) after \(\lceil 16 \ln(nm/\delta)/\epsilon^2)\rceil\) iterations, where each iteration can be accomplished using \(n+m\) queries to the entries of the matrix \(A\) and \(\mathcal{O}\left( n+m \right)\) total time. An important subroutine of each iteration is a Gibbs sampling step for a diagonal matrix (a special case of the general quantum Gibbs sampling problem in which any Hermitian matrix is allowable). When the matrix \(A\) is sparse, the number of queries can be reduced to \(2s\), where \(s\) is the maximum number of nonzero entries in a row or column of \(A\), and the total time can be reduced to \(\mathcal{O}\left( s\ln(mn) \right)\).

The quantum algorithm assumes coherent access to the matrix entries of \(A\). Through amplitude amplification and the related subroutines of amplitude estimation and minimum finding, the quantum algorithm of [1] speeds up the Gibbs sampling task and reduces the maximum cost of an iteration to \(\tilde{\mathcal{O}}(\sqrt{n+m}/\epsilon)\) queries to the matrix elements of \(A\) and an equal amount of time complexity, where \(\tilde{\mathcal{O}}\) notation suppresses logarithmic factors. In the case that the matrices are sparse, the maximum cost of an iteration is reduced to \(\tilde{\mathcal{O}}(\sqrt{s}/\epsilon^{1.5})\). The work of [3] introduces a technique called dynamic Gibbs sampling, which exploits the fact that the distribution to be sampled changes slowly from iteration to iteration, and further reduces the iteration cost to \(\tilde{\mathcal{O}}(\sqrt{n+m}/\epsilon^{1/2} +1/\epsilon)\) in the dense case. This gives a total query and time complexity roughly given by

\[\begin{align} \text{dense:} \;\;\; & \left(\frac{16\ln(nm)}{\epsilon^2} \text{ iterations}\right)\times \left(\tilde{\mathcal{O}}\left(\frac{\sqrt{n+m}}{\sqrt{\epsilon}} + \frac{1}{\epsilon}\right) \text{ per iteration}\right) = \tilde{\mathcal{O}}\left(\frac{\sqrt{n+m}}{\epsilon^{2.5}} + \frac{1}{\epsilon^3}\right) \\ \text{sparse:} \;\;\; & \left(\frac{16\ln(nm)}{\epsilon^2} \text{ iterations}\right)\times \left(\tilde{\mathcal{O}}\left(\frac{\sqrt{s}}{\epsilon^{1.5}}\right) \text{ per iteration}\right) = \tilde{\mathcal{O}}\left(\frac{\sqrt{s}}{\epsilon^{3.5}}\right) & \end{align}\]

This complexity assumes access to a quantum random access memory (QRAM). Without a QRAM, the cost per iteration increases by a factor \(\tilde{\mathcal{O}}(1/\epsilon^2)\).

See also [4], which independently from [1] gave a quantum algorithm that solves zero-sum games with slightly worse \(\epsilon\) dependence, as well as [5], which gave quantum algorithms for generalizations of zero-sum games to other vector norms.

Existing error corrected resource estimates

There are no existing error corrected resource estimates for this algorithm.

Caveats

Due to poor dependence of the complexity on the error \(\epsilon\), this algorithm is only likely to be useful in situations where it is not necessary to learn the optimal strategy to high precision. It is unclear when such situations arise in practice.
As mentioned above, if no QRAM is available, the runtime suffers a \(\tilde{\mathcal{O}}(1/\epsilon^2)\) time slowdown.
A fully end-to-end analysis should also consider the exact way that the queries to the matrix entries of \(A\) are implemented. If they are given in a classical database, a large \(\mathcal{O}\left( nm \right)\)-size QRAM may also be required to implement the queries in \(\text{polylog}(mn)\) time. Note that this would be separate from the \(\tilde{\mathcal{O}}(1/\epsilon^2)\)-size QRAM the algorithm uses to reduce the time complexity. To avoid the QRAM requirement for implementing a query, it must be the case that the matrix entries are efficiently computable in some other way.

Comparable classical complexity and challenging instance sizes

The classical version of the quantum algorithm has time and query complexity given by [1, Section 2]

\[\begin{align} \text{dense:} \;\;\; & \left(\frac{16\ln(nm)}{\epsilon^2} \text{ iterations}\right)\times \left(\mathcal{O}\left( n+m \right) \text{ per iteration}\right) = \tilde{\mathcal{O}}\left(\frac{n+m}{\epsilon^2}\right) \\ \text{sparse:} \;\;\; & \left(\frac{16\ln(nm)}{\epsilon^2} \text{ iterations}\right)\times \left(\tilde{\mathcal{O}}(s) \text{ per iteration}\right) = \tilde{\mathcal{O}}\left(\frac{s}{\epsilon^{2}}\right) & \end{align}\]

Alternatively, the problem could be solved using other approaches for solving the associated LP. Classical interior point methods for LPs can achieve \(\mathcal{O}\left( n^{\omega}\log(1/\epsilon) \right)\) runtime in the common case that \(m=\mathcal{O}\left( n \right)\) [6], where \(\omega < 2.37\) is the matrix-multiplication exponent. This runtime exhibits better \(\epsilon\) dependence at the expense of worse \(n\) dependence. Note that quantum interior point methods have also been proposed for conic programs like LPs, but whether they could yield a speedup over classical interior point methods would depend on the scaling of certain instance-specific parameters.

Speedup

The quantum complexity has a quadratic improvement in complexity with respect to the parameter \(n+m\), and a polynomial slowdown with respect to the parameter \(\epsilon\).

Outlook

It is difficult to assess whether a practical advantage could be obtained in the setting of zero-sum games without further investigation of how queries to matrix elements are accomplished, an assessment of constant factors involved in the algorithm, and consideration of any additional overheads from fault-tolerant quantum computation. The theoretical speedup available is quadratic and may require a medium or large-scale QRAM. This speedup may not be sufficiently large to overcome these overheads in practice.

It is perhaps instructive to compare the outlook of zero-sum games to conic programming more generally. On the one hand, unlike the algorithm for general SDPs and LPs, the algorithm for zero-sum games does not have a complexity dependence on instance-specific parameters denoting the size of the primal and dual solutions. This makes it easier to evaluate the runtime of the algorithm and more likely that it can be an effective algorithm. On the other hand, a core subroutine of the quantum algorithm is to perform classical Gibbs sampling quadratically faster than a classical computer can using techniques like amplitude amplification. However, it is not clear how the speedup could be made greater than quadratic, even in special cases. A similar subroutine is required in the multiplicative weights approach to solving SDPs, but in that case, the Gibbs state to be sampled is a truly quantum state (i.e. nondiagonal in the computational basis), rather than a classical state. Using more advanced methods for Gibbs sampling, it is possible that in some special cases there could be a superquadratic quantum speedup for SDPs that would not be available for the simpler case of LPs and zero-sum games.

Bibliography

Joran van Apeldoorn and András Gilyén. Quantum algorithms for zero-sum games. arXiv: https://arxiv.org/abs/1904.03180, 2019.
Michael D. Grigoriadis and Leonid G. Khachiyan. A sublinear-time randomized approximation algorithm for matrix games. 18(2):53–58, 1995. doi:10.1016/0167-6377(95)00032-0.
Adam Bouland, Yosheb Getachew, Yujia Jin, Aaron Sidford, and Kevin Tian. Quantum speedups for zero-sum games via improved dynamic gibbs sampling. arXiv: https://arxiv.org/abs/2301.03763, 2023.
Tongyang Li, Shouvanik Chakrabarti, and Xiaodi Wu. Sublinear quantum algorithms for training linear and kernel-based classifiers. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, 3815–3824. PMLR, 6 2019. arXiv: https://arxiv.org/abs/1904.02276. URL: https://proceedings.mlr.press/v97/li19b.html.
Tongyang Li, Chunhao Wang, Shouvanik Chakrabarti, and Xiaodi Wu. Sublinear classical and quantum algorithms for general matrix games. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, volume 35, 8465–8473. 5 2021. arXiv: https://arxiv.org/abs/2012.06519. URL: https://ojs.aaai.org/index.php/AAAI/article/view/17028, doi:10.1609/aaai.v35i10.17028.
Michael B. Cohen, Yin Tat Lee, and Zhao Song. Solving linear programs in the current matrix multiplication time. Journal of the ACM, 1 2021. arXiv: https://arxiv.org/abs/1810.07896. URL: https://doi.org/10.1145/3424305, doi:10.1145/3424305.