Skip to content

Topological data analysis

Overview

In topological data analysis, we aim to compute the dominant topological features (connected components, and \(k\)-dimensional holes) of data points sampled from an underlying topological manifold (given a length scale at which to view the data) or of a graph. These features may be of independent interest (e.g., the number of connected components in the matter distribution in the universe) or can be used as generic features to compare datasets. Quantum algorithms for this problem leverage the ability of a register of qubits to efficiently store a state representing the system. This leads to quantum algorithms that are more efficient than known classical algorithms, in some regimes.

Actual end-to-end problem(s) solved

We compute to accuracy \(\epsilon\) the Betti numbers \(\beta_k^i\) (the number of \(k\)-dimensional holes at a given length scale \(i\)) or the persistent Betti numbers \(\beta_k^{i,j}\) (the number of \(k\)-dimensional holes that survive from scale \(i\) to scale \(j\)) of a simplicial complex built from datapoints sampled from an underlying manifold. The simplicial complex is a higher dimensional generalization of a graph, constructed by connecting datapoints within a given length scale of each other. A simplicial complex constructed in this way is known as a clique complex or a Vietoris-Rips complex.

The persistent Betti number \(\beta_k^{i,j}\) is the quantity of primary interest for point cloud data, as it is unclear a priori what the 'true' length scale of the manifold is, and noise present in the data may lead to a large number of short-lived holes. The longest-lived features are considered to be the dominant topological features. The births and deaths of features are typically plotted on a "persistence diagram." Different datasets can be compared by using stable distance measures between their diagrams, or vectorising the diagrams and using kernel methods or neural networks. For graphs, the length scale is not required, and so \(\beta_k^i\) can be of interest. For statements common to both \(\beta_k^{i,j}\) and \(\beta_k^i\), we will use the notation \(\beta_k^*\). Practical applications typically consider low values of \(k\), motivated both by computational cost, and interpretability.

Dominant resource cost/complexity

The quantum algorithms [1, 2, 3, 4, 5] for computing \(\beta_k^*\) actually return these quantities normalized by the number of \(k\)-simplices present in the complex at the given length scale, \(|S_k^i|\). For a complex with \(N\) datapoints, we can either use \(N\) qubits to store the simplicial complex, or \(\mathcal{O}\left( k\log(N) \right)\) when \(k \ll N\). Quantum algorithms have two subroutines:

  1. Finding \(k\)-simplices present in the complex at the given length scale (using either classical rejection sampling or Grover's algorithm), which in the best case scales as \(\sqrt{\binom{N}{k+1}/|S_k^i|}\).
  2. Projecting onto the eigenspace of an operator that encodes the topology (using either quantum phase estimation or quantum singular value transformation). This introduces a dependence on the gap(s) \(\Lambda\) of the operator(s) used to encode the topology.

The most efficient approaches use amplitude estimation to compute \(\sqrt{\beta_k^*/|S_k^i|}\) to additive error \(\delta\) with complexity \(\mathcal{O}\left( \delta^{-1} \right)\). The most expensive subroutines within the quantum algorithms are the membership oracles that determine if a given simplex is present in the complex, the cost of which we denote by \(m_k\). The overall cost of the most efficient known approaches to compute \(\beta_k^*\) to constant additive error \(\Delta\) is approximately

\[\begin{equation} \frac{m_k \sqrt{\beta_k^*}}{\Delta} \left( \sqrt{\binom{N}{k+1}} + \frac{\sqrt{|S_k^i|} \mathrm{poly}(N,k)}{\Lambda} \right) . \end{equation}\]

The quantum algorithm must be repeated at all pairs of length scales to compute the persistence diagram.

Existing error corrected resource estimates

In [4] the gate depth (and non-Clifford gate depth) of all subroutines (including explicit implementations of the membership oracles) was established for computing \(\beta_k^{i,j}\) and \(\beta_k^i\). However that reference did not consider a final compilation to \(T\)/Toffoli gates for concrete problems of interest.

In [5] the Toffoli complexity of estimating \(\beta_k^i\) was determined. The Toffoli complexity for estimating \(\beta_k^i\) to relative error (rather than constant error), for a family of graphs with large \(\beta_k^i\), was determined for \(k=4,8,16,32\) and \(N \leq 10^4\). The resulting Toffoli counts ranged from \(10^8\) (\(N=100, k=4\)) to \(10^{17}\) (\(N=10^4, k=32\)), using \(N\) logical qubits.

Caveats

Quantum algorithms are unable to achieve exponential speedups for estimating \(\beta_k^*\) to constant additive error. This is because they must efficiently find simplices in the complex (thus \(|S_k^i|\) must be large), but they return \(\beta_k^*/|S_k^i|\), which means the error must be rescaled by \(|S_k^i|\) to achieve constant error. More rigorously, [6] showed that determining if the Betti number of a (clique-dense) clique complex is nonzero is QMA\(_1\)-hard. Thus, quantum algorithms should not be expected to provide exponential speedups for (persistent) Betti number estimation. In [7] it was shown that estimating normalized quasi-Betti numbers (which accounts for miscounting low-lying but nonzero singular values) of general cohomology groups is DQC1-hard1. The hardness of estimating normalized (persistent) Betti numbers of a clique complex, subject to a gap assumption of \(\Lambda = \Omega(1/\mathrm{poly}(N))\)—which is the problem solved by existing quantum algorithms—has not been established (see [7, Sec. 1.1]).

Quantum algorithms also depend on the eigenvalue gap(s) \(\Lambda\) of the operator(s) that encode the topology. The scaling of these gaps has not been studied for typical applications.

Finally, typical applications consider dimension \(k \leq 3\). It is unclear whether this is because larger values of \(k\) are uninteresting, or because they are too expensive to compute classically.

Comparable classical complexity and challenging instance sizes

While classical algorithms are technically efficient for constant dimension \(k\), they are limited in practice. For a number of benchmark calculations on systems with up to \(\mathcal{O}\left( 10^9 \right)\) simplices we refer to [8].

The 'textbook' classical algorithm for \(\beta_k^*\) scales as \(\mathcal{O}\left( |S_{k+1}^j|^\omega \right)\) where \(\omega \approx 2.4\) is the cost of matrix multiplication [9]. In practice the cost is considered closer to \(\mathcal{O}\left( |S_{k+1}^j| \right)\) due to sparsity in the complex [9] (well studied classical heuristics that sparsify the complex can also be used to achieve this scaling [10]). The textbook algorithm only needs to be run once to compute the persistence diagram.

Classical algorithms based on the power method [11] scale approximately as

\[\begin{equation} \widetilde{\mathcal{O}}\left( \frac{|S_k^i| (k^2 \beta_k^i + k (\beta_k^i)^2)}{\Lambda} \log\left(\frac{1}{\Delta}\right) \right) \end{equation}\]

to compute \(\beta_k^i\) to additive error \(\Delta\). This is only quadratically worse than the quantum algorithm for \(|S_k^i| = \mathcal{O}\left( \binom{N}{k+1} \right)\). The power method has recently been extended to compute persistent Betti numbers, with a similar complexity [4]. The power method is more efficient than the rigorous textbook classical algorithm described above, but it must be repeated for each pair of length scales to compute the persistence diagram, which is a disadvantage in practice.

Recently, randomized classical algorithms have been proposed for estimating \(\beta_k^i/|S_k^i|\) to additive error [5, 12]. The algorithm of [12] runs in polynomial time for clique complexes for constant gap \(\Lambda\) and error \(\Delta = 1/\mathrm{poly}(N)\) (or \(\Delta\) constant and \(\Lambda = \mathcal{O}\left( 1/\log(N) \right)\)).

Speedup

For the task of computing \(\beta_k^{i,j}\) to constant additive error, quantum algorithms can achieve an almost quintic speedup over the rigorous scaling of the textbook classical algorithm for large \(k\) (subject to the dependence of the gap parameters on \(N\)). For a dimension sufficiently low to be studied classically, \(k=3\), the speedup would be approximately cubic, subject to the gap dependence. However, when compared against the aforementioned observed scaling of the textbook classical algorithm of \(\mathcal{O}\left( |S_{k+1}^j| \right)\) (or against classical heuristics that achieve this scaling) the quantum speedup is reduced to (sub)-quadratic for all \(k\), even before considering the gap dependence. Moreover, the quantum algorithm has large constant factor overheads from the precision \(\Delta\) and the number of repetitions to compute the persistence diagram.

A more apples-to-apples comparison between the quantum algorithm and the power method shows that the quantum algorithm is only quadratically better than rigorous classical algorithms [11, 4].

For the task of computing \(\beta_k^i\) to relative error, graphs have been found for which the quantum algorithm provides superpolynomial [5] or quartic [5, 13] speedups over both the power method and the heuristic/observed scaling of the textbook approach. As noted above, this task can also be addressed with recent randomized classical algorithms [5, 12]. The algorithm of [12] runs in polynomial time for clique complexes with constant gap \(\Lambda\) and error \(\Delta = 1/\mathrm{poly}(N)\) (or \(\Delta\) constant and \(\Lambda = \mathcal{O}\left( (1/\log(N) \right)\)). These are more restrictive conditions than quantum algorithms (which can simultaneously have both \(\Lambda, \Delta = \mathcal{O}\left( 1/\mathrm{poly}(N) \right)\)).

NISQ implementations

In [14] a NISQ-friendly compilation of the quantum algorithm described above was proposed, trading deep quantum circuits for many repetitions of shallower circuits, which comes at the cost of worsening the asymptotic scaling of the algorithm (see the table in [4] for a quantitative comparison). A proof-of-principle experiment was performed [14]. In [7] it was shown that the TDA problem can be mapped to a fermionic Hamiltonian, and it was proposed to use the variational quantum eigensolver to find the ground states of this Hamiltonian (the degeneracy of which gives \(\beta_k^i\)). It is unclear what ansatz circuits one should use to make this approach advantageous compared to classical algorithms, as naive (e.g., random) trial states would have exponentially small overlap with the target states.

Outlook

Given the large overheads induced by error correction, it seems unlikely that the quantum algorithms for computing (persistent) Betti numbers to constant additive error will achieve practical advantage for current calculations of interest. This is because the quantum speedup over classical approaches is only quadratic for this task, and classical algorithms are efficient for the \(k \leq 3\) regime typically considered.

If more datasets can be identified where the high-dimensional (persistent) Betti numbers are large and practically interesting to compute to relative error, then quantum algorithms may be of practical relevance. We refer to [15] for a recent survey of applications of TDA.

Bibliography

  1. Seth Lloyd, Silvano Garnerone, and Paolo Zanardi. Quantum algorithms for topological and geometric analysis of data. Nature Communications, 7(1):1–7, 2016. arXiv: https://arxiv.org/abs/1408.3106. doi:https://doi.org/10.1038/ncomms10138.

  2. Sam Gunn and Niels Kornerup. Review of a quantum algorithm for betti numbers. arXiv: https://arxiv.org/abs/1906.07673, 2019.

  3. Ryu Hayakawa. Quantum algorithm for persistent betti numbers and topological data analysis. Quantum, 6:873, 12 2022. arXiv: https://arxiv.org/abs/2111.00433. URL: https://doi.org/10.22331/q-2022-12-07-873, doi:10.22331/q-2022-12-07-873.

  4. Sam McArdle, András Gilyén, and Mario Berta. A streamlined quantum algorithm for topological data analysis with exponentially fewer qubits. arXiv: https://arxiv.org/abs/2209.12887, 2022.

  5. Dominic W Berry, Yuan Su, Casper Gyurik, Robbie King, Joao Basso, Alexander Del Toro Barba, Abhishek Rajput, Nathan Wiebe, Vedran Dunjko, and Ryan Babbush. Quantifying quantum advantage in topological data analysis. arXiv: https://arxiv.org/abs/2209.13581, 2022.

  6. Marcos Crichigno and Tamara Kohler. Clique homology is qma1-hard. arXiv: https://arxiv.org/abs/2209.11793, 2022.

  7. Chris Cade and P Marcos Crichigno. Complexity of supersymmetric systems and the cohomology problem. arXiv: https://arxiv.org/abs/2107.00011, 2021.

  8. Nina Otter, Mason A Porter, Ulrike Tillmann, Peter Grindrod, and Heather A Harrington. A roadmap for the computation of persistent homology. EPJ Data Science, 6:1–38, 2017. arXiv: https://arxiv.org/abs/1506.08903. doi:10.1140/epjds/s13688-017-0109-5.

  9. Nikola Milosavljević, Dmitriy Morozov, and Primoz Skraba. Zigzag persistent homology in matrix multiplication time. In Proceedings of the 27th Annual Symposium on Computational geometry, 216–225. 2011. doi:https://doi.org/10.1145/1998196.1998229.

  10. Konstantin Mischaikow and Vidit Nanda. Morse theory for filtrations and efficient computation of persistent homology. Discrete & Computational Geometry, 50(2):330–353, 2013. doi:https://doi.org/10.1007/s00454-013-9529-6.

  11. Joel Friedman. Computing betti numbers via combinatorial laplacians. Algorithmica, 21(4):331–346, 1998. doi:https://doi.org/10.1007/PL00009218.

  12. Simon Apers, Sayantan Sen, and Dániel Szabó. A (simple) classical algorithm for estimating betti numbers. arXiv: https://arxiv.org/abs/2211.09618, 2022.

  13. Alexander Schmidhuber and Seth Lloyd. Complexity-theoretic limitations on quantum algorithms for topological data analysis. arXiv: https://arxiv.org/abs/2209.14286, 2022.

  14. Ismail Yunus Akhalwaya, Shashanka Ubaru, Kenneth L Clarkson, Mark S Squillante, Vishnu Jejjala, Yang-Hui He, Kugendran Naidoo, Vasileios Kalantzis, and Lior Horesh. Exponential advantage on noisy quantum computers. arXiv: https://arxiv.org/abs/2209.09371, 2022.

  15. Felix Hensel, Michael Moor, and Bastian Rieck. A survey of topological machine learning methods. Frontiers in Artificial Intelligence, 4:681108, 2021. doi:10.3389/frai.2021.681108.

  16. E. Knill and R. Laflamme. Power of one bit of quantum information. Physical Review Letters, 81:5672–5675, 12 1998. arXiv: https://arxiv.org/abs/quant-ph/9802037. URL: https://link.aps.org/doi/10.1103/PhysRevLett.81.5672, doi:10.1103/PhysRevLett.81.5672.


  1. DQC1 is a complexity class that is physically motivated by the "one clean qubit model\" [16]. This model has a single pure state qubit which can be initialized, manipulated and measured freely, as well as \(N-1\) maximally mixed qubits.