Preparing states from classical data

Rough overview (in words)

An important subroutine in many quantum algorithms is preparing a quantum state given a list of its amplitudes stored, for example, in a classical database.¹ The upshot is that \(N\) amplitudes, which require \(\mathcal{O}\left( N \right)\) classical bits to write down, can be encoded in a quantum state with only \(\log_2(N)\) qubits, an exponential compression in memory. However, there are caveats; for example, simple information-theoretic bounds [1] dictate that the quantum circuit that prepares the \(\log_2(N)\)-qubit state must still have at least \(\mathcal{O}\left( N \right)\) gates, so no exponential advantage in gate complexity is possible. Depending on which resource is being optimized, the best protocol for state preparation will look different and optimal state preparation methods are known for most choices of metric.

Rough overview (in math)

Let \(x = (x_0,\ldots,x_{N-1}) \in \mathbb{C}^N\) be a vector of \(N\) complex numbers, and let

\[\begin{equation} \label{eq:ket_psi_state_prep} \ket{\psi} = \frac{1}{\lVert x \rVert}\sum_{i=0}^{N-1} x_i \ket{i} \end{equation}\]

be the associated normalized quantum state, where \(\lVert x \rVert\) denotes the standard Euclidean vector norm. Let \(n=\log_2(N)\) denote the number of qubits of \(\ket{\psi}\). The goal is to prepare the state \(\ket{\psi}\) by applying a quantum circuit to the state \(\ket{0}^{\otimes n}\), a problem studied extensively in previous literature. A common approach, originating in [2], is to iterate through each of the \(n\) qubits and perform a single-qubit rotation, with the angle of rotation determined by the setting of the previous qubits. The rotation on the first qubit creates the 1-qubit state

\[\begin{equation} \left(\sqrt{\sum\nolimits_{i=0}^{N/2-1} |x_i|^2 }\right)\ket{0} + \left(\sqrt{\sum\nolimits_{i=N/2}^{N-1} |x_i|^2 }\right) \ket{1} \end{equation}\]

by performing a single-qubit rotation (about the \(Y\) axis) on the state \(\ket{0}\) by an appropriate angle. Next, a similar kind of single-qubit rotation is performed on the second qubit, where the angle of rotation is conditioned on whether the first qubit is \(\ket{0}\) or \(\ket{1}\). The \(m\)th rotation is by one of \(2^{m-1}\) angles, depending on the setting of the first \(m-1\) qubits. Thus, in total there are \(1+2+\ldots+2^{n-1} = N-1\) total angles that might be used for single-qubit rotations. This sequence of operations prepares the state \(\lVert x\rVert^{-1}\sum_{i=0}^{N-1}\lvert x_i \rvert \ket{i}\). To apply the phases, a single qubit rotation about the \(Z\)-axis by the appropriate angle is performed—the angle depends on the setting of all \(n\) qubits, corresponding to the \(N=2^n\) different phases that might be needed. Thus, the total number of angles that define the protocol is \(2N-1\), exactly corresponding to the number of real parameters needed to describe the general state in Eq. \(\eqref{eq:ket_psi_state_prep}\).

It remains to describe how the controlled single-qubit rotations are performed when there are many control bits and different angles for each setting of the control. Here, one has many choices and the exact method will depend on how one has access to the data in \(x\) and what resource is being optimized. The most straightforward way is to iterate through each possible setting of the control bits and perform a multiply controlled rotation by a fixed angle for each in sequence. This approach requires \(\mathcal{O}\left( N \right)\) gates spread over \(\mathcal{O}\left( N \right)\) depth, as depicted in Fig. 1. When ancilla qubits are available, one can design protocols that have shallower depth (but the same total number of gates). For example, to perform the controlled rotation, one might store the \(2N-1\) angles needed to create the state in a quantum random access memory data structure. In this case, to perform a rotation, one need only read in the value of the angle from the QRAM into an ancilla register, then perform a rotation by the angle stored in memory. This way, one can apply the correct angle in one shot, rather than iterating through all possible angles.

Assuming one can perform arbitrary single-qubit gates to exact precision, it is possible to prepare the state \(\ket{\psi}\) exactly. However, often one must design circuits from a discrete gate set, such as Clifford gates and \(T\) gates, for example, when compiling into a gate sequence that can be implemented fault tolerantly. When this is the case, single-qubit rotations must be performed approximately: to approximate a single-qubit rotation to error \(\epsilon\), a Clifford+\(T\) sequence of length \(\mathcal{O}\left( \log(1/\epsilon) \right)\) must be applied [4].

Dominant resource cost (gates/qubits)

In the table below, we collect several state preparation results in the model where any single-qubit gate can be performed exactly and the only multi-qubit gates allowed are CNOTs. Each result is state-of-the-art in some parameter regime. The circuit size (i.e., the total number of single-qubit and CNOT gates) and depth (i.e., the number of parallel-acting layers of gates), as well as the number of ancilla qubits (i.e. the number of qubits beyond the \(n\) qubits needed to hold the state \(\ket{\psi}\)) are listed.

Ref.	Circuit size	Circuit depth	Ancilla qubits
[3, 5]	\(\mathcal{O}\left( 2^n \right)\)	\(\mathcal{O}\left( \frac{2^n}{n} \right)\)	none
[3, 5]	\(\mathcal{O}\left( 2^n \right)\)	\(\mathcal{O}\left( \frac{2^n}{m+n} \right)\)	\(m \in [0,\mathcal{O}\left( 2^n/n \right)]\)
[3, 6, 7]	\(\mathcal{O}\left( 2^n \right)\)	\(\mathcal{O}\left( n \right)\)	\(\mathcal{O}\left( 2^n \right)\)

Note that the result of [5], which shows depth \(\mathcal{O}\left( 2^n/(m+n) \right)\) using \(m\) ancilla qubits for \(m \leq \mathcal{O}\left( 2^n/n \right)\), encompasses all other results in the table (and is superior to the third row as it uses \(\mathcal{O}\left( 2^n/n \right)\) ancilla qubits instead of \(\mathcal{O}\left( 2^n \right)\)). We include the other results for completeness, as they are distinct constructions and can have other potential upsides.

A lower bound of \(\Omega(2^n)\) is known for circuit size [1], so all of the results above are size optimal up to constant factors. Moreover, for any \(m\), a lower bound of \(\Omega(\max(n,2^n/(n+m)))\) is known for the circuit depth [3], so all of the results above are also optimal in circuit depth, up to constant factors.

For approximate state preparation in a discrete gate set such as \(\{H,S,T,\mathrm{CNOT}\}\), the state \(\ket{\psi}\) is prepared up to \(\epsilon\) error, measured by the \(\ell_2\)-norm, and the circuit size and depth will depend on \(\epsilon\). In this case, we have the following table of results.

Ref.	Circuit size	Circuit depth	Ancilla qubits
[3]	\(\mathcal{O}\left( 2^n \log(2^n/\epsilon) \right)\)	\(\mathcal{O}\left( \frac{2^n \log(2^n/\epsilon)}{n} \right)\)	none
[3]	\(\mathcal{O}\left( 2^n \log(2^n/\epsilon) \right)\)	\(\mathcal{O}\left( \frac{2^n\log(2^n/\epsilon)}{m+n} \right)\)	\(m \in [0,\mathcal{O}\left( \frac{2^n}{n \log(n)} \right)]\)
[7]	\(\mathcal{O}\left( 2^n\log(n/\epsilon) \right)\)	\(\mathcal{O}\left( n +\log(1/\epsilon) \right)\)	\(\mathcal{O}\left( 2^n \right)\)

If the state \(\ket{\psi}\) is sparse, meaning that only \(d\) of the \(N\) amplitudes are nonzero, then more efficient state preparation methods are known. In particular, [6] gave a circuit of depth \(\mathcal{O}\left( \log(nd) \right)\) that uses only \(\mathcal{O}\left( nd\log(d) \right)\) ancilla qubits, a great improvement over the general case when \(d \ll N\).

In some fault-tolerant implementation schemes, such as lattice surgery using surface codes, Clifford gates can be performed cheaply, while \(T\) gates require the expensive process of magic state distillation. While \(\Omega(2^n\log(1/\epsilon)/\log(n))\) total gates are necessary to approximately create \(\ket{\psi}\) [8, Eq. 4.85] (matching upper bounds from [7] up to \(\mathrm{polylog}(n)\) factors), [9] noted that it is possible for most of these to be Clifford gates. The number of \(T\) gates can be reduced to \(\sqrt{2^n}\log(2^n/\epsilon)\) using \(\sqrt{2^n}\log(1/\epsilon)\) ancillas (in fact, there is a smooth tradeoff between the \(T\) count and the number of ancillas). Furthermore, these ancillas can be dirty, meaning they can be initialized into any quantum state, so long as they are returned to this (potentially unknown) state at the end of the procedure.

All of the above constructions are "garbage-free" state preparation protocols, because they prepare the state \(\ket{\psi}\) exactly and all ancilla qubits are returned to their initial state. However, in some applications, it is allowable to leave the ancilla register entangled with the data as long as the coefficients are correct. That is, one prepares the state

\[\begin{equation} \frac{1}{\lVert x\rVert}\sum_{i=0}^{N-1} x_i \ket{i} \otimes \ket{\mathrm{garbage}_i}\,. \end{equation}\]

In this setting, [10, Sec. IIID], en route to giving better algorithms for the electronic structure problem, gave a construction that approximately prepares the state above using only \(\mathcal{O}\left( 2^n+\log(1/\epsilon) \right)\) \(T\) gates and \(\mathcal{O}\left( \log(N) \right)\) ancilla qubits, albeit still requiring \(\mathcal{O}\left( N\log(1/\epsilon) \right)\) Clifford gates and \(\mathcal{O}\left( \log(N/\epsilon) \right)\) ancillas. In [10] it is presented with \(\mathcal{O}\left( N \right)\) depth but could be improved to \(\mathcal{O}\left( \log(N) \right)\) depth at the expense of additional ancillas, using log-depth constructions for QRAM. This method can also make use of the spacetime tradeoffs mentioned above, as discussed in [9, 11].

Caveats

Classical pre-processing: computing the circuits for preparing \(\ket{\psi}\) given the list of \(N\) coefficients \(x\) can be a non-negligible classical cost. For example, computing each of the \(\mathcal{O}\left( N \right)\) single-qubit rotation angles requires computing sums and evaluating trigonometric functions, which can be done to precision \(\epsilon\) in \(\mathrm{polylog}(1/\epsilon)\) classical time. Moreover, computing Clifford+\(T\) gate sequences that approximate given rotation angles to error \(\epsilon\) likewise requires \(\mathrm{polylog}(1/\epsilon)\) classical time [4]. The total classical work scales as \(O(N\mathrm{polylog}(1/\epsilon))\), although this cost can be parallelized.
Coherent arithmetic: to avoid some of the classical pre-processing, one might try to perform the arithmetic coherently. This might be unavoidable if the entries of \(x\) arrive in an online fashion and rotation angles and other quantities need to be computed after superpositions have been created. Formally, the scaling of coherent arithmetic is mild, generally requiring just \(\mathrm{polylog}(N,1/\epsilon)\) number of gates and ancilla qubits, but in practice this is likely to be expensive (e.g., known methods for coherently computing \(\arcsin(\cdot)\) to nine bits of precision use order-\(10^4\) Toffoli gates and more than 100 ancilla qubits [12]). See [13] for a general black-box approach that avoids coherent arithmetic.
Too many ancilla qubits: achieving depths that scale logarithmically with \(N\) requires \(\mathcal{O}\left( N \right)\) ancilla qubits, which limits the size of \(N\) that might be practical. This could be mitigated if it is possible to develop a large-scale hardware element specialized for performing the sort of circuits that arise in these protocols, similar to a quantum random access memory.
Long-range gates: achieving \(\mathrm{polylog}(N)\) depth for state preparation requires \(\mathcal{O}\left( N \right)\) ancilla qubits and \(\mathcal{O}\left( N \right)\) gates, many of which act in parallel and on far-separated qubits. If spatial locality were imposed, it would likely be difficult to avoid \(\mathcal{O}\left( N \right)\) overhead in depth.
Dequantization: Consider the task of drawing samples from the same probability distribution induced by measuring \(\ket{\psi}\) in the computational basis in time \(\mathrm{polylog}(N)\) time. Preparing \(\ket{\psi}\) as described is a quantum method of doing so, but the same can be done classically by first constructing a certain classical data structure and assuming access to classical RAM [14]. In some machine learning applications, this idea leads to classical algorithms that effectively dequantize quantum algorithms that utilize the state preparation primitive [15, 16].

Example use cases

Hamiltonian simulation via linear combination of unitaries (LCU) requires a PREPARE step where a state is prepared with certain classically computed coefficients. Relatedly, the same PREPARE gadget is used to construct block-encodings of such Hamiltonians. However, in this application, state preparation with garbage is generally allowable.
In certain quantum machine learning protocols, classical data (e.g., image pixel values) are encoded into a quantum state via the so-called "amplitude encoding," where \(N\) classical features are stored in a quantum state of \(\log_2(N)\) qubits [17]. Following the preparation of the amplitude encoded data, the state is processed with the goal of, for example, classifying the image.
Creating a block-encoding of a matrix of classical data is performed using state preparation as a subroutine (more precisely, block-encoding classical data requires controlled state preparation). The block-encoding is then useful in a variety of contexts, for example in quantum interior point methods.

Bibliography

Martin Plesch and Časlav Brukner. Quantum-state preparation with universal gate decompositions. Physical Review A, 83:032302, 3 2011. arXiv: https://arxiv.org/abs/1003.5760. URL: https://link.aps.org/doi/10.1103/PhysRevA.83.032302, doi:10.1103/PhysRevA.83.032302.
Lov Grover and Terry Rudolph. Creating superpositions that correspond to efficiently integrable probability distributions. arXiv: https://arxiv.org/abs/quant-ph/0208112, 2002.
Xiaoming Sun, Guojing Tian, Shuai Yang, Pei Yuan, and Shengyu Zhang. Asymptotically optimal circuit depth for quantum state preparation and general unitary synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pages 1–1, 2023. arXiv: https://arxiv.org/abs/2108.06150. doi:10.1109/TCAD.2023.3244885.
Neil J Ross and Peter Selinger. Optimal ancilla-free clifford+ t approximation of z-rotations. arXiv: https://arxiv.org/abs/1403.2975, 2014.
Pei Yuan and Shengyu Zhang. Optimal (controlled) quantum state preparation and improved unitary synthesis by quantum circuits with any number of ancillary qubits. Quantum, 7:956, 3 2023. arXiv: https://arxiv.org/abs/2202.11302. URL: https://doi.org/10.22331/q-2023-03-20-956, doi:10.22331/q-2023-03-20-956.
Xiao-Ming Zhang, Tongyang Li, and Xiao Yuan. Quantum state preparation with optimal circuit depth: implementations and applications. Physical Review Letters, 129:230504, 11 2022. arXiv: https://arxiv.org/abs/2201.11495. URL: https://link.aps.org/doi/10.1103/PhysRevLett.129.230504, doi:10.1103/PhysRevLett.129.230504.
Kaiwen Gui, Alexander M Dalzell, Alessandro Achille, Martin Suchara, and Frederic T Chong. Spacetime-efficient low-depth quantum state preparation with applications. arXiv: https://arxiv.org/abs/2303.02131, 2023.
Michael A. Nielsen and Isaac L. Chuang. Quantum computation and quantum information. Cambridge University Press, 2000. doi:10.1017/CBO9780511976667.
Guang Hao Low, Vadym Kliuchnikov, and Luke Schaeffer. Trading t-gates for dirty qubits in state preparation and unitary synthesis. arXiv: https://arxiv.org/abs/1812.00954, 2018.
Ryan Babbush, Craig Gidney, Dominic W. Berry, Nathan Wiebe, Jarrod McClean, Alexandru Paler, Austin Fowler, and Hartmut Neven. Encoding electronic spectra in quantum circuits with linear t complexity. Physical Review X, 8(4):041015, 2018. arXiv: https://arxiv.org/abs/1805.03662. doi:10.1103/PhysRevX.8.041015.
Dominic W. Berry, Craig Gidney, Mario Motta, Jarrod R. McClean, and Ryan Babbush. Qubitization of arbitrary basis quantum chemistry leveraging sparsity and low rank factorization. Quantum, 3:208, 12 2019. arXiv: https://arxiv.org/abs/1902.02134. URL: https://doi.org/10.22331/q-2019-12-02-208, doi:10.22331/q-2019-12-02-208.
Thomas Häner, Martin Roetteler, and Krysta M Svore. Optimizing quantum circuits for arithmetic. arXiv: https://arxiv.org/abs/1805.12445, 2018.
Yuval R. Sanders, Guang Hao Low, Artur Scherer, and Dominic W. Berry. Black-box quantum state preparation without arithmetic. Physical Review Letters, 122:020502, 1 2019. arXiv: https://arxiv.org/abs/1807.03206. URL: https://link.aps.org/doi/10.1103/PhysRevLett.122.020502, doi:10.1103/PhysRevLett.122.020502.
Shantanav Chakraborty, András Gilyén, and Stacey Jeffery. The power of block-encoded matrix powers: improved regression techniques via faster hamiltonian simulation. In Proceedings of the 46th International Colloquium on Automata, Languages, and Programming (ICALP), 33:1–33:14. 2019. arXiv: https://arxiv.org/abs/1804.01973. doi:10.4230/LIPIcs.ICALP.2019.33.
Ewin Tang. A quantum-inspired classical algorithm for recommendation systems. In Proceedings of the 51st ACM Symposium on the Theory of Computing (STOC), 217–228. 2019. arXiv: https://arxiv.org/abs/1807.04271. doi:10.1145/3313276.3316310.
Ewin Tang. Quantum principal component analysis only achieves an exponential speedup because of its state preparation assumptions. Physical Review Letters, 127(6):060503, 2021. arXiv: https://arxiv.org/abs/1811.00414. doi:10.1103/PhysRevLett.127.060503.
Maria Schuld and Francesco Petruccione. Machine learning with quantum computers. Springer, 2021. doi:10.1007/978-3-030-83098-4.
Mihir K Bhaskar, Stuart Hadfield, Anargyros Papageorgiou, and Iasonas Petras. Quantum algorithms and circuits for scientific computing. Quantum Information and Computation, 2016. arXiv: https://arxiv.org/abs/1511.08253. doi:10.26421/QIC16.3-4-2.
Edgard Muñoz-Coreas and Himanshu Thapliyal. T-count and qubit optimized quantum circuit design of the non-restoring square root algorithm. ACM Journal on Emerging Technologies in Computing Systems, 14(3):1–15, 2018. arXiv: https://arxiv.org/abs/1712.08254. doi:https://doi.org/10.1145/3264816.
Lov K Grover. Synthesis of quantum superpositions by quantum computation. Physical Review Letters, 85(6):1334, 2000. doi:10.1103/PhysRevLett.85.1334.
Shengbin Wang, Zhimin Wang, Guolong Cui, Shangshang Shi, Ruimin Shang, Lixin Fan, Wendong Li, Zhiqiang Wei, and Yongjian Gu. Fast black-box quantum state preparation based on linear combination of unitaries. Quantum Information Processing, 20(8):1–14, 2021. arXiv: https://arxiv.org/abs/2105.06230. doi:https://doi.org/10.1007/s11128-021-03203-z.
Johannes Bausch. Fast black-box quantum state preparation. Quantum, 2022. arXiv: https://arxiv.org/abs/2009.10709. doi:https://doi.org/10.22331/q-2022-08-04-773.
Sam McArdle, András Gilyén, and Mario Berta. Quantum state preparation without coherent arithmetic. arXiv: https://arxiv.org/abs/2210.14892, 2022.
Arthur G Rattew and Bálint Koczor. Preparing arbitrary continuous functions in quantum registers with logarithmic complexity. arXiv: https://arxiv.org/abs/2205.00519, 2022.
Steven Herbert. No quantum speedup with grover–rudolph state preparation for quantum monte carlo integration. Physical Review E, 103:063302, 6 2021. arXiv: https://arxiv.org/abs/2101.02240. URL: https://link.aps.org/doi/10.1103/PhysRevE.103.063302, doi:10.1103/PhysRevE.103.063302.
Alexei Kitaev and William A Webb. Wavefunction preparation and resampling using a quantum computer. arXiv: https://arxiv.org/abs/0801.0342, 2008.
Arthur G. Rattew, Yue Sun, Pierre Minssen, and Marco Pistoia. The efficient preparation of normal distributions in quantum registers. Quantum, 5:609, 12 2021. arXiv: https://arxiv.org/abs/2009.06601. URL: https://doi.org/10.22331/q-2021-12-23-609, doi:10.22331/q-2021-12-23-609.
M. Mottonen and J. J. Vartiainen. Decompositions of general quantum gates. arXiv: https://arxiv.org/abs/quant-ph/0504100, 2005.
Vivek V. Shende, Igor L. Markov, and Stephen S. Bullock. Minimal universal two-qubit controlled-not-based circuits. Physical Review A, 69:062321, 6 2004. arXiv: https://arxiv.org/abs/quant-ph/0308033. URL: https://link.aps.org/doi/10.1103/PhysRevA.69.062321, doi:10.1103/PhysRevA.69.062321.
Scott Aaronson. Open problems related to quantum query complexity. ACM Transactions on Quantum Computing, 12 2021. arXiv: https://arxiv.org/abs/2109.06917. URL: https://doi.org/10.1145/3488559, doi:10.1145/3488559.
Gregory Rosenthal. Query and depth upper bounds for quantum unitaries via grover search. arXiv: https://arxiv.org/abs/2111.07992, 2021.

When the amplitudes are given by some well-behaved function, rather than being arbitrarily chosen, different (related) protocols are used, see Further reading below. ↩