# QuickSilver: Efficient and Affordable Zero-Knowledge Proofs for Circuits and Polynomials over Any Field 

Kang Yang<br>State Key Laboratory of Cryptology<br>yangk@sklc.org

Pratik Sarkar<br>Boston University<br>pratik93@bu.edu

Chenkai Weng<br>Northwestern University<br>ckweng@u.northwestern.edu

Xiao Wang<br>Northwestern University<br>wangxiao@cs.northwestern.edu


#### Abstract

Zero-knowledge (ZK) proofs with an optimal memory footprint have attracted a lot of attention, because such protocols can easily prove very large computation with a small memory requirement. Such ZK protocol only needs $O(M)$ memory for both parties, where $M$ is the memory required to verify the statement in the clear. In this paper, we propose several new ZK protocols in this setting, which improve the concrete efficiency and, at the same time, enable sublinear amortized communication for circuits with some notion of relaxed uniformity.


1. In the circuit-based model, where the computation is represented as a circuit over a field, our ZK protocol achieves a communication complexity of 1 field element per non-linear gate for any field size while keeping the computation very cheap.
We implemented our protocol, which shows extremely high efficiency and affordability. Compared to the previous best-known implementation, we achieve $6 \times-7 \times$ improvement in computation and $3 \times-$ $7 \times$ improvement in communication. When running on intro-level AWS instances, our protocol only needs one US dollar to prove one trillion AND gates (or 2.5 US dollars for one trillion multiplication gates over a 61 -bit field).
2. In the setting where part of the computation can be represented as a set of polynomials, we can achieve communication sublinear to the polynomial size: the communication only depends on the input size and the highest degree of all polynomials, independent of the number of polynomials and the number of multiplications in the polynomials.
Using the improved ZK protocol, we can prove matrix multiplication with communication proportional to the input size, rather than the number of multiplications. Proving the multiplication of two $1024 \times 1024$ matrices, our implementation, with one thread and 1 GB of memory, only needs 10 seconds and communicates $25 \mathrm{MB}, 35 \times$ faster than the state-of-the-art protocol Virgo that would need more than 140 GB of memory for the same task.

## 1 Introduction

Zero-knowledge (ZK) proofs allow a prover in possession of a witness $w$ to prove to a verifier that $\mathcal{C}(w)=1$ for some public circuit $\mathcal{C}$. Until recently, most works focus on non-interactive ZK proofs with a small proof size [e.g., $\mathrm{BCC}^{+} 16, \mathrm{BBB}^{+} 18, \mathrm{WTs}^{+} 18, \mathrm{BCR}^{+} 19$, BBHR19, Set20, GKR08, BFS20, ZXZS20, AHIV17, $\left.\mathrm{BFH}^{+} 20\right]$, most of which suffer from a scalability problem. Specifically, to prove a circuit with $t$ gates, these protocols require memory of size proportional to $t$. As a result, the memory constraint quickly becomes the bottleneck on the scale of computation that can be efficiently processed by a ZK proof.

Recently, ZK protocols with essentially unlimited scalability gained a lot of attention. Here, the protocol should use the minimal amount of memory possible, which is the memory required to evaluate the circuit in the clear. For example, ZK protocols based on privacy-free garbled circuits [JKO13, FNO15, HK20] satisfy this requirement and is also concretely efficient. However, due to the lower bound on garbled circuits, the best communication complexity one can hope for is $\kappa$ bits per AND gate in the gate-by-gate paradigm, where $\kappa$ is the computational security parameter. Some ZK proofs [GMO16, CDG ${ }^{+}$17, KKW18] using the "MPC-in-the-head" paradigm [IKOS07] are also streamable, thus achieving a small memory overhead, but the overall computation and communication complexities are still high. For zero-knowledge succinct non-interactive arguments of knowledge (zk-SNARKs), the only exception is to use recursive composition [BCCT13]. However, compared to the state-of-the-art memory-heavy zk-SNARKs [Set20, SL20], those using recursive composition are still orders of magnitude slower [COS20, BDFG20].

A recent promising line of work based on (subfield) Vector Oblivious Linear Evaluation [BCGI18, $\mathrm{BCG}^{+}$19b, SGRR19, $\mathrm{BCG}^{+} 19 \mathrm{a}, \mathrm{YWL}^{+}$20], or (s)VOLE for short, provides a new direction to accomplish this goal. Weng, Yang, Katz and Wang [WYKW20] proposed a ZK protocol (Wolverine) that can prove circuits over any field with the communication complexity roughly $(3 \rho / \log t+1)$ field elements per multiplication gate where $t$ is the circuit size and $\rho$ is the statistical security parameter; in the case of large fields ${ }^{1}$, the communication can be reduced to 4 field elements (or 2 field elements with much higher computational cost) per multiplication gate. Another recent work by Baum, Malozemoff, Rosen and Scholl (Mac' $\mathrm{n}^{\prime}$ Cheese) [BMRS20] proposed a similar ZK protocol for large fields with the communication cost of 3 field elements per multiplication gate. Line-Point ZK (LPZK) by Dittmer, Ishai, and Ostrovsky [DIO20] achieves seemingly optimal communication cost (i.e., one field element per multiplication gate) in the gate-by-gate paradigm, but their protocol is only for large fields and has not been studied for implementation yet. See Table 1 for details.

Challenges. The recent line of work has significantly improved the efficiency of ZK proofs with essentially unlimited scalability. However, some challenges remain that we would like to solve in this paper.

1. LPZK is great in communication but it only supports large fields. For small fields (e.g., binary fields), the best known ZK protocol Wolverine is still based on the cut-and-choose technique and thus incurs a high overhead.
2. All of the above memory-efficient protocols require communication linear to the number of multiplication gates in the circuits (except for zk-SNARKs with recursive composition). One would naturally desire concretely efficient protocols with the same memory efficiency while only requiring sublinear communication cost. Mac' $n^{\prime}$ Cheese partially addressed this issue by incorporating the "stacking" technique [HK20], but its applicability is rather limited to circuits with many branches.

### 1.1 Our Contributions

In this paper, we present a set of new ZK protocols that achieve unprecedented practical performance while being able to scale to billions and even trillions of gates without any memory issue. Our circuit-based protocol only needs to send one field element per multiplication gate (and free for addition gates) for any field size with very efficient computational efficiency. Our polynomial-based protocol only needs to send $n+d$ field elements to prove a set of degree- $d$ polynomials on $n$ input values, independent of the number of polynomials and the number of multiplications in each polynomial. Combining both protocols, we are able to achieve sublinear communication as long as the circuit satisfies a weak notion of uniformity. We elaborate our results below.

[^0]| Protocol | Boolean Circuit |  | Arithmetic Circuit |  |
| :---: | :---: | :---: | :---: | :---: |
|  | Size | Speed | Size | Speed |
| Wolverine [WYKW20] | 7 | $1.25 \mathrm{M} / \mathrm{sec}$ | 4 | $0.66 \mathrm{M} / \mathrm{sec}$ |
| Mac' ${ }^{\prime}$ 'Cheese [BMRS20] | - | - | 3 | 0.4 M/sec |
| LPZK [DIO20] | - | - | 1 | - |
| QuickSilver | 1 | $7.7 \mathrm{M} / \mathrm{sec}$ | 1 | $4.8 \mathrm{M} / \mathrm{sec}$ |

Table 1: Comparing our work (QuickSilver) with prior related work. Size represents the number of field elements to send for each multiplication gate, which is also the number of (s)VOLE correlations needed. Speed represents the number of multiplication gates that can be executed per second with unlimited bandwidth and a single thread.

Optimally efficient ZK proof for any field in the gate-by-gate paradigm. As summarized in Table 1, the recent LPZK protocol achieves the best communication cost in the gate-by-gate paradigm for large fields. However, when it comes to small fields like Boolean circuits, the cost of the best-known protocol [WYKW20] is still much high due to cut-and-choose. Our ZK protocol is developed on top of Wolverine and LPZK to achieve the best of both protocols: it needs to send one field element and a single sVOLE correlation per non-linear gate for any field (including the case of Boolean circuits). See Section 3.1 for a detailed explanation of the protocol.

Our implementation can prove Boolean circuits at a speed of 7.7 million AND gates per second (with one thread) and can prove arithmetic circuits over a 61 -bit field at a speed of 4.8 million multiplication gates per second (with one thread). The performance could be doubled using four threads (See Section 6).

Efficient ZK proof beyond the gate-by-gate paradigm. The above protocol reminds us of the story in garbled circuits. After a long history of optimizations [BMR90, NPS99, KS08, KMR14], the half-gates garbling scheme [ZRE15] achieves free for XOR gates and $2 \kappa$-bit communication per AND gate, which is also proven to be optimal when the garbling algorithm takes one gate as input at a time (a.k.a., the gate-by-gate paradigm). The lower bound soon spawned new research to bypass it by garbling not in the gate-by-gate framework. For example, Ball et al. [BMR16] observed that garbling Boolean formulas as a whole can further improve communication efficiency.

Our optimized protocol puts us in a similar situation, where the gate-by-gate paradigm limits further improvement. Inspired by the above success story in garbling, we study the case of proving polynomial satisfiability as a whole. As we discuss in Section 3.2, we find that it is possible to prove a set of degree- $d$ polynomials all on $n$ variables with communication cost of $n+d$ field elements, independent of the number of polynomials or the number of multiplications in the polynomials.

This ZK proof for polynomial satisfiability has a direct application to prove knowledge of a solution of the short-integer-solution (SIS) problem, where verifying that all secret inputs are bounded in a small range can be viewed as low-degree polynomials. Our ZK protocol could prove knowledge of a solution of the SIS problem with $8 \times$ improvement in proof size and $110 \times$ improvement in execution time, compared with the state-of-the-art implementation [WYKW20].

Sublinear ZK proof for circuits with relaxed uniformity. The above efficient polynomial-based ZK protocol is sublinear when we compare the communication cost with the number of multiplications in the polynomial set. Using this result, we can improve the efficiency of many applications and reach communication sublinear to the circuit size as long as the circuit satisfies some weak notion of uniformity. More specifically, as long as the circuit contains many sub-circuits such that their polynomial representations are all bounded by some degree $d$, our technique can reduce the communication cost for proving these sub-
circuits. For example, we can prove matrix multiplication in communication complexity of $O\left(n^{2}\right)$ where two matrices are of dimension $n \times n$, and a gate-by-gate protocol needs $O\left(n^{3}\right)$ communication. We discuss more complicated examples, as well as the general applicability in Section 3.3, which opens a new space of improvement for real-world applications.

For matrix multiplication, our implementation can prove an instance with $n=1024$, which contains about one billion multiplications over a 61 -bit field, in about 25 MB of communication and 10 seconds using just 1 GB of memory. This is $35 \times$ faster than the prior best-known implementation [ZXZS20], which needs 148 GB of memory to accomplish this task.

## 2 Preliminaries

### 2.1 Notation

We use $\kappa$ and $\rho$ to denote the computational and statistical security parameters, respectively. We use $x \leftarrow S$ to denote that sampling $x$ uniformly at random from a finite set $S$. For $n \in \mathbb{N}$, we denote by $[n]$ a set $\{1, \ldots, n\}$. For $a, b \in \mathbb{Z}$ with $a \leq b$, we write $[a, b]=\{a, \ldots, b\}$. We use bold lower-case letters like $\boldsymbol{x}$ for column vectors, and denote by $x_{i}$ the $i$-th component of vector $\boldsymbol{x}$ where $x_{1}$ is the first entry.

For an extension field $\mathbb{F}_{p^{r}}$ of a finite field $\mathbb{F}_{p}$, where $p \geq 2$ is a prime or a power of a prime and $r \geq 1$ is an integer, we fix some monic, irreducible polynomial $f(X)$ of degree $r$ and write $\mathbb{F}_{p^{r}} \cong \mathbb{F}_{p}[X] / f(X)$. Therefore, every field element $w \in \mathbb{F}_{p^{r}}$ can be denoted uniquely as $w=\sum_{h \in[r]} w_{h} \cdot X^{h-1}$ with $w_{h} \in \mathbb{F}_{p}$ for all $h \in[r]$. We could view the elements over $\mathbb{F}_{p^{r}}$ equivalently as the vectors in $\left(\mathbb{F}_{p}\right)^{r}$. When we write arithmetic expressions involving both elements of $\mathbb{F}_{p}$ and elements of $\mathbb{F}_{p^{r}}$, it is understood that field elements in $\mathbb{F}_{p}$ are viewed as the polynomials lying in $\mathbb{F}_{p^{r}}$ that have only constant terms.

A circuit $\mathcal{C}$ over a field $\mathbb{F}_{p}$ is defined by a set of input wires $\mathcal{I}_{\text {in }}$, along with a list of gates of the form $(\alpha, \beta, \gamma, T)$, where $\alpha, \beta$ are the indices of the input wires of the gate, $\gamma$ is the index of the output wire of the gate, and $T \in\{$ Add, Mult $\}$ is the type of the gate. For proving the satisfiability of a circuit $\mathcal{C}$, we consider that there is only one output wire in circuit $\mathcal{C}$. If $p=2$, then $\mathcal{C}$ is a Boolean circuit with Add $=\oplus$ and Mult $=\wedge$. If $p>2$ is a prime or a power of a prime, then $\mathcal{C}$ is an arithmetic circuit where Add/Mult corresponds to addition/multiplication in $\mathbb{F}_{p}$. We let $t$ denote the number of multiplication gates in the circuit, and let $n=\left|\mathcal{I}_{\text {in }}\right|$.

A set of $t$ multivariable polynomials over $\mathbb{F}_{p}$ is denoted as $\left\{f_{1}, \ldots, f_{t}\right\}$, where each polynomial has degree at most $d$ and takes the same $n$ variables $\left(x_{1}, \ldots, x_{n}\right)$ as input. Every polynomial $f_{i}$ can be denoted as $\sum_{h \in[0, d]} f_{i, h}$, where all terms in $f_{i, h}$ have degree exactly $h$. For a vector $\boldsymbol{z}=\left(z_{1}, \ldots, z_{n}\right) \in \mathbb{F}_{p}^{n}$, we define $f_{i}(\boldsymbol{z})=f_{i}\left(z_{1}, \ldots, z_{n}\right)$ as a value obtained by computing the polynomial $f_{i}$ on $x_{j}=z_{j}$ for $j \in[n]$. Given a polynomial $f$ over $\mathbb{F}_{p}$, we could naturally define a polynomial over $\mathbb{F}_{p^{r}}$ by interpreting operations over $\mathbb{F}_{p}$ (i.e., additions and multiplications) to be operations over $\mathbb{F}_{p^{r}}$.

### 2.2 MACs and Functionalities

Information-Theoretic MACs. We use information-theoretic message authentication codes (IT-MACs) [BDOZ11, NNOB12] to authenticate values over $\mathbb{F}_{p}$ or $\mathbb{F}_{p^{r}}$. Specifically, let $\Delta \in \mathbb{F}_{p^{r}}$ be a global key, which is sampled uniformly at random and known only by one party $\mathcal{V}$. A value $x \in \mathbb{F}_{p}$ or $\mathbb{F}_{p^{r}}$ known by the other party $\mathcal{P}$ can be authenticated by giving $\mathcal{V}$ a uniform key $k \in \mathbb{F}_{p^{r}}$ and giving $\mathcal{P}$ the corresponding MAC tag $m=k-\Delta \cdot x \in \mathbb{F}_{p^{r}}$. We denote such an authenticated value by $[x]$. Note that authenticated values are additively homomorphic. In particular, given the public coefficients $c_{1}, \ldots, c_{\ell}, c \in \mathbb{F}_{p}$ or $\mathbb{F}_{p^{r}}$, the parties can compute $[y]=\sum_{i=1}^{\ell} c_{i} \cdot\left[x_{i}\right]+c$ locally, where $\mathcal{P}$ compute $y:=\sum_{i=1}^{\ell} c_{i} \cdot x_{i}+c, m_{y}:=\sum_{i=1}^{\ell} c_{i} \cdot m_{x_{i}}$, and $\mathcal{V}$ computes $k_{y}:=\sum_{i=1}^{\ell} c_{i} \cdot k_{x_{i}}+c \cdot \Delta$. We extend the above notation to vectors of authenticated

## Functionality $\mathcal{F}_{\mathrm{ZK}}$

Input. Upon receiving (input, $i d, w$ ) from a prover $\mathcal{P}$ and (input, $i d$ ) from a verifier $\mathcal{V}$, with $i d$ a fresh identifier, store $(i d, w)$.

Prove circuit satisfiability. Upon receiving (prove, $\mathcal{C}, i d_{1}, \ldots, i d_{n}$ ) from $\mathcal{P}$ and (verify, $\mathcal{C}, i d_{1}, \ldots, i d_{n}$ ) from $\mathcal{V}$ where $i d_{1}, \ldots, i d_{n}$ are present in memory, retrieve $\left(i d_{i}, w_{i}\right)$ for $i \in[n]$ and define a vector $\boldsymbol{w}=\left(w_{1}, \ldots, w_{n}\right)$. Send true to $\mathcal{V}$ if $\mathcal{C}(\boldsymbol{w})=1$ and false otherwise.
Prove polynomial satisfiability. Upon receiving (prove, $\left\{f_{i}\right\}_{i \in[t]}, i d_{1}, \ldots, i d_{n}$ ) from $\mathcal{P}$ and (verify, $\left\{f_{i}\right\}_{i \in[t]}$, $\left.i d_{1}, \ldots, i d_{n}\right)$ from $\mathcal{V}$ where $i d_{1}, \ldots, i d_{n}$ are present in memory, retrieve $\left(i d_{i}, w_{i}\right)$ for $i \in[n]$ and define a vector $\boldsymbol{w}=\left(w_{1}, \ldots, w_{n}\right)$. Then send true to $\mathcal{V}$ if $f_{i}(\boldsymbol{w})=0$ for all $i \in[t]$ and false to $\mathcal{V}$ otherwise.

Figure 1: Zero-knowledge functionality for circuit and polynomial satisfiability.
values as well. In this case, $[\boldsymbol{x}]$ means that $\mathcal{P}$ holds $\boldsymbol{x} \in \mathbb{F}_{p}^{n}$ and $\boldsymbol{m} \in \mathbb{F}_{p^{r}}^{n}$, while $\mathcal{V}$ holds $\boldsymbol{k} \in \mathbb{F}_{p^{r}}^{n}$ with $\boldsymbol{k}=\boldsymbol{m}+\Delta \cdot \boldsymbol{x}$.
Security model and functionalities. All our protocols are proven in the universal composability (UC) framework [Can01] in the presence of a malicious, static adversary. We provide a brief overview of the UC framework in the Appendix A. Our ZK functionality is shown in Figure 1. As we provide the protocols for both circuits and polynomial sets, we extend the ZK functionality accordingly.

We use the standard sVOLE functionality $\left[\mathrm{BCG}^{+} 19 \mathrm{~b}, \mathrm{BCG}^{+} 19 \mathrm{a}\right]$ shown in Figure 6 of Appendix A to generate authenticated values, where $\mathcal{P}$ obtains $\boldsymbol{x} \in \mathbb{F}_{p}^{\ell}$ and $\boldsymbol{m} \in \mathbb{F}_{p^{r}}^{\ell}$, and $\mathcal{V}$ obtains $\Delta \in \mathbb{F}_{p^{r}}$ and $\boldsymbol{k} \in \mathbb{F}_{p^{r}}^{\ell}$, such that $\boldsymbol{m}=\boldsymbol{k}-\Delta \cdot \boldsymbol{x}$. This sVOLE functionality can be efficiently realized using the recent LPNbased protocols [SGRR19, $\mathrm{BCG}^{+} 19 \mathrm{a}, \mathrm{YWL}^{+} 20$, WYKW20]. These protocols have the communication complexity sublinear to the number of resulting sVOLE correlations.

## 3 Technical Overview

In this section, we provide the intuition of our ZK protocols, and leave the full protocol description, as well as the proofs of security, to later sections.

### 3.1 Zero-Knowledge Proof for Circuit Satisfiability over Any Sized Field

In the previous section, we introduced authenticated values, which can also be viewed as a way for the prover to commit values to the verifier. It is a non-interactive commitment with an interactive preprocessing phase. Given this tool, recent work [WYKW20, DIO20, BMRS20] designed efficient zero-knowledge proofs for circuit satisfiability with high scalability and communication linear to the circuit size. These works all follow the same paradigm below.

1. The prover first commits to all wire values in the circuit to the verifier, which takes $n+t$ field elements of communication and $n+t$ number of (s)VOLE correlations, where $n$ is the input size and $t$ is the number of multiplication gates. Because the underlying commitment is additively homomorphic, addition gates can be processed for free.
2. The prover proves that the committed values on multiplication gates are correct by executing a checking procedure with the verifier. This is where prior works differ. The first approach used in ZK proofs Wolverine [WYKW20] and Mac' $n^{\prime}$ Cheese [BMRS20] uses the VOLE-based commitment as a black box and follow the ideas in secure multi-party computation (MPC) to construct many authenticated/committed

Beaver triples. These Beaver triples can then be used to check the correctness of the first step. The second approach by LPZK [DIO20] uses the fact that IT-MACs are linear relationships. Their checking procedure needs only one extra field element per multiplication gate, which can be further eliminated using a random oracle. The resulting protocol appears to be optimal for large fields in this gate-by-gate paradigm.

Our ZK protocol in the gate-by-gate paradigm. Our protocol in the circuit-based setting can be viewed as a hybrid approach of Wolverine, which uses subfield VOLE to support any field size, and LPZK, which crucially relies on the fact that the VOLE-based commitments are linear relationships.

Similar to prior work, we use a linear scan on the circuit to compute the authenticated values on all the wires in the circuit. In particular, for each multiplication gate, the prover $\mathcal{P}$ has $\left(w_{\alpha}, m_{\alpha}\right),\left(w_{\beta}, m_{\beta}\right)$, $\left(w_{\gamma}, m_{\gamma}\right) \in \mathbb{F}_{p} \times \mathbb{F}_{p^{r}}$, and the verifier $\mathcal{V}$ holds $k_{\alpha}, k_{\beta}, k_{\gamma}, \Delta \in \mathbb{F}_{p^{r}}$ such that the following four equations hold:

$$
w_{\gamma}=w_{\alpha} \cdot w_{\beta} \quad \text { and } \quad m_{i}=k_{i}-w_{i} \cdot \Delta \text { for } i \in\{\alpha, \beta, \gamma\}
$$

If $\mathcal{P}$ is malicious, the first equation could potentially be incorrect and our task is to check that this relationship holds for all gates. Although the last three equations are linear equations from the perspective of the verifier, the first equation is not linear. The crucial observation is that it is possible to convert the non-linear checking to a linear checking. Specifically, we observe that for the $i$-th multiplication gate with wire values $\left(w_{\alpha}, w_{\beta}, w_{\gamma}\right)$, if it is computed correctly (i.e., $\left.w_{\gamma}=w_{\alpha} \cdot w_{\beta}\right)$, then we have the following relation:

$$
\begin{aligned}
& \overbrace{B_{i}=k_{\alpha} \cdot k_{\beta}-k_{\gamma} \cdot \Delta}^{\text {known to } \mathcal{V}} \\
& =\left(m_{\alpha}+w_{\alpha} \cdot \Delta\right) \cdot\left(m_{\beta}+w_{\beta} \cdot \Delta\right)-\left(m_{\gamma}+w_{\gamma} \cdot \Delta\right) \cdot \Delta \\
& =m_{\alpha} \cdot m_{\beta}+\left(w_{\beta} \cdot m_{\alpha}+w_{\alpha} \cdot m_{\beta}-m_{\gamma}\right) \cdot \Delta+\left(w_{\alpha} \cdot w_{\beta}-w_{\gamma}\right) \cdot \Delta^{2} \\
& =\underbrace{m_{\alpha} \cdot m_{\beta}}_{\begin{array}{c}
\text { known to } \mathcal{P} \\
\text { denoted as } A_{0, i}
\end{array}}+\underbrace{\left.w_{\beta} \cdot m_{\alpha}+w_{\alpha} \cdot m_{\beta}-m_{\gamma}\right)}_{\begin{array}{c}
\text { known to } \mathcal{P} \\
\text { denoted as } A_{1, i}
\end{array}} \cdot \underbrace{\Delta}_{\text {known to } \mathcal{V}} \\
& \text { global key }
\end{aligned}
$$

We can see that the above relationship is now linear and very similar to the IT-MAC relationship. What's more, we also show in Section 4 that if the underlying wire values (i.e., $w_{\alpha}, w_{\beta}, w_{\gamma}$ ) are not computed correctly, then the above relationship can hold only with probability $2 / p^{r}$ : now it becomes a quadratic equation of $\Delta$, where there are at most two values of $\Delta$ that satisfy the equation.

Now when we look at a circuit with $t$ multiplication gates, we can obtain one such relationship for each gate. Namely, for each $i \in[t], \mathcal{P}$ has $A_{0, i}, A_{1, i} \in \mathbb{F}_{p^{r}}$ and $\mathcal{V}$ has $B_{i} \in \mathbb{F}_{p^{r}}$ such that $B_{i}=A_{0, i}+A_{1, i} \cdot \Delta$. We can check all $t$ linear relations in a batch using a random linear combination. In particular, the verifier samples a uniform element $\chi \in \mathbb{F}_{p^{r}}$ after the above values have been defined, and then checks that the following relationship holds:


By the verifier sending just one field element (i.e., $\chi$ ), we are able to reduce checking $t$ equations in the circuit to checking the above single equation, that is $B=A_{0}+A_{1} \cdot \Delta$, where $\mathcal{V}$ has $B$ and $\Delta$, while $\mathcal{P}$ has $A_{0}$ and $A_{1}$. This could be easily checked by using a random linear relationship $B^{*}=A_{0}^{*}+A_{1}^{*} \cdot \Delta$ with
$B^{*}, A_{0}^{*}, A_{1}^{*} \in \mathbb{F}_{p^{r}}$ to mask field elements $A_{0}$ and $A_{1}$, and then opening the masked elements. In particular, $\mathcal{P}$ sends $U=A_{0}+A_{0}^{*}$ and $V=A_{1}+A_{1}^{*}$ to $\mathcal{V}$, who checks that $B+B^{*}=U+V \cdot \Delta$. Finally, this random linear relationship over $\mathbb{F}_{p^{r}}$ can be easily obtained by generating subfield VOLE correlations on $\mathbb{F}_{p}$ and packing them to $\mathbb{F}_{p^{r}}$.

Note that the online phase of the ZK protocol where the circuit and witness are known, can be made non-interactive by computing $\chi$ using a random oracle to hash the transcript up to that point. In Section 4, we provide the detailed description of our ZK protocol for circuit satisfiability and prove that it is UC-secure. We report the performance of the protocol in Section 6.

### 3.2 Zero-Knowledge Proof for Polynomial Sets

The above ZK protocol is great: for Boolean circuits, it sends only one bit per AND gate in the sVOLEhybrid model and supports free-XOR. We believe that in the gate-by-gate paradigm, the communication cost is optimal: intuitively, each AND gate has to require some communication when we process each gate individually.

This reminds us of the successful story of garbling, where a progression of research reached the halfgates garbling scheme [ZRE15] proven to be optimal in the gate-by-gate paradigm. However, a lower bound is merely a way to rule out certain approaches and in this case, the garbling lower bound implies that a better garbling scheme either needs to use something beyond random oracle and linear operations, or needs to go beyond the gate-by-gate paradigm and look at more gates as a whole. Indeed, the follow-up works [MPas15, BMR16, KP17] have shown that it is indeed possible to garble a formula more efficiently. In our second protocol, we apply the same philosophy to our ZK protocol for circuit satisfiability.

Our ZK protocol: proving inner product with smaller cost. Recall that in the above idea when we prove a multiplication gate with wire values $(x, y, z)$, two parties essentially prove $f(x, y, z)=x \cdot y-$ $z=0$. This could be viewed as a degree-2 polynomial on three variables. Let's first generalize it to a degree-2 polynomial with more than just one multiplication. Suppose $f$ is a degree- 2 polynomial such that $f\left(x_{1}, \ldots, x_{n}\right)=c_{0}+\sum_{i \in[n / 2]} c_{i} \cdot x_{i} \cdot x_{n / 2+i}$. Two parties hold authenticated values $\left[w_{1}\right], \ldots,\left[w_{n}\right]$, and the prover wants to prove $f\left(w_{1}, \ldots, w_{n}\right)=0$. We will follow the similar thinking process as above. In particular, observe that

$$
\begin{aligned}
f & \left(k_{1}, \ldots, k_{n}\right)=c_{0}+\sum_{i \in[n / 2]} c_{i} \cdot k_{i} \cdot k_{n / 2+i} \\
= & c_{0}+\sum_{i \in[n / 2]} c_{i} \cdot\left(m_{i}+w_{i} \cdot \Delta\right) \cdot\left(m_{n / 2+i}+w_{n / 2+i} \cdot \Delta\right) \\
= & \left(c_{0}+\sum_{i \in[n / 2]} c_{i} \cdot m_{i} \cdot m_{n / 2+i}\right)+\left(\sum_{i \in[n / 2]} c_{i} \cdot m_{i} \cdot w_{n / 2+i}+c_{i} \cdot m_{n / 2+i} \cdot w_{i}\right) \cdot \Delta+ \\
& \left(\sum_{i \in[n / 2]} c_{i} \cdot w_{i} \cdot w_{n / 2+i}\right) \cdot \Delta^{2} \\
= & \left(c_{0}+\sum_{i \in[n / 2]} c_{i} \cdot m_{i} \cdot m_{n / 2+i}\right)+\left(\sum_{i \in[n / 2]} c_{i} \cdot m_{i} \cdot w_{n / 2+i}+c_{i} \cdot m_{n / 2+i} \cdot w_{i}\right) \cdot \Delta-c_{0} \cdot \Delta^{2} .
\end{aligned}
$$

The last equation is due to the fact that $f\left(w_{1}, \ldots, w_{n}\right)=c_{0}+\sum_{i \in[n / 2]} c_{i} \cdot w_{i} \cdot w_{n / 2+i}=0$. Reorganizing the above equation a bit, we can obtain the following equation:

$$
\underbrace{f\left(k_{1}, \ldots, k_{n}\right)+c_{0} \cdot \Delta^{2}}_{\text {known to } \mathcal{V} \text {, denoted as } B}=\underbrace{\left(c_{0}+\sum_{i \in[n / 2]} c_{i} \cdot m_{i} \cdot m_{n / 2+i}\right)}_{\text {known to } \mathcal{P} \text {, denoted as } A_{0}}+\underbrace{\left(\sum_{i \in[n / 2]} c_{i} \cdot m_{i} \cdot w_{n / 2+i}+c_{i} \cdot m_{n / 2+i} \cdot w_{i}\right) \cdot \Delta .}_{\text {known to } \mathcal{P} \text {, denoted as } A_{1}}
$$

This is still a linear relationship $B=A_{0}+A_{1} \cdot \Delta$, which we could prove easily as in our first protocol. Essentially, we can prove a degree-2 polynomial with $n / 2$ multiplications with a communication cost of just $O(1)$, in addition to the cost of committing the witness. This is independent of the number of multiplications in the polynomial, which could be as many as $O(n)$. One immediate observation is that if we have $t$ such polynomials to be proven, the total communication cost is still $O(1)$ rather than $O(t)$, by using the same random-linear-combination idea to reduce all linear checks to a single check. This protocol immediately allows us to prove inner product faster than the circuit-based ZK protocol.
Our ZK protocol: proving arbitrary degree- $d$ polynomials. The above protocol is not as generalized as we want in two aspects: 1) it only handles degree-2 polynomials, not ones with a higher degree; 2) when being executed for $t$ times, we are assuming the polynomials are the same. Now we generalize it to the most extensive format. We assume that the witness is $\left(w_{1}, \ldots, w_{n}\right) \in \mathbb{F}_{p}^{n}$; there are totally $t$ polynomials to be proven and each multivariable polynomial $f_{i}\left(x_{1}, \ldots, x_{n}\right)$ over $\mathbb{F}_{p}$ has a degree at most $d$. The prover wants to prove that $f_{i}\left(w_{1}, \ldots, w_{n}\right)=0$ for all $i \in[t]$. Note that a polynomial can only take a subset of witness as input, and the witnesses that are not used will be not involved in the evaluation of the polynomial. Below, we show how to prove such polynomial set in communication of $d$ field elements over $\mathbb{F}_{p^{r}}$, in addition to the $n$ field elements over $\mathbb{F}_{p}$ to commit the witness.

When there are $n$ variables and degree up to $d$, there are surprisingly high number of terms, making it difficult to even write it out. Note that in practice commonly used polynomials only have a small number of terms, but we do not want to limit in any way for our ZK protocol. As a result, for every $n$-variable $d$-degree polynomial $f \in\left\{f_{1}, \ldots, f_{t}\right\}$, we will represent it as $f\left(x_{1}, \ldots, x_{n}\right)=\sum_{h \in[0, d]} g_{h}\left(x_{1}, \ldots, x_{n}\right)$, where $g_{h}$ is a degree- $h$ polynomial such that all terms in $g_{h}$ have exactly degree $h$. This turns out to help in our calculation. Now we apply the same idea as follows:

$$
\begin{aligned}
& f\left(k_{1}, \ldots, k_{n}\right)=f\left(m_{1}+w_{1} \cdot \Delta, \ldots, m_{n}+w_{n} \cdot \Delta\right) \\
& =\sum_{h \in[0, d]} g_{h}\left(m_{1}+w_{1} \cdot \Delta, \ldots, m_{n}+w_{n} \cdot \Delta\right) \\
& =\sum_{h \in[0, d]}\left(g_{h}\left(w_{1}, \ldots, w_{n}\right) \cdot \Delta^{h}+\sum_{j \in[0, h-1]} A_{h}^{j} \cdot \Delta^{j}\right) \\
& =\sum_{h \in[0, d]} g_{h}\left(w_{1}, \ldots, w_{n}\right) \cdot \Delta^{h}+\sum_{h \in[0, d]} \sum_{j \in[0, h-1]} A_{h}^{j} \cdot \Delta^{j} .
\end{aligned}
$$

In the above equation, $A_{h}^{j}$ is the coefficient in front of $x^{j}$ for the univariate polynomial $q_{h}(x)=g_{h}\left(m_{1}+\right.$ $\left.w_{1} \cdot x, \ldots, m_{n}+w_{n} \cdot x\right)$. Therefore, $q_{h}(\Delta)=g_{h}\left(w_{1}, \ldots, w_{n}\right) \cdot \Delta^{h}+\sum_{j \in[0, h-1]} A_{h}^{j} \cdot \Delta^{j}$.

At this point, we are stuck on the first part of the above equation. Recall that the protocol works in the multiplication-gate case, as the polynomial $f(x, y, z)=x \cdot y-z$ is exactly the coefficient of $\Delta^{2}$ and thus proving that the relationship is linear implies that this coefficient is zero. In the case of inner product, there is a constant term $c_{0}$, but we are able to get around with it since $c_{0}$ is public. Here, the polynomial is generic and there may be lower-order terms on $\Delta$. Ideally, we want an equation with the term $\left(\sum_{h \in[0, d]} g_{h}\left(w_{1}, \ldots, w_{n}\right)\right) \cdot \Delta^{d}$, and then prove that the rest part of the equation is a $(d-1)$-degree polynomial. To sum up, we need a way to shift each sub-polynomial so that the terms align. It turns out to be possible. See below.

Now instead of evaluating $f\left(k_{1}, \ldots, k_{n}\right)$ as we always do, we will decompose this polynomial and shift
each sub-polynomial. The verifier now computes

$$
\begin{aligned}
\sum_{h \in[0, d]} g_{h}\left(k_{1}, \ldots, k_{n}\right) \cdot \Delta^{d-h} & =\sum_{h \in[0, d]} g_{h}\left(m_{1}+w_{1} \cdot \Delta, \ldots, m_{n}+w_{n} \cdot \Delta\right) \cdot \Delta^{d-h} \\
& =\sum_{h \in[0, d]}\left(g_{h}\left(w_{1}, \ldots, w_{n}\right) \cdot \Delta^{d}+\sum_{j \in[0, h-1]} A_{h}^{j} \cdot \Delta^{j+d-h}\right) \\
& =\sum_{h \in[0, d]} g_{h}\left(w_{1}, \ldots, w_{n}\right) \cdot \Delta^{d}+\sum_{h \in[0, d-1]} A_{h} \cdot \Delta^{h} \\
& =f\left(w_{1}, \ldots, w_{n}\right) \cdot \Delta^{d}+\sum_{h \in[0, d-1]} A_{h} \cdot \Delta^{h} \\
& =\sum_{h \in[0, d-1]} A_{h} \cdot \Delta^{h} .
\end{aligned}
$$

Here $A_{h}^{j}$ is defined as above, and $A_{h}$ is the aggregated coefficient for all terms with $\Delta^{h}$. Note that the prover with witnesses $\left\{w_{i}\right\}$ and MACs $\left\{m_{i}\right\}$ can compute all the coefficients locally. The coefficients $\left\{A_{h}\right\}$ are polynomial coefficients when we treat it as a single-variable polynomial on $\Delta$. Therefore, we can compute $\left\{A_{h}\right\}$ efficiently by evaluating the polynomial on $d+1$ points and then use number theoretic transformation to interpolate it. In many practical applications, the polynomial is usually simple and thus the coefficients can be derived without using the above generic approach. This relationship can be viewed as an oblivious polynomial evaluation (OPE), where the verifier has $\Delta$ and the prover has a polynomial $P(x)=\sum_{h \in[0, d-1]} A_{h} \cdot x^{h}$ over $\mathbb{F}_{p^{r}}$. The verifier wants to check that the resulting evaluation in the above equation is the same as $P(\Delta)$. It is not hard to check the above polynomial relation, as sVOLE can be used to generate (V)OPE in an efficient way (see Section 4.1 for details). Similarly, we can perform the checks for all $t$ polynomials in a batch using the random linear combination. This results in a total communication of $(n+d r) \log p$ bits in the sVOLE-hybrid model.

### 3.3 Other Useful Results

By applying the above results in different settings, we can obtain the following interesting results.
Proving knowledge of solutions to lattice problems. Immediately, our ZK protocol shows a way to prove low-degree polynomials with very high efficiency. This could help us to prove knowledge of a solution of the short integer solution (SIS) problem. In particular, we assume that the prover knows a vector $s$, such that $\mathbf{A} \cdot \boldsymbol{s}=\boldsymbol{t}$ and $s \in[-B, B]^{m}$ for a small value $B \in \mathbb{N}\left(\right.$ or $\left.s \in\{0,1\}^{m}\right)$, where both parties know the public matrix $\mathbf{A} \in \mathbb{Z}_{q}^{n \times m}$ and vector $\boldsymbol{t} \in \mathbb{Z}_{q}^{n}$ (here we assume that $q$ is a prime). The main cost is to prove that all input values are bounded, which can be modeled as a set of low-degree polynomials, something our ZK protocol is very good at. As a result, our ZK protocol significantly outperforms all prior works for proving knowledge of solutions to SIS problems. In particular, for $s \in\{-1,0,1\}^{m}$, our protocol uses $8 \times$ less communication and $110 \times$ less execution-time compared to the state-of-the-art protocol Wolverine [WYKW20]. See Section 5.2 for details of our solution and Section 6.2 for performance evaluation.

Proving matrix multiplication with sublinear communication. Here, the prover intends to prove that $\mathbf{A} \cdot \mathbf{B}=\mathbf{C}$, where $\mathbf{A}, \mathbf{B} \in \mathbb{F}_{p}^{n \times n}$ are two secret matrices and $\mathbf{C} \in \mathbb{F}_{p}^{n}$ is a public matrix. Using the circuit-based ZK protocol, this will need $O\left(n^{3}\right)$ communication. In the previous section, we present a ZK protocol for inner product of two vectors with $O(1)$ communication (in addition to the cost of committing the witness), which is independent of the number of the inner-product executions. This immediately gives us a ZK protocol for proving matrix multiplication with $O\left(n^{2}\right)$ communication for committing two secret

## Functionality $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$

Initialize: Upon receiving (init) from $\mathcal{P}$ and $\mathcal{V}$, sample $\Delta \leftarrow \mathbb{F}_{p^{r}}$ if $\mathcal{V}$ is honest, and receive $\Delta \in \mathbb{F}_{p^{r}}$ from the adversary otherwise. Store $\Delta$ and send it to $\mathcal{V}$, and ignore all subsequent (init) commands.

Extend: This procedure can be run multiple times. Upon receiving (extend, $\ell$ ) from $\mathcal{P}$ and $\mathcal{V}$, do the following:

1. If $\mathcal{V}$ is honest, sample $\boldsymbol{k} \leftarrow \mathbb{F}_{p^{r}}^{\ell}$. Otherwise, receive $\boldsymbol{k} \in \mathbb{F}_{p^{r}}^{\ell}$ from the adversary.
2. If $\mathcal{P}$ is honest, sample $\boldsymbol{x} \leftarrow \mathbb{F}_{p}^{\ell}$ and compute $\boldsymbol{m}:=\boldsymbol{k}-\Delta \cdot \boldsymbol{x} \in \mathbb{F}_{p^{r}}^{\ell}$. Otherwise, receive $\boldsymbol{x} \in \mathbb{F}_{p}^{\ell}$ and $\boldsymbol{m} \in \mathbb{F}_{p^{r}}^{\ell}$ from the adversary, and then recompute $\boldsymbol{k}:=\boldsymbol{m}+\Delta \cdot \boldsymbol{x} \in \mathbb{F}_{p^{r}}^{\ell}$.
3. Send $(\boldsymbol{x}, \boldsymbol{m})$ to $\mathcal{P}$ and $\boldsymbol{k}$ to $\mathcal{V}$.

Vector Oblivious Polynomial Evaluation: Upon receiving (VOPE, $d$ ) from $\mathcal{P}$ and $\mathcal{V}$, do the following:

1. If $\mathcal{V}$ is honest, sample $B \leftarrow \mathbb{F}_{p^{r}}$. Otherwise, receive $B \in \mathbb{F}_{p^{r}}$ from the adversary.
2. If $\mathcal{P}$ is honest, sample $A_{i} \leftarrow \mathbb{F}_{p^{r}}$ for $i \in[d]$ and compute $A_{0}:=B-\sum_{i \in[d]} A_{i} \cdot \Delta^{i}$. Otherwise, receive $\left\{A_{i}\right\}_{i \in[0, d]}$ with $A_{i} \in \mathbb{F}_{p^{r}}$ from the adversary and re-compute $B:=\sum_{i \in[0, d]} A_{i} \cdot \Delta^{i}$.
3. Send $\left\{A_{i}\right\}_{i \in[0, d]}$ to $\mathcal{P}$ and $B$ to $\mathcal{V}$.

Figure 2: Functionality for extended subfield VOLE.
matrices and additional $O(1)$ communication for proving the matrix relationship $\left(\mathbf{A} \cdot \mathbf{B}\right.$ has $n^{2}$ executions of inner product between two vectors of dimension $n$ ).

Proving integer multiplication over a ring with a linear amortized communication cost. The multiplication of two $n$-bit integers needs $O\left(n^{2}\right)$ AND gates. Although multiplication is more efficient over large fields, we may prefer computing over a ring $\mathbb{Z}_{2^{n}}$ (e.g., $n=32$ ), for applications where matching cleartext computation is crucial. Now, we could actually prove the multiplication of two integers in communication of $O(n)$ bits. In particular, it is known that $n$-bit multiplier can be represented as a Boolean circuit of depth $2 \log n+3$ [BHWK16]. Therefore, the polynomial that represents each bit in the multiplication has degree at most $8 n^{2}$. If there are $t$ integer multiplications to be proven, the amortized communication cost of each multiplication will be $3 n+\frac{8 n^{2} \kappa}{t}$ bits, which becomes $4 n$ bits when $t \approx 8 n \kappa$. Here the communication of $3 n$ bits for each integer multiplication is used to commit input and output values.

Generic sublinear ZK proof via amortization. If the amortization is considered, then one immediate result is that we can prove $t$ number of circuits with each $N$ multiplication gates on totally $n$ variables in amortized communication sublinear to the circuit size. This is because the polynomial that represents an $N$-sized circuit can have a degree at most $N$. If the number $t=O(\sqrt{N})$ of circuits to be proven, then the amortized communication cost per circuit is $O\left(\frac{n+N}{t}\right) \approx O(\sqrt{N})$. If each circuit has a more compact polynomial representation (in terms of the highest degree), the communication saving will be even higher.

We could develop on top of this idea for general circuits with weak uniformity. Specifically, given a large circuit $C$, we can identify all sub-circuits denoted by $C_{1}, \ldots, C_{t}$, that have compact polynomial representations. Now, we can prove the evaluation of gates in $C \backslash\left\{C_{1}, \ldots, C_{t}\right\}$ using our ZK protocol for circuit satisfiability, which provides us with the commitments on all input wires and the wires related to the gates. Then we can prove all the sub-circuits $C_{1}, \ldots, C_{t}$ in a communication cost linear to the highest degree of all the circuits represented as polynomials. Note that the input and output values of all the sub-circuits have been committed in the first step, and thus no extra cost is needed. Here we only need a reasonably weak uniformity: we do not require all sub-circuits to be of the same size or the same topology; instead we only need that the polynomial representations of sub-circuits are all bounded by some degree $d$.

## 4 Zero-Knowledge Proof for Circuit Satisfiability over Any Field

In this section, we present our ZK protocol for circuit satisfiability over any field with communication of only one field element per multiplication gate using sVOLE as a subroutine. First of all, we introduce a functionality (and the corresponding protocol) that extends sVOLE to additionally support vector oblivious polynomial evaluation (VOPE), which is crucial for our ZK protocols in this section and the next section. Then, we provide the details of our ZK protocol, and give the formal security proof.

### 4.1 Extended Subfield Vector Oblivious Linear Evaluation

Extended sVOLE functionality. To accommodate our efficient ZK protocols for circuits and polynomial sets (described in Section 4.2 and Section 5), we propose an extended sVOLE functionality $\mathcal{F}_{\text {ext-sVoLE }}^{p, r}$ defined in Figure 2 to generate authenticated values and special correlations related to random polynomials. This functionality is the same as that shown in Figure 6 of Appendix A, except that it additionally allows two parties to obtain vector oblivious polynomial evaluation (VOPE) correlations over $\mathbb{F}_{p^{r}}$ with the guarantee that the same global key $\Delta$ is used between sVOLE and VOPE. In particular, given a polynomial-degree $d$ input by both parties, this functionality will sample $d+1$ uniform coefficients over extension field $\mathbb{F}_{p^{r}}$ to define a random polynomial $g$, and then output the coefficients to a party $\mathcal{P}$ and $g(\Delta)$ to the other party $\mathcal{V}$.
Protocol for realizing extended sVOLE functionality. We construct the protocol to UC-realize functionality $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$ by extending the sVOLE protocol. Recall that our extended functionality can be viewed as adding the support to output VOPE correlations over extension field $\mathbb{F}_{p^{r}}$. Our protocol to accomplish it takes two steps: 1) packing subfield VOLE correlations between $\mathbb{F}_{p}$ and $\mathbb{F}_{p^{r}}$ into VOLE correlations over $\mathbb{F}_{p^{r}} ; 2$ ) multiplying independent VOLE correlations to obtain a VOPE correlation. We note that a malicious party $\mathcal{V}$ could cause the outputting coefficients $A_{1}, \ldots, A_{d-1}$ of honest party $\mathcal{P}$ to be always 0 by setting $\Delta=0$ and all its keys as 0 . To prevent the attack, we iteratively multiply the VOLE correlations over $\mathbb{F}_{p^{r}}$, and use an extra independent VOLE correlation to randomize the product of VOLE correlations after multiplication is computed in every iteration. Details of the protocol are described in Figure 3.

The security of this protocol is proved in the following theorem, where the proof of this theorem is postponed to Appendix B.

Theorem 1. Protocol $\Pi_{\text {ext-sVole }}^{p, r}$ shown in Figure 3 UC-realizes functionality $\mathcal{F}_{\text {ext-sVole }}^{p, r}$ with statistical error $(d-1) / p^{r}$ and the information-theoretic security in the $\mathcal{F}_{\mathrm{sVOLE}}^{p, r}-h y b r i d$ model.

### 4.2 Our ZK Protocol using extended sVOLE

We have already discussed the intuition of our ZK protocol for circuit satisfiability over any field in Section 3.1, and thus here directly describe the details of the protocol in Figure 4. In the following, we prove the security of our ZK protocol, and show that the online phase can be made non-interactive in the random oracle model.

Proof of security. When both parties are honest, we easily see that the verifier will output true with probability 1. In particular, according to the description in Section 3.1, we have that the check in protocol $\Pi_{\mathrm{ZK}}^{p, r}$ always passes for an honest execution. For an honest protocol execution, we always have that $w_{h}=1$ (and thus $k_{h}=m_{h}+\Delta$ ) for the single output wire $h$. Overall, our ZK protocol $\Pi_{\mathrm{ZK}}^{p, r}$ shown in Figure 4 has the perfect completeness.

Theorem 2. Protocol $\Pi_{\mathrm{ZK}}^{p, r}$ UC-realizes functionality $\mathcal{F}_{\mathrm{ZK}}$ that proves the circuit satisfiability in the $\mathcal{F}_{\mathrm{ext-sVOLE}}{ }^{p}$ hybrid model with soundness error $(t+3) / p^{r}$ and the information-theoretic security.

## Protocol $\Pi_{\text {ext-sVOLE }}^{p, r}$

Initialize. $\mathcal{P}$ and $\mathcal{V}$ send (init) to $\mathcal{F}_{\text {sVOLE }}^{p, r}$, which returns a uniform $\Delta \in \mathbb{F}_{p^{r}}$ to $\mathcal{V}$.
Generate sVOLE correlations. On input (extend, $\ell$ ), two parties $\mathcal{P}$ and $\mathcal{V}$ call $\mathcal{F}_{\text {sVOLE }}^{p, r}$ to directly generate $\ell$ sVOLE correlations.

Generate VOPE correlations. On input (VOPE, $d$ ), two parties $\mathcal{P}$ and $\mathcal{V}$ execute the following:

1. For each $i \in[2 d-1]$, two parties perform as follows:
(a) Both parties send (extend, $r$ ) to $\mathcal{F}_{\text {sVOLE }}^{p, r}$, which returns $\left\{\left(m_{h}, u_{h}\right)\right\}_{h \in[r]}$ to $\mathcal{P}$ and $\left\{k_{h}\right\}_{h \in[r]}$ to $\mathcal{V}$ such that $k_{h}=m_{h}+u_{h} \cdot \Delta \in \mathbb{F}_{p^{r}}$ and $u_{h} \in \mathbb{F}_{p}$ for $h \in[r]$.
(b) $\mathcal{P}$ computes $M_{i}:=\sum_{h \in[r]} m_{h} \cdot X^{h-1} \in \mathbb{F}_{p^{r}}$ and $U_{i}:=\sum_{h \in[r]} u_{h} \cdot X^{h-1} \in \mathbb{F}_{p^{r}}$, and $\mathcal{V}$ computes $K_{i}:=\sum_{h \in[r]} k_{h} \cdot X^{h-1} \in \mathbb{F}_{p^{r}}$, where $K_{i}=M_{i}+U_{i} \cdot \Delta$.
2. $\mathcal{P}$ defines $g_{1}(x)=M_{1}+U_{1} \cdot x$, and $\mathcal{V}$ sets $B_{1}=K_{1}$. If $d>1$, from $i=1$ to $d-1$, two parties execute as follows:
(a) $\mathcal{P}$ computes the following univariate polynomial:

$$
g_{i+1}(x)=g_{i}(x) \cdot\left(M_{i+1}+U_{i+1} \cdot x\right)+\left(M_{d+i}+U_{d+i} \cdot x\right)
$$

(b) $\mathcal{V}$ computes $B_{i+1}:=B_{i} \cdot K_{i+1}+K_{d+i}$.

Then, $\mathcal{P}$ computes the coefficients $\left\{A_{i}\right\}_{i \in[0, d]}$ locally such that $g_{d}(x)=\sum_{i \in[0, d]} A_{i} \cdot x^{i}$, and $\mathcal{V}$ defines $B:=B_{d}$.
3. $\mathcal{P}$ outputs $\left\{A_{i}\right\}_{i \in[0, d]} ; \mathcal{V}$ outputs $B$.

Figure 3: Protocol for extended subfield VOLE in the $\mathcal{F}_{\mathrm{sVOLE}}^{p, r}$-hybrid model.
Proof. We first consider the case of a malicious prover (i.e., soundness and knowledge extraction) and then consider the case of a malicious verifier (i.e., zero knowledge). In each case, we construct a probabilistic polynomial time (PPT) simulator $\mathcal{S}$ given access to $\mathcal{F}_{\mathrm{ZK}}$, and running the adversary $\mathcal{A}$ as a subroutine while emulating $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$ for $\mathcal{A}$. We always implicitly assume that $\mathcal{S}$ passes all communication between adversary $\mathcal{A}$ and environment $\mathcal{Z}$.

Malicious prover. By emulating $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}, \mathcal{S}$ interacts with $\mathcal{A}$ as follows:

1. $\mathcal{S}$ emulates $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$ for $\mathcal{A}$ by choosing uniform $\Delta \in \mathbb{F}_{p^{r}}$, and recording all the values $\left\{\mu_{i}\right\}_{i \in[n]}$ and $\left\{\nu_{i}\right\}_{i \in[N]}$ and their corresponding MAC tags, which are received by $\mathcal{F}_{\text {ext-svole }}^{p, r}$ from adversary $\mathcal{A}$. These values define the corresponding keys in the natural way. When emulating $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}, \mathcal{S}$ also receives $\left(A_{0}^{*}, A_{1}^{*}\right) \in\left(\mathbb{F}_{p^{r}}\right)^{2}$ from $\mathcal{A}$ and defines $B^{*}$ accordingly.
2. When $\mathcal{A}$ sends $\left\{\delta_{i}\right\}_{i \in \mathcal{I}_{\text {in }}}$ in step $4, \mathcal{S}$ computes $w_{i}:=\delta_{i}+\mu_{i} \in \mathbb{F}_{p}$ for $i \in \mathcal{I}_{\text {in }}$.
3. $\mathcal{S}$ executes the rest of the protocol as an honest verifier, using $\Delta$ and the keys defined in the first step. If the honest verifier outputs false, then $\mathcal{S}$ sends $\boldsymbol{w}=\perp$ and $\mathcal{C}$ to $\mathcal{F}_{\text {ZK }}$ and aborts. If the honest verifier outputs true, then $\mathcal{S}$ sends $\boldsymbol{w}$ and $\mathcal{C}$ to $\mathcal{F}_{Z K}$ where $\boldsymbol{w}=\left(w_{1}, \ldots, w_{n}\right)$ is defined as above.

Clearly, the view of adversary $\mathcal{A}$ simulated by $\mathcal{S}$ has the identical distribution as its view in the real-world execution. Whenever the verifier in the real-world execution outputs false, the verifier in the ideal-world execution outputs false as well (since $\mathcal{S}$ sends $\perp$ to $\mathcal{F}_{\text {ZK }}$ in this case). Thus, it only remains to bound the probability that the verifier in the real-world execution outputs true but the witness $\boldsymbol{w}$ sent by $\mathcal{S}$ to $\mathcal{F}_{\text {ZK }}$

## $\underline{\text { Protocol } \Pi_{\text {ZK }}^{p, r}}$

Inputs: The prover $\mathcal{P}$ and the verifier $\mathcal{V}$ hold a circuit $\mathcal{C}$ over any field $\mathbb{F}_{p}$ with $t$ multiplication gates. Prover $\mathcal{P}$ also holds a witness $\boldsymbol{w}$ such that $\mathcal{C}(\boldsymbol{w})=1$ and $|\boldsymbol{w}|=n$ (i.e., $\left|\mathcal{I}_{\text {in }}\right|=n$ ).

Preprocessing phase: Both the circuit and witness are unknown.

1. $\mathcal{P}$ and $\mathcal{V}$ send (init) to $\mathcal{F}_{\text {ext-sVole }}^{p, r}$, which returns a uniform $\Delta \in \mathbb{F}_{p^{r}}$ to $\mathcal{V}$.
2. $\mathcal{P}$ and $\mathcal{V}$ send (extend, $n+t$ ) to $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$, which returns authenticated values $\left\{\left[\mu_{i}\right]\right\}_{i \in[n]}$ and $\left\{\left[\nu_{i}\right]\right\}_{i \in[t]}$ to the parties.
3. $\mathcal{P}$ and $\mathcal{V}$ send (VOPE, 1) to $\mathcal{F}_{\text {ext-sVoLE }}^{p, r}$, which returns uniform $\left(A_{0}^{*}, A_{1}^{*}\right)$ to $\mathcal{P}$ and $B^{*}$ to $\mathcal{V}$, such that $B^{*}=$ $A_{0}^{*}+A_{1}^{*} \cdot \Delta$.

Online phase: Now the circuit and witness are known by the parties.
4. For $i \in \mathcal{I}_{\text {in }}, \mathcal{P}$ sends $\delta_{i}:=w_{i}-\mu_{i} \in \mathbb{F}_{p}$ to $\mathcal{V}$, and then both parties compute $\left[w_{i}\right]:=\left[\mu_{i}\right]+\delta_{i}$.
5. For each gate $(\alpha, \beta, \gamma, T) \in \mathcal{C}$, in a topological order:

- If $T=$ Add, then two parties locally compute $\left[w_{\gamma}\right]:=\left[w_{\alpha}\right]+\left[w_{\beta}\right]$.
- If $T=$ Mult and this is the $i$-th multiplication gate, $\mathcal{P}$ sends $d_{i}:=w_{\alpha} \cdot w_{\beta}-\nu_{i} \in \mathbb{F}_{p}$ to $\mathcal{V}$, and then both parties compute $\left[w_{\gamma}\right]:=\left[\nu_{i}\right]+d_{i}\left(\right.$ with $w_{\gamma}=w_{\alpha} \cdot w_{\beta}$ in the honest case).

6. For the $i$-th multiplication gate, two parties hold an authenticated triple $\left(\left[w_{\alpha}\right],\left[w_{\beta}\right],\left[w_{\gamma}\right]\right)$ (with $k_{i}=m_{i}+$ $w_{i} \cdot \Delta$ for $\left.i \in\{\alpha, \beta, \gamma\}\right)$ from the previous step and execute the following:

- $\mathcal{P}$ computes $A_{0, i}:=m_{\alpha} \cdot m_{\beta} \in \mathbb{F}_{p^{r}}$ and $A_{1, i}:=w_{\alpha} \cdot m_{\beta}+w_{\beta} \cdot m_{\alpha}-m_{\gamma} \in \mathbb{F}_{p^{r}}$.
- $\mathcal{V}$ computes $B_{i}:=k_{\alpha} \cdot k_{\beta}-k_{\gamma} \cdot \Delta \in \mathbb{F}_{p^{r}}$.

7. $\mathcal{P}$ and $\mathcal{V}$ perform the following check to verify that $B_{i}=A_{0, i}+A_{1, i} \cdot \Delta$ for all $i \in[t]$.
(a) $\mathcal{V}$ samples $\chi \leftarrow \mathbb{F}_{p^{r}}$ and sends it to $\mathcal{P}$.
(b) $\mathcal{P}$ computes $U:=\sum_{i \in[t]} A_{0, i} \cdot \chi^{i}+A_{0}^{*}$ and $V:=\sum_{i \in[t]} A_{1, i} \cdot \chi^{i}+A_{1}^{*}$, and sends $(U, V)$ to $\mathcal{V}$.
(c) Then $\mathcal{V}$ computes $W:=\sum_{i \in[t]} B_{i} \cdot \chi^{i}+B^{*}$ and checks that $W=U+V \cdot \Delta$. If the check fails, $\mathcal{V}$ outputs false and aborts.
8. For the single output wire $h$ in the circuit $\mathcal{C}$, both parties hold $\left[w_{h}\right]$ with $k_{h}=m_{h}+w_{h} \cdot \Delta$, and check that $w_{h}=1$ as follows:

- In parallel with the previous step, $\mathcal{P}$ sends $m_{h}$ to $\mathcal{V}$.
- $\mathcal{V}$ checks that $k_{h}=m_{h}+\Delta$. If the check fails, then $\mathcal{V}$ outputs false. Otherwise, $\mathcal{V}$ outputs true.

Figure 4: Zero-knowledge protocol for circuit satisfiability over any field in the $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$-hybrid model.
satisfies $\mathcal{C}(\boldsymbol{w})=0$. In the following, we show that if $\mathcal{C}(\boldsymbol{w})=0$ then the probability that the honest verifier in the real-world execution outputs true is at most $(t+3) / p^{r}$.

By induction, we prove that all the values on the wires in the circuit are correct. It is trivial that the values associated with the input wires and the output wires of Add gates are computed correctly. Therefore, we focus on analyzing the correctness of the values related to the output wires of Mult gates. When we analyze the correctness of the output value with respect to the $i$-th multiplication gate, we always assume that the output values associated with the first $(i-1)$ multiplication gates are correct by induction. For the $i$-th multiplication gate, two parties hold an authenticated triple $\left(\left[w_{\alpha}\right],\left[w_{\beta}\right],\left[w_{\gamma}\right]\right)$ with $w_{\gamma}=w_{\alpha} \cdot w_{\beta}+e_{i}$, where $e_{i} \in \mathbb{F}_{p}$ is an error chosen by adversary $\mathcal{A}$ by sending an incorrect value $d_{i}^{\prime}$ in step 5 of protocol $\Pi_{\text {ZK }}^{p, r}$.

Thus, we have $k_{\gamma}=m_{\gamma}+w_{\gamma} \cdot \Delta=m_{\gamma}+\left(w_{\alpha} \cdot w_{\beta}\right) \cdot \Delta+e_{i} \cdot \Delta$. Further, we have:

$$
\begin{aligned}
B_{i} & =k_{\alpha} \cdot k_{\beta}-k_{\gamma} \cdot \Delta \\
& =\left(m_{\alpha}+w_{\alpha} \cdot \Delta\right) \cdot\left(m_{\beta}+w_{\beta} \cdot \Delta\right)-\left(m_{\gamma}+w_{\alpha} \cdot w_{\beta} \cdot \Delta+e_{i} \cdot \Delta\right) \cdot \Delta \\
& =m_{\alpha} \cdot m_{\beta}+\left(w_{\alpha} \cdot m_{\beta}+w_{\beta} \cdot m_{\alpha}-m_{\gamma}\right) \cdot \Delta-e_{i} \cdot \Delta^{2} \\
& =A_{0, i}+A_{1, i} \cdot \Delta-e_{i} \cdot \Delta^{2} .
\end{aligned}
$$

In step 7 of the ZK protocol, $\mathcal{A}$ sends $U^{\prime}=U+E_{u}$ and $V^{\prime}=V+E_{v}$ to the honest verifier, where $U, V \in \mathbb{F}_{p^{r}}$ are computed following the protocol description, and $E_{u}, E_{v} \in \mathbb{F}_{p^{r}}$ are the adversarially chosen errors. Furthermore, we have the following:

$$
\begin{aligned}
W & =\sum_{i \in[t]} B_{i} \cdot \chi^{i}+B^{*} \\
& =\sum_{i \in[t]}\left(A_{0, i}+A_{1, i} \cdot \Delta-e_{i} \cdot \Delta^{2}\right) \cdot \chi^{i}+A_{0}^{*}+A_{1}^{*} \cdot \Delta \\
& =U+V \cdot \Delta-\left(\sum_{i \in[t]} e_{i} \cdot \chi^{i}\right) \cdot \Delta^{2} \\
& =\left(U^{\prime}-E_{u}\right)+\left(V^{\prime}-E_{v}\right) \cdot \Delta-\left(\sum_{i \in[t]} e_{i} \cdot \chi^{i}\right) \cdot \Delta^{2} .
\end{aligned}
$$

If the check passes in step 7 , then we have that $W=U^{\prime}+V^{\prime} \cdot \Delta$. Therefore, we obtain that

$$
E_{u}+E_{v} \cdot \Delta+\left(\sum_{i \in[t]} e_{i} \cdot \chi^{i}\right) \cdot \Delta^{2}=0 .
$$

If $\sum_{i \in[t]} e_{i} \cdot \chi^{i} \neq 0$, then the above equation holds with probability at most $2 / p^{r}$, as $\Delta \in \mathbb{F}_{p^{r}}$ is uniformly random and kept secret from the adversary's view. Below, we consider that $\sum_{i \in[t]} e_{i} \cdot \chi^{i}=0$. If there exists some $i \in[t]$ such that $e_{i} \neq 0$, the probability that $\sum_{i \in[t]} e_{i} \cdot \chi^{i}=0$ is at most $t / p^{r}$, as $\chi$ is sampled uniformly at random after $e_{i}$ for all $i \in[t]$ have been determined. Overall, all the values on the wires in the circuit are correct, except with probability at most $(t+2) / p^{r}$.

Now, we assume that all the values on the wires in the circuit are correct. If $\mathcal{C}(\boldsymbol{w})=0$ but the honest verifier outputs true in step 8 , then adversary $\mathcal{A}$ must send $m_{h}+\Delta$ to the honest verifier where $m_{h}$ is a MAC tag on output wire $h$ known by $\mathcal{A}$. In other words, $\mathcal{A}$ learns $\Delta$, which occurs with probability at most $1 / p^{r}$.

In conclusion, any unbounded environment $\mathcal{Z}$ cannot distinguish between the real-world execution and ideal-world execution, except with probability $(t+3) / p^{r}$.
Malicious verifier. If $\mathcal{S}$ receives false from $\mathcal{F}_{\mathrm{ZK}}$, then it simply aborts. Otherwise, $\mathcal{S}$ interacts with adversary $\mathcal{A}$ as follows:

1. In the preprocessing phase, $\mathcal{S}$ emulates $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$ by recording the global key $\Delta$ and the keys for all the authenticated values, that are sent to this functionality by $\mathcal{A}$. $\mathcal{S}$ also receives $B^{*} \in \mathbb{F}_{p^{r}}$ from $\mathcal{A}$ when emulating $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$.
2. $\mathcal{S}$ executes steps $4-5$ of protocol $\Pi_{\mathrm{ZK}}^{p, r}$ by sending uniformly random $\delta_{i}$ for each $i \in \mathcal{I}_{\text {in }}$ and $d_{i}$ for the $i$-th multiplication gate to adversary $\mathcal{A}$.
3. $\mathcal{S}$ executes steps 6-7 of the protocol as an honest prover, except that sampling $V \leftarrow \mathbb{F}_{p^{r}}$ and computing $U:=W-V \cdot \Delta$ where $W$ is computed using $\Delta, B^{*}$ and the keys received from $\mathcal{A}$ following the protocol specification.
4. In step 8 of the protocol, $\mathcal{S}$ computes $k_{h}$ (based on the keys sent to $\mathcal{F}_{\text {ext-sVole }}^{p, r}$ by $\mathcal{A}$ ) and then sets $m_{h}:=k_{h}+\Delta$, where $h$ is the single output wire. Then, $\mathcal{S}$ sends $m_{h}$ to $\mathcal{A}$.

Since $\left\{\mu_{i}\right\},\left\{\nu_{i}\right\}$ and $A_{1}^{*}$ are uniformly random and perfectly hidden against the view of adversary $\mathcal{A}$, we easily obtain that the view of $\mathcal{A}$ simulated by $\mathcal{S}$ is distributed identically to its view in the real protocol execution. This completes the proof.

In the protocol $\Pi_{\mathrm{ZK}}^{p, r}$ shown in Figure 4, if we set $p=2^{61}-1$ and $r=1$, then the computation of $\chi^{i}$ for $i \in[t]$ is expensive (especially for large $t$ ). We can replace $\chi^{i}$ for $i \in[t]$ with independent uniform coefficient $\chi_{i}$ for $i \in[t]$ to obtain better computation efficiency. In this case, the verifier can send a random seed in $\{0,1\}^{\kappa}$ to the prover, and then both parties compute $\chi_{1}, \ldots, \chi_{t}$ using the seed and a random oracle. Now, the soundness error is bounded by $q / 2^{\kappa}+4 / p^{r}$, where $q$ is an upper bound of the number of random oracle queries made by the adversary. When using the random oracle, the security is guaranteed in the computational sense.
Non-interactive online phase. In the online phase of our protocol $\Pi_{\mathrm{ZK}}^{p, r}$, the verifier only sends a random coefficient $\chi$ to the prover. Thus, the communication cost is one field element per multiplication gate even without random oracle. But the online phase needs communication of three rounds.

We can use the Fiat-Shamir heuristic to make the online phase non-interactive at the cost of that the information-theoretic security is degraded to the computation security. Specifically, both parties can compute $\chi \in \mathbb{F}_{p^{r}}$ as $\mathrm{H}\left(d_{1}, \ldots, d_{t}\right)$, where $\mathrm{H}:\{0,1\}^{*} \rightarrow \mathbb{F}_{p^{r}}$ is a cryptographic hash function modeled as a random oracle and $p^{r} \geq 2^{\kappa}$. In this case, the soundness error for the batch check of multiplication gates together with the correctness of the single output is now bounded by $\left(q_{\mathbf{H}}+t+3\right) / p^{r} \leq\left(q_{\mathbf{H}}+t+3\right) / 2^{\kappa}$, where $q_{\mathrm{H}}$ is an upper bound of the number of H queries made by the adversary. When we set $p=2$ and $r=128$, we can obtain a non-interactive online phase with a blazing-fast computation given hardware-instruction support.

## 5 Zero-Knowledge Proof for Polynomial Sets over Any Field

Recall that we have explained the intuition of our ZK protocol for polynomial sets in Section 3.2. Thus, we directly show the detailed protocol in Figure 5, and provide the formal security proof. We also present the practical applications of the ZK protocol, including how to optimize the zero-knowledge proofs for proving matrix multiplication, proving knowledge of a solution to an SIS problem, proving integer multiplication over a ring and proving the circuits with some level of weak uniformity.

In the ZK protocol shown in Figure 5, the prover can compute the coefficients $\left\{A_{i, h}\right\}_{h \in[0, d-1]}$ of polynomial $g_{i}(x)$ for $i \in[t]$ in the following generic way.

- $\mathcal{P}$ computes $y_{i, j}:=g_{i}\left(\alpha_{j}\right)$ for $j \in[d+1]$, where $\alpha_{1}, \ldots, \alpha_{d+1}$ are any $d+1$ different fixed points over extension field $\mathbb{F}_{p^{r}}$.
- Then $\mathcal{P}$ can compute $g_{i}(x)=\sum_{j \in[d+1]} y_{i, j} \cdot \delta_{j}(x)$, where $\delta_{j}(x)=\prod_{k \neq j} \frac{x-\alpha_{k}}{\alpha_{j}-\alpha_{k}}$ is a fixed $d$-degree polynomial that can be precomputed in the preprocessing phase.

In a lot of practical applications, the polynomials $\left\{g_{i}(x)\right\}$ are usually simple, and thus the coefficients can be computed efficiently without the need of using the above generic approach.

### 5.1 Proof of Security

When both parties are honest, it is not hard to see that the verifier will always output true with probability 1. Specifically, from $k_{i}=m_{i}+w_{i} \cdot \Delta$ for $i \in[n]$ and the description in Section 3.2, we have that

## $\underline{\text { Protocol } \Pi_{\text {polyzK }}^{p, r}}$

Inputs: The prover $\mathcal{P}$ and verifier $\mathcal{V}$ hold $t$ number of $d$-degree polynomials $f_{1}, \ldots, f_{t}$ all over $n$ variables. Each polynomial $f_{i}$ is represented as $f_{i}=\sum_{h \in[0, d]} f_{i, h}$ where all terms in $f_{i, h}$ have degree $h . \mathcal{P}$ also holds a witness $\boldsymbol{w} \in \mathbb{F}_{p}^{n}$, such that $f_{i}(\boldsymbol{w})=0$ for all $i \in[t]$.

## Preprocessing phase:

1. $\mathcal{P}$ and $\mathcal{V}$ send (init) to $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$, which sends $\Delta \in \mathbb{F}_{p^{r}}$ to $\mathcal{V}$.
2. $\mathcal{P}$ and $\mathcal{V}$ send (extend, $n$ ) to $\mathcal{F}_{\text {ext-sVoLE }}^{p, r}$, which returns authenticated values $\left\{\left[s_{i}\right]\right\}_{i \in[n]}$ to the parties.
3. $\mathcal{P}$ and $\mathcal{V}$ send (VOPE, $d-1$ ) to $\mathcal{F}_{\text {ext-sVoLE }}^{p, r}$, which returns $\left\{A_{h}^{*}\right\}_{h \in[0, d-1]}$ to $\mathcal{P}$, $B^{*}$ to $\mathcal{V}$, such that $\sum_{h \in[0, d-1]} A_{h}^{*} \cdot \Delta^{h}=B^{*}$.

## Online phase:

4. For $i \in[n], \mathcal{P}$ sends $\delta_{i}:=w_{i}-s_{i} \in \mathbb{F}_{p}$ to $\mathcal{V}$, and both parties compute $\left[w_{i}\right]:=\left[s_{i}\right]+\delta_{i}$. Note that $k_{i}=m_{i}+w_{i} \cdot \Delta$ for $i \in[n]$.
5. From $i=1$ to $t$, for the $i$-th polynomial $f_{i}$, two parties perform the following:

- $\mathcal{V}$ computes $B_{i}:=\sum_{h \in[0, d]} f_{i, h}\left(k_{1}, \ldots, k_{n}\right) \cdot \Delta^{d-h}$.
- $\mathcal{P}$ defines a univariate $d$-degree polynomial over field $\mathbb{F}_{p^{r}}$ as $g_{i}(x)=\sum_{h \in[0, d]} f_{i, h}\left(m_{1}+w_{1} \cdot x, \ldots, m_{n}+\right.$ $\left.w_{n} \cdot x\right) \cdot x^{d-h}$, and computes the coefficients $\left\{A_{i, h}\right\}_{h \in[0, d]}$ such that $g_{i}(x)=\sum_{h \in[0, d]} A_{i, h} \cdot x^{h}$. Since $A_{i, d}=f_{i}\left(w_{1}, \ldots, w_{n}\right)=0, g_{i}(x)$ can be written as $\sum_{h \in[0, d-1]} A_{i, h} \cdot x^{h}$.

6. Two parties perform the following to check that $\sum_{h \in[0, d-1]} A_{i, h} \cdot \Delta^{h}=B_{i}$ for all $i \in[t]$ :
(a) $\mathcal{V}$ samples $\chi \leftarrow \mathbb{F}_{p^{r}}$ and sends it to $\mathcal{P}$.
(b) For all $h \in[0, d-1], \mathcal{P}$ computes $U_{h}:=\sum_{i \in[t]} A_{i, h} \cdot \chi^{i}+A_{h}^{*}$ and sends it to $\mathcal{V}$.
(c) $\mathcal{V}$ computes $W:=\sum_{i \in[t]} B_{i} \cdot \chi^{i}+B^{*}$ and checks that $W=\sum_{h \in[0, d-1]} U_{h} \cdot \Delta^{h}$. If the check fails, $\mathcal{V}$ outputs false; otherwise it outputs true.

Figure 5: Zero-knowledge for polynomial satisfiability over any field in the $\mathcal{F}_{\mathrm{ext} \text {-sVOLE }}^{p, r}$-hybrid model.
$B_{i}=\sum_{h \in[0, d-1]} A_{i, h} \cdot \Delta^{h}$ for all $i \in[t]$. Together with $\sum_{h \in[0, d-1]} A_{h}^{*} \cdot \Delta^{h}=B^{*}$, we obtain that the following holds:

$$
\begin{aligned}
W & =\sum_{i \in[t]} B_{i} \cdot \chi^{i}+B^{*} \\
& =\sum_{i \in[t]}\left(\sum_{h \in[0, d-1]} A_{i, h} \cdot \Delta^{h}\right) \cdot \chi^{i}+\sum_{h \in[0, d-1]} A_{h}^{*} \cdot \Delta^{h} \\
& =\sum_{h \in[0, d-1]}\left(\sum_{i \in[t]} A_{i, h} \cdot \chi^{i}+A_{h}^{*}\right) \cdot \Delta^{h}=\sum_{h \in[0, d-1]} U_{h} \cdot \Delta^{h}
\end{aligned}
$$

Thus, our protocol shown in Figure 5 has the perfect completeness.
Theorem 3. Protocol $\Pi_{\text {polyZK }}^{p, r}$ UC-realizes functionality $\mathcal{F}_{\mathrm{ZK}}$ that proves polynomial satisfiability in the $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$-hybrid model with soundness error $(d+t) / p^{r}$ and the information-theoretic security.
Proof. We first consider the case of a malicious prover (i.e., soundness and knowledge extraction) and then consider the case of a malicious verifier (i.e., zero knowledge). In each case, we construct a PPT simulator
$\mathcal{S}$, which given access to $\mathcal{F}_{\mathrm{ZK}}$, runs the adversary $\mathcal{A}$ as a subroutine while emulating $\mathcal{F}_{\text {ext-sVoLE }}^{p, r}$ for $\mathcal{A}$. We always implicitly assume that $\mathcal{S}$ passes all communication between adversary $\mathcal{A}$ and environment $\mathcal{Z}$.
Malicious prover. $\mathcal{S}$ emulates functionality $\mathcal{F}_{\text {ext-sVoLe }}^{p, r}$ and interacts with adversary $\mathcal{A}$ as follows:

1. $\mathcal{S}$ emulates $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$ for $\mathcal{A}$ by choosing uniform $\Delta \in \mathbb{F}_{p^{r}}$, and recording all the values $\left\{s_{i}\right\}_{i \in[n]}$ and their corresponding MAC tags that are received by $\mathcal{F}_{\text {ext-sVoLE }}^{p, r}$ from adversary $\mathcal{A}$. These values define the corresponding keys in the natural way. When emulating $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}, \mathcal{S}$ also receives $\left\{A_{h}^{*}\right\}_{h \in[0, d-1]}$ from $\mathcal{A}$ and defines $B^{*}=\sum_{h \in[0, d-1]} A_{h}^{*} \cdot \Delta^{h}$.
2. When $\mathcal{A}$ sends $\left\{\delta_{i}\right\}_{i \in[n]}$ in step $4, \mathcal{S}$ extracts the witness as $w_{i}:=\delta_{i}+s_{i}$ for $i \in[n]$.
3. $\mathcal{S}$ executes the remaining part of protocol $\Pi_{\text {polyzk }}^{p, r}$ as an honest verifier, using $\Delta$ and the keys defined in the first step. If the honest verifier outputs false, then $\mathcal{S}$ sends $\boldsymbol{w}=\perp$ and $\mathcal{C}$ to $\mathcal{F}_{\mathrm{ZK}}$ and aborts. If the honest verifier outputs true, $\mathcal{S}$ sends $\boldsymbol{w}$ and $\mathcal{C}$ to $\mathcal{F}_{\mathrm{ZK}}$ where $\boldsymbol{w}=\left(w_{1}, \ldots, w_{n}\right)$ is extracted by $\mathcal{S}$ as above.

It is easy to see that the view of the adversary simulated by $\mathcal{S}$ has the identical distribution as its view in the real-world execution. Whenever the honest verifier in the real-world execution outputs false, the honest verifier in the ideal-world execution outputs false as well (since $\mathcal{S}$ sends $\perp$ to $\mathcal{F}_{\text {ZK }}$ in this case). Therefore, we only need to bound the probability that the verifier in the real-world execution outputs true but the witness $\boldsymbol{w}$ sent by $\mathcal{S}$ to $\mathcal{F}_{\mathrm{ZK}}$ satisfies that $f_{i}(\boldsymbol{w}) \neq 0$ for some $i \in[t]$. Below, we show that this happens with probability at most $(d+t) / p^{r}$.

Let $f_{i}(\boldsymbol{w})=f_{i}\left(w_{1}, \ldots, w_{n}\right)=y_{i}$ with some $y_{i} \in \mathbb{F}_{p}$ for each $i \in[t]$, where $\boldsymbol{w}=\left(w_{1}, \ldots, w_{n}\right)$ is a vector extracted by $\mathcal{S}$. According to the definition of $B_{i}$ for $i \in[t]$, we have the following:

$$
\begin{aligned}
B_{i} & =\sum_{h \in[0, d]} f_{i, h}\left(k_{1}, \ldots, k_{n}\right) \cdot \Delta^{d-h} \\
& =\sum_{h \in[0, d]} f_{i, h}\left(m_{1}+w_{1} \cdot \Delta, \ldots, m_{n}+w_{n} \cdot \Delta\right) \cdot \Delta^{d-h} \\
& =\sum_{h \in[0, d-1]} A_{i, h} \cdot \Delta^{h}+y_{i} \cdot \Delta^{d} .
\end{aligned}
$$

In step $6, \mathcal{S}$ receives $U_{h}^{\prime}=U_{h}+E_{h}$ for $h \in[0, d-1]$ from adversary $\mathcal{A}$, where $U_{h}$ is computed with $\boldsymbol{w}$ and the corresponding MACs following the protocol specification, and $E_{h} \in \mathbb{F}_{p^{r}}$ is an adversarially chosen error. Together with $B^{*}=\sum_{h \in[0, d-1]} A_{h}^{*} \cdot \Delta^{h}$, we obtain that the following equation holds:

$$
\begin{aligned}
W & =\sum_{i \in[t]} B_{i} \cdot \chi^{i}+B^{*} \\
& =\sum_{i \in[t]}\left(\sum_{h \in[0, d-1]} A_{i, h} \cdot \Delta^{h}+y_{i} \cdot \Delta^{d}\right) \cdot \chi^{i}+\sum_{h \in[0, d-1]} A_{h}^{*} \cdot \Delta^{h} \\
& =\left(\sum_{i \in[t]} y_{i} \cdot \chi^{i}\right) \cdot \Delta^{d}+\sum_{h \in[0, d-1]}\left(\sum_{i \in[t]} A_{i, h} \cdot \chi^{i}+A_{h}^{*}\right) \cdot \Delta^{h} \\
& =\left(\sum_{i \in[t]} y_{i} \cdot \chi^{i}\right) \cdot \Delta^{d}+\sum_{h \in[0, d-1]} U_{h}^{\prime} \cdot \Delta^{h}-\sum_{h \in[0, d-1]} E_{h} \cdot \Delta^{h} .
\end{aligned}
$$

If the honest verifier outputs true, then we have $W=\sum_{h \in[0, d-1]} U_{h}^{\prime} \cdot \Delta^{h}$. Therefore, we have the following:

$$
\left(\sum_{i \in[t]} y_{i} \cdot \chi^{i}\right) \cdot \Delta^{d}-E_{d-1} \cdot \Delta^{d-1}-\cdots-E_{1} \cdot \Delta-E_{0}=0 .
$$

If $\sum_{i \in[t]} y_{i} \cdot \chi^{i} \neq 0$, the probability that the above equation holds is at most $d / p^{r}$, as $\Delta \in \mathbb{F}_{p^{r}}$ is uniformly random and kept secret from the adversary's view. In the following, we assume that $\sum_{i \in[t]} y_{i} \cdot \chi^{i}=0$. If there exists some $i \in[t]$ such that $y_{i} \neq 0$, then that probability that $\sum_{i \in[t]} y_{i} \cdot \chi^{i}=0$ is at most $t / p^{r}$, since $\chi \in \mathbb{F}_{p^{r}}$ is sampled uniformly at random after $y_{i}$ for all $i \in[t]$ have been defined. Overall, the probability that the honest verifier outputs true but $f_{i}(\boldsymbol{w}) \neq 0$ for some $i \in[t]$ is bounded by $(d+t) / p^{r}$. In conclusion, any unbounded environment $\mathcal{Z}$ cannot distinguish between the real-world execution and idealworld execution, except with probability $(d+t) / p^{r}$.

Malicious verifier. If $\mathcal{S}$ receives false from $\mathcal{F}_{\mathrm{ZK}}$, then it simply aborts. Otherwise, $\mathcal{S}$ interacts with adversary $\mathcal{A}$ as follows:

1. In the preprocessing phase, $\mathcal{S}$ emulates $\mathcal{F}_{\text {ext-sVoLE }}^{p, r}$ by recording the global key $\Delta$ and the keys for all the authenticated values, which are received from adversary $\mathcal{A}$. Additionally, $\mathcal{S}$ also receives $B^{*} \in \mathbb{F}_{p^{r}}$ from $\mathcal{A}$ by emulating $\mathcal{F}_{\text {ext-sVole }}^{p, r}$.
2. $\mathcal{S}$ executes the step 4 of protocol $\Pi_{\text {polyZk }}^{p, r}$ by sending uniform $\delta_{i} \in \mathbb{F}_{p}$ for $i \in[n]$ to adversary $\mathcal{A}$.
3. For steps 5-6 of the ZK protocol, $\mathcal{S}$ computes $W$ by using $\Delta$, the keys and $B^{*}$ received from $\mathcal{A}$ following the protocol description, and then samples $U_{1}, \ldots, U_{d-1} \leftarrow \mathbb{F}_{p^{r}}$ and computing $U_{0}:=W-\sum_{h \in[d-1]} U_{h}$. $\Delta^{h}$. Then, $\mathcal{S}$ sends $U_{0}, \ldots, U_{d-1}$ to $\mathcal{A}$.

Note that $\left\{s_{i}\right\}$ and $\left\{A_{h}^{*}\right\}_{h \in[d-1]}$ are sampled uniformly at random and kept secret from the view of adversary $\mathcal{A}$. Therefore, we easily obtain that the view of $\mathcal{A}$ simulated by $\mathcal{S}$ is distributed identically to its view in the real-world execution, which completes the proof.

For the polynomial-based ZK protocol, we can also use the Fiat-Shamir heuristic to make the online phase non-interactive at the cost of that the security is guaranteed in the computational sense.

### 5.2 Optimizing Practical Applications

In the following applications, for the sake of simplicity, we always assume that $p^{r} \approx 2^{\kappa}$ as the Fiat-Shamir heuristic is assumed to be implicitly used in the applications. For the interactive case, we can also extend the applications to smaller extension fields, as long as the soundness error is assured negligible in $\rho$. In this section, the communication cost is computed in the sVOLE-hybrid model. ${ }^{2}$
Optimizing matrix multiplication. The prover wants to prove that $\mathbf{A} \cdot \mathbf{B}=\mathbf{C}$, where $\mathbf{A}, \mathbf{B} \in \mathbb{F}_{p}^{n \times n}$ are two secret matrices and $\mathbf{C} \in \mathbb{F}_{p}^{n \times n}$ is a public matrix known by the verifier. Using the circuit-based ZK protocol shown in Figure 4, this will need communication of $\left(2 n^{2}+n^{3}\right) \log p+2 \kappa$ bits.

Using the polynomial-based ZK protocol described in Figure 5, we can directly obtain a ZK protocol for inner product of two $n$-length vectors with communication of $2 n \log p+2 \kappa$ bits, by defining a polynomial $f(\boldsymbol{x}, \boldsymbol{y})=\sum_{i \in[n]} x_{i} \cdot y_{i}$ for two vectors $\boldsymbol{x}, \boldsymbol{y} \in \mathbb{F}_{p}^{n}$. The communication complexity remains unchanged, even if the inner product of $t$ vector pairs needs to be proved. This immediately gives us a ZK protocol for proving matrix multiplication with communication of $2 n^{2} \log p+2 \kappa$ bits, since a matrix multiplication can be written as the inner-product of $n^{2}$ vector pairs, where the communication of $2 n^{2} \log p$ bits is used to commit the entries in matrices $\mathbf{A}$ and $\mathbf{B}$ using sVOLE.
Proving solutions to lattice problems. Here, we assume the prover has a binary vector $s \in\{0,1\}^{m}$ and intends to prove that $\mathbf{A} \cdot \boldsymbol{s}=\boldsymbol{t}$, with public matrix $\mathbf{A} \in \mathbb{Z}_{q}^{n \times m}$ and vector $\boldsymbol{t} \in \mathbb{Z}_{q}^{n}$. Here we assume that $q$ is a prime. The SIS problem has been considered in prior work such as [BN20, WYKW20]. Our ZK

[^1]| Threads | Boolean Circuits |  |  |  |  | Arithmetic Circuits |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | 10 Mbps | 20 Mbps | 30 Mbps | 50 Mbps | Local-host | 100 Mbps | 500 Mbps | 1 Gbps | 2 Gbps | Local-host |
| 1 | 4.4 M | 6.2 M | 7.0 M | 7.5 M | 7.6 M | 1.2 M | 3.4 M | 4.2 M | 4.8 M | 4.8 M |
| 2 | 5.3 M | 8.1 M | 9.9 M | 11.8 M | 11.8 M | 1.3 M | 4.4 M | 6.1 M | 7.0 M | 7.1 M |
| 3 | 5.7 M | 9.1 M | 11.4 M | 13.9 M | 14.3 M | 1.4 M | 4.9 M | 7.2 M | 8.4 M | 8.4 M |
| 4 | 5.8 M | 9.9 M | 12.2 M | 14.9 M | 15.8 M | 1.4 M | 5.0 M | 7.5 M | 8.9 M | 8.9 M |

Table 2: Benchmark the performance of our circuit-based ZK protocol. The benchmark results are the number of AND/MULT gates per second that can be proven using our protocol, where "M" means "million". Benchmark was obtained with different network settings and number of threads.
protocol for polynomial sets can be used to prove the statements more efficiently. To commit to all input bits $s_{1}, \ldots, s_{m}$, the ZK protocol takes $m \log q$ bits. Then we need to show that 1 ) the prover indeed commits bits and that 2) the above linear equation holds. All of the above can be modeled as a set of $m+n$ polynomials with degree at most 2 . In particular, we need to show that $s_{i} \cdot\left(s_{i}-1\right)=s_{i}^{2}-s_{i}=0$ for all $i \in[m]$ and that $\sum_{j \in[m]} a_{i, j} \cdot s_{j}-t_{i}=0$ for $i \in[n]$ where $a_{i, j}$ is the entry in the $i$-th row and $j$-th column of matrix $\mathbf{A}$. Since the polynomial degree is at most 2 , the communication cost would be 2 elements over the extension field, each of size roughly $\kappa$ bits. Therefore, the total communication cost is $m \log q+2 \kappa$ bits.

If the secret vector $s$ is in $[-B, B]^{m}$ (with a small integer $B$ ) instead of a binary vector, which has also been addressed by prior work [BLS19, BCOS20, ENS20, WYKW20], we would need a degree- $(2 B+1)$ polynomial to prove that $f\left(s_{i}\right)=0$ for all $i \in[m]$ where $f(x)=\Pi_{j \in[-B, B]}(x-j)$, with the total communication cost of $m \log q+(2 B+1) \kappa$ bits.

In Section 6.2, we evaluate the concrete performance to demonstrate that our polynomial-based ZK protocol significantly outperforms prior work for proving knowledge of solutions to SIS problems.

Optimizing integer operations over a ring. Arithmetic operations over a field may often be sufficient for some applications. However, for applications where matching cleartext computation is crucial, the statement to be proven may require native computation over a ring $\mathbb{Z}_{2^{n}}$ such as $\mathbb{Z}_{2^{32}}$. In this case, one may naturally think about ring operations. Here we explore an alternative approach.

Our idea is to view integer multiplication over $\mathbb{Z}_{2^{n}}$ as a set of $n$ polynomials that take $2 n$ variables as input. In this case, the maximum degree for these polynomials is $8 n^{2}$, since the Boolean circuit for integer multiplication has a depth $2 \log n+3$ [BHWK16]. The communication cost for proving a set of integer multiplications would become linear to $n$, when the number of integer multiplications to be proven is large. In particular, if there are $t$ integer multiplications to be proven with $t \approx 8 n \kappa$, the amortized communication cost for each multiplication will be $3 n+\frac{8 n^{2} \kappa}{t} \approx 4 n$ bits.
Optimizing for circuits with weak uniformity. Inspired by the above concrete examples, we summarize a blueprint to optimize circuits with some level of weak uniformity (i.e., the polynomial representations of sub-circuits are all bounded by some degree $d$ ).

Assume that the circuit to be proven is $C$, which contains $t$ multiplication gates. We let $C_{1}, \ldots, C_{k}$ be $k$ non-overlapping sub-circuits of $C$, such that for sub-circuit $C_{i}$, it has $t_{i}$ multiplication gates without counting the multiplication gates that include the output wires of $C_{i}$. Each sub-circuit can be represented as a set of polynomials with the degree at most $d$. In a nutshell, our ZK protocol can be constructed as follows:

1. Use sVOLE to commit to all the wire values in $C \backslash\left\{C_{1}, \ldots, C_{k}\right\}$, including the input wires go into the sub-circuits $C_{1}, \ldots, C_{k}$ and the output wires go out of these sub-circuits.
This step takes $t-\sum_{i=1}^{k} t_{i}$ elements over $\mathbb{F}_{p}$ for communication.
2. Prove that all multiplication gates in $C \backslash\left\{C_{1}, \ldots, C_{k}\right\}$ are computed correctly using our ZK protocol for circuit satisfiability shown in Figure 4.

This step takes $2 \kappa$ bits of communication.
3. For each sub-circuit $C_{i}$, represent it as a set of polynomials, one for each output of $C_{i}$. Prove that all the polynomials with respect to all sub-circuits are computed correctly using our ZK protocol for polynomial satisfiability shown in Figure 5.
This step takes $d \kappa$ bits of communication, because all input and output wire values have already been committed with sVOLE.

In summary, the communication of the above protocol is essentially $\left(t-\sum_{i \in[k]} t_{i}\right) \log p+(d+2) \kappa$ bits. Now the task is really about how to "dig" as many "holes" as possible from $C$, while keeping all holes relatively simple. In practice, this is fairly common, as the real-life computations are written in succinct libraries, which means the same subroutine is often called for many times. We leave it as a future work to fully explore its potential and build an automated optimizer to maximize the practical efficiency.

## 6 Implementation and Benchmarking

We implemented our ZK protocols and report their performance. Unless otherwise specified, our evaluation results are reported over two Amazon EC2 machines of type m5.2xlarge with throttled network bandwidth (with latency about 0.1 ms ) and one thread. Each machine has 8 virtual CPUs, which means 4 CPU cores. We instantiate the COT protocol (i.e., sVOLE with $p=2$ and $r=\kappa$ ) and the VOLE protocol over a 61 -bit field by using the recent protocols [YWL ${ }^{+} 20$, WYKW20], and use SHA- 256 as the cryptographic hash function modeled as a random oracle. We take advantage of hardware AES-NI and binary-field multiplication when applicable. All our implementations achieve computational security parameter $\kappa=128$ and statistical security parameter $\rho \approx 100$ for Boolean circuits, and $\kappa=128$ and $\rho \geq 40$ for arithmetic circuits over a 61-bit field where Mersenne prime $p=2^{61}-1$ is used as in prior work.

### 6.1 Benchmarking Our Circuit-based ZK Proof

We benchmarked the performance of our ZK protocol by proving circuits with $3 \times 10^{8}$ AND/MULT gates. Similar to prior work [WYKW20, BMRS20], we observe that the performance does not depend on the shape of the circuit and is linear to the circuit size; and thus we focus on the speed in terms of "million gates per second". In Table 2, we benchmarked the performance of our circuit-based ZK protocol under different network settings and number of threads. The performance of our protocol ranges from 4.4 million to 15.8 million AND gates per second (or from 1.2 million to 8.9 million multiplication gates per second), depending on the network setting and number of threads. When we increase the number of threads and/or the network bandwidth, we could see an increase in the performance. The computation becomes the efficiency bottleneck of our protocol for Boolean circuits (resp., arithmetic circuits) when the network bandwidth is increased to 50 Mbps (resp., 2 Gbps ), and thus the performance is not improved much beyond that.
Comparison with prior work. We compared the performance of our ZK protocol QuickSilver and prior related work in Table 1. Since Mac' $n^{\prime}$ Cheese [BMRS20] only reported the performance of their protocol with one thread and local-host, we compare the performance of all protocols using this setup. In the Boolean setting, we observe $6 \times$ improvement in computation and $7 \times$ improvement in communication compared to the state-of-the-art protocol Wolverine [WYKW20]. For arithmetic circuits, our protocol improves by at least $7 \times$ in computation and $3 \times-4 \times$ in communication compared to Wolverine and Mac' $n^{\prime}$ Cheese. Note that Wolverine studied the performance of their ZK protocol when used for DECO $\left[\mathrm{ZMM}^{+} 20\right]$ and Blind CA [WAP ${ }^{+}$19], as well as other applications like Merkle trees, and proving bugs in a set of code snippets [HK20]. Our performance improvement described as above directly translates to the improvements for all of these applications.

| Instance Information |  |  | Boolean Circuits |  | Arithmetic Circuits |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Type | Price cents/hour | CPU | Speed gates/sec | $\begin{gathered} \text { Cost } \\ \text { gates/cent } \end{gathered}$ | Speed gates/sec | Cost gates/cent |
| c6g.medium | 1.9 | ARM | 5.3 M | 10.0 B | 2.2 M | 4.1 B |
| c5.large | 4.7 | Intel | 5.9 M | 4.5 B | 2.9 M | 2.2 B |
| c5a.large | 4.2 | AMD | 7.3 M | 6.3 B | 3.0 M | 2.6 B |

Table 3: Performance of stress-testing our ZK protocol on different Amazon EC2 instances. All instances have 2 vCPUs and 1 GB memory.

| Length of vectors | Binary field $\mathbb{F}_{2}$ |  |  | Large field $\mathbb{F}_{2^{61}-1}$ |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | $10^{6}$ | $10^{7}$ | $10^{8}$ | $10^{6}$ | $10^{7}$ | $10^{8}$ |
| Process witness ( $s$ ) | 0.24 | 2.4 | 23.8 | 0.39 | 3.9 | 39.2 |
| Prove inner product ( $m s$ ) | 36.6 | 69.3 | 423.5 | 42.8 | 100.3 | 703.8 |

Table 4: Performance of our ZK protocol for inner product. We report separately the cost to process the witness and the cost to prove the inner product after the witness was processed.

| Protocol | Execution Time | Communication |
| :---: | :---: | :---: |
| Virgo [ZXZS20] | 357 s | 221 KB |
| Wolverine [WYKW20] | 1627 s | 34 GB |
| Mac' $\mathrm{n}^{\prime}$ Cheese [BMRS20] | 2684 s | 25.8 GB |
| QuickSilver (Circuit) | 316 s | 8.6 GB |
| QuickSilver (Polynomial) | 10 s | 25.2 MB |

Table 5: Performance of proving matrix multiplication using various protocols. Every protocol is used to prove knowledge of two $1024 \times 1024$ matrices over a 61 -bit field, whose product is a public matrix. The execution time for Wolverine and Mac' $n^{\prime}$ Cheese is based on local-host, while our protocols and Virgo are based on a 500 Mbps network. While Virgo uses about 148 GB memory, other protocols use the memory of about 1 GB .

Stress-testing of our ZK protocol. We stress-test our circuit-based ZK protocol on the cheapest instance of Amazon EC2 that only costs 2 to 5 cents per hour, and summarize the experimental results in Table 3. ${ }^{3}$ For all protocol executions, we use only a single thread. The Boolean circuits (resp., arithmetic circuits) are tested under the network bandwidth of 20 Mbps (resp., 500 Mbps ). Although the computational power and memory are limited, our protocol still achieves high throughput. The speed for computing Boolean circuits ranges from 5.3 million to 7.3 million gates per second and the speed for arithmetic circuits ranges from 2.2 million to 3 million gates per second. Taking the low cost into consideration, our ZK protocol is very affordable. The cost to prove Boolean circuits is about 10 billion gates per cent using the ARM CPU; the cost of proving arithmetic circuits is roughly $2.2-4.1$ billion gates per cent.

[^2]| Protocol | ENS [ENS20] | Wolverine [WYKW20] | QuickSilver |
| :---: | :---: | :---: | :---: |
| Communication | 53 KB | 32.8 KB | 4.1 KB |
| Execution time | - | 220 ms | 2 ms |

Table 6: Performance comparison of our ZK protocol QuickSilver vs. prior work for proving knowledge of an SIS solution. The solution is assumed to be a ternary-vector and $n=2048, m=1024, \log q=32$.

### 6.2 Benchmarking Our Polynomial-based ZK Proof

While our ZK protocol for polynomial satisfiability is generic, here we focus on some useful applications that use low-degree polynomials to demonstrate how powerful it can be. We leave exploration of compilerbased optimization and more complicated examples as the future work. In all of the experiments below, we use the network bandwidth of 20 Mbps for a binary field and 500 Mbps for a 61 -bit field, and always use a single thread.

Inner product. In this benchmark, the witness consists of two vectors of $n$ field elements (namely $\boldsymbol{x}$ and $\boldsymbol{y}$ ), and the prover wants to prove that the inner product of two vectors $\langle\boldsymbol{x}, \boldsymbol{y}\rangle=\sum_{i \in[n]} x_{i} \cdot y_{i}$ equals to some public value. We report the cost of processing the witness and the cost of proving the inner product separately in Table 4. We found that processing the witness could be free in a larger computation. This is because when using inner product as a sub-circuit in a larger circuit, the input witness of this subcircuit is the output of some prior computation and thus need not be processed again. In this case, the cost of the ZK proof for inner product is simply the second line. We can see that even for proving inner product of two vectors of length $10^{8}$, the cost is very small.

Matrix multiplication. We report the performance of our ZK protocol for proving matrix multiplication, and compare it with prior work in Table 5 . We observe that our polynomial-based ZK protocol is $31 \times$ faster than our own circuit-based protocol, which is already faster than prior protocols. It also uses $340 \times$ less communication than our circuit-based protocol. Our proof size is still significantly larger than Virgo, but our ZK protocols (QuickSilver) benefit in other aspects including execution time and memory usage. We attempted to test on a recent zk-SNARK implementation Spartan [Set20], but the program used more than 600 GB memory before killed.

Proving knowledge of solutions to lattice-based problems. Here we focus on proving knowledge of a solution to a short integer solution (SIS) problem. We assume that the prover knows a vector $s \in[-B, B]^{m}$, such that $\mathbf{A} \cdot \boldsymbol{s}=\boldsymbol{t}$, where both parties know the public matrix $\mathbf{A} \in \mathbb{Z}_{q}^{n \times m}$ and vector $\boldsymbol{t} \in \mathbb{Z}_{q}^{n}$ (here we assume that $q$ is a prime). Checking the matrix multiplication is easy since the matrix $\mathbf{A}$ is public and thus the main work is to check that all coordinates in $s$ are bounded. This can be done by proving that $\Pi_{j \in[-B, B]}\left(s_{i}-j\right)=0$ for all $i \in[m]$. In typical SIS problems, e.g., the one studied in the recent work [ENS20], $B$ is set to 1 , resulting in a degree-3 polynomial. The checking procedure of our ZK protocol is essentially free compared to the cost of obtaining the committed input to the polynomial. We show the performance comparison in Table 6, where the execution time for [ENS20] is not available from their paper. Due to our improved protocol for low-degree polynomials, our protocol is significantly faster and uses less communication than prior work.

## Acknowledgements

The authors would like to thank Steve Lu for the helpful discussion on the cost of Amazon EC2. This material is based upon work supported in part by DARPA under Contract No. HR001120C0087. The views,
opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. Work of Pratik Sarkar is supported by the DARPA SIEVE project and NSF awards 1931714, 1414119. Work of Kang Yang is supported in part by the National Key Research and Development Program of China (Grant Nos. 2018YFB0804105, 2020YFA0309703), and in part by the National Natural Science Foundation of China (Grant Nos. 61932019, 61802021).

## References

[AHIV17] Scott Ames, Carmit Hazay, Yuval Ishai, and Muthuramakrishnan Venkitasubramaniam. Ligero: Lightweight sublinear arguments without a trusted setup. In Bhavani M. Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu, editors, ACM CCS 2017, pages 2087-2104, Dallas, TX, USA, October 31 - November 2, 2017. ACM Press.
$\left[\mathrm{BBB}^{+} 18\right]$ Benedikt Bünz, Jonathan Bootle, Dan Boneh, Andrew Poelstra, Pieter Wuille, and Greg Maxwell. Bulletproofs: Short proofs for confidential transactions and more. In 2018 IEEE Symposium on Security and Privacy, pages 315-334, San Francisco, CA, USA, May 21-23, 2018. IEEE Computer Society Press.
[BBHR19] Eli Ben-Sasson, Iddo Bentov, Yinon Horesh, and Michael Riabzev. Scalable zero knowledge with no trusted setup. In Alexandra Boldyreva and Daniele Micciancio, editors, CRYPTO 2019, Part III, volume 11694 of LNCS, pages 701-732, Santa Barbara, CA, USA, August 18-22, 2019. Springer, Heidelberg, Germany.
$\left[\mathrm{BCC}^{+} 16\right]$ Jonathan Bootle, Andrea Cerulli, Pyrros Chaidos, Jens Groth, and Christophe Petit. Efficient zero-knowledge arguments for arithmetic circuits in the discrete log setting. In Marc Fischlin and Jean-Sébastien Coron, editors, EUROCRYPT 2016, Part II, volume 9666 of $L N C S$, pages 327-357, Vienna, Austria, May 8-12, 2016. Springer, Heidelberg, Germany.
[BCCT13] Nir Bitansky, Ran Canetti, Alessandro Chiesa, and Eran Tromer. Recursive composition and bootstrapping for SNARKS and proof-carrying data. In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, 45th ACM STOC, pages 111-120, Palo Alto, CA, USA, June 1-4, 2013. ACM Press.
$\left[\mathrm{BCG}^{+} 19 \mathrm{a}\right]$ Elette Boyle, Geoffroy Couteau, Niv Gilboa, Yuval Ishai, Lisa Kohl, Peter Rindal, and Peter Scholl. Efficient two-round OT extension and silent non-interactive secure computation. In Lorenzo Cavallaro, Johannes Kinder, XiaoFeng Wang, and Jonathan Katz, editors, ACM CCS 2019, pages 291-308. ACM Press, November 11-15, 2019.
$\left[\mathrm{BCG}^{+} 19 b\right]$ Elette Boyle, Geoffroy Couteau, Niv Gilboa, Yuval Ishai, Lisa Kohl, and Peter Scholl. Efficient pseudorandom correlation generators: Silent OT extension and more. In Alexandra Boldyreva and Daniele Micciancio, editors, CRYPTO 2019, Part III, volume 11694 of LNCS, pages 489518, Santa Barbara, CA, USA, August 18-22, 2019. Springer, Heidelberg, Germany.
[BCGI18] Elette Boyle, Geoffroy Couteau, Niv Gilboa, and Yuval Ishai. Compressing vector OLE. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors, ACM CCS 2018, pages 896-912, Toronto, ON, Canada, October 15-19, 2018. ACM Press.
[BCOS20] Cecilia Boschini, Jan Camenisch, Max Ovsiankin, and Nicholas Spooner. Efficient postquantum SNARKs for RSIS and RLWE and their applications to privacy. In Jintai Ding
and Jean-Pierre Tillich, editors, Post-Quantum Cryptography - 11th International Conference, PQCrypto 2020, pages 247-267, Paris, France, April 15-17 2020. Springer, Heidelberg, Germany.
[ $\mathrm{BCR}^{+}$19] Eli Ben-Sasson, Alessandro Chiesa, Michael Riabzev, Nicholas Spooner, Madars Virza, and Nicholas P. Ward. Aurora: Transparent succinct arguments for R1CS. In Yuval Ishai and Vincent Rijmen, editors, EUROCRYPT 2019, Part I, volume 11476 of LNCS, pages 103-128, Darmstadt, Germany, May 19-23, 2019. Springer, Heidelberg, Germany.
[BDFG20] Dan Boneh, Justin Drake, Ben Fisch, and Ariel Gabizon. Halo infinite: Recursive zksnarks from any additive polynomial commitment scheme. Cryptology ePrint Archive, Report 2020/1536, 2020. https://eprint.iacr.org/2020/1536.
[BDOZ11] Rikke Bendlin, Ivan Damgård, Claudio Orlandi, and Sarah Zakarias. Semi-homomorphic encryption and multiparty computation. In Kenneth G. Paterson, editor, EUROCRYPT 2011, volume 6632 of $L N C S$, pages 169-188, Tallinn, Estonia, May 15-19, 2011. Springer, Heidelberg, Germany.
[ $\mathrm{BFH}^{+}$20] Rishabh Bhadauria, Zhiyong Fang, Carmit Hazay, Muthuramakrishnan Venkitasubramaniam, Tiancheng Xie, and Yupeng Zhang. Ligero++: A new optimized sublinear IOP. In Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna, editors, ACM CCS 20, pages 2025-2038, Virtual Event, USA, November 9-13, 2020. ACM Press.
[BFS20] Benedikt Bünz, Ben Fisch, and Alan Szepieniec. Transparent SNARKs from DARK compilers. In Anne Canteaut and Yuval Ishai, editors, EUROCRYPT 2020, Part I, volume 12105 of LNCS, pages 677-706, Zagreb, Croatia, May 10-14, 2020. Springer, Heidelberg, Germany.
[BHWK16] Niklas Büscher, Andreas Holzer, Alina Weber, and Stefan Katzenbeisser. Compiling low depth circuits for practical secure computation. In Ioannis G. Askoxylakis, Sotiris Ioannidis, Sokratis K. Katsikas, and Catherine A. Meadows, editors, ESORICS 2016, Part II, volume 9879 of $L N C S$, pages $80-98$, Heraklion, Greece, September 26-30, 2016. Springer, Heidelberg, Germany.
[BLS19] Jonathan Bootle, Vadim Lyubashevsky, and Gregor Seiler. Algebraic techniques for short(er) exact lattice-based zero-knowledge proofs. In Alexandra Boldyreva and Daniele Micciancio, editors, CRYPTO 2019, Part I, volume 11692 of $L N C S$, pages 176-202, Santa Barbara, CA, USA, August 18-22, 2019. Springer, Heidelberg, Germany.
[BMR90] Donald Beaver, Silvio Micali, and Phillip Rogaway. The round complexity of secure protocols (extended abstract). In 22nd ACM STOC, pages 503-513, Baltimore, MD, USA, May 14-16, 1990. ACM Press.
[BMR16] Marshall Ball, Tal Malkin, and Mike Rosulek. Garbling gadgets for Boolean and arithmetic circuits. In Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi, editors, ACM CCS 2016, pages 565-577, Vienna, Austria, October 24-28, 2016. ACM Press.
[BMRS20] Carsten Baum, Alex J. Malozemoff, Marc Rosen, and Peter Scholl. Mac'n'cheese: Zeroknowledge proofs for arithmetic circuits with nested disjunctions. Cryptology ePrint Archive, Report 2020/1410, 2020. https://eprint.iacr.org/2020/1410.
[BN20] Carsten Baum and Ariel Nof. Concretely-efficient zero-knowledge arguments for arithmetic circuits and their application to lattice-based cryptography. In Aggelos Kiayias, Markulf Kohlweiss, Petros Wallden, and Vassilis Zikas, editors, PKC 2020, Part I, volume 12110 of LNCS, pages 495-526, Edinburgh, UK, May 4-7, 2020. Springer, Heidelberg, Germany.
[Can01] Ran Canetti. Universally composable security: A new paradigm for cryptographic protocols. In 42nd FOCS, pages 136-145, Las Vegas, NV, USA, October 14-17, 2001. IEEE Computer Society Press.
[CDG ${ }^{+}$17] Melissa Chase, David Derler, Steven Goldfeder, Claudio Orlandi, Sebastian Ramacher, Christian Rechberger, Daniel Slamanig, and Greg Zaverucha. Post-quantum zero-knowledge and signatures from symmetric-key primitives. In Bhavani M. Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu, editors, ACM CCS 2017, pages 1825-1842, Dallas, TX, USA, October 31 - November 2, 2017. ACM Press.
[COS20] Alessandro Chiesa, Dev Ojha, and Nicholas Spooner. Fractal: Post-quantum and transparent recursive proofs from holography. In Anne Canteaut and Yuval Ishai, editors, EUROCRYPT 2020, Part I, volume 12105 of LNCS, pages 769-793, Zagreb, Croatia, May 10-14, 2020. Springer, Heidelberg, Germany.
[DIO20] Samuel Dittmer, Yuval Ishai, and Rafail Ostrovsky. Line-point zero knowledge and its applications. Cryptology ePrint Archive, Report 2020/1446, 2020. https://eprint.iacr. org/2020/1446.
[ENS20] Muhammed F. Esgin, Ngoc Khanh Nguyen, and Gregor Seiler. Practical exact proofs from lattices: New techniques to exploit fully-splitting rings. In ASIACRYPT 2020, Part II, LNCS, pages 259-288. Springer, Heidelberg, Germany, December 2020.
[FNO15] Tore Kasper Frederiksen, Jesper Buus Nielsen, and Claudio Orlandi. Privacy-free garbled circuits with applications to efficient zero-knowledge. In Elisabeth Oswald and Marc Fischlin, editors, EUROCRYPT 2015, Part II, volume 9057 of LNCS, pages 191-219, Sofia, Bulgaria, April 26-30, 2015. Springer, Heidelberg, Germany.
[GKR08] Shafi Goldwasser, Yael Tauman Kalai, and Guy N. Rothblum. Delegating computation: interactive proofs for muggles. In Richard E. Ladner and Cynthia Dwork, editors, 40th ACM STOC, pages 113-122, Victoria, BC, Canada, May 17-20, 2008. ACM Press.
[GMO16] Irene Giacomelli, Jesper Madsen, and Claudio Orlandi. ZKBoo: Faster zero-knowledge for Boolean circuits. In Thorsten Holz and Stefan Savage, editors, USENIX Security 2016, pages 1069-1083, Austin, TX, USA, August 10-12, 2016. USENIX Association.
[HK20] David Heath and Vladimir Kolesnikov. Stacked garbling for disjunctive zero-knowledge proofs. In Anne Canteaut and Yuval Ishai, editors, EUROCRYPT 2020, Part III, volume 12107 of LNCS, pages 569-598, Zagreb, Croatia, May 10-14, 2020. Springer, Heidelberg, Germany.
[HSS17] Carmit Hazay, Peter Scholl, and Eduardo Soria-Vazquez. Low cost constant round MPC combining BMR and oblivious transfer. In Tsuyoshi Takagi and Thomas Peyrin, editors, ASIACRYPT 2017, Part I, volume 10624 of LNCS, pages 598-628, Hong Kong, China, December 3-7, 2017. Springer, Heidelberg, Germany.
[IKOS07] Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Zero-knowledge from secure multiparty computation. In David S. Johnson and Uriel Feige, editors, 39th ACM STOC, pages 21-30, San Diego, CA, USA, June 11-13, 2007. ACM Press.
[JKO13] Marek Jawurek, Florian Kerschbaum, and Claudio Orlandi. Zero-knowledge using garbled circuits: how to prove non-algebraic statements efficiently. In Ahmad-Reza Sadeghi, Virgil D. Gligor, and Moti Yung, editors, ACM CCS 2013, pages 955-966, Berlin, Germany, November 4-8, 2013. ACM Press.
[KKW18] Jonathan Katz, Vladimir Kolesnikov, and Xiao Wang. Improved non-interactive zero knowledge with applications to post-quantum signatures. In David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang, editors, ACM CCS 2018, pages 525-537, Toronto, ON, Canada, October 15-19, 2018. ACM Press.
[KMR14] Vladimir Kolesnikov, Payman Mohassel, and Mike Rosulek. FleXOR: Flexible garbling for XOR gates that beats free-XOR. In Juan A. Garay and Rosario Gennaro, editors, CRYPTO 2014, Part II, volume 8617 of LNCS, pages 440-457, Santa Barbara, CA, USA, August 17-21, 2014. Springer, Heidelberg, Germany.
[KP17] Yashvanth Kondi and Arpita Patra. Privacy-free garbled circuits for formulas: Size zero and information-theoretic. In Jonathan Katz and Hovav Shacham, editors, CRYPTO 2017, Part I, volume 10401 of LNCS, pages 188-222, Santa Barbara, CA, USA, August 20-24, 2017. Springer, Heidelberg, Germany.
[KS08] Vladimir Kolesnikov and Thomas Schneider. Improved garbled circuit: Free XOR gates and applications. In Luca Aceto, Ivan Damgård, Leslie Ann Goldberg, Magnús M. Halldórsson, Anna Ingólfsdóttir, and Igor Walukiewicz, editors, ICALP 2008, Part II, volume 5126 of LNCS, pages 486-498, Reykjavik, Iceland, July 7-11, 2008. Springer, Heidelberg, Germany.
[MPas15] Tal Malkin, Valerio Pastro, and abhi shelat. The whole is greater than the sum of its parts: Linear garbling and applications. https://simons.berkeley.edu/talks/ tal-malkin-2015-06-10, 2015.
[NNOB12] Jesper Buus Nielsen, Peter Sebastian Nordholt, Claudio Orlandi, and Sai Sheshank Burra. A new approach to practical active-secure two-party computation. In Reihaneh Safavi-Naini and Ran Canetti, editors, CRYPTO 2012, volume 7417 of $L N C S$, pages 681-700, Santa Barbara, CA, USA, August 19-23, 2012. Springer, Heidelberg, Germany.
[NPS99] Moni Naor, Benny Pinkas, and Reuban Sumner. Privacy preserving auctions and mechanism design. In Proceedings of the 1st ACM conference on Electronic commerce, pages 129-139, 1999.
[Set20] Srinath Setty. Spartan: Efficient and general-purpose zkSNARKs without trusted setup. In Daniele Micciancio and Thomas Ristenpart, editors, CRYPTO 2020, Part III, volume 12172 of LNCS, pages 704-737, Santa Barbara, CA, USA, August 17-21, 2020. Springer, Heidelberg, Germany.
[SGRR19] Phillipp Schoppmann, Adrià Gascón, Leonie Reichert, and Mariana Raykova. Distributed vector-OLE: Improved constructions and implementation. In Lorenzo Cavallaro, Johannes Kinder, XiaoFeng Wang, and Jonathan Katz, editors, ACM CCS 2019, pages 1055-1072. ACM Press, November 11-15, 2019.
[SL20] Srinath Setty and Jonathan Lee. Quarks: Quadruple-efficient transparent zkSNARKs. Cryptology ePrint Archive, Report 2020/1275, 2020. https://eprint.iacr.org/2020/ 1275.
[WAP ${ }^{+}$19] Liang Wang, Gilad Asharov, Rafael Pass, Thomas Ristenpart, and abhi shelat. Blind certificate authorities. In 2019 IEEE Symposium on Security and Privacy, pages 1015-1032, San Francisco, CA, USA, May 19-23, 2019. IEEE Computer Society Press.
[WTs ${ }^{+}$18] Riad S. Wahby, Ioanna Tzialla, abhi shelat, Justin Thaler, and Michael Walfish. Doublyefficient zkSNARKs without trusted setup. In 2018 IEEE Symposium on Security and Privacy, pages 926-943, San Francisco, CA, USA, May 21-23, 2018. IEEE Computer Society Press.
[WYKW20] Chenkai Weng, Kang Yang, Jonathan Katz, and Xiao Wang. Wolverine: Fast, scalable, and communication-efficient zero-knowledge proofs for boolean and arithmetic circuits. Cryptology ePrint Archive, Report 2020/925, 2020. https://eprint.iacr.org/2020/925.
$\left[\mathrm{YWL}^{+} 20\right]$ Kang Yang, Chenkai Weng, Xiao Lan, Jiang Zhang, and Xiao Wang. Ferret: Fast extension for correlated OT with small communication. In Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna, editors, ACM CCS 20, pages 1607-1626, Virtual Event, USA, November 913, 2020. ACM Press.
[ZMM ${ }^{+}$20] Fan Zhang, Deepak Maram, Harjasleen Malvai, Steven Goldfeder, and Ari Juels. DECO: Liberating web data using decentralized oracles for TLS. In Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna, editors, ACM CCS 20, pages 1919-1938, Virtual Event, USA, November 9-13, 2020. ACM Press.
[ZRE15] Samee Zahur, Mike Rosulek, and David Evans. Two halves make a whole - reducing data transfer in garbled circuits using half gates. In Elisabeth Oswald and Marc Fischlin, editors, EUROCRYPT 2015, Part II, volume 9057 of LNCS, pages 220-250, Sofia, Bulgaria, April 2630, 2015. Springer, Heidelberg, Germany.
[ZXZS20] Jiaheng Zhang, Tiancheng Xie, Yupeng Zhang, and Dawn Song. Transparent polynomial delegation and its applications to zero knowledge proof. In 2020 IEEE Symposium on Security and Privacy, pages 859-876, San Francisco, CA, USA, May 18-21, 2020. IEEE Computer Society Press.

## A More Details on Preliminaries

Universal composability. We use the universal composability (UC) framework [Can01] to prove security in the presence of a malicious, static adversary. We say that a protocol $\Pi U C$-realizes an ideal functionality $\mathcal{F}$ if for any PPT adversary $\mathcal{A}$, there exists a PPT simulator $\mathcal{S}$ such that for any PPT environment $\mathcal{Z}$ with arbitrary auxiliary input $z$, the output distribution of $\mathcal{Z}$ in the real-world execution where the parties interact with $\mathcal{A}$ and execute $\Pi$ is indistinguishable from the output distribution of $\mathcal{Z}$ in the ideal-world execution where the parties interact with $\mathcal{S}$ and $\mathcal{F}$.

We prove the security of our protocols in the $\mathcal{G}$-hybrid model, where the parties execute a protocol with real messages and also have access to an ideal functionality $\mathcal{G}$. We say that protocol $\Pi$ UC-realizes functionality $\mathcal{F}$ in the $\mathcal{G}$-hybrid model with statistical error $1 / 2^{\rho}$ and the information-theoretic security (resp., the computational security), if the distinguishing probability of environment $\mathcal{Z}$ between the realworld execution and ideal-world execution is bounded by $1 / 2^{\rho}$ (resp., $1 / 2^{\rho}+\operatorname{negl}(\kappa)$ ).
Global-key query for sVOLE. In the malicious setting, if the sVOLE protocol [WYKW20] over any field is used to securely realize $\mathcal{F}_{\text {sVOLE }}^{p, r}$ shown in Figure 6, then we can guarantee the UC security when the following single global-key query is added into $\mathcal{F}_{\text {sVOLE }}^{p, r}$.

## Functionality $\mathcal{F}_{\mathrm{sVOLE}}^{p, r}$

Initialize: Upon receiving (init) from $\mathcal{P}$ and $\mathcal{V}$, sample $\Delta \leftarrow \mathbb{F}_{p^{r}}$ if $\mathcal{V}$ is honest, and receive $\Delta \in \mathbb{F}_{p^{r}}$ from the adversary otherwise. Store $\Delta$ and send it to $\mathcal{V}$, and ignore all subsequent (init) commands.

Extend: This procedure can be run multiple times. Upon receiving (extend, $\ell$ ) from $\mathcal{P}$ and $\mathcal{V}$, execute the following:

1. If $\mathcal{V}$ is honest, sample $\boldsymbol{k} \leftarrow \mathbb{F}_{p^{r}}^{\ell}$. Otherwise, receive $\boldsymbol{k} \in \mathbb{F}_{p^{r}}^{\ell}$ from the adversary.
2. If $\mathcal{P}$ is honest, sample $\boldsymbol{x} \leftarrow \mathbb{F}_{p}^{\ell}$ and compute $\boldsymbol{m}:=\boldsymbol{k}-\Delta \cdot \boldsymbol{x} \in \mathbb{F}_{p^{r}}^{\ell}$. Otherwise, receive $\boldsymbol{x} \in \mathbb{F}_{p}^{\ell}$ and $\boldsymbol{m} \in \mathbb{F}_{p^{r}}^{\ell}$ from the adversary, and then recompute $\boldsymbol{k}:=\boldsymbol{m}+\Delta \cdot \boldsymbol{x} \in \mathbb{F}_{p^{r}}^{\ell}$.
3. Send $(\boldsymbol{x}, \boldsymbol{m})$ to $\mathcal{P}$ and $\boldsymbol{k}$ to $\mathcal{V}$.

## Figure 6: Functionality for subfield VOLE.

- If $\mathcal{P}$ is corrupted, receive (guess, $\Delta^{\prime}$ ) from the adversary. If $\Delta^{\prime}=\Delta$, then send success to $\mathcal{P}$ and ignore any subsequent global-key query. Otherwise, send abort to both parties and abort.

This global-key query allows the adversary to guess $\Delta$ only once, which only increases the successful probability of the adversary to guess $\Delta$ by at most $1 / p^{r}$. If the more efficient sVOLE protocol [YWL ${ }^{+}$20] for only the case of $p=2$ and $r=\kappa$ is used, then we need to strengthen the global-key query from the single query to any polynomial number of queries, where this functionality will not abort for an incorrect guess. This type of global-key queries has been used in MPC protocols such as [NNOB12, HSS17]. In this case, the successful probability of the adversary to guess $\Delta$ is increased to at most $q / 2^{\kappa}$, where $q$ is the number of global-key queries.

For the sake of simplicity, we omit the global-key query, even if we use the protocols [WYKW20, $\left.\mathrm{YWL}^{+} 20\right]$ to UC-realize $\mathcal{F}_{\mathrm{sVOLE}}^{p, r}$.

## B Proof of Theorem 1

Proof. We easily construct a PPT simulator $\mathcal{S}$ given access to $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$, and running the (unbounded) adversary $\mathcal{A}$ as a subroutine while emulating $\mathcal{F}_{\text {sVOLE }}^{p, r}$ for $\mathcal{A}$. In particular, there is no communication between $\mathcal{P}$ and $\mathcal{V}$. Thus, $\mathcal{S}$ can emulate $\mathcal{F}_{\text {sVOLE }}^{p, r}$ and record all the values sent by $\mathcal{A}$ to $\mathcal{F}_{\text {sVOLE }}^{p, r}$, and then compute the output value for the corrupted party following the protocol specification, and send it to functionality $\mathcal{F}_{\text {ext-sVOLE }}^{p, r}$. It is trivial to see that the simulation is perfect.

Below, we show that the output of the honest party is perfectly indistinguishable between the realworld execution and ideal-world execution. We first prove if both parties compute their output locally following the protocol specification, then their outputs satisfy the correct VOPE correlation. Specifically, from $k_{h}=m_{h}+u_{h} \cdot \Delta$ for $h \in[r]$, we easily obtain that for each $i \in[d]$,

$$
\begin{aligned}
K_{i} & =\sum_{h \in[r]} k_{h} \cdot X^{h-1}=\sum_{h \in[r]}\left(m_{h}+u_{h} \cdot \Delta\right) \cdot X^{h-1} \\
& =\sum_{h \in[r]} m_{h} \cdot X^{h-1}+\left(\sum_{h \in[r]} u_{h} \cdot X^{h-1}\right) \cdot \Delta \\
& =M_{i}+U_{i} \cdot \Delta .
\end{aligned}
$$

It is easy to see that $B_{1}=K_{1}=M_{1}+U_{1} \cdot \Delta=g_{1}(\Delta)$. Below, we prove by induction. In the $i$-th iteration
with $i \in[d-1]$, assuming that $B_{i}=g_{i}(\Delta)$, we have the following holds:

$$
\begin{aligned}
B_{i+1} & =B_{i} \cdot K_{i+1}+K_{d+i} \\
& =g_{i}(\Delta) \cdot\left(M_{i+1}+U_{i+1} \cdot \Delta\right)+\left(M_{d+i}+U_{d+i} \cdot \Delta\right) \\
& =g_{i+1}(\Delta)
\end{aligned}
$$

Therefore, we obtain that $B=B_{d}=g_{d}(\Delta)=\sum_{i \in[0, d]} A_{i} \cdot \Delta^{i}$.
If $\mathcal{V}$ is honest, then its output is always defined by the output of malicious party $\mathcal{P}$ and $\Delta$ (i.e., $B=$ $\sum_{i \in[0, d]} A_{i} \cdot \Delta^{i}$ ) in both worlds. In the following, we consider the case that $\mathcal{P}$ is honest but $\mathcal{V}$ is malicious. The output values $A_{0}, \ldots, A_{d}$ for $\mathcal{P}$ are uniformly random such that $B=\sum_{i \in[0, d]} A_{i} \cdot \Delta^{i}$ in the ideal-world execution, where $B$ is the output of malicious party $\mathcal{V}$. In the real protocol execution, $A_{i}$ for each $i \in[0, d]$ is computed as the coefficient of item $x^{i}$ for polynomial $g_{d}(x)$. According to the definition of $\mathcal{F}_{\text {sVOLE }}^{p, r}, u_{h}$ for $h \in[r]$ is uniform in $\mathbb{F}_{p}$. Therefore, for $i \in[2 d-1]$, we have that $U_{i}:=\sum_{h \in[r]} u_{h} \cdot X^{h-1}$ is uniformly random in $\mathbb{F}_{p^{r}}$. For $i \in[d]$, we prove by induction that each coefficient of $g_{i}(x)$ except for constant term is uniformly distributed in $\mathbb{F}_{p^{r}}$, except with probability at most $1 / p^{r}$. This holds for $g_{1}(x)=M_{1}+U_{1} \cdot x$ with probability 1 . In the $i$-th iteration with $i \in[d-1]$, we have that the coefficients $A_{i, 1}, \ldots, A_{i, i}$ of degree- $i$ polynomial $g_{i}(x)$ are uniform by the induction assumption. From the definition of $g_{i+1}(x)$, we obtain the following holds:

$$
\begin{aligned}
g_{i+1}(x)= & g_{i}(x) \cdot\left(M_{i+1}+U_{i+1} \cdot x\right)+\left(M_{d+i}+U_{d+i} \cdot x\right) \\
= & \left(\sum_{h=0}^{i} A_{i, h} \cdot x^{h}\right) \cdot\left(M_{i+1}+U_{i+1} \cdot x\right)+\left(M_{d+i}+U_{d+i} \cdot x\right) \\
= & \left(A_{i, 0} \cdot M_{i+1}+M_{d+i}\right)+\left(A_{i, 1} \cdot M_{i+1}+A_{i, 0} \cdot U_{i+1}+U_{d+i}\right) \cdot x \\
& \sum_{h=2}^{i}\left(A_{i, h} \cdot M_{i+1}+A_{i, h-1} \cdot U_{i+1}\right) \cdot x^{h}+A_{i, i} \cdot U_{i+1} \cdot x^{i+1} .
\end{aligned}
$$

From the uniformity of $U_{d+i}$, we directly obtain that the 1-degree term of $g_{i+1}(x)$ is uniform. If $U_{i+1} \neq 0$ except with probability $1 / p^{r}$, then the $h$-degree term of $g_{i+1}(x)$ for $h \in[2, i+1]$ is uniform from the uniformity of $A_{i, h-1}$. Overall, except with probability $1 / p^{r}$, each coefficient of $g_{i+1}(x)$ except for constant term is uniformly random. Therefore, the coefficients $A_{1}, \ldots, A_{d}$ of polynomial $g_{d}(x)$ are uniform over $\mathbb{F}_{p^{r}}$, except with probability at most $(d-1) / p^{r}$. Together with $B=g_{d}(\Delta)$, we have that $A_{0}=B-\sum_{i \in[d]} A_{i} \cdot \Delta^{i}$, which completes the proof.


[^0]:    ${ }^{1}$ In this paper, we say that a "large field" means that the size is at least $2^{\rho}$.

[^1]:    ${ }^{2}$ We note that the communication for generating sVOLE correlations is sublinear to the number of resulting sVOLE correlations, using the recent LPN-based protocols [SGRR19, $\mathrm{BCG}^{+} 19 \mathrm{a}, \mathrm{YWL}^{+}$20, WYKW20].

[^2]:    ${ }^{3}$ Price is based on AWS defined-duration spot instances. There are cheaper t3.medium, t3a.medium burstable instances, but the cost is higher than the instances in Table 3 unless the average CPU usage is kept below $20 \%$.

