# Concretely-Efficient Zero-Knowledge Arguments for Arithmetic Circuits and Their Application to Lattice-Based Cryptography 

Carsten Baum ${ }^{\star} \quad$ Ariel Nof*<br>Department of Computer Science, Bar-Ilan University<br>carsten.baum@biu.ac.il, ariel.nof@biu.ac.il


#### Abstract

In this work we present a new interactive Zero-Knowledge Argument of knowledge for general arithmetic circuits. Our protocol is based on the "MPC-in-the-head"-paradigm of Ishai et al. (STOC 2009) and uses MPC with preprocessing such as recently proposed by Katz, Kolesnikov and Wang (ACM CCS 2018). Our argument system uses only symmetric-key primitives and utilizes a version of the so-called SPDZ-protocol which has efficiency benefits for arithmetic circuits compared to other approaches. Based on specific properties of our protocol we then show how it can be used to construct an efficient Zero-Knowledge Argument of Knowledge for instances of the Short Integer Solution (SIS) problem. We present different protocols that are tailored to specific uses of SIS and show how our solution compares in terms of argument size to existing work. We moreover implemented our Zero-Knowledge argument for SIS and show that using our protocols it is possible to run a complete interactive proof, even for general SIS instances which result in a circuit with $>10^{6}$ gates, in less than 0.5 seconds. To the best of our knowledge, our construction outperforms all known approaches for the SIS problem with post-quantum security either in terms of computation or communication complexity.


## 1 Introduction

Zero-Knowledge Arguments of Knowledge (ZKAoK) are interactive protocols that allow a computationally bounded prover to convince a verifier that she knows a witness for a certain statement, without revealing any further information about the witness. Since their introduction in the 80ies [GMR89] these protocols have been important building blocks for applications in cryptography. While solutions for very specific languages have been plentiful, many applications require the use of arbitrary (algebraic) circuits in order to prove complex relationships. For example, proving that two homomorphic commitments contain the same committed message is generally an easy task, while proving knowledge of a preimage of a SHA-256 hash requires more generic solutions. Recent years saw a variety of different techniques which aim at providing such ZKAoK (see [PHGR16,GMO16,AHIV17,BSBHR18, WTS $\left.{ }^{+} 18, \mathrm{BSCR}^{+} 18\right]$ to just name a few), varying in terms of argument size, prover/verification time, interaction and assumptions. While many of these systems perform very well with large witnesses and circuit sizes, none of them are a one-size-fits-all solution.

As an example, consider the so-called Short Integer Solution (SIS) problem. Here, a verifier has a matrix $\mathbf{A}$ and a vector $\boldsymbol{t}$ while the prover wants to prove knowledge of a secret $s$ such that $\boldsymbol{t}=\mathbf{A} \boldsymbol{s} \bmod q$ and $\|\boldsymbol{s}\|_{\infty} \leq \beta$. The SIS problem and related problems are crucial building blocks for post-quantum lattice-based cryptography and constructing efficient ZKAoK with a small communication complexity has long been a problem: the soundness error of current special-purpose protocols is constant, meaning that the arguments have to be repeated many times to actually be convincing to a verifier. A particular, non-standard approach has been suggested by Bendlin \& Damgård [BD10],

[^0]who were the first to examine arguments of knowledge for SIS using generic proof systems. They observed that for certain argument schemes, the above SIS function has both a very low multiplication depth and low total number of multiplications, if the secret $s$ is a binary vector. However, many general ZKAoK systems only provide asymptotic efficiency, meaning that they require the circuit to be very big before their strengths play out [AHIV17, $\left.\mathrm{BBC}^{+} 18\right]$. Moreover, many of these approaches achieve sub-linear communication complexity at the cost of high prover's running time [PHGR16]. Other approaches are insecure to post-quantum attacks [WTS $\left.{ }^{+} 18, \mathrm{MBKM} 19, \mathrm{BCC}^{+} 16, \mathrm{PHGR} 16\right]$ or rely on knowledge assumptions that are poorly understood. Finally, none of existing general solutions takes advantage of the unique structure of the SIS problem.

## 1.1 'MPC-in-the-Head' and Preprocessing

The "MPC-in-the-head" paradigm is a general technique which is used in our construction. Before outlining our contributions, we first describe what this paradigm is.

MPC or Secure Multiparty Computation describes a type of interactive protocol which allows to securely compute functions on secret data. No information is leaked beyond the output of the function with correctness even in the presence of dishonest participants.
MPC-in-the-head was introduced by Ishai et al. [IKOS07] as a technique to construct generic zero-knowledge proofs from MPC protocols. Here, the statement to be proven is rewritten into a circuit $C$, which outputs $y$ if and only if its input $w$ is a correct witness for the statement to be proven. The prover simulates all parties of an MPC protocol as well as their interaction in his head. These parties obtain a secret-sharing of the witness $w$ as their input, run a protocol to evaluate $C$ and send the outputs to the verifier. Moreover, the prover commits to the inputs as well as randomness and exchanged messages of each party separately, and opens a verifier-chosen subset of these commitments to the verifier. The verifier then checks if these parties were simulated correctly by the prover and that the messages and the outputs are consistent. On a very high level, this is a proof of the statement if the MPC scheme is robust against active attacks, and it is zero-knowledge due to the privacy of it.
Preprocessing is a widely used optimization of practical MPC schemes. Here, each party begins the actual protocol with shares of correlated randomness, which is itself independent of the inputs of the protocol. This correlated randomness is then used to speed up the actual computation, and due to its independence it can be computed ahead of time. To the best of our knowledge, the only MPC-in-the-head scheme to use preprocessing was introduced recently in [KKW18].

### 1.2 Our Contributions

In this work, we construct a new practically efficient ZKAoK for arithmetic circuits, together with a multitude of extensions to make it applicable to SIS. Our scheme is based on the "MPC-in-thehead" approach and uses only symmetric-key primitives. It has an argument size that only depends on non-linear gates of the circuit $C$ and low prover running time. We implemented our construction and report on its practicality. In more details:
'MPC-in-the-Head' with Preprocessing. We first generalize the idea of [KKW18] to work over arithmetic circuits using a variant of the SPDZ MPC protocol [DPSZ12,LN17] and provide a formal
proof of security to their "cut-and-choose" preprocessing heuristic. Then, we present a new construction where we replace the "cut-and-choose" mechanism with a "sacrificing"-based approach. While both approaches have similar cost per MPC instance, our "sacrificing"-based approach yields a smaller cheating probability, which means that the number of MPC instances simulated in the proof can be significantly smaller, thus reducing the overall communication footprint. Our scheme is highly flexible in its choice of parameters. In particular, by changing the number of parties in the underlying MPC protocol, one can alternate between achieving low communication and low running time. Our construction only requires efficient standard symmetric primitives, and thus is plausibly post-quantum secure. The presentation of the two constructions is in Section 3.

Application to SIS. The MPC scheme which we use in our construction allows to perform additions and multiplications with public values "for free", meaning those do not have an impact on the size of the argument. In the SIS problem the verification of the input of the prover consists of computing a product with a public matrix $\mathbf{A}$ and a proof that the secret $s$ contains bounded values, so the first part comes essentially for free. We initially tweak the approach of [BD10] and only allow $s$ to consist of bits, which allows for a very fast argument of size using one square gate per element of $s$. Then, we show how to handle more general distributions of $s$ and introduce some specific optimizations to reduce communication and computation. This is described in Section 5.

Experimental Results. We implemented our zero-knowledge protocol for the Binary SIS problem (i.e., where the secret $s$ is a vector of bits) and ran extensive experiments with various sets of parameters - both for the SIS problem and for the simulated MPC protocol. For a 61-bit field and a matrix A of size $1024 \times 4096$ (which suffices for many applications such as encryption or commitments), we are able to run our argument in 1.2 seconds for 40-bits of statistical security when working with a single thread. When utilizing 32 threads, this reduces to only 250 ms . This shows that general lattice-based ZK arguments (which do not rely on structured lattices) are practical and can be used in real-world applications. To the best of our knowledge, we are also the first to implement ZK arguments for general SIS. The results and all the details can be found in Section 6.

Sampling circuits on the fly. A major source of optimizations to our application is the fact that our "MPC-in-the-head" protocol essentially allows the prover and the verifier to negotiate the circuit $C$ during the protocol, under certain circumstances. This fact is used by us to construct circuits "on the fly" with fewer non-linear gates, which helps to reduce the argument size. Thus, as an additional contribution of this work, we provide formal definitions for an argument system where the circuit is sampled jointly by the prover and the verifier during the execution and show how to incorporate this into our protocols. This is described in Section 4.

### 1.3 Related Work

The past few years saw a tremendous progress in the efficiency of zero-knowledge arguments of knowledge for arithmetic circuits. ZK-SNARKs (Succinct Non-Interactive Arguments of Knowledge) such as Pinocchio [PHGR16] and libsnark [BSCTV14] offer a very low argument size and verification time, at the expense of keys in the size of Megabytes and a large prover runtime due to the use of pairings. This also means that they are inherently not post-quantum secure. From weaker
assumptions, Bootle et al. $\left[\mathrm{BCC}^{+} 16\right]$ recently constructed a highly efficient argument system, a variant of which can also be adapted to the post-quantum setting $\left[\mathrm{BBC}^{+} 18\right]$. Albeit asymptotically more efficient, our construction outperforms $\left[\mathrm{BBC}^{+} 18\right]$ for small circuits. In addition to $\left[\mathrm{BCC}^{+} 16\right]$, Wahby et al. $\left[\mathrm{WTS}^{+} 18\right]$ present a practically efficient ZK-Snark from the Discrete Log assumption. MPC-in-the-head [IKOS07] was first shown to be practical by the work of Giacomelli et al. [GMO16]. Later works such as the Ligero argument system [AHIV17] improved upon that work and provided an argument with proof size that is sub-linear in the size of the circuit $C$. Recently, BenSasson et al. presented a new Zero-Knowledge Interactive Oracle Proof called Stark [BSBHR18] which improves asymptotically upon IKOS-style proof systems, as well as a practically efficient protocol called Aurora $\left[\mathrm{BSCR}^{+} 18\right]$ with proof size that is only poly-logarithmic in $C$. These constructions are asymptotically better than ours in terms of proof size. However, the computational work of the prover is orders of magnitudes higher than in our construction due to the extensive use of polynomial interpolation. In addition, they do not have the property that linear gates have no effect on the complexity, which makes our protocol fit to applications such as SIS.

Zero-Knowledge Arguments for SIS. While the approach of [BD10] can more be seen as a theoretical result, there are three main lines of work for SIS arguments of knowledge. The identification scheme of Stern [Ste96] was modified into a argument of knowledge by Kagawa et al. [KTX08] for their commitment scheme, which was then generalized to be a proper argument of knowledge by Ling et al. [LNSW13]. This argument system has constant soundness error and its efficiency quickly degrades with the bound $\beta$ on the witness. The argument is extremely versatile and allows constructions such as e.g. ring signatures [LLNW17].

A second line of work more directly follows standard Schnorr-style proofs and uses rejection sampling to be zero-knowledge [Lyu09,Lyu12]. Like Stern-style arguments it has constant soundness error, but it supports also non-binary secrets with negligible communication overhead. On the other hand, the soundness guarantee only holds with some "slack", meaning that while an honest prover starts with a secret $\|\boldsymbol{s}\|_{\infty} \leq \beta$, the protocol only guarantees that $\|\boldsymbol{s}\|_{\infty} \leq \tau \cdot \beta$ where $\tau$ depends, among other things, on the dimensions of $\mathbf{A}$.

A third was recently introduced by del Pino et al. [dPLS19]. They use a variant of the Bulletproof scheme to adapt range proofs for the structured lattice setting. As they argue in the paper, the computational efficiency of their scheme hinges on using Ring-SIS or Ring-LWE primitives, while the soundness also relies on the Discrete Log assumption.

We provide a detailed comparison of these works with our construction in Section 7, including comparison of proof size, whenever possible, for the SIS application.

## 2 Preliminaries

Unless stated otherwise, operations in this work are carried out over the field $\mathbb{F}=\mathbb{F}_{q}$ for an odd prime $q$. The elements are being represented by the interval $[-(q-1) / 2,(q-1) / 2]$. We use $\lambda$ as the computational and $\kappa$ as the statistical security parameters, and generally assume that $q \approx \operatorname{poly}(\lambda, \kappa)$. We use bold lower-case letters such as $s$ to denote a vector and bold upper-case letters like $\mathbf{A}$ for matrices. We let $s[i]$ denote the $i$ th component of the vector $s$. Furthermore, we use $[n]$ as an abbreviation for the set $\{1, \ldots, n\}$.

### 2.1 Programming Model

Our notation for the circuits that we use in this paper will be similar to [BHR12].
The circuit $C=\left(n_{\text {in }}, n_{\text {out }}, n_{C}, L, R, F\right)$ is defined over a field $\mathbb{F}$, and each wire $w$ will hold a value from this field or $\perp$ initially. $C$ has $n_{\text {in }}$ input wires as well as $n_{\text {out }}$ output wires and $n_{C} \geq n_{\text {in }}+n_{\text {out }}$ wires in total. We define $\mathcal{I}=\left\{1, \ldots, n_{\text {in }}\right\}, \mathcal{W}=\left\{1, \ldots, n_{C}\right\}$ and $\mathcal{O}=\left\{n_{C}-n_{\text {out }}+1, \ldots, n_{C}\right\}$. The circuit will have $n_{\text {gates }}=n_{C}-n_{\text {in }}$ gates in total and we define the set $\mathcal{G}=\left\{n_{\text {in }}+1, \ldots, n_{C}\right\}$.

We define functions $L: \mathcal{G} \rightarrow \mathcal{W} \backslash \mathcal{O}, R: \mathcal{G} \rightarrow(\mathcal{W} \backslash \mathcal{O}) \cup\{\perp\}$ such that $L(x)<x$ as well as $L(x)<R(x)<x$ if $R(x) \neq \perp$ (i.e., the function $L(x)$ returns the index of the left input wire of the gate whereas the function $R(x)$ returns the index of the right input wire if exists). The function $F: \mathcal{G} \times \mathbb{F} \times(\mathbb{F} \cup\{\perp\}) \rightarrow \mathbb{F}$ determines the function which is computed by each gate.

The algorithm $\operatorname{eval}(C, x)$ with $x \in \mathbb{F}^{n_{\text {in }}}$ then runs the circuit as follows

1. For $i \in\left[n_{\text {in }}\right]$ set $w_{i} \leftarrow x[i]$.
2. For each $g \in \mathcal{G}$ :
(a) $l \leftarrow L(g), r \leftarrow R(g)$
(b) $w_{g} \leftarrow F(g, l, r)$
3. Output $y=w_{n_{C}-n_{\text {out }}+1} \ldots w_{n_{C}}$

We further restrict $F$ to compute certain functions only: (i) Add: On input $x_{1}, x_{2}$ output $x_{1}+x_{2}$, (ii) Mult: On input $x_{1}, x_{2}$ output $x_{1} \times x_{2}$, (iii) CAdd: On input $x$ and for the hard-wired $a$ output $x+a$, (iv) CMult: On input $x$ and for the hard-wired $a$ output $x \times a$; and (v) Square: On input $x$ output $x^{2}$. We say that $C(x)=y$ if $\operatorname{eval}(C, x)$ returns the value $y \in \mathbb{F}^{n_{\text {out }}}$.

We denote by $n_{m u l}$ and $n_{s q}$ the number of multiplication and square gates in the circuit respectively.

### 2.2 Zero-Knowledge Proof of Knowledge

Let TM be an abbreviation for Turing Machines. An iTM is defined to be an interactive TM, i.e. a Turing Machine with a special communication tape and a PPT iTM is a probabilistic polynomialtime TM. Let $L_{R} \subseteq\{0,1\}^{*}$ be an NP language and $R$ be its related NP-relation for circuits over $\mathbb{F}$. Thus $(x=(C, y), w) \in R$ iff $(C, y) \in L_{R}$ and $\operatorname{eval}(C, w)=y$. We write $R_{x}=\{w \mid(x, w) \in R\}$ for the set of witnesses for a fixed $x$.

Honest Verifier Proof of Knowledge (HVPoK). Assume ( $\mathcal{P}, \mathcal{V}$ ) is a pair of PPT iTMs and let $\xi:\{0,1\}^{*} \rightarrow[0,1]$ be a function. We say that $(\mathcal{P}, \mathcal{V})$ is a proof of knowledge with knowledge error $\xi$ if the following properties hold:

Completeness: If $\mathcal{P}$ and $\mathcal{V}$ follow the protocol on input $x$ and private input $w$ to $\mathcal{P}$ where $(x, w) \in R$, then $\mathcal{V}$ always outputs 1 .
Knowledge Soundness: There exists a probabilistic algorithm $\mathcal{E}$ called the knowledge extractor, such that for every interactive prover $\hat{\mathcal{P}}$ and every $x \in L_{R}$, the algorithm $\mathcal{E}$ satisfies the following condition: let $\delta(x)$ the probability that $\mathcal{V}$ accepts on input $x$ after interacting with $\hat{\mathcal{P}}$. If $\delta(x)>\xi(x)$, then upon input $x \in L$ and oracle access to $\hat{\mathcal{P}}$, the algorithm $\mathcal{E}$ outputs a string $w$ such that $(x, w) \in L_{R}$ in expected number of steps bounded by $O\left(\frac{1}{\delta(x)-\xi(x)}\right)$.

We say that $(\mathcal{P}, \mathcal{V})$ is an honest verifier zero-knowledge proof of knowledge, if in addition to the two above properties, the following property is also satisfied.

Zero-Knowledge with Respect to an Honest Verifier: Let view $\mathcal{\mathcal { V }}(x)$ be a random variable describing the content of the random challenge of $\mathcal{V}$ and the messages $\mathcal{V}$ receives from $\mathcal{P}$ during the joint computation on common input $x$. Then, there exists a PPT simulator $\mathcal{S}$, such that $\{\mathcal{S}(x)\}_{x \in L} \approx_{c}\left\{\operatorname{view}_{\mathcal{V}}^{\mathcal{P}}(x)\right\}_{x \in L}$

This definition is sufficient, since public-coin protocols like the protocols we consider in this work, which satisfies the zero-knowledge with respect to the honest verifier, can be easily transformed to protocols which are zero-knowledge in general by having the verifier commits to his challenges at the beginning of the execution. Moreover, it is possible also to obtain a non-interactive zero-knowledge proof of knowledge (NIZKPoK) via the Fiat-Shamir transformation [FS86]. If the soundness of a proof only holds against a computationally bounded prover, then one generally talks about computationally sound proofs which are known or Arguments of Knowledge (AoK).

### 2.3 Commitments and Collision-Resistant Hash Functions (CRHF)

We use Commitments and Collision-Resistant Hash Functions (CRHF) as buildings blocks in our constructions and thus introduce them now shortly. A commitment scheme allows one party to commit to a message $m$ by sending a commitment value which satisfies the following two properties: (i) Hiding: the commitment reveals nothing about m.; and (ii) Binding: it is (computationally) infeasible for the committing party to open a committed message to different messages $m$ and $m^{\prime}$.

A Collision-Resistant Hash Functions (CRHF) is a function $H$ for which it is "hard" to find two inputs $x$ and $x^{\prime}$ such that $H(x)=H\left(x^{\prime}\right)$. Usually, hash functions can shrink a long message into a short digest one. This means that for almost all messages a collision must exist. The requirement is therefore that such a collision will be very hard to find for any PPT adversary.

### 2.4 The Short Integer Solution Problem

We will now formalize the SIS problem, which was already informally introduced in the introduction. Let $q$ be a prime such that $\mathbb{F}_{q}$ is the base field of the argument system. At the same time, $q$ will also be the modulus of the SIS instance which is defined over $\mathbb{Z}_{q}$. Though $\mathbb{F}_{q} \simeq \mathbb{Z}_{q}$ in our setting, we keep the distinction that we use $\mathbb{F}_{q}$ when we talk about circuits, while we use $\mathbb{Z}_{q}$ when we talk about SIS. To define the Short Integer Solution problem, let $n, m \in \mathbb{N}$ such that $n \ll m$. We naturally embed $\mathbb{Z}_{q}$ into $\mathbb{Z}$ by considering and identifying each $\bmod q$-number with an element from the interval $\left[-\frac{q-1}{2}, \frac{q-1}{2}\right] \subset \mathbb{Z}$. This allows us to consider the $\ell_{p}$-norm of vectors $s \in \mathbb{Z}_{q}^{m}$ and we let $\|s\|_{\infty}$ be the $\infty$-norm of the embedding of $s$ into the module $\mathbb{Z}^{m}$. We define $S_{\beta}^{m} \subset \mathbb{Z}_{q}^{m}$ to be the subset of $m$-element vectors with $\ell_{\infty}$-norm $\leq \beta$.

Definition 1 (Short Integer Solution (SIS)). Let $m, n, q$ be defined as above and $\beta \in \mathbb{N}$. Given $\mathbf{A} \in \mathbb{Z}_{q}^{n \times m}$ and $\boldsymbol{t} \in \mathbb{Z}_{q}^{n}$, the (inhomogeneous) SIS-problem is to find $\boldsymbol{s} \in \mathbb{Z}_{q}^{m}$ such that $\boldsymbol{t}=\mathbf{A} \boldsymbol{s} \bmod q$ and $s \in S_{\beta}^{m}$.

We can collect all possible sets of (A,s,t) that fulfill above definition in an NP-relation

$$
R_{\mathrm{SIS}}^{m, n, q, \beta}=\left\{(x, w)=((\mathbf{A}, \boldsymbol{t}), s) \mid \boldsymbol{s} \in S_{\beta}^{m} \wedge \mathbf{A} \in \mathbb{F}_{q}^{n \times m} \wedge \boldsymbol{t}=\mathbf{A} \boldsymbol{s}\right\}
$$

In practice, one often encounters proofs that do not show exactly that $s \in S_{\beta}^{m}$ even though the prover has such a value as witness. Instead, they guarantee that the bound might be a bit bigger,
by a factor at most $\tau$ which is mostly referred to as slack. We have that $R_{\text {SIS }}^{m, n, q, \beta} \subseteq R_{\mathrm{SIS}}^{m, n, q, \tau \cdot \beta}$ if $\tau \geq 1$, so any honest prover will still make the verifier accept if it proves for $R_{\text {SIS }}^{m, n, q, \tau \cdot \beta}$ instead.

For simplicity, we will also consider a special type of SIS where the vector $s$ of the witness has to be binary.

Definition 2 (Binary-SIS). Let $m, n, q$ be defined as above. Given $\mathbf{A} \in \mathbb{Z}_{q}^{n \times m}$ and $\boldsymbol{t} \in \mathbb{Z}_{q}^{n}$, the (inhomogeneous) Binary SIS-problem is to find $\boldsymbol{s} \in\{0,1\}^{m}$ such that $\boldsymbol{t}=\mathbf{A} \boldsymbol{s} \bmod q$.

This Binary-SIS problem is not uncommon and the versions of the schemes of [BD10,LNSW13] are actually defined for it. We specifically define its relation as

$$
R_{\mathrm{B}-\mathrm{SIS}}^{m, n, q}=\left\{(x, w)=((\mathbf{A}, \boldsymbol{t}), \boldsymbol{s}) \mid \boldsymbol{s} \in\{0,1\}^{m} \wedge \mathbf{A} \in \mathbb{F}_{q}^{n \times m} \wedge \boldsymbol{t}=\mathbf{A} \boldsymbol{s}\right\}
$$

## 3 Honest Verifier Arguments of Knowledge for Arithmetic Circuits

In this section, we introduce our honest verifier argument of knowledge (HVZKAoK) protocol for proving the satisfiability of arithmetic circuits. We begin by describing the underlying MPC protocol to securely compute an arithmetic circuits. Then, we present two HVZKAoK protocols based on the MPC protocol - one that relies on the "cut-and-choose" technique and one that relies on "sacrificing". While the first is a direct extension of a recent work of Katz et al. [KKW18], the second one is new to the best of our knowledge.

### 3.1 The MPC protocol

Our MPC protocol is a simplified version of the SPDZ protocol [DPSZ12]. Let $N$ denote the number of parties and let $P_{1}, \ldots, P_{N}$ denote the parties participating in the protocol.

Secret sharing scheme. Let $\llbracket x \rrbracket$ denote the a sharing of $x$. We use a simple additive secret sharing, i.e., a sharing of $x$ consists of random $x_{1}, \ldots, x_{N} \in \mathbb{Z}_{q}$ such that $x=x_{1}+\cdots+x_{N} \bmod q$, where $P_{i}$ holds $x_{i}$. We define the following operations on the secret sharing scheme.

- open $(\llbracket x \rrbracket)$ : In this procedure, the parties reveal the secret $x$ by having each party send its share. Upon receiving $x_{j}$ from each $P_{j}$, party $P_{i}$ computes $x=\sum_{j=1}^{N} x_{j} \bmod q$.
$-\llbracket x \rrbracket+\llbracket y \rrbracket$ : Given two shares $x_{i}$ and $y_{i}$ of $x$ and $y$, each party $P_{i}$ defines $x_{i}+y_{i}$ as its share of the result.
$-\llbracket x \rrbracket+\sigma$ : Given a sharing $\llbracket x \rrbracket$ and a public constant $\sigma$, party $P_{1}$ define $x_{1}+\sigma$ as its share of the result, whereas the other parties' shares remain the same.
$-\sigma \cdot \llbracket x \rrbracket$ : Given a sharing $\llbracket x \rrbracket$ and a public constant $\sigma$, each party $P_{i}$ define $\sigma \cdot x_{i}$ as its share of the product.

Multiplication operation. We say that the triple $(\llbracket a \rrbracket, \llbracket b \rrbracket, \llbracket c \rrbracket)$ is a random multiplication triple if $a$ and $b$ are random and $c=a \cdot b$. To multiply two shared values $\llbracket x \rrbracket$ and $\llbracket y \rrbracket$ using a preprocessed random triple ( $\llbracket a \rrbracket, \llbracket b \rrbracket, \llbracket c \rrbracket$ ), the parties work as follows:

1. The parties locally compute $\llbracket \alpha \rrbracket=\llbracket x \rrbracket-\llbracket a \rrbracket$ and $\llbracket \beta \rrbracket=\llbracket y \rrbracket-\llbracket b \rrbracket$.
2. The parties run open $(\llbracket \alpha \rrbracket)$ and $\operatorname{open}(\llbracket \beta \rrbracket)$ to obtain $\alpha$ and $\beta$.
3. Each party locally computes $\llbracket z \rrbracket=\llbracket c \rrbracket-\alpha \cdot \llbracket b \rrbracket-\beta \cdot \llbracket a \rrbracket+\alpha \cdot \beta$.

The above is the well-known Beaver Circuit Randomization [Bea91] which holds since

$$
\begin{aligned}
\llbracket z \rrbracket & =\llbracket c \rrbracket-\alpha \cdot \llbracket b \rrbracket-\beta \cdot \llbracket a \rrbracket+\alpha \cdot \beta \\
& =\llbracket a b \rrbracket-(x-a) \cdot \llbracket b \rrbracket-(y-b) \cdot \llbracket a \rrbracket+(x-a) \cdot(y-b) \\
& =\llbracket a b-x b-a b-y a-b a+x y+a y+x b+a b \rrbracket \\
& =\llbracket x y \rrbracket
\end{aligned}
$$

Square operation. We say that the pair $(\llbracket b \rrbracket, \llbracket d \rrbracket)$ is a random square if $b$ is random and $d=b^{2}$. To compute the square $\llbracket x^{2} \rrbracket$ given $\llbracket x \rrbracket$ using a preprocessed $(\llbracket b \rrbracket, \llbracket d \rrbracket)$, the parties work as follows:

1. The parties locally compute $\llbracket \alpha \rrbracket=\llbracket x \rrbracket-\llbracket b \rrbracket$.
2. The parties run open $(\llbracket \alpha \rrbracket)$ to obtain $\alpha$.
3. Each party locally computes $\llbracket z \rrbracket=\alpha \cdot(\llbracket x \rrbracket+\llbracket b \rrbracket)+\llbracket d \rrbracket$.

Note that the above holds since

$$
\llbracket z \rrbracket=\alpha \cdot(\llbracket x \rrbracket+\llbracket b \rrbracket)+\llbracket d \rrbracket=(x-b) \cdot(\llbracket x \rrbracket+\llbracket b \rrbracket)+\llbracket b^{2} \rrbracket=\llbracket x^{2}-b^{2}+b^{2} \rrbracket=\llbracket x^{2} \rrbracket .
$$

The protocol. The above building blocks can easily be combined to securely run eval(•) on a circuit $C$ : after the inputs are secret-shared using $\llbracket \cdot \rrbracket$, the parties apply $G$ as defined in Section 2.1 consecutively to the shares. That is, addition gates and multiplication/addition by-a-publicconstant gates are computed locally, whereas multiplication and square gates are computed using the above sub-protocols.

Security. For our purpose of using a MPC protocol to establish a zero-knowledge argument, the used protocol only needs to be secure in the presence of a semi-honest adversary. Furthermore, it suffices for the protocol to be secure in the client-server model, i.e., when the parties who run the protocol (the servers) do not hold input and do not see the final output, but rather receive shares of the inputs from the clients, perform the distributed computation and then send the output shares back to the clients.

Formally, let $\mathcal{F}_{\text {tr }}$ and $\mathcal{F}_{\text {sq }}$ be ideal functionalities that provide the parties with random multiplication triples and squares. We define $\operatorname{view}_{I, \pi}^{\mathcal{F}_{\mathrm{tr}}, \mathcal{F}_{\mathrm{sq}}}(C)$ to be the view of a subset of parties $I$ during the execution of a protocol $\pi$ on an arithmetic circuit $C$, in the ( $\left.\mathcal{F}_{\mathrm{tr}}, \mathcal{F}_{\mathrm{sq}}\right)$-hybrid model and in the client-server model described above. This consists of the input shares, the correlated randomness they receive from the ideal functionalities and the messages they obtain from the other parties while computing multiplication and square gates. We prove the security of the protocol in Theorem 1.

Theorem 1. Let $C$ be an arithmetic circuit over the field $\mathbb{F}$ and let $\pi$ be the protocol described above. Then, for every subset of parties $I \subset\left\{P_{1}, \ldots, P_{N}\right\}$ with $|I| \leq N-1$, there exists a probabilistic polynomial-time algorithm $\mathcal{S}$ such that $\left.\{\mathcal{S}(I, C)\}_{\equiv\left\{\operatorname{view}_{I, \pi} \mathcal{F}_{\text {tr }}, \mathcal{F}_{\text {sq }}\right.}(C)\right\}$

Proof. Intuitively, this follows from the fact that the corrupted parties see only shares of the values on the wires of the circuit that could open to any value or random public values. Formally, the simulator $\mathcal{S}$ begins by choosing $t$ random shares for each input wire and adding them to the view of the corrupted parties in $I$. Then, it goes over the circuit in topological order; for addition gates and multiplication-by-a-constant, it does the local operation on the corrupted parties' shares as defined by the protocol. For multiplication gates, $\mathcal{S}$ chooses $t$ shares of $a, b$ and $c$ and adds them to
the corrupted parties' view. Then, it chooses random $\alpha$ and $\beta$, compute the honest parties' shares of them accordingly (knowing the corrupted parties' shares) and adds them to the view. Then, it computes the corrupted parties' shares of $z$ as defined by the protocol. Similarly, for square gates, it chooses $t$ random shares for $b$ and $d$ and adds them to the view of the corrupted parties. Then, it chooses a random $\alpha$, computes the honest parties' shares accordingly, adds them to the view and computes the corrupted parties' shares of the output.

The only difference between the simulated view and the real execution is the way $\alpha$ and $\beta$ are chosen. However, in both cases the these values are uniformly distributed. Thus, the view generated by the simulator is identically distributed to the view in the real execution as required.

### 3.2 HVZKAoK Protocol using Cut-and-Choose

We are now ready to present our first HVZKAoK protocol $\Pi_{c \& c}$. It is based on the MPC protocol presented in the previous section and relying on the cut-and-choose technique to generate correct random multiplication triples and squares. The formal description appears in Fig. 1.a and Fig. 1.b.

The idea behind the protocol is that the prover $\mathcal{P}$ proves its knowledge of $w$ such that $C(\boldsymbol{w})=\boldsymbol{y}$ by simulating a secure $N$-party computation of the circuit over an additive sharing of $w$, using the MPC protocol described above. Since $\mathcal{P}$ knows the input and thus the values on each wire of the circuit, it can simulate the execution "in his head". Since our MPC protocol uses random triples and squares supplied by the ideal functionalities $\mathcal{F}_{\text {tr }}$ and $\mathcal{F}_{\text {sq }}$, the prover $\mathcal{P}$ needs to play their role as well. Clearly, $\mathcal{P}$ may try to cheat in the simulated computation, aiming to cause the verifier $\mathcal{V}$ to accept false statements. This is prevented by having $\mathcal{V}$ challenging $\mathcal{P}$ in two ways. First, after $\mathcal{P}$ has committed to $M$ sets of random triples and squares, $\mathcal{V}$ chooses randomly $\tau$ of them, which are revealed and opened to him. The remaining $M-\tau$ sets of the pre-processed data are used to support $M-\tau$ circuit computations - each with different randomness. The prover $\mathcal{P}$ performs these computations and commits to the views of the parties, to be then challenged for the second time by $\mathcal{V}$. The verifier chooses a random subset of $N-1$ parties in each execution, whose views are opened and tested for consistency. If theses two tests passed successfully and the output of the circuit is $\boldsymbol{y}$, then $\mathcal{V}$ outputs acc. Observe that $\mathcal{V}$ cannot learn any information about the witness $\boldsymbol{w}$ during the protocol: the opened pre-processing executions reveal only random data which is thrown away afterwards, and the $N-1$ views that are opened do not reveal anything since the MPC protocol is resilient to $N-1$ semi-honest parties.

In more details, in Round $1, \mathcal{P}$ commits to $M$ pre-processing executions. A major source of saving here is using pseudo-randomness instead of pure randomness. Specifically, $\mathcal{P}$ chooses a seed $\mathrm{sd}_{e}$ for each execution $e$, from which he derives the seeds $\mathrm{sd}_{e, i}$ for each party $P_{i}$. These seeds are used to generate all the random shares held by $P_{i}$ throughout the computation. Now, if execution $e$ is selected to be tested by $\mathcal{V}$ in Round 2, then $\mathcal{P}$ can send just $\mathrm{sd}_{e}$ to $\mathcal{V}$, thereby saving communication. For the $M-\tau$ preprocessings which are used in the on-line execution in Round 3, $\mathcal{P}$ cannot send the master seed but rather will have to send $N-1$ seeds of the $N-1$ parties chosen to be opened by $\mathcal{V}$ in Round 4 . Thereby the data of one of the parties is kept secret (in Section 3.4 we will see that it is possible to reduce the data sent here from $N$ seeds to $\log N$ seeds). Observe, however, that not all the data held by the parties is random. In particular, when generating a multiplication triple $\llbracket a_{e, k} \rrbracket, \llbracket b_{e, k} \rrbracket, \llbracket c_{e, k} \rrbracket(e$ is the execution index and $k$ is the index of the gate for which this triple is consumed), one can use the seeds of the parties to generate the sharing of $a_{e, k}$ and $b_{e, k}$, but once these are fixed, $c_{e, k}=a_{e, k} \cdot b_{e, k}$ is also fixed. Thus, when generating the sharing of $c_{e, k}$, it is necessary to "fix" the initial sharing derived from the random seeds. Therefore, the prover
also commits to the offset $\Delta_{e, k}$ for each execution $e$ and multiplication gate $g_{k}$, which is added to the initial random sharing $\llbracket c_{e, k} \rrbracket$. The same applies to the generation of random squares. Similarly, when the sharings of the inputs are generated in Round $3, \mathcal{P}$ can use the seeds of the parties to derive their shares, and then fix the initial sharing by adding the offset (denoted this time by $\phi_{e, k}$ ) to obtain a correct sharing of the given input. Thus, the prover $\mathcal{P}$ must commit to the offset on each input wire as well. To further reduce communication, we hash all the commitments together and send only the hash value to $\mathcal{V}$.

Cheating error (soundness). We compute the probability that $\mathcal{V}$ outputs acc when $C(\boldsymbol{w}) \neq \boldsymbol{y}$. Let $c$ be the number of pre-processing emulations where $\mathcal{P}$ cheats (i.e., by generating incorrect squares or multiplication triples). Since $\tau$ emulations out of $M$ are opened and tested by the verifier, we have that this step is passed without the cheating being detected with probability $\frac{\binom{M-c}{\tau}}{\binom{M}{\tau}}$. After this step, $M-\tau$ circuit computations are being simulated by the verifier. In order to make the output of the protocol be $\boldsymbol{y}, \mathcal{P}$ must cheat (i.e., deviate from the specification of the MPC protocol) in $M-\tau-c$ emulations. Since $N-1$ views are being opened in each such emulation, $\mathcal{P}$ clearly will not sabotage the view of more than one party. Thus, the probability that this is not detected is $\frac{1}{N^{M-\tau-c}}$. The overall success cheating probability is therefore

$$
\xi_{c \& c}(M, N, \tau)=\max _{0 \leq c \leq M-\tau}\left\{\frac{\binom{M-c}{\tau}}{\binom{M}{\tau} \cdot N^{M-\tau-c}}\right\}
$$

Formal proof. As mentioned before, the above protocol has appeared already in [KKW18] (for Boolean circuits, but extending it to Arithmetic circuits is straightforward). However, there it was described as an optimization to their baseline protocol and so was not formally proved. We therefore provide now a proof that the protocol $\Pi_{c \& c}$ is an honest verifier zero-knowledge argument of knowledge.

Theorem 2. Let $H$ be a collision-resistant hash function and let com be a secure commitment scheme. Then, the protocol $\Pi_{c \& c}$ is an HVZKAoK with knowledge error (soundness) $\xi_{c \& c}(M, N, \tau)$.

Proof. We prove the that each of the three properties defined in Section 2.2 are satisfied by our protocol.

Completeness. This follows trivially from the correctness of the MPC protocol.
Honest Verifier Zero Knowledge. This property follows from the security of the MPC protocol as defined in Theorem 1 and by the hiding property of the commitment scheme. Specifically, let $\mathcal{S}_{\pi}$ be the simulator that exists for the MPC protocol described in the proof of Theorem 3.1. We construct a honest-verifier zero-knowledge simulator $\mathcal{S}$ for our protocol, which works as follows:

1. $\mathcal{S}$ chooses random $E \subset[M]$ such that $|E|=\tau$. Then, for each $e \in \bar{E}=[M] \backslash E$, it chooses a random $i_{e} \in[N]$.
2. For each $e \in E, \mathcal{S}$ prepares the pre-processing data as an honest prover would do in Round 1 , with one exception: it computes $\Pi_{e}$ as a commitment to a 0 -string.
3. For each $e \in \bar{E}, \mathcal{S}$ chooses $\mathrm{sd}_{e, i}$ for each $i \in I_{e}=[N] \backslash\left\{i_{e}\right\}$. Then, for the pre-processing in Round 1, it generates state $e_{e}$ by choosing random $\Delta_{e, k}$ for each multiplication/square gate. In addition, it chooses random $\phi_{e, k}$ for each input wire $k$ and compute $\Pi_{e}$ accordingly. Proceeding

## The "Cut-and-Choose" Based HVZK Argument $\Pi_{c \& c}$ (Part 1)

Let $H$ be a collision-resistant hash function and com be a commitment scheme
Inputs: Both the prover $\mathcal{P}$ and the verifier $\mathcal{V}$ hold $\boldsymbol{y} \in \mathbb{F}^{n_{\text {out }}}$, a description of an arithmetic circuit $C$ over a finite field $\mathbb{F}$ and parameters $M, N, \tau$; the prover $\mathcal{P}$ also hold a witness $\boldsymbol{w} \in \mathbb{F}^{n_{\text {in }}}$ such that $C(\boldsymbol{w})=\boldsymbol{y}$.

## Round 1:

1. For each $e \in[M]$ :
(a) $\mathcal{P}$ initializes an empty string state $e$.
(b) $\mathcal{P}$ chooses a master seed sde ${ }_{e}$ and use it to generate $\mathrm{sd}_{e, 1}, \ldots, \mathrm{sd}_{e, N}$.
(c) For each multiplication gate $g_{k} \in \mathcal{G}$ :
i. $\mathcal{P}$ defines three random sharings $\llbracket a_{e, k} \rrbracket, \llbracket b_{e, k} \rrbracket$ and $\llbracket c_{e, k} \rrbracket$ by using sd $\mathrm{s}_{e, i}$ for each $i \in[N]$ to generate $a_{e, k, i}, b_{e, k, i}$ and $c_{e, k, i}$.
ii. $\mathcal{P}$ computes $a_{e, k}=\sum_{i=1}^{N} a_{e, k, i}$ and $b_{e, k}=\sum_{i=1}^{N} b_{e, k, i}$ and $c_{e, k}=a_{e, k} \cdot b_{e, k}$. Then, it sets $\Delta_{e, k}=c_{e, k}-\sum_{i=1}^{N} c_{e, k, i}$ and state $e_{e} \leftarrow$ state $_{e} \| \Delta_{e, k}$.
iii. $\mathcal{P}$ sets the random triple for this gate to be $\left(\llbracket a_{e, k} \rrbracket, \llbracket b_{e, k} \rrbracket, \llbracket c_{e, k} \rrbracket+\Delta_{e, k}\right)$
(d) For each square gate $g_{k} \in \mathcal{G}$ :
i. $\mathcal{P}$ defines two random sharings $\llbracket b_{e, k} \rrbracket$ and $\llbracket d_{e, k} \rrbracket$ by using $\mathrm{sd}_{e, i}$ for each $i \in[N]$ to generate $b_{e, k, i}$ and $d_{e, k, i}$.
ii. $\mathcal{P}$ computes $b_{e, k}=\sum_{i=1}^{N} b_{e, k, i}$.

Then, it computes $\Delta_{e, k}=\left(b_{e, k}\right)^{2}-\sum_{i=1}^{N} d_{e, k, i}$ and state $e_{e} \leftarrow$ state $_{e} \| \Delta_{e, k}$.
iii. $\mathcal{P}$ sets the random square to this gate to be $\left(\llbracket b_{e, k} \rrbracket, \llbracket d_{e, k} \rrbracket+\Delta_{e, k}\right)$.
(e) $\mathcal{P}$ chooses a linear random sharing of the inputs:
i. For each $i \in[N], \mathcal{P}$ uses $\operatorname{sd}_{e, i}$ to generate $w_{e, 1, i}, \ldots, w_{e, n_{\mathrm{in}}, i}$.
ii. For each $k \in \mathcal{I}, \mathcal{P}$ sets $\phi_{e, k}=w_{k}-\sum_{i=1}^{N} w_{e, k, i}$.
(f) $\mathcal{P}$ chooses a random string $g_{e} \in\{0,1\}^{\lambda}$ and then it computes $\Omega_{e}=\operatorname{com}\left(\phi_{e, 1}\|\cdots\| \phi_{e, n_{\text {in }}}, g_{e}\right)$.
(g) For each $i \in[N], \mathcal{P}$ uses sd ${ }_{e, i}$ to generate $r_{e, i} \in\{0,1\}^{\lambda}$ and then it computes $\Gamma_{e, i}=\operatorname{com}\left(\operatorname{sd}_{e, i}, r_{e, i}\right)$.
(h) $\mathcal{P}$ uses sd ${ }_{e}$ to generate a random string $s_{e} \in\{0,1\}^{\lambda}$ and computes $\Gamma_{e}=\operatorname{com}\left(\right.$ state $\left._{e}, s_{e}\right)$.
(i) Finally, $\mathcal{P}$ computes $h_{e}=H\left(\Gamma_{e}\left\|\Gamma_{e, 1}\right\| \cdots \| \Gamma_{e, N}\right)$.
2. $\mathcal{P}$ computes $h_{\Gamma}=H\left(h_{1}\|\cdots\| h_{M}\right), h_{\Omega}=H\left(\Omega_{1}\|\cdots\| \Omega_{M}\right)$ and sends them to $\mathcal{V}$.

Round 2: $\mathcal{V}$ chooses a random challenge $E \subset[M]$ such that $|E|=\tau$ and sends it to $\mathcal{P}$.
Round 3:

1. Let $\bar{E}=[M] \backslash E$. First, $\mathcal{P}$ chooses sd ${ }_{\bar{E}}$.
2. For each $e \in E$ :
(a) $\mathcal{P}$ initializes an empty string view . Then, it goes over the circuit in topological order and simulates each $^{\text {. }}$ gate's computation using the MPC protocol described in Section 3.1, while consuming the random triples and squares it prepared in Round 1.
i. For each multiplication gate $g_{k}, \mathcal{P}$ sets: view $_{e} \leftarrow \operatorname{view}_{e}\left\|\alpha_{e, k, 1}\right\| \cdots\left\|\alpha_{e, k, N}\right\| \beta_{e, k, 1}\|\cdots\| \beta_{e, k, N}$.
ii. For each square gate $g_{k}, \mathcal{P}$ sets: view $_{e} \leftarrow \operatorname{view}_{e}\left\|\alpha_{e, k, 1}\right\| \cdots \| \alpha_{e, k, N}$.
(b) Let $o_{e, 1, i}, \ldots, o_{e, n_{\text {out }}, i}$ be the shares on the output wires held by party $P_{i}$ at the end of the computation. Then, for each output wire $k \in \mathcal{O}, \mathcal{P}$ sets: view $_{e} \leftarrow \operatorname{view}_{e}\left\|o_{e, k, 1}\right\| \cdots \| o_{e, k, N}$.
(c) $\mathcal{P}$ uses sd $\bar{E}$ to generate $g_{e} \in\{0,1\}^{\lambda}$ and computes $\Pi_{e}=\operatorname{com}\left(\right.$ view $\left._{e}, g_{e}\right)$.
3. $\mathcal{P}$ computes $h_{\pi}=H\left(\Pi_{e_{1}}\|\cdots\| \Pi_{e_{|\bar{E}|}}\right)$.
4. $\mathcal{P}$ sends to $\mathcal{V}:\left\{\operatorname{sd}_{e}\right\}_{e \in E},\left\{\Omega_{e}\right\}_{e \in E}$ and $h_{\pi}$.

Fig. 1.a: HVZK argument using "Cut-and-Choose"
to Round $3, \mathcal{S}$ generates view $e$ by following the instructions of $\mathcal{S}_{\pi}$ with $I=I_{e}$ and using sd ${ }_{e, i}$ to generate the required randomness. For the commitments $\Gamma_{e, \bar{i}_{e}}, \mathcal{S}$ uses the 0 -string as the committed message.
4. For each $e \in \bar{E}$, for each $k \in \mathcal{O}, \mathcal{S}$ set $o_{e, k, i_{e}}=y_{k}-\sum_{i \in I_{e}} o_{e, i}$.
5. $\mathcal{S}$ computes all the hash values as an honest prover would do.
6. $\mathcal{S}$ outputs a transcript of the protocol.

Round 4: For each $e \in \bar{E}: \mathcal{V}$ chooses a random $\bar{i}_{e} \in[N]$ and sends it to $\mathcal{P}$.
Round 5: For each $e \in \bar{E}$ :
Let $I_{e}=[N] \backslash\left\{\bar{i}_{e}\right\}$. Then, $\mathcal{P}$ sends the following to $\mathcal{V}:(1) \operatorname{sd}_{\bar{E}},\left\{\operatorname{sd}_{e, i}\right\}_{i \in I_{e}}$ (2) $\Gamma_{e, \bar{i}_{e}}(3) \alpha_{e, k, \bar{i}_{e}}, \beta_{e, k, \bar{i}_{e}}$ and $\Delta_{e, k}$ for each multiplication or square gate $g_{k}(4) g_{e}$ and $\left\{\phi_{e, k}\right\}_{k=1}^{n_{\text {in }}}$ and (5) $o_{e, 1, \bar{i}_{e}}, \ldots, o_{e, n_{\text {out }}, \bar{i}_{e}}$.
Output: $\mathcal{V}$ output acc iff all the following checks succeeds:

1. For each $e \in E, \mathcal{V}$ uses sd $_{e}$ to compute $h_{e}$ as a honest prover would do.

For each $e \in \bar{E}, \mathcal{V}$ uses $\left\{\operatorname{sd}_{e, i}\right\}_{i \in I_{e}}$ to compute $\Gamma_{e, i}$ as a honest prover would do. Then, using $\Gamma_{e, \bar{i}_{e}}$ received from $\mathcal{P}$, the verifier $\mathcal{V}$ computes $h_{e}$.
Then, $\mathcal{V}$ checks that $h_{\Gamma}=H\left(h_{1}\|\cdots\| h_{M}\right)$.
2. For each $e \in \bar{E}, \mathcal{V}$ computes $\Omega_{e}$ using $\left\{\phi_{e, k}\right\}_{k=1}^{n_{\text {in }}}$ and $g_{e}$. Then, using $\left\{\Omega_{e}\right\}_{e \in E}$ received from $\mathcal{P}$, the verifier $\mathcal{V}$ checks that $h_{\Omega}=H\left(\Omega_{1}\|\cdots\| \Omega_{M}\right)$.
3. For each $e \in \bar{E}, \mathcal{V}$ computes $^{\text {view }}{ }_{e}$ as an honest prover would do by going over the circuit in topological order and using $\left\{\operatorname{sd}_{e, i}\right\}_{i \in I_{e}}$, the shares $\alpha_{e, k, \overline{\bar{i}}_{e}}, \beta_{e, k, \bar{i}_{e}}$ and $\Delta_{e, k}$ received from $\mathcal{P}$ for each multiplication and square gate, and $\left\{o_{e, k, \bar{i}_{e}}\right\}_{k=1}^{n_{\text {out }}}$. Then, it computes $\Pi_{e}$ as a honest prover would do.
Finally, $\mathcal{V}$ checks that $h_{\pi}=H\left(\Pi_{e_{1}}\|\cdots\| \Pi_{e_{|\bar{E}|}}\right)$.
4. For each $e \in \bar{E}$, for each $k \in \mathcal{O}, \mathcal{V}$ checks that $\sum_{i=1}^{N} o_{e, k, i}=y_{k}$

Fig. 1.b: HVZK Argument using "Cut-and-Choose", continued

From the indistinguishably of the transcript generated by $\mathcal{S}_{\pi}$ and the hiding property of the commitment scheme, it follows by an hybrid argument that the view generated by $\mathcal{S}$ is computationally indistinguishable from the view in a real execution.

Knowldege Soundness. We proceed to prove the soundness property of the protocol. For simplicity we assume that the commitment scheme is perfectly binding and that there are no collisions for the hash function.

We first argue that if the cheating probability $\delta(x)$ is higher than $\xi_{c \& c}(M, N, \tau)$, then there exists at least one MPC instance (out of $M$ ) where the prover has committed to a valid witness $\boldsymbol{w}$. Recall that we consider deterministic provers and so the first prover's message where he commits to the input $\boldsymbol{w}$ is fixed. Now, let $\mathbf{G}$ be a $0 / 1$-matrix where each column corresponds to a possible first challenge of $\mathcal{V}$ (i.e., $\tau$ pre-processings to be opened) and each row corresponds to a possible second challenge chosen by $\mathcal{V}$ (i.e., $M-\tau$ parties indices for which the view should not to be opened). Thus, $\delta(x)$ is the fraction of ' 1 ' entries in $\mathbf{G}$. Let $\xi_{c \& c}(M, N, \tau)=\frac{\binom{M-c *}{\tau}}{\binom{M}{\tau}} \cdot N^{M-\tau-c *}$ (i.e., $c *$ is the value for which the expression for $\xi$ written above is maximized). We can write $\xi_{c \& c}$ also as $\frac{\binom{M-c *}{\tau} \cdot N^{c *}}{\binom{M}{\tau} \cdot N^{M-\tau}}$. Observe that the number of entries in $\mathbf{G}$ is $\binom{M}{\tau} \cdot N^{M-\tau}$. From our assumption that $\delta(x)>\frac{\binom{M-c *}{\tau} \cdot N^{c *}}{\binom{M}{\tau} \cdot N^{M-\tau}}$ it thus follows that the number of ' 1 ' entries in $\mathbf{G}$ is higher than $\binom{M-c *}{\tau} \cdot N^{c *}$. Next, assume that in the interaction with the prover $\mathcal{P}^{*}$, it corrupts $c$ of the pre-processings. Clearly, if any of these are opened, then the transcript won't be accepted by $\mathcal{V}$. Thus, there can be ' 1 ' entries only in ( $\left.\begin{array}{c}M-c \\ \tau\end{array}\right)$ columns in $\mathbf{G}$. For each of these columns, there exists $N^{c}$ possible challenges for the MPC instances where the pre-processing is incorrect. Since there are more than $\binom{M-c *}{\tau} \cdot N^{c *} \geq\binom{ M-c}{\tau} \cdot N^{c}$ entries with ' 1 ' in $\mathbf{G}$, then there must exist two accepting transcripts with the same first challenge $E$ and with different second challenge $\left\{i_{e}\right\}_{e \in \bar{E}}$ and $\left\{i_{e}^{\prime}\right\}_{e \in \bar{E}}$, where $i_{e} \neq i_{e}^{\prime}$ for an MPC instance $e$ with
correct pre-processing. This means that all the views of the parties in $e$ are also correct. Thus, the witness used in this instance must be a valid witness.

Next, given that a valid witness $\boldsymbol{w}$ is used in execution $e$, observe that it is possible to extract this witness using two accepting transcripts $\left(E,\left\{i_{e}\right\}_{e \in \bar{E}}\right),\left(E^{\prime},\left\{i_{e}^{\prime}\right\}_{e \in \bar{E}^{\prime}}\right)$ such that the challenge for $e$ is different. Specifically, it is required that one of the following will hold: $e \in E \cup E^{\prime} \backslash E \cap E^{\prime}$ or $i_{e}^{\prime} \neq i_{e}$. This suffices since in the first case it is possible to extract the seeds of all parties from one transcript (where the pre-processing of $e$ is opened) and the adjustment sent by the $\mathcal{P}$ from the second transcript (where $e$ 's pre-processing is not opened), there by obtaining the input shares of all parties. In the second case, where $i_{e}^{\prime} \neq i_{e}$, one of the transcripts reveals $N-1$ input shares whereas the other reveals the remaining share, thereby again allowing us to compute the witness by adding all shares.

Let $\delta(x)=\xi_{c \& c}(M, N, \tau)+\epsilon$ for some $\epsilon>0$. We now describe an extractor $\mathcal{E}$ to obtain such two transcripts:

1. Probe the matrix $\mathbf{G}$ until the first ' 1 ' entry was found. Denote by $\boldsymbol{c}=\left(c_{1}, \ldots, c_{M}\right)$ be the challenge for this entry, where for each $e \in[M], c_{e}$ is the challenge for the $i$ th execution.
2. For each execution $e$ run an extractor $\mathcal{E}_{e}$, who probe $\mathbf{G}$ at random until an entry ' 1 ' is found for which the challenge $\boldsymbol{c}^{\prime}$ is such that $c_{e}^{\prime} \neq c_{e}$.
3. For each $\boldsymbol{c}^{\prime}$ outputted by $\mathcal{E}_{e}$, extract the witness $\boldsymbol{w}$ used in execution $e$ using $\boldsymbol{c}$ and $\boldsymbol{c}^{\prime}$ (as explained in the text above), and check that $C(\boldsymbol{w})=\boldsymbol{y}$. If yes, output $\boldsymbol{w}$ and halt.

First, observe that the expected running time of the first step is $\frac{1}{\delta}<\frac{1}{\epsilon}$. For the second step, we prove the following Lemma:

Lemma 1. Let $J=\left\{j_{e_{1}}, j_{e_{2}}, \ldots\right\} \subseteq[M]$ be the set of indices which correspond to executions with valid witnesses and let $|J|$ denote its size. Then there exists an $e \in J$ such that $\operatorname{Pr}\left[\operatorname{acc} \mid c_{e}^{\prime} \neq c_{e}\right] \geq$ $\epsilon / M$, where acc is the event where the verifier accepts.

Proof. We denote by Jeq the event $\forall e \in J: c_{e}^{\prime}=c_{e}$ and by $\overline{\text { Jeq }}$ its negation $\exists e \in J: c_{e}^{\prime} \neq c_{e}$. Assume in contradiction that for all $e \in J$ it holds that $\operatorname{Pr}\left[\operatorname{acc} \mid c_{e}^{\prime} \neq c_{e}\right]<\epsilon / M$.

It follows that

$$
\begin{align*}
\delta(x) & =\operatorname{Pr}[\mathrm{acc} \wedge \mathrm{Jeq}]+\operatorname{Pr}[\mathrm{acc} \wedge \overline{\mathrm{Jeq}}]=\operatorname{Pr}[\mathrm{acc} \wedge \mathrm{Jeq}]+\operatorname{Pr}[\mathrm{acc} \mid \overline{\mathrm{Jeq}}] \cdot \operatorname{Pr}[\mathrm{Jeq}] \\
& \leq \operatorname{Pr}[\mathrm{acc} \wedge \mathrm{Jeq}]+\operatorname{Pr}[\mathrm{acc} \mid \overline{\mathrm{Jeq}}] \\
& =\operatorname{Pr}[\mathrm{acc} \wedge \mathrm{Jeq}]+\operatorname{Pr}\left[\mathrm{acc} \mid c_{e_{j_{1}}}^{\prime} \neq c_{e_{j_{1}}} \vee{\left.c_{e_{j_{2}}}^{\prime} \neq c_{e_{j_{2}}} \vee \cdots\right]} \leq \operatorname{Pr}[\mathrm{acc} \wedge \mathrm{Jeq}]+\sum_{e_{j} \in J} \operatorname{Pr}\left[\mathrm{acc} \mid c_{e_{j}}^{\prime} \neq c_{e_{j}}\right]\right. \\
& <\operatorname{Pr}[\mathrm{acc} \wedge \mathrm{Jeq}]+\sum_{e_{j} \in J} \epsilon / M \\
& \leq \operatorname{Pr}[\mathrm{acc} \wedge \mathrm{Jeq}]+M \cdot \epsilon / M \\
& =\operatorname{Pr}[\mathrm{acc} \wedge \mathrm{Jeq}]+\epsilon
\end{align*}
$$

where the second inequality is obtained by using Union Bound, the third inequality follows from our assumption on $\operatorname{Pr}\left[\operatorname{acc} \mid c_{e}^{\prime} \neq c_{e}\right]$ for all $e \in J$ and the last inequality holds since $|J| \leq M$.

We now proceed to bound $\operatorname{Pr}[\operatorname{acc} \wedge$ Jeq], which is the probability of acceptance with all challenges for the executions with the valid witness remaining unchanged. In the following, we say that $c_{e}=0$ if the challenge for $e$ is to open the pre-processing and $c_{e} \neq 0$ otherwise. Now, without loss of generality, assume that for $e \in\left\{e_{j_{1}}, \ldots, e_{j_{k}}\right\}$, it holds that $c_{e}=0$, whereas for $e \in\left\{e_{j_{k+1}}, \ldots, e_{j_{|J|} \mid}\right\}$ it holds that $c_{e} \neq 0$ (i.e., exactly $k$ of the executions with valid witness were chosen to be opened). It follows that

$$
\operatorname{Pr}\left[c_{e_{j_{1}}}=\cdots=c_{e_{j_{k}}}=0 \wedge c_{e_{j_{k+1}}} \neq 0 \wedge \cdots \wedge c_{e_{j_{|J|}}} \neq 0\right]=\frac{\binom{M-|J|}{\tau-k}}{\binom{M}{\tau}}
$$

as well as

$$
\begin{equation*}
\operatorname{Pr}\left[\operatorname{Jeq} \mid c_{e_{j_{1}}}=\cdots=c_{e_{j_{k}}}=0 \wedge c_{e_{j_{k+1}}} \neq 0 \wedge \cdots \wedge c_{e_{j_{|J|}}} \neq 0\right]=\frac{\binom{M-|J|}{\tau-k}}{\binom{M}{\tau}} \cdot \frac{1}{N^{|J|-k}} \tag{2}
\end{equation*}
$$

(recall that for executions $e_{j_{k+1}}, \ldots, e_{j_{|J|}}$ a second challenge is chosen with probability $1 / N$ ).
Next, observe that once the challenges of the executions with correct witness are fixed and remain unchanged, we can compute the probability of obtaining a second acc using our formula of $\xi_{c \& c}$. That is, conditioned on the event that $k$ MPC preprocessings are opened and $|J|-k$ are not opened, it holds that

$$
\begin{align*}
& \operatorname{Pr}\left[\operatorname{acc} \left\lvert\,\left(\begin{array}{l}
\left.\left.\mathrm{Jeq} \mid c_{e_{j_{1}}}=\cdots=c_{e_{j_{k}}}=0 \wedge c_{e_{j_{k+1}}} \neq 0 \wedge \cdots \wedge c_{e_{j_{|J|}}} \neq 0\right)\right] \\
\quad \leq \frac{\binom{M-|J|-\bar{c}}{\tau-k}}{\binom{M-|J|}{\tau-k}} \cdot \frac{1}{N^{M-|J|-\tau+k-\bar{c}}}
\end{array} .\right.\right.\right.
\end{align*}
$$

(note that here $\bar{c}$ is the number of corrupted pre-processings only within the executions with bad witness. This means that $\bar{c} \leq c$ ). This holds since if the probability was higher, then a valid witness must have been used in some of the executions. However, in (3), the distribution is only over executions with an invalid witness.

Combining (2) and (3) together, we conclude that

$$
\begin{aligned}
\operatorname{Pr}[\operatorname{acc} & \left.\wedge \text { Jeq } \mid c_{e_{j_{1}}}=\cdots=c_{e_{j_{k}}}=0 \wedge c_{e_{j_{k+1}}} \neq 0 \wedge \cdots \wedge c_{e_{j_{|J|}}} \neq 0\right] \\
& \leq \frac{\binom{M-|J|-\bar{c}}{\tau-k}}{\binom{M-|J|}{\tau-k}} \cdot \frac{1}{N^{M-|J|-\tau+k-\bar{c}}} \cdot \frac{\binom{M-|J|}{\tau-k}}{\binom{M}{\tau}} \cdot \frac{1}{N^{|J|-k}} \\
& =\frac{\binom{M-|J|-\bar{c}}{\tau-k}}{\binom{M}{\tau}} \cdot \frac{1}{N^{M-\tau-\bar{c}}}<\frac{\binom{M-c *}{\tau}}{\binom{M}{\tau}} \cdot \frac{1}{N^{M-\tau-c *}}
\end{aligned}
$$

where the last inequality holds since $\bar{c} \leq c$ ( $c$ is the number of overall corrupted pre-processings, where $\bar{c}$ is the derived by looking only on bad processings of the executions), $k+(c-\bar{c}) \leq|J|(c-\bar{c}$ is the number of corrupted pre-processings for executions with good witness while $k$ is the number of such executions that are opened and so their pre-procssing must be correct) and so $k+c \leq|J|+\bar{c}$ which means that

$$
\binom{M-|J|-\bar{c}}{\tau-k} N^{\bar{c}} \leq\binom{ M-k-c}{\tau-k} N^{c} \leq\binom{ M-c}{\tau} N^{c} \leq\binom{ M-c *}{\tau} N^{c *} .
$$

Here, the second inequality follows because $\binom{x-1}{y-1}<\binom{x}{y}$ and the last step is due to the definition of $c *$.

Finally, we show that $\operatorname{Pr}[\operatorname{acc} \wedge \mathrm{Jeq}] \leq \frac{\binom{M-c *}{\tau}}{\binom{M}{\tau}} \cdot \frac{1}{N^{M-\tau-c *}}$. This easily holds since

$$
\begin{align*}
\operatorname{Pr}[\mathrm{acc} \wedge \mathrm{Jeq}]= &  \tag{4}\\
= & \sum_{Q \subseteq J}\left(\operatorname{Pr}\left[\mathrm{acc} \wedge \mathrm{Jeq} \mid \forall e \in Q: c_{e}=0 \wedge \forall e \in J \backslash Q: c_{e} \neq 0\right]\right. \\
& \left.\operatorname{Pr}\left[\forall e \in Q: c_{e}=0 \wedge \forall e \in J \backslash Q: c_{e} \neq 0\right]\right) \\
\leq & \frac{\binom{M-c *}{\tau}}{\binom{M}{\tau}} \cdot \frac{1}{N^{M-\tau-c *}} \sum_{Q \subseteq J} \operatorname{Pr}\left[\forall e \in Q: c_{e}=0 \wedge \forall e \in J \backslash Q: c_{e} \neq 0\right] \\
= & \frac{\binom{M-c *}{\tau}}{\binom{M}{\tau}} \cdot \frac{1}{N^{M-\tau-c *}} .
\end{align*}
$$

Here the last equality holds since $1=\sum_{Q \subseteq J} \operatorname{Pr}\left[\forall e \in Q: c_{e}=0 \wedge \forall e \in J \backslash Q: c_{e} \neq 0\right]$ which follows from the fact that the events whose probability we compute are all disjunct, but together they cover all possible challenges occurring for the elements in $J$.

Going back to (1), we have that $\delta<\epsilon+\frac{\binom{M-c *}{\tau}}{\binom{M}{\tau}} \cdot \frac{1}{N^{M-c *-\tau}}$ in contradiction to the assumption that $\delta=\xi(M, N, \tau)+\epsilon$. Thus, there must exists an execution $e$ with a valid witness, for which $\operatorname{Pr}\left[\operatorname{acc} \mid c_{e}^{\prime} \neq c_{e}\right] \geq \epsilon / M$.

Recall that our extractor tries to extract the witness from all executions until it succeeds to extract a correct witness from some execution. From Lemma 1, there exists an execution $e$ with a valid witness for which the probability of probing an accepting transcript that allows us to extract is higher than $\frac{\epsilon}{M}$. Thus, the expected number of steps until the witness is extracted is bounded by $\frac{M}{\epsilon}$. Note that $M$ depends only on the statistical security parameter but its size in independent of the common input $x$ held by the prover and the verifier (which is the circuit in our ZKPOK system). Thus, we conclude that if the success cheating probability is higher than $\xi_{c \& c}(M, N, \tau)$, then a valid witness can be extracted in $O\left(\frac{|x|}{\epsilon}\right)$ expected number of steps (recall that the extractor checks the validity of witnesses by running $C(\boldsymbol{w})$. Thus, the running time also depends on the common input $x$ which is allowed by the definition). This concludes the proof.

### 3.3 HVZKAoK Protocol Using Imperfect Preprocessing and Sacrificing

In this section, we present our second HVZKAoK protocol $\Pi_{\mathrm{sac}}$. In this protocol, instead of ensuring that the preprocessings are correct, we rely on a method where one "sacrifices" random multiplication triples and squares in order to verify the correctness of multiplication and square operations.

The idea of this protocol is that $\mathcal{P}$ does not simulate the execution of a protocol to compute multiplication and square gates, but rather simulates an execution of a protocol to verify that the shares on the output wires of these gates are correctly defined. This means that now $\mathcal{P}$ will first define and commit to sharings of the values on each wire of the circuit and then will simulate an execution of a verification protocol for multiplication and square gates (recall that for other
gates the computation is local and thus no verification is required). We begin by describing the verification methods used in our protocol.

Verification of a multiplication triple using another. This procedure is reminiscent to the recent work of [LN17]. Given a random triple ( $\llbracket a \rrbracket, \llbracket b \rrbracket, \llbracket c \rrbracket$ ), it is possible to verify the correctness of a triple ( $\llbracket x \rrbracket$, $\llbracket y \rrbracket, \llbracket z \rrbracket$ ), i.e., that $z=x \cdot y$, without revealing any information on either of the triples, in the following way:

1. The parties generate a random $\epsilon \in \mathbb{F}$.
2. The parties locally compute $\llbracket \alpha \rrbracket=\epsilon \llbracket x \rrbracket+\llbracket a \rrbracket$ and $\llbracket \beta \rrbracket=\llbracket y \rrbracket+\llbracket b \rrbracket$.
3. The parties run open $(\llbracket \alpha \rrbracket)$ and open $(\llbracket \beta \rrbracket)$ to obtain $\alpha$ and $\beta$.
4. The parties locally compute $\llbracket v \rrbracket=\epsilon \llbracket z \rrbracket-\llbracket c \rrbracket+\alpha \cdot \llbracket b \rrbracket+\beta \cdot \llbracket a \rrbracket-\alpha \cdot \beta$.
5. The parties run open $(\llbracket v \rrbracket)$ to obtain $v$ and accept iff $v=0$.

Observe that if both triples are correct multiplication triples (i.e., $z=x y$ and $c=a b$ ) then the parties will always accept since

$$
\begin{aligned}
v & =\epsilon \cdot z-c+\alpha \cdot b+\beta \cdot a-\alpha \cdot \beta \\
& =\epsilon \cdot x y-a b+(\epsilon \cdot x+a) b+(y+b) a-(\epsilon \cdot x+a)(y+b) \\
& =\epsilon \cdot x y-a b+\epsilon \cdot x b+a b+y a+b a-\epsilon \cdot x y-a y-\epsilon \cdot x b-a b \\
& =0
\end{aligned}
$$

In contrast, if one (or both) of the triples are incorrect, then the parties will accept with probability of at most $1 /|\mathbb{F}|$ as shown in Lemma 2 .

Lemma 2. If $(\llbracket a \rrbracket, \llbracket b \rrbracket, \llbracket c \rrbracket)$ or $(\llbracket x \rrbracket, \llbracket y \rrbracket, \llbracket z \rrbracket)$ is an incorrect multiplication triple then the parties output acc in the sub-protocol above with probability $\frac{1}{|F|}$.

Proof. Let $\Delta_{z}=z-x \cdot y$ and $\Delta_{c}=c-a \cdot b$. If the parties output acc then it means that $v=0$, i.e.,

$$
\begin{aligned}
v & =\epsilon \cdot z-c+\alpha \cdot b+\beta \cdot a-\alpha \cdot \beta \\
& =\epsilon \cdot\left(x y+\Delta_{z}\right)-\left(a b+\Delta_{c}\right)+(\epsilon \cdot x+a) b+(y+b) a-(\epsilon \cdot x+a)(y+b) \\
& =\epsilon \Delta_{z}-\Delta_{c}=0 .
\end{aligned}
$$

Next, consider the following cases:

- Case 1: $\Delta_{z}=0, \Delta_{c} \neq 0$. In this case, $v=-\Delta_{c} \neq 0$. Thus, the parties will not output acc in contradiction to the assumption.
- Case 2: $\Delta_{z} \neq 0, \Delta_{c} \neq 0$. In this case, $v=0$ iff $\epsilon=\Delta_{c} \cdot\left(\Delta_{z}\right)^{-1}$. Since $\epsilon$ is uniformly distributed over $\mathbb{F}$, this happens with probability $1 /|\mathbb{F}|$.
- Case 3: $\Delta_{z} \neq 0, \Delta_{c}=0$. In this case, $v=0$ iff $\epsilon=0$ which, yet again, happens with probability $1 /|\mathbb{F}|$.

Going over all cases, we conclude that the lemma follows.

Verification of a square pair using another. Similarly, one can use a random square ( $\llbracket b \rrbracket, \llbracket d \rrbracket)$ to verify the correctness of a given square $(\llbracket x \rrbracket, \llbracket z \rrbracket)$ by working as follows:

1. The parties generate a random $\epsilon \in \mathbb{F}$.
2. The parties locally compute $\llbracket \alpha \rrbracket=\llbracket x \rrbracket-\epsilon \llbracket b \rrbracket$.
3. The parties run open $(\llbracket \alpha \rrbracket)$ to obtain $\alpha$.
4. Each party locally computes $\llbracket v \rrbracket=\llbracket z \rrbracket-\alpha \cdot(\llbracket x \rrbracket+\epsilon \llbracket b \rrbracket)-\epsilon^{2} \llbracket d \rrbracket$.
5. The parties run open $(\llbracket v \rrbracket)$ to obtain $v$ and accept iff $v=0$

As before, if the squares are correct, i.e., $z=x^{2}$ and $d=b^{2}$, then the parties will accept, since

$$
\begin{aligned}
v & =z-\alpha \cdot(x+\epsilon \cdot b)-\epsilon^{2} \cdot d=x^{2}-(x-\epsilon \cdot b) \cdot(x+\epsilon \cdot b)-\epsilon^{2} \cdot b^{2} \\
& =x^{2}-x^{2}+\epsilon^{2} b^{2}-\epsilon^{2} b^{2}=0
\end{aligned}
$$

In contrast, if one of the random squares (or both) is incorrect, then the parties will accept with probability $\frac{2}{|F|}$. This is proved in Lemma 3.

Lemma 3. If $(\llbracket x \rrbracket, \llbracket z \rrbracket)$ or $(\llbracket b \rrbracket, \llbracket d \rrbracket)$ is an incorrect square, then the parties output acc in the subprotocol above with probability $\frac{2}{|F|}$.

Proof. Let $\Delta_{d}=d-b^{2}$ and $\Delta_{z}=z-x^{2}$ and assume that the parties output acc. This means that

$$
\begin{aligned}
v & =z-\alpha \cdot(x+\epsilon \cdot b)-\epsilon^{2} \cdot d=x^{2}+\Delta_{z}-(x-\epsilon \cdot b) \cdot(x+\epsilon \cdot b)-\epsilon^{2} \cdot\left(b^{2}+\Delta_{d}\right) \\
& =\Delta_{z}-\epsilon^{2} \cdot \Delta_{d}=0 .
\end{aligned}
$$

We consider the following three cases:

- Case 1: $\Delta_{z}=0, \Delta_{d} \neq 0$. In this case, $v=0$ iff $\epsilon^{2}=0 \bmod |\mathbb{F}|$, which holds iff $\epsilon=0$. Since $\epsilon$ is chosen randomly from $\mathbb{F}$, this holds with probability $\frac{1}{|\mathbb{F}|}<\frac{2}{|\mathbb{F}|}$.
- Case 2: $\Delta_{z} \neq 0, \Delta_{d} \neq 0$. In this case, $v=0$ iff $\epsilon^{2}=\Delta_{z} \cdot\left(\Delta_{d}\right)^{-1}$. Now, if $\epsilon=0$, then $v=\Delta_{z}$ and the parties will reject. Otherwise, since there are $\frac{|\mathbb{F}|-1}{2}$ squares in $\mathbb{F}$, then the parties will accept with probability $\frac{2}{|F|-1}$. Thus, overall the parties accept with probability
$\quad \frac{1}{\left\lvert\, \frac{1}{|F|} \cdot 0+\left(1-\frac{1}{|\vec{F}|}\right) \cdot \frac{2}{|F|-1}=\frac{2}{\mid[\mathbb{F} \cdot} .\right.}$
- Case 3: $\Delta_{z} \neq 0, \Delta_{d}=0$. In this case, $v=\Delta_{z} \neq 0$ and thus the parties will not output acc.

Going over all possible cases, we conclude that the lemma follows.
The protocol. Our second PoK protocol is formally described in Fig. 2.a and Fig. 2.b. In this protocol, the prover $\mathcal{P}$ first commits in Round 1 to sharings of the values on each wire of the circuit and to sharings of random multiplication triples and squares for $M$ independent executions. As in the previous protocol, we save communication by deriving all the random shares from a single seed. Then, in Round 2, the verifier $\mathcal{V}$ challenges $\mathcal{P}$ by choosing the randomness required for the verification procedure, i.e., an $\epsilon$ value for each multiplication and square gate. Upon receiving the challenge from $\mathcal{V}$, the prover $\mathcal{P}$ simulates $M$ executions of the verification protocol in Round 3 and commits to the view of the parties in each execution. Then, in Round $4, \mathcal{V}$ picks his second challenge by choosing, for each execution, $N-1$ parties whose view will be opened and tested. In Round $5, \mathcal{P}$ sends to $\mathcal{V}$ the seeds from which the randomness of the $N-1$ parties was derived and all the messages sent to these parties from the remaining party $P_{\bar{i}_{e}}$. As in the previous protocol,
for values that are fixed, i.e., inputs, multiplications and squares, the prover sends also an offset (which was committed in the first round) to "fix" the sharing to the correct value. As before, we further reduce the communication cost by hashing the commitments together and sending only the hash value. Finally, $\mathcal{V}$ accepts if and only if all commitments are correct, the view of each party was computed correctly, the verification procedures conclude with the parties holding a sharing of 0 and the output of the circuit is $\boldsymbol{y}$.

Cheating probability (soundness). We compute the probability that $\mathcal{V}$ outputs acc when $C(\boldsymbol{w}) \neq \boldsymbol{y}$. Observe that each of the $M$ executions is independent from the other executions. When considering a single execution, $\mathcal{P}$ can cheat in either computing the view of one of the parties or cheat in defining the shares on the output wire of a multiplication/square gate. In the former case, it will succeed with probability $\frac{1}{N}$ whereas in the latter case it will succeed with probability $\frac{1}{|\mathbb{F}|}$ or $\frac{2}{|\mathbb{F}|}$ (note that if there are gates of both types in the circuit, it will be more beneficial for $\mathcal{P}$ to cheat in square gates since $\left.\frac{2}{|F|}>\frac{1}{|F|}\right)$. Furthermore, the best strategy for the prover is to first cheat in multiplication/square gates and then if it didn't receive the desired challenge that will cause the verification process to end successfully, it can manipulate one of the parties' view. Thus, if there are square gates in the circuit, then the overall cheating probability is

$$
\xi_{\mathrm{sac}}(M, N)=\left(\frac{2}{|\mathbb{F}|}+\left(1-\frac{2}{|\mathbb{F}|}\right) \cdot \frac{1}{N}\right)^{M}=\left(\frac{2 N+|\mathbb{F}|-2}{|\mathbb{F}| \cdot N}\right)^{M} .
$$

Similarly, if there are multiplication gates in the circuit (and no square gates), then the cheating probability is

$$
\xi_{\mathrm{sac}}(M, N)=\left(\frac{1}{|\mathbb{F}|}+\left(1-\frac{1}{|\mathbb{F}|}\right) \cdot \frac{1}{N}\right)^{M}=\left(\frac{N+|\mathbb{F}|-1}{|\mathbb{F}| \cdot N}\right)^{M}
$$

Formal proof. We are now ready to prove formally that the protocol $\Pi_{\mathrm{sac}}$ is an honest verifier zero-knowledge argument of knowledge.

Theorem 3. Let $H$ be a collision-resistant hash function and let com be a secure commitment scheme. Then the protocol $\Pi_{\mathrm{sac}}$ is a HVZKAoK with knowledge error (soundness) $\xi_{\mathrm{sac}}(M, N)$.

Proof. We prove that each of the three properties defined in Section 2.2 are satisfied by our protocol. Completeness. This follows trivially from the correctness of the MPC protocol.
Honest Verifier Zero Knowledge. We construct a simulator $\mathcal{S}$ for our protocol which works as follows:

1. For $e=1, \ldots, M, \mathcal{S}$ chooses random challenges $\epsilon_{e, k}$ for each multiplication and square gate $g_{k}$ in $C$ and random $\bar{i}_{e} \in[N]$.
2. $\mathcal{S}$ simulates the first step of the protocol: for each $e \in[M]$ and $i \in[N] \backslash \bar{i}_{e}$, it defines the shares of the random multiplication triples and squares and the shares on each wire of the circuit as an honest prover would do. For generating state $e, \mathcal{S}$ chooses ransom $\Delta_{e, k}$ for each random triple/square gate and random $\varphi_{e, k}$ for each multiplication/square gate and random $\phi_{e, k}$ for each input wire $k$. Then, it computes state $e, i$ and $\Gamma_{e, i}$ as an honest verifier. For $\Gamma_{e, \bar{i}_{r}}$, it uses the 0 -string as the committed message. Finally, $\mathcal{S}$ computes $h_{\Gamma}$ as an honest prover.
3. $\mathcal{S}$ simulates Round 3:
(a) First, it initializes an empty string view ${ }_{e}$ for all $e \in[M]$.

## The "Sacrificing" Based HVZK Argument $\Pi_{\mathrm{sac}}$ (Part 1)

Let $H$ be a collision-resistant hash function and com be a commitment scheme
Inputs: Both the prover $\mathcal{P}$ and the verifier $\mathcal{V}$ hold $\boldsymbol{y} \in \mathbb{F}^{n_{\text {out }}}$, a description of an arithmetic circuit $C$ over a finite field $\mathbb{F}$ and parameters $M, N$; the prover $\mathcal{P}$ also hold a witness $\boldsymbol{w} \in \mathbb{F}^{n_{\mathrm{in}}}$ such that $C(\boldsymbol{w})=\boldsymbol{y}$.

## Round 1:

1. For each $e \in[M]$ :
(a) $\mathcal{P}$ initializes an empty string state $e$.
(b) For each $i \in[N], \mathcal{P}$ initializes an empty string state $e_{e, i}$.
(c) $\mathcal{P}$ chooses a master seed $\mathrm{sd}_{e}$ and seeds $\mathrm{sd}_{e, 1}, \ldots, \mathrm{sd}_{e, N}$.

Then, for each $i \in[N]$, it sets state $e_{e, i} \leftarrow \operatorname{sd}_{e, i}$.
(d) $\mathcal{P}$ prepares the pre-processing data:

- For each multiplication gate $g_{k} \in \mathcal{G}$ :
i. For each $i \in[N], \mathcal{P}$ uses $\operatorname{sd}_{e, i}$ to generate $a_{e, k, i}, b_{e, k, i}, c_{e, k, i}$. This shares define the random sharings $\llbracket a_{e, k} \rrbracket, \llbracket b_{e, k} \rrbracket$ and $\llbracket c_{e, k} \rrbracket$, where $a_{e, k}=\sum_{i=1}^{N} a_{e, k, i}, b_{e, k}=\sum_{i=1}^{N} b_{e, k, i}$ and $c_{e, k}=\sum_{i=1}^{N} c_{e, k, i}$.
ii. $\mathcal{P}$ sets $\Delta_{e, k}=a_{e, k} \cdot b_{e, k}-c_{e, k}$ and state $_{e} \leftarrow \operatorname{state}_{e} \| \Delta_{e, k}$.
iii. $\mathcal{P}$ defines the random triple for this gate to be $\left(\llbracket a_{e, k} \rrbracket, \llbracket b_{e, k} \rrbracket, \llbracket c_{e, k} \rrbracket+\Delta_{e, k}\right)$.
- For each square gate $g_{k} \in \mathcal{G}$ :
i. For each $i \in[N], \mathcal{P}$ uses $\mathrm{sd}_{e, i}$ to generate $b_{e, k, i}$ and $d_{e, k, i}$. This shares define the random sharings $\llbracket b_{e, k} \rrbracket$ and $\llbracket d_{e, k} \rrbracket$, where $b_{e, k}=\sum_{i=1}^{N} b_{e, k, i}$ and $d_{e, k}=\sum_{i=1}^{N} d_{e, k, i}$.
ii. $\mathcal{P}$ sets $\Delta_{e, k}=\left(b_{e, k}\right)^{2}-d_{e, k}$ and state $e_{e} \leftarrow$ state $_{e} \| \Delta_{e, k}$.
iii. $\mathcal{P}$ defines the random square for this gate to be $\left(\llbracket b_{e, k} \rrbracket, \llbracket d_{e, k} \rrbracket+\Delta_{e, k}\right)$.
(e) $\mathcal{P}$ chooses a linear random sharing of the inputs:
i. For each $i \in[N], \mathcal{P}$ uses sd ${ }_{e, i}$ to generate $w_{e, 1, i}, \ldots, w_{e, n_{\text {in }}, i}$.

This shares define the random sharings $\llbracket w_{e, 1} \rrbracket, \ldots, \llbracket w_{e, n_{\text {in }}} \rrbracket$, where $w_{e, k}=\sum_{i=1}^{N} w_{e, k, i}$.
ii. For each input wire $k \in \mathcal{I}, \mathcal{P}$ sets $\phi_{e, k}=w_{k}-\sum_{i=1}^{N-1} w_{e, k, i}$ and state $e_{e} \leftarrow$ state $_{e} \| \phi_{e, k}$. The sharing on this wire is defined to be $\llbracket w_{e, k} \rrbracket+\phi_{e, k}$.
(f) $\mathcal{P}$ simulates the computation of the circuit $C$ going gate-by-gate in topological order. In particular:

- For each addition or multiplication-by-a-constant gates, $\mathcal{P}$ computes the parties' output shares via the local operation described in Section 3.1.
- For each multiplication gate $g_{k} \in \mathcal{G}$ with sharing $\llbracket x_{k} \rrbracket$ and $\llbracket y_{k} \rrbracket$ on its input wires:
i. For each $i \in[N], \mathcal{P}$ uses $\operatorname{sd}_{e, i}$ to generate $z_{e, k, i}$.

These shares define the random sharing $\llbracket z_{e, k} \rrbracket$ where $z_{e, k}=\sum_{i=1}^{N} z_{e, k, i}$.
ii. $\mathcal{P}$ sets: $\varphi_{e, k}=x_{k} \cdot y_{k}-\sum_{i=1}^{N} z_{e, k, i}$ and state $e_{e} \leftarrow \operatorname{state}_{e} \| \varphi_{e, k}$.

The sharing on the output wire is defined to be $\llbracket z_{e, k} \rrbracket+\varphi_{e, k}$.

- For each square gate $g_{k} \in \mathcal{G}$ with sharing $\llbracket x_{k} \rrbracket$ on its input wire:
i. For each $i \in[N], \mathcal{P}$ uses $\mathrm{sd}_{e, i}$ to generate $z_{e, k, i}$.

These shares define the random sharing $\llbracket z_{e, k} \rrbracket$ where $z_{e, k}=\sum_{i=1}^{N} z_{e, k, i}$.
ii. $\mathcal{P}$ sets: $\varphi_{e, k}=\left(x_{k}\right)^{2}-\sum_{i=1}^{N} z_{e, k, i}$ and state $e_{e} \leftarrow$ state $_{e} \| \varphi_{e, k}$.

The sharing on the output wire is defined to be $\llbracket z_{e, k} \rrbracket+\varphi_{e, k}$.
(g) $\mathcal{P}$ uses sd ${ }_{e}$ to generate $r_{e} \in\{0,1\}^{\lambda}$ and computes $\Gamma_{e}=\operatorname{com}\left(\right.$ state $\left._{e}, r_{e}\right)$.
(h) For each $i \in[N], \mathcal{P}$ uses $\operatorname{sd}_{e, i}$ to generate $r_{e, i} \in\{0,1\}^{\lambda}$ and then it computes $\Gamma_{e, i}=\operatorname{com}\left(\operatorname{state}_{e, i}, r_{e, i}\right)$.
(i) Finally, $\mathcal{P}$ computes $h_{e}=H\left(\Gamma_{e}\left\|\Gamma_{e, 1}\right\| \cdots \| \Gamma_{e, N}\right)$.
2. $\mathcal{P}$ computes $h_{\Gamma}=H\left(h_{1}\|\cdots\| h_{M}\right)$ and sends it to $\mathcal{V}$.

Round 2: $\mathcal{V}$ chooses $\operatorname{sd}_{\iota}$. Then, for each $e \in[M]$ it uses $\operatorname{sd}_{\iota}$ to generate random coefficients $\epsilon_{e, k}$ for each multiplication and square gate $g_{k}$ in $C$. Finally, $\mathcal{V}$ sends sd ${ }_{\iota}$ to $\mathcal{P}$.

Fig. 2.a: HVZK Argument using "Sacrificing"
(b) For each $e \in[M]$, it chooses random $\alpha_{e, k, i}$ and $\beta_{e, k, i}$ for each multiplication and square gate $g_{k}$ and $i \in[N]$ and adds them to view (note that here it chooses shares for all parties including party $\bar{i}_{e}$ ).

## Round 3:

1. $\mathcal{P}$ chooses a random seed $\mathrm{sd}_{E}$.
2. $\mathcal{P}$ uses sd ${ }_{\iota}$ to generate random coefficients $\epsilon_{e, k}$ as the verifier would do.
3. For each $e \in[M]$ :
(a) $\mathcal{P}$ initializes an empty string view $_{e}$.
(b) For each multiplication gate $g_{k}$ (in topological order), $\mathcal{P}$ simulates the verification procedure described in the text using $\epsilon_{e, k}$. In addition, it sets: view $e_{e} \leftarrow \operatorname{view}_{e}\left\|\alpha_{e, k, 1}\right\| \cdots\left\|\alpha_{e, k, N}\right\| \beta_{e, k, 1}\|\cdots\| \beta_{e, k, N}$.
(c) For each square gate $g_{k}$ (in topological order), $\mathcal{P}$ simulates the verification procedure described in the text using $\epsilon_{e, k}$. In addition, it sets: view $_{e} \leftarrow \operatorname{view}_{e}\left\|\alpha_{e, k, 1}\right\| \cdots \| \alpha_{e, k, N}$.
(d) Let $v_{e, k, i}$ be the sharing held by party $P_{i}$ at the end of the verification procedure of gate $g_{k}$.

Then, for each $i \in[N], \mathcal{P}$ sets: view $_{e} \leftarrow \operatorname{view}_{e}\left\|v_{e, k, 1}\right\| \cdots \| v_{e, k, N}$.
(e) Let $o_{e, 1, i}, \ldots, o_{e, n_{\text {out }}, i}$ be the shares on the output wires of $C$ held by party $P_{i}$. Then, for output wire $k \in \mathcal{O}, \mathcal{P}$ sets: view $_{e} \leftarrow \operatorname{view}_{e}\left\|o_{e, k, 1}\right\| \cdots \| o_{e, k, N}$.
4. $\mathcal{P}$ uses sd ${ }_{E}$ to generate $g_{e} \in\{0,1\}^{\lambda}$ and computes $\Pi_{e}=\operatorname{com}\left(\right.$ view $\left._{e}, g_{e}\right)$.
5. $\mathcal{P}$ computes $h_{\pi}=H\left(\Pi_{1}\|\cdots\| \Pi_{M}\right)$ and sends it to $\mathcal{V}$.

Round 4: For each $e \in[M]: \mathcal{V}$ chooses a random $\bar{i}_{e} \in[N]$ and sends it to $\mathcal{P}$.
Round 5: For each $e \in[M]$ :
Let $I_{e}=[N] \backslash\left\{\bar{i}_{e}\right\}$. Then, $\mathcal{P}$ sends the following to $\mathcal{V}: \operatorname{sd}_{E}, \operatorname{sd}_{e},\left\{\operatorname{sd}_{e, i}\right\}_{i \in I_{e}}, \Gamma_{e, \bar{i}_{e}},\left\{\phi_{e, k}\right\}_{k=1}^{n_{\text {in }}}$, the tuple
$\left(\Delta_{e, k}, \varphi_{e, k}, \alpha_{e, k, \bar{i}_{e}}, \beta_{e, k, \bar{i}_{e}}, v_{e, k, \bar{i}_{e}}\right)$ for each multiplication or square gate $g_{k}$, and $o_{e, 1, \bar{i}_{e}}, \ldots, o_{e, n_{\text {out }}, \bar{i}_{e}}$.
Output: $\mathcal{V}$ output acc iff all the following checks succeeds:

1. For each $e \in[M], \mathcal{V}$ uses $\left\{\operatorname{sd}_{e, i}\right\}_{i \in I_{e}}$ and the tuple received for each multiplication and square gate to compute the shares of the parties in $I_{e}$ on each wire and their shares of each random triple and square. Then, it uses sd ${ }_{e}$ to compute $\Gamma_{e}$ and uses $\left\{\operatorname{sd}_{e, i}\right\}_{i \in I_{e}}$ to compute $\left\{\Gamma_{e, i}\right\}_{i \in I_{e}}$ as an honest prover would do. Then, using $\Gamma_{e, \bar{i}_{e}}$ received from $\mathcal{P}$, the verifier $\mathcal{V}$ computes $h_{e}$. Then, $\mathcal{V}$ checks that $h_{\Gamma}=H\left(h_{1}\|\cdots\| h_{M}\right)$.
2. For each $e \in[M], \mathcal{V}$ computes view $w_{e}$ by going gate-by-gate in topological order and simulating the verification procedure using the tuple received from $\mathcal{P}$ for each multiplication and square gate, and using $\left\{o_{e, k, \bar{i}_{e}}\right\}_{k=1}^{n_{\text {out }}}$. Then, it computes $\Pi_{e}$ as a honest prover would do. Finally, $\mathcal{V}$ checks that $h_{\pi}=H\left(\Pi_{1}\|\cdots\| \Pi_{M}\right)$.
3. For each $e \in[M]$ and multiplication/square gate $g_{k}, \mathcal{V}$ checks that $\sum_{i=1}^{N} v_{e, k, i}=0$
4. For each $e \in[M]$, for each $k \in \mathcal{O}, \mathcal{V}$ checks that $\sum_{i=1}^{N} o_{e, k, i}=y_{k}$

Fig. 2.b: HVZK Argument using "Sacrificing", continued
(c) For each $e \in[M]$, multiplication/square gate $g_{k}$ and $i \in[N] \backslash\left\{\bar{i}_{e}\right\}, \mathcal{S}$ computes $v_{e, k, i}$ as an honest prover. Then, it sets $v_{e, k, \bar{i}_{e}}$ such that $\sum_{i=1}^{n} v_{e, k, i}=0$ and adds $v_{e, k, i}$ for all $i \in[N]$ to view $_{e}$.
(d) For each $e \in[M], i \in[N] \backslash\left\{\bar{i}_{e}\right\}$ and $k \in\left[n_{\text {out }}\right], \mathcal{S}$ computes $o_{e, k, i}$ as an honest prover. Then, it sets $o_{e, k, \bar{i}_{e}}$ such that $\sum_{i=1}^{n} o_{e, k, i}=y_{k}$ and adds $o_{e, i}$ for all $i \in[N]$ to view .
(e) $\mathcal{S}$ computes $\left\{\Pi_{e}\right\}_{e \in[M]}$ and $h_{\pi}$ as an honest prover would do.
4. $\mathcal{S}$ outputs the transcript of the protocol.

The only difference between the simulation and a real execution is the way the commitments to the shares of party $\bar{i}_{e}$ are computed and the way $\Delta_{e, k}, \phi_{e, k}, \varphi_{e, k}, \alpha_{e, k, i}$ and $\beta_{e, k, i}$ are chosen (in the simulation they are chosen uniformly whereas in the real execution they computed deterministically as the slack between random sharings and actual values that are on the wires). However, from the hiding property of the commitment the former does not change and since $\Delta_{e, k}, \phi_{e, k}, \varphi_{e, k}, \alpha_{e, k, i}$ and $\beta_{e, k, i}$ are all uniformly distributed over $\mathbb{F}$ in both executions, the latter does not change as well
(since all of them are defined by a random sharing which is kept secret from the verifier). Therefore, we conclude that the transcript generated by $\mathcal{S}$ is indistinguishable from a real execution.

Knowledge Soundness. As in the proof of Theorem 2, we assume for simplicity that the commitment scheme is perfectly binding and that there are no collisions for the hash function. In addition, we consider the case where there are only multiplication gates in the circuit. The proof for the case when there are also square gates is similar with the appropriate changes. Recall that we denote by $n_{m u l}$ the number of multiplication gates in the circuit. Clearly, a cheating prover might not need to cheat in every multiplication gate in order to make the output correct, as this depends on the structure of the circuit. Thus, throughout the proof, we assume that the prover cheats in $k$ multiplications and that this suffices to totally manipulate the output.

As the challenges of the different MPC instances are independent, we proceed by analyzing a single instance first (i.e. let $\left.M=1, \xi_{\text {sac }}(1, N)=\xi_{\text {sac }}(N)\right)$ and then argue how this generalized to arbitrary choices of $M$. We begin by showing that if the cheating probability $\delta(x)$ is higher than $\xi_{\text {sac }}(N)$, then the prover must have committed to the correct witness $\boldsymbol{w}$.

Lemma 4. Let $M=1$. If $\delta(x)>\xi_{\text {sac }}(N)$, then the prover $\mathcal{P}^{*}$ must have committed to the correct witness $\boldsymbol{w}$ in Round 1.

Proof. To see this, let $\mathbf{G}$ be a $0 / 1$ matrix, where each column corresponds to a possible first challenge of the verifier, each row corresponds to a possible second challenge and the bit in each cell indicates whether the verifier outputs acc or not. It follows that there are $|\mathbb{F}|^{n_{m u l}}$ columns and $N$ rows in $\mathbf{G}$. Thus, if $\delta(x)>\frac{N+|\mathbb{F}|-1}{N \cdot|\mathbb{F}|}$, then the number of ' 1 ' entries in $\mathbf{G}$ is larger than $(N+\mathbb{F}-1) \cdot|\mathbb{F}|^{n_{m u l}-1}$. Now, assume for the sake of contradiction that the prover has cheated in $k>0$ multiplication triples. By Lemma 2 there is one challenge for each of these gates for which the verifier won't detect cheating. With loss of generality we say that the challenges $\epsilon_{1}, \ldots, \epsilon_{k}$ are for the corrupted gates and challenges $\epsilon_{k+1}, \ldots, \epsilon_{n_{m u l}}$ are for the remaining gates, and denote the $k$ challenges, for which the verification procedure ends successfully, by $\bar{\epsilon}_{1}, \ldots, \bar{\epsilon}_{k}$. Thus, there are $|\mathbb{F}|^{n_{\text {mul }}-k}$ columns that can be filled with ' 1 ' entries (in each such column the challenges $\epsilon_{1}, \ldots, \epsilon_{k}$ are fixed to $\bar{\epsilon}_{1}, \ldots, \bar{\epsilon}_{k}$ and so it is required to choose a challenge only for the remaining $n_{m u l}-k$ multiplication gates). For the remaining columns, we claim that there must exist at least one column with more than a single ' 1 ' entry. This holds since otherwise the number of ' 1 ' entries in $\mathbf{G}$ is bounded by $|\mathbb{F}|^{n_{m u l}-k} \cdot N+\left(|\mathbb{F}|^{n_{m u l}}-|\mathbb{F}|^{n_{m u l}-k}\right) \cdot 1=\left(N+|\mathbb{F}|^{k}-1\right) \cdot|\mathbb{F}|^{n_{m u l}-k}$ (for $|\mathbb{F}|^{n_{m u l}-k}$ columns all $N$ entries are filled with ' 1 ' and for the other columns there is a single ' 1 ' entry). However, this is in contradiction to our assumption that there are more than $(N+|\mathbb{F}|-1) \cdot|\mathbb{F}|^{n_{m u l}-1}$ entries with ' 1 ' in $\mathbf{G}$, since

$$
\begin{equation*}
(N+|\mathbb{F}|-1) \cdot|\mathbb{F}|^{n_{m u l}-1} \geq\left(N+|\mathbb{F}|^{k}-1\right) \cdot|\mathbb{F}|^{n_{m u l}-k} \tag{5}
\end{equation*}
$$

(to see that this inequality holds, observe that it is equivalent to $(N+|\mathbb{F}|-1) \cdot|\mathbb{F}|^{k-1} \geq N+|\mathbb{F}|^{k}-1$ which is equivalent to $N \cdot|\mathbb{F}|^{k-1}+|\mathbb{F}|^{k}-|\mathbb{F}|^{k-1} \geq N+|\mathbb{F}|^{k}-1$ which can be written as $(N-1) \cdot|\mathbb{F}|^{k-1} \geq$ $N-1$ which holds for any $k>0$ ). We conclude that there must be a column with challenges $\epsilon_{1}^{\prime}, \ldots, \epsilon_{n_{m u l}}^{\prime}$ such that $\exists i \in[k]: \epsilon_{i}^{\prime} \neq \bar{\epsilon}_{i}$, that has at least two ' 1 ' entries. That is, for the first challenge which corresponds to that column, the prover can answer successfully at least two different second challenges. Let $c_{1}$ and $c_{2}$ the challenges corresponding to such two accepting transcripts, i.e., both have the same first challenge $\epsilon_{1}^{\prime}, \ldots, \epsilon_{n_{m u l}}^{\prime}$ and different second challenge $i_{1}$ and $i_{2}$. Note that it is possible to compute a witness from $c_{1}$ and $c_{2}$, since two different second challenges reveals the inputs of all the parties. Let $\boldsymbol{w}^{*}$ be the witness computed from $c_{1}$ and $c_{2}$. We argue that this is a valid witness, i.e., that $C(\boldsymbol{w})=\boldsymbol{y}$. This holds since when the verifier accepts, $C(\boldsymbol{w}) \neq \boldsymbol{y}$ only
if one of two events occur: (a) one of the parties' views is inconsistent but it is not chosen to be opened by $\mathcal{V}$ or (b) there are multiplication gates that were not correctly computed but $\mathcal{V}$ chooses the one single challenge that makes the verification procedure end successfully for each of these gates. However, the first event does not happen, since $c_{1}$ and $c_{2}$ differs in the second challenge, and thus all parties' views are covered and verified in the two executions. The second event does not happen as well for $c_{1}$ and $c_{2}$ since for both of them the first challenge has the property that $\exists i \in[k]: \quad \epsilon_{i}^{\prime} \neq \bar{\epsilon}_{i}$ (recall that $\bar{\epsilon}_{i}$ is the only challenge for the corrupted gate $i$ that makes the verification procedure succeed). This contradicts our assumption on the number of multiplication triples which are incorrect. We conclude that $\boldsymbol{w}$ must be a valid witness and thus the claim follows.

In the protocol however we will have $M>1 \mathrm{MPC}$ instances. We therefore define the set

$$
\left.\left.S=\left\{\begin{array}{l|l}
\tau=\left(\begin{array}{c}
\epsilon_{1}^{(1)}, \ldots, \epsilon_{n_{m u l}}^{(M)} \\
i_{1}, \ldots, i_{M}
\end{array}\right.
\end{array}\right) \in \mathbb{F}^{n_{m u l} \cdot M} \times[N]^{M} \right\rvert\, \begin{array}{l}
\tau \text { is challenge vector } \\
\text { of accepting transcript }
\end{array}\right\}
$$

where $\left.\tau\right|_{j}=\left(\epsilon_{1}^{(j)}, \ldots, \epsilon_{n_{m u l}}^{(j)}, i_{j}\right)$ denotes the challenges of MPC instance $j$ of the challenge vector $\tau$ and $\left.\tau\right|_{j} ^{P}=i_{j}$ denotes the choice of party which is not opened in the j-th MPC protocol. Furthermore, define $S_{1}, \ldots, S_{M}$ where

$$
S_{j}=\left\{\left(\epsilon_{1}^{(j)}, \ldots, \epsilon_{n_{m u l}}^{(j)}, i_{j}\right) \in \mathbb{F}^{n_{m u l}} \times[N]|\exists \tau \in S \wedge \tau|_{j}=\left(\epsilon_{1}^{(j)}, \ldots, \epsilon_{n_{m u l}}^{(j)}, i_{j}\right)\right\}
$$

We start out with the fact that $\delta(x)>\xi_{\mathrm{sac}}(M, N)$ and so $|S|=\delta(x) \cdot\left(|\mathbb{F}|^{n_{m u l}} \cdot N\right)^{M}>\xi_{\mathrm{sac}}(M, N)$. $\left(|\mathbb{F}|^{n_{m u l}} \cdot N\right)^{M}$, which follows from the definition of $S$. Now, assume that $\forall j \in[M]:\left|S_{j}\right| \leq$ $\xi(N) \cdot\left(|\mathbb{F}|^{n_{m u l}} \cdot N\right)$. Since $S=\left\{\tau|\forall j \in[M]: \tau|_{j} \in S_{j}\right\}$ it must hold that $|S| \leq\left|S_{1}\right| \cdots\left|S_{M}\right|$ because every $\tau$ must be a combination of different $\left.\tau\right|_{j}$. But that implies

$$
|S| \leq \xi_{\text {sac }}(N)^{M} \cdot\left(|\mathbb{F}|^{n_{m u l}} \cdot N\right)^{M}=\xi_{\text {sac }}(M, N) \cdot\left(|\mathbb{F}|^{n_{m u l}} \cdot N\right)^{M}
$$

in contradiction to our assumption on the upper-bound of each $\left|S_{j}\right|$. Therefore, there must exist a $j \in[M]$ such that $\left|S_{j}\right|>\xi_{\text {sac }}(N) \cdot\left(|\mathbb{F}|^{n_{m u l}} \cdot N\right)$. From Lemma 4 it thus follows that this MPC instance must have the correct witness committed:

Corollary 1. If $\delta(x)>\xi_{\mathrm{sac}}(M, N)$, then the prover $\mathcal{P}^{*}$ must have committed to the correct witness $\boldsymbol{w}$ in Round 1 in at least one of the MPC instances.

Observe that once a correct witness is committed to in the first round in one of the MPC instances, then two accepting transcripts where the second challenge is different for such an instance are sufficient to extract it. We therefore can define an extractor $\mathcal{E}$ (similar to the one in the proof of the previous protocol) which works as follows:

1. Choose random vectors $\tau \in \mathbb{F}^{n_{m u l} \cdot M} \times[N]^{M}$ until $\tau \in S$ was found.
2. For each $j \in[M]$ run the following in parallel:
(a) Choose random vectors $\tau^{\prime} \in \mathbb{F}^{n_{m u l} \cdot M} \times[N]^{M}$ such that $\left.\tau^{\prime}\right|_{j} ^{P} \neq\left.\tau\right|_{j} ^{P}$ until $\tau^{\prime} \in S$ was found.
(b) Extract $\boldsymbol{w}_{j}$ from MPC instance $j$ using $\left.\tau\right|_{j},\left.\tau^{\prime}\right|_{j}$. If $C(\boldsymbol{w})=\boldsymbol{y}$ then output $\boldsymbol{w}$ and stop, otherwise continue.

We now prove that the expected running time of our extractor satisfies the security definition.

Lemma 5. Let $\delta(x)=\xi_{\mathrm{sac}}(M, N)+\varepsilon$. Then, the expected running time of extractor $\mathcal{E}$ is $O(M|x| / \varepsilon)$.

Proof. The proof is very similar to the one of the previous protocol. We will find the first $\tau \in S$ in time $O(|x| / \varepsilon)$ by definition. Thus, let us argue that the overall expected runtime until we find a correct witness $\boldsymbol{w}$ is $M|x| / \varepsilon$ as in the previous proof.

By Corollary 1 there must be at least one MPC instance with a correct witness committed. Let $k \geq 1$ be the overall number of these correct committed witnesses, and without loss of generality they are in MPC instances $1, \ldots, k$. As argued above, we can extract if we find $\tau^{\prime}$ such that $\left.\tau^{\prime}\right|_{j} ^{P} \neq\left.\tau\right|_{j} ^{P}$ for $j \in[k]$. For the sake of contradiction, assume that $\operatorname{Pr}\left[\operatorname{acc}\left|\tau^{\prime}\right|_{j}^{P} \neq\left.\tau\right|_{j} ^{P}\right]<\varepsilon / M$ for all $j \in[k]$. We can rewrite the success probability as

$$
\begin{align*}
\delta(x)=\operatorname{Pr}[\operatorname{acc}]= & \operatorname{Pr}\left[\left.\operatorname{acc} \wedge \tau^{\prime}\right|_{1} ^{P}=\left.\left.\tau\right|_{1} ^{P} \wedge \cdots \wedge \tau^{\prime}\right|_{k} ^{P}=\left.\tau\right|_{k} ^{P}\right]+ \\
& \operatorname{Pr}\left[\operatorname{acc} \wedge\left(\left.\tau^{\prime}\right|_{1} ^{P} \neq\left.\left.\tau\right|_{1} ^{P} \vee \cdots \vee \tau^{\prime}\right|_{k} ^{P} \neq\left.\tau\right|_{k} ^{P}\right)\right] \\
\leq & \operatorname{Pr}\left[\left.\operatorname{acc} \wedge \tau^{\prime}\right|_{1} ^{P}=\left.\left.\tau\right|_{1} ^{P} \wedge \cdots \wedge \tau^{\prime}\right|_{k} ^{P}=\left.\tau\right|_{k} ^{P}\right]+\sum_{j=1}^{k} \operatorname{Pr}\left[\left.\operatorname{acc} \wedge \tau^{\prime}\right|_{j} ^{P} \neq\left.\tau\right|_{j} ^{P}\right] \\
& <\operatorname{Pr}\left[\left.\operatorname{acc} \wedge \tau^{\prime}\right|_{1} ^{P}=\left.\left.\tau\right|_{1} ^{P} \wedge \cdots \wedge \tau^{\prime}\right|_{k} ^{P}=\left.\tau\right|_{k} ^{P}\right]+M \cdot(\varepsilon / M)  \tag{6}\\
= & \operatorname{Pr}\left[\operatorname{acc}\left|\tau^{\prime}\right|_{1}^{P}=\left.\left.\tau\right|_{1} ^{P} \wedge \cdots \wedge \tau^{\prime}\right|_{k} ^{P}=\left.\tau\right|_{k} ^{P}\right] \cdot \operatorname{Pr}\left[\left.\tau^{\prime}\right|_{1} ^{P}=\left.\left.\tau\right|_{1} ^{P} \wedge \cdots \wedge \tau^{\prime}\right|_{k} ^{P}=\left.\tau\right|_{k} ^{P}\right]+\varepsilon \\
\leq & \left(\frac{(N+|\mathbb{F}|-1)}{|\mathbb{F}| \cdot N}\right)^{M-k} \cdot \frac{1}{N^{k}}+\varepsilon  \tag{7}\\
\leq & \frac{(N+|\mathbb{F}|-1)^{M}}{|\mathbb{F}|^{M} \cdot N^{M}}+\varepsilon  \tag{8}\\
= & \xi_{\text {sac }}(N, M)+\varepsilon=\delta(x) .
\end{align*}
$$

Here Eq. (6) uses the assumed upper-bound on all $\operatorname{Pr}\left[\operatorname{acc}\left|\tau^{\prime}\right|_{j}^{P} \neq\left.\tau\right|_{j} ^{P}\right]$ while Eq. (7) follows from the assumption that the $M-k$ instances have an incorrect witness and so by Lemma 4 , the acceptance probability in each of them is bounded by $\xi_{\text {sac }}(N)$ and since the probability that the second challenge in the other $k$ instances remains the same is $\frac{1}{N^{k}}$. Finally, Eq. (8) follows since $\frac{(N+|\mathbb{F}|-1)^{M-k}}{(\mid \mathbb{F} \cdot \cdot N)^{M-k} \cdot N^{k}}=\frac{(N+|\mathbb{F}|-1)^{M-k}}{|\mathbb{F}|^{M-k} \cdot N^{M}}$ and so $\frac{(N+|\mathbb{F}|-1)^{M-k}}{|\mathbb{F}|^{M-k} N^{M}}<\frac{(N+|\mathbb{F}|-1)^{M}}{|\mathbb{F}|^{M} \cdot N^{M}}$ since $|\mathbb{F}|^{k}<(N+|\mathbb{F}|-1)^{k}$, which holds because $N>1$ and $k>0$.

The resulting contradiction implies that there exists an instance $j$ with valid witness such that $\operatorname{Pr}\left[\operatorname{acc}\left|\tau^{\prime}\right|_{j}^{P} \neq\left.\tau\right|_{j} ^{P}\right] \geq \varepsilon / M$ which means that $\mathcal{E}$ will find the correct witness in expected runtime that is bounded by $O(M|x| / \varepsilon)$ as required (recall that $|x|$ in our protocol is the size of the circuit $C$ ).

We have proved that if the cheating probability is higher then $\xi_{\mathrm{sac}}(M, N)$ by $\varepsilon$, then the prover must have committed to the correct witness and this can be extracted from him with expected running time that is bounded by $O(M|x| / \varepsilon)$. Note that $M$ is independent from the common input and depends only on the statistical security parameter, and thus given a security parameter $M$ can be viewed as a constant. This concludes the proof.

### 3.4 Additional Optimizations

Reducing the number of seeds sent by the prover. In both of our protocols, the prover is required to send $N-1$ seeds for each execution $e$ that was not chosen to be opened. Each of these seeds is used to generate the randomness of one party throughout the execution. As in [KKW18], we can reduce the number of seeds that are sent from $N-1$ to $\log N$ by using a binary tree. Specifically, let $\operatorname{sd}_{e}$ be the root of a binary tree of height $\log N$ where the seed in each internal node is used to generate the seeds of it's two descendants. The value of the $i$ th leaf is labeled as $s d_{e, i}$ and used to generate the randomness of party $P_{i}$.

Now, when the verifier chooses a random $\bar{i}_{e} \in[N]$ as its challenge, instead of sending him sd ${ }_{e, i}$ for each $i \in[N] \backslash\left\{\bar{i}_{e}\right\}$, it suffices to send just $\log N$ seeds. Specifically, $\mathcal{P}$ can iterate over the tree beginning with the root, and for each node $j$, where $\operatorname{sd}_{e, \bar{i}_{e}}$ is not in the induced sub-tree rooted at $j$, send the seed of $j$ to the verifier. Once a seed has been chosen to be sent, $\mathcal{P}$ does not proceed to traverse in the sub-tree rooted in $j$ but will descend into the other direction. Overall, in the first protocol, the communication induced by sending the seeds in $M-\tau$ emulations is then reduced from $(M-\tau) \cdot(N-1)$ seeds to $(M-\tau) \log N$, whereas in the second protocol it is reduced from $M \cdot(N-1)$ seeds to $M \log N$, which can be significant when the number of parties is large.

Batch verification in $\Pi_{\mathrm{sac}}$. In $\Pi_{\mathrm{sac}}$ each multiplication triple is being verified separately. In order to save communication it is possible to batch-verify all triples by taking a linear combination of all $\llbracket v \rrbracket \mathrm{~s}$ and open only the result. Specifically, given a batch of triples $\left(\llbracket x_{1} \rrbracket, \llbracket y_{1} \rrbracket, \llbracket z_{1} \rrbracket\right), \ldots,\left(\llbracket x_{m} \rrbracket, \llbracket y_{m} \rrbracket, \llbracket z_{m} \rrbracket\right)$ to verify using a batch of random triples $\left(\llbracket a_{1} \rrbracket, \llbracket b_{1} \rrbracket, \llbracket c_{1} \rrbracket\right), \ldots,\left(\llbracket a_{m} \rrbracket, \llbracket b_{m} \rrbracket, \llbracket c_{m} \rrbracket\right)$ (the same applies to squares), the parties first compute $\llbracket v_{k} \rrbracket=\epsilon_{k} \cdot \llbracket z_{k} \rrbracket-\llbracket c_{k} \rrbracket+\alpha_{k} \cdot \llbracket b_{k} \rrbracket+\beta \cdot \llbracket a_{k} \rrbracket-\alpha_{k} \cdot \beta_{k}$ for each $k \in[m]$ (as in Lemma 2). Then, they jointly generate public random coefficients $\gamma_{1}, \ldots, \gamma_{m} \in \mathbb{F}$ and locally compute $\llbracket v \rrbracket=\sum_{i=1}^{m} \gamma_{k} \cdot \llbracket v_{k} \rrbracket$. Finally, the parties open $\llbracket v \rrbracket$ and check equality to 0 . If $v_{k}=0$ for all $k \in[m]$ then obviously $v=0$ as well. In contrast, if there exists $k \in[m]$ such that $v_{k} \neq 0$, then $v=0$ with probability $1 /|\mathbb{F}|$. This is summed up in the following two propositions.

Proposition 1. Let $\left(\llbracket x_{1} \rrbracket, \llbracket y_{1} \rrbracket, \llbracket z_{1} \rrbracket\right), \ldots,\left(\llbracket x_{m} \rrbracket, \llbracket y_{m} \rrbracket, \llbracket z_{m} \rrbracket\right)$ and $\left(\llbracket a_{1} \rrbracket, \llbracket b_{1} \rrbracket, \llbracket c_{1} \rrbracket\right), \ldots,\left(\llbracket a_{m} \rrbracket, \llbracket b_{m} \rrbracket, \llbracket c_{m} \rrbracket\right)$ be two lists of $m$ triples. If there exists $\hat{k} \in[m]$ such that $\left(\llbracket x_{\hat{k}} \rrbracket, \llbracket y_{\hat{k}} \rrbracket, \llbracket z_{\hat{k}} \rrbracket\right)$ or $\left(\llbracket a_{\hat{k}} \rrbracket, \llbracket b_{\hat{k}} \rrbracket, \llbracket c_{\hat{k}} \rrbracket\right)$ is incorrect, then the parties output acc in the batch verification protocol with probability at most $\frac{2}{|\mathbb{F}|}$.

Proof. If the parties output acc, this means that $v=\sum_{k=1}^{m} \gamma_{k} \cdot v_{k}=0$. From Lemma 2 it follows that $v_{\hat{k}}=0$ with probability $\frac{1}{|F|}$ and so $v_{\hat{k}} \neq 0$ with probability $1-\frac{1}{|F|}$. In the latter case, $v=0$ iff $\gamma_{\hat{k}}=\left(-\sum_{\substack{k=1 \\ k \neq \hat{k}}}^{m} \gamma_{k} \cdot v_{k}\right) \cdot\left(v_{\hat{k}}\right)^{-1}$ which happens with probability $1 /|\mathbb{F}|$ since $\gamma_{\hat{k}}$ is chosen uniformly from $\mathbb{F}$. Thus, we have that the overall probability that the parties output acc is bounded by $\frac{1}{|F|}+\left(1-\frac{1}{|F|}\right) \frac{1}{|F|}<\frac{2}{|F|}$ as required.

Proposition 2. Let $\left(\llbracket x_{1} \rrbracket, \llbracket z_{1} \rrbracket\right), \ldots,\left(\llbracket x_{m} \rrbracket, \llbracket z_{m} \rrbracket\right)$ and $\left(\llbracket b_{1} \rrbracket, \llbracket d_{1} \rrbracket\right), \ldots,\left(\llbracket b_{m} \rrbracket, \llbracket d_{m} \rrbracket\right)$ be two lists of $m$ squares. If there exists $\hat{k} \in[m]$ such that $\left(\llbracket x_{\hat{k}} \rrbracket, \llbracket z_{\hat{k}} \rrbracket\right)$ or $\left(\llbracket b_{\hat{k}} \rrbracket, \llbracket d_{\hat{k}} \rrbracket\right)$ is incorrect, then the parties output acc in the batch verification protocol with probability of at most $\frac{3}{|F|}$.

Proof. From Lemma 3 it follows that $v_{\hat{k}}=0$ with probability $\frac{2}{|F|}$. Thus, the statement follows from exactly by the same argument as in the proof of Proposition 1.

Plugging in the batch verification procedure, $\Pi_{\mathrm{sac}}$ is changed so that the random coefficients for the linear combination are chosen by the verifier and handed to the prover as and additional
challenge. Specifically, in Round 2, the verifier picks a random coefficient $\gamma_{e, k}$ for each instance $e$ and each multiplication/square gate, and hands it to the prover in addition to the random elements $\epsilon_{e, k}$ that are used inside the verification procedure. Then, in Round 3 the prover simulates the verification procedure for each multiplication/square gate and then simulates each party $i$ locally taking the linear combination $v_{e, i}=\sum_{k} \gamma_{e, k} \cdot v_{e, k, i}$, where $v_{e, k, i}$ is its share of $v_{e, k}$. Finally, $\mathcal{P}$ sets view $_{e} \leftarrow \operatorname{view}_{e}\left\|v_{e, 1}\right\| \cdots \| v_{e, N}$. This significantly reduces the communication of the protocol, since once the verifier chooses the one party $\bar{i}_{e}$ whose view is not opened in execution $e$, the prover does not need to send the share of party $\bar{i}_{e}$ for each $v_{k, e}$, but rather only its share of $v_{e}$, since only one single value is opened and checked for the entire circuit. On the other hand, observe that using this optimization also affects the soundness of the protocol, as the probability of not being caught in the verification procedure is now increased to $\frac{2}{|F|}$ for cheating in multiplication gates and $\frac{3}{|F|}$ for square gates (observe that once again manipulating the output of square gates would be more beneficial for the prover). Thus, if there are square gates in the circuit, the updated cheating probability is bounded by

$$
\xi_{\mathrm{sac}}(M, N)=\left(\frac{3}{|\mathbb{F}|}+\left(1-\frac{3}{|\mathbb{F}|}\right) \cdot \frac{1}{N}\right)=\left(\frac{3 N+|\mathbb{F}|-3}{|\mathbb{F}| \cdot N}\right)
$$

whereas if there are multiplication gates in the circuit (and no square gates), the updated cheating probability is bounded by

$$
\xi_{\mathrm{sac}}(M, N)=\left(\frac{2}{|\mathbb{F}|}+\left(1-\frac{2}{|\mathbb{F}|}\right) \cdot \frac{1}{N}\right)=\left(\frac{2 N+|\mathbb{F}|-2}{|\mathbb{F}| \cdot N}\right) .
$$

Batching the output correctness check. Similarly to this previous optimization, we can reduce communication by verifying the correctness of the circuit's output in a batched manner, i.e., take a random linear combination of all outputs before sending it to $\mathcal{V}$. Recall that in both $\Pi_{\mathrm{c} \& \mathrm{c}}, \Pi_{\mathrm{sac}}, \mathcal{V}$ checks for each output wire $k \in\left[n_{\text {out }}\right]$ in each execution $e$, that the shares $\left\{o_{e, k, i}\right\}_{i=1}^{N}$ add up to the output $y_{k}$ that should be on that wire. This is the same as checking that $\sum_{i=1}^{N} o_{e, k, i}-y_{k}=0$. Thus, we can take a random linear combination $\sum_{k \in\left[n_{\text {out }}\right]} \gamma_{e, k} \cdot\left(\sum_{i=1}^{N} o_{e, k, i}-y_{k}\right)$ with values $\gamma_{e, k}$ chosen by $\mathcal{V}$ and check that the result equals 0 . This reduces communication, because now the prover is required to send the verifier only one single output share of party $\bar{i}_{e}$ instead of a share per each output wire (recall that for the other parties their entire view is revealed and thus the share on the output wires can be computed by the verifier).

In order to plug this idea into $\Pi_{c \& c}$ while maintaining security, we need to add another round where, after the view during the circuit computation is committed, the verifier chooses randomly the random coefficients $\gamma_{e, k}$ for each execution $e$ and output wire $k$ and hands them to the prover who then computes for each party $i$ the random linear combination of its shares. Specifically, for each party $i$ with shares $\left\{o_{e, k, i}\right\}_{k \in\left[n_{\text {out }}\right]}$, it computes $o_{e, i}=\sum_{k \in\left[n_{\text {out }}\right]} \gamma_{e, k} \cdot o_{e, k, i}$, and then commits to $o_{e, 1}\|\cdots\| o_{e, N}$ and sends the commitment to the verifier. Only then, the verifier chooses the party $\bar{i}_{e}$, whose view is kept secret and hands it to the prover. Then, in the final round, the prover sends all the data specified in the protocol's description with the exception being that instead of sending $\left\{o_{e, k, \bar{i}_{e}}\right\}_{k \in\left[n_{\text {out }}\right]}$ for each execution $e$, it suffices to send $o_{e, \bar{i}_{e}}$ only. The verifier then computes $o_{e, i}$ for each $i \in[N] \backslash\left\{\bar{i}_{e}\right\}$, and using $o_{e, \bar{i}_{e}}$ checks that $\sum_{i=1}^{N} o_{e, i}-\sum_{k \in\left[n_{\text {out }}\right]} \gamma_{e, k} \cdot y_{k}=0$ for each execution $e$.

For $\Pi_{\text {sac }}$ no additional rounds are required. After the prover $\mathcal{P}$ has committed in the first round to the shares of the parties on all the wires of the circuit, $\mathcal{V}$ can send the random coefficients along
with the challenges it sends for the verification procedure. The prover $\mathcal{P}$ then computes the linear combination of the output shares and commits to the obtained shares as explained above.

As in the previous optimization, we need to update the soundness of the two protocols, since taking a random linear combination can result with having 0 , even though the sharing on the output wires were not correct (this happens with probability $\frac{1}{|F|}$ as in the previous optimization). In $\Pi_{c \& c}$ we denote by $c_{1}$ the number of pre-processings corrupted by the prover and by $c_{2}$ the number of emulations where the prover cheats in computing the view of one of the parties. In the remaining executions, the prover do not cheat (but only uses an invalid witness), hoping that the output verification will succeed due to the random linear combination. Then, we have that the cheating probability is

$$
\xi_{c \& c}(M, N, \tau)=\max _{\substack{0 \leq c_{1} \leq M-\tau \\ 0 \leq c_{2} \leq M-\tau-c_{1}}}\left\{\frac{\binom{M-c_{1}}{\tau}}{\binom{M}{\tau} \cdot N^{c_{2}} \cdot|\mathbb{F}|^{M-\tau-c_{1}-c_{2}}}\right\}
$$

We remark that in all our instantiations, it always hold that $|\mathbb{F}|>N$ and thus the best strategy for a cheating prover is to set $c_{2}=M-\tau-c_{1}$, which means that the cheating probability remains the same as before.

For $\Pi_{\text {sac }}$, we argue that the cheating probability $\xi_{\text {sac }}$ remains the same. This holds since the current optimization is independent of the verification of multiplications/squares process, meaning that both generate different outputs which the verifier checks in the last round. In particular, when the prover cheats in one of the multiplications/squares, it can get away with it only by receiving the "correct" challenge for the verification process or by changing one view, exactly as before. Thus, for each instance, the prover can either manipulate the output of multiplication/square gates or act honestly with an invalid witness and hope that taking the random linear combination of the incorrect outputs will equal to the random linear combination of the publicly known outputs. Therefore, the cheating probability for each instance is bounded by $\max \left\{\frac{2 N+|\mathbb{F}|-2}{|\mathbb{F}| \cdot N}, \frac{1}{|\mathbb{F}|}\right\}$ (in the case where the circuit consists of multiplication gates and no square gates; the analysis is similar for the opposite case). However, observe that $\frac{2 N+|\mathbb{F}|-2}{|\mathbb{F}| \cdot N}>\frac{1}{|\mathbb{F}|}($ since $N+|\mathbb{F}|-2>0)$, and thus the soundness remains the same as before.

To sum-up the discussion, this optimization does not increase the soundness error of our protocols. However, the number of rounds is increased in the first protocol.

### 3.5 Communication and Computation Cost Analysis

In this section, we estimate the cost of our two protocols. The analysis includes all three optimizations described in the previous section. We denote by |hash|, |sd| and |com| the length of the hash values, seeds and commitments.

Computation cost. By inspecting both $\Pi_{\text {sac }}$ and $\Pi_{c \& c}$ one sees that for each multiplication gate $O(M \cdot N)$ multiplications in $\mathbb{F}$ must be computed. In practice, their runtime dominates over those of the additions over $\mathbb{F}$ which can be optimized by carrying out multiple $\mathbb{F}$-additions over the integers before applying a modular reduction. For large enough $\mathbb{F}$ we have that $\xi_{\text {sac }}(M, N) \approx(1 / N)^{M}$, and so for statistical security parameter $\kappa$ we have $M \cdot \log N=\kappa$ which means that we will approximately have to perform $O(\kappa \cdot(N / \log N) \cdot|C|)$ multiplications both at proving and verification time, but only over the field over which $C$ is actually defined.

Communication cost for $\Pi_{\mathrm{c} \& \mathrm{c}}$. The communication cost of messages sent from $\mathcal{P}$ to $\mathcal{V}$ in each round is: (1) Round 1: 2|hash|; (2) Round 3: $\tau \cdot|\mathrm{sd}|+(M-\tau) \cdot|\operatorname{com}|+|h a s h| ; ~(3)$ Round 5: |sd| + $(M-\tau) \cdot\left(\log N \cdot|\mathrm{sd}|+|\operatorname{com}|+3 \log _{2}(|\mathbb{F}|) \cdot n_{\text {mul }}+2 \log _{2}(|\mathbb{F}|) \cdot n_{s q}+\log _{2}(|\mathbb{F}|) \cdot n_{\text {in }}+\log _{2}(|\mathbb{F}|)\right)$.

Summing the above, we obtain that the overall communication cost incurred by messages sent by $\mathcal{P}$ is
|hash $|\cdot 3+|$ sd $\left|\cdot(\tau+1+\log N(M-\tau))+|\operatorname{com}| \cdot 2(M-\tau)+\log _{2}(|\mathbb{F}|) \cdot(M-\tau)\left(3 n_{m u l}+2 n_{s q}+n_{\text {in }}+1\right)\right.$.
Communication cost for $\Pi_{\mathrm{sac}}$. The communication cost of messages sent from $\mathcal{P}$ to $\mathcal{V}$ in each round is: (1) Round 1: |hash|; (2) Round 3: |hash|; (3) Round 5: $|\mathrm{sd}|+M \cdot(\mid$ sd $|+\log N \cdot| \mathrm{sd}|+|$ com $\mid+$ $4 \log _{2}(|\mathbb{F}|) \cdot n_{m u l}+3 \log _{2}(|\mathbb{F}|) \cdot n_{s q}+\log _{2}(|\mathbb{F}|)+\log _{2}(|\mathbb{F}|) \cdot n_{\text {in }}+\log _{2}(|\mathbb{F}|)$.

We obtain that the overall cost of communication sent from $\mathcal{P}$ 's side is

$$
\mid \text { hash }|\cdot 2+| \text { sd }\left|\cdot(2+M \log N)+|\operatorname{com}| \cdot M+\log _{2}(|\mathbb{F}|) \cdot M\left(4 n_{\text {mul }}+3 n_{s q}+n_{\text {in }}+2\right)\right.
$$

Asymptotically, by setting |hash $|=|$ sd $\left|=|\operatorname{com}|=O(\lambda), \log _{2}(|\mathbb{F}|)=O(\log (\lambda))\right.$ and $M, N$ as above we get that the communication cost of $\mathcal{P}$ is $O(\log (\lambda) \cdot \kappa \cdot(|C| / \log (N)))$.

One might be tempted to draw conclusions from the above expressions regarding which of the protocols is more communication-efficient. However, recall that the soundness error of the protocols is not the same. This means that different values will be chosen for the parameters $M$ and $N$ in each of the protocols, thereby affecting the proof size in different ways. In fact, as we will see in Section 6 , the soundness error of $\Pi_{\text {sac }}$ allows having smaller parameters for $M, N$ (e.g., for the same $N$ in both protocols, a smaller $M$ can be taken for $\Pi_{\mathrm{sac}}$ ), thus reducing the overall communication cost.

## 4 Sampling Circuits on the Fly

At the end of the previous section we introduced an optimization where the verifier $\mathcal{V}$ checks the circuit's output correctness by looking only at a linear combination of the outputs instead of checking each output separately. In particular, this is done by having $\mathcal{V}$ choosing the random coefficients which will be used to compute the linear combination after the prover $\mathcal{P}$ commits to the view of the parties during the circuit's computation. This process can also be viewed as an interaction where the parties determine the final circuit's structure during the execution, as here the challenge chosen by $\mathcal{V}$ adds a layer on top of the initial circuit which consists of 'multiplication-by-a-constant' and addition gates.

This idea, which we call "sampling the circuit on the fly" will be also used in some of the optimizations suggested for the application presented in Section 5. Thus, we take a pause in this section to formally establish this idea, so that security of all optimizations of this kind can be derived easily without the need to re-prove security each time.

Although in the above optimization of linear combination the verifier chooses solely the circuit that will be evaluated, we consider a more general definition where both the prover and the verifier sample the circuit together from a set of possible circuits. The sampling process must begin only after the prover have committed and fixed the witness that will be used. This means that from this point on any form of cheating is possible only during the simulation of the MPC protocol to compute the sampled circuit, as the witness cannot be tailored to the circuit which will later be used. We remark that although the circuit will be jointly sampled by both parties, we restrict the
sampling done by $\mathcal{V}$ to be independent of the messages of $\mathcal{P}$ and to not require him to keep a secret state. This makes it possible to still apply the Fiat-Shamir transform in order to make the protocol non-interactive. The prover $\mathcal{P}$, in contrast, will be allowed to make his choice depending on the witness that it committed or on other messages. At the same time, the choice of $\mathcal{P}$ should not allow him to break the soundness of the protocol or the zero-knowledge property.

The section is organized as follows. We first provide a formal definition for the notion of circuit sampling. Then, we show how to incorporate it into our argument system and argue that the obtained system remains a zero-knowledge argument of knowledge. Finally, we show how the output linear combination optimization described above is indeed an instantiation of the general notion.

### 4.1 Definition of Circuit Sampling

First, we define the notion of circuit sampling for an NP relation.
Definition 3 ( $R$-circuit Sampler). Let $R$ be an NP relation and $S_{\mathcal{P}}, S_{\mathcal{V}}$ be two non-empty sets that can be described with a string of polynomial length (in the security parameter $\lambda$ ). For $(x, w) \in R$ we say that Sample $=($ ExtWitness, Response, SampCircuit) is an $R$-circuit sampler if

ExtWitness is a PPT algorithm which on input $(x, w) \in R$ outputs an extended witness $\overline{\boldsymbol{w}}$.
Response is a PPT algorithm which on input ( $x, w, \overline{\boldsymbol{w}}, \tau_{\mathcal{V}}$ ) outputs a configuration $\tau_{\mathcal{P}}$.
SampCircuit is a deterministic algorithm which on input ( $x, \tau \mathcal{V}, \tau_{\mathcal{P}}$ ) outputs a circuit $C$ as well as a description of a set $Y$.

We next define a security game which follows the way we embed these algorithms into our argument system. Consider the following game, which we denote by $\operatorname{Game}_{R, \mathcal{P}}\left((x, w), S_{\mathcal{P}}, S_{\mathcal{V}}, \lambda\right)$, executed with a prover $\mathcal{P}$ :

1. $\mathcal{P}$ outputs $\overline{\boldsymbol{w}}$.
2. Choose a random $\tau_{\mathcal{V}} \leftarrow S_{\mathcal{V}}$ and hand it to $\mathcal{P}$.
3. $\mathcal{P}$ outputs $\tau_{\mathcal{P}} \in S_{\mathcal{P}}$.
4. Use the $R$-circuit sampling algorithm to compute $(C, Y) \leftarrow \operatorname{SampCircuit}\left(x, \tau_{\mathcal{P}}, \tau_{\mathcal{V}}\right)$.
5. Output 1 iff $C(\overline{\boldsymbol{w}}) \in Y$.

To understand the game, observe that Step 1 emulates the commitment to the witness, made by $\mathcal{P}$ in the first step of our argument systems, in Step 2 a challenge is chosen which is followed by the configuration chosen by $\mathcal{P}$ in Step 3. Once all the input for the SampCircuit algorithm is gathered, $(C, Y)$ are being determined, and $\mathcal{P}$ wins if computing the circuit $C$ on $\overline{\boldsymbol{w}}$ yields a valid output. Note that in the above definition there is no validation ensuring that the message $\tau_{\mathcal{P}}$ that is sent in the game is valid. This can be done by SampCircuit outputting $Y=\emptyset$ for an invalid choice of $\tau_{\mathcal{P}}$.

We have three requirements from the circuit sampler. First, an obvious requirement is that if the prover uses the correct $w$ and chooses $\tau_{\mathcal{P}}$ honestly, then the output of the game should be 1 (except for a negligible probability).

Definition 4 (Correct $R$-circuit Sampler). Let Sample be an $R$-circuit sampler. We say that Sample is correct, if when $\mathcal{P}$ on input $(x, w)$ computes $\overline{\boldsymbol{w}} \leftarrow \operatorname{ExtWitness}(x, w)$ and $\tau_{\mathcal{P}} \leftarrow \operatorname{Response}\left(x, w, \overline{\boldsymbol{w}}, \tau_{\mathcal{V}}\right)$, it holds that $\mathrm{Game}_{R, \mathcal{P}}\left((x, w), S_{\mathcal{P}}, S_{\mathcal{V}}, \lambda\right)=1$ with probability negligibly close to 1 .

The second property required from the circuit sampler is soundness. Similarly to the standard definition of this notion, we require that if the prover wins in the above game with probability larger than $\alpha$, then it will be possible to extract the correct witness.

Definition 5 ( $\alpha$-sound $R$-circuit Sampler). Let Sample be an $R$-circuit sampler. We say that Sample is $\alpha$-sound if given $\operatorname{Pr}\left[\operatorname{Game}_{R, \mathcal{P}}\left((x, w), S_{\mathcal{P}}, S_{\mathcal{V}}, \lambda\right)=1\right]>\alpha$ (where the distribution is over $\left.\tau_{\mathcal{V}} \in S_{\mathcal{V}}\right)$, there exists a deterministic PPT extractor $\mathcal{E}(\overline{\boldsymbol{w}})$ which outputs $w^{\prime}$ such that $\left(x, w^{\prime}\right) \in R$ with probability 1.

While the definition looks similar to that of knowledge soundness used for proofs of knowledge, there are crucial differences: the extraction is from $\overline{\boldsymbol{w}}$, and it is done in polynomial time and with probability 1 . This is because extraction of a candidate witness $w^{\prime}$ from $\overline{\boldsymbol{w}}$ is an "easy" task (as we will see in all our circuit sampling uses) and so the only question is whether $w^{\prime}$ is a valid witness or not. The definition thus says that if $\mathcal{P}$ wins with probability higher than $\alpha$, then it must have used the correct witness $w$ to compute $\overline{\boldsymbol{w}}$

Finally, we also need to ensure that the additional interaction does not leak any information about $w$. This is formalized in the standard way of requiring the existence of a simulator who can output an indistinguishable transcript without knowing $w$. Clearly, the message $\tau_{\mathcal{P}}$ should not reveal any information about $\overline{\boldsymbol{w}}$ to an outsider. However, we additionally need simulatability of $C(\overline{\boldsymbol{w}})$ : the sampled circuit may enforce the relation $R$ in different ways than a static circuit would do (thus the set $Y$ ) and this could potentially leak information. We will see an occurrence of this phenomenon in one of the optimizations which we present later, where we use rejection sampling inside the circuit.

Definition 6 (Simulatable $R$-circuit Sampler). Let $(x, w) \in R$ and Sample be an $R$-circuit sampler. Then, there exists a PPT algorithm $\mathcal{S}$ such that

$$
\left\{\left(\tau_{\mathcal{P}}, C(\overline{\boldsymbol{w}})\right) \leftarrow \mathcal{P}\left(x, w, \tau_{\mathcal{V}}\right)\right\} \approx_{s}\left\{\left(\tau_{\mathcal{P}}, C(\overline{\boldsymbol{w}})\right) \leftarrow \mathcal{S}\left(x, \tau_{\mathcal{V}}\right)\right\}
$$

where $\mathcal{P}$ assume to act honestly as in Definition 4.

### 4.2 Circuit Sampling and the Zero-Knowledge Argument.

We now include the above approach into the protocol $\Pi_{\text {sac }}$ due to the simplicity of the analysis and leave an adaptation of the first protocol as future work.

The modified protocol $\Pi_{\text {sac }}^{\text {samp }}$ works as follows, where we only highlight the additional steps:
Round 1: For each $e \in[M]$ (i.e. each MPC instance) $\mathcal{P}$ computes $\overline{\boldsymbol{w}}_{e} \leftarrow \operatorname{ExtWitness}(x, w)$. Then, it chooses the randomness used for the execution $e$ (i.e., the seeds used to derive all randomness). Finally, $\mathcal{P}$ commits to the extended witness and the randomness and send it to $\mathcal{V}$
Round 2: For each $e \in[M] \mathcal{V}$ samples $\tau_{\mathcal{V}, e}$ as in Step 2 of the above game. It then sends $\tau_{\mathcal{V}, 1}, \ldots, \tau_{\mathcal{V}, M}$ to $\mathcal{P}$.
Round 3: For each $e \in[M] \mathcal{P}$ locally computes $\tau_{\mathcal{P}, e} \leftarrow \operatorname{Response}\left(x, w, \overline{\boldsymbol{w}}_{e}, \tau_{\mathcal{V}, e}\right)$ as well as $\left(C_{e}, Y_{e}\right) \leftarrow \operatorname{SampCircuit}\left(x, \tau_{\mathcal{P}, e}, \tau_{\mathcal{V}, e}\right)$. It uses $C_{e}$ in for MPC protocol instance $e$ and sends the remaining first round messages together with $\tau_{\mathcal{P}, e}$ to $\mathcal{V}$.
Round 4: $\mathcal{V}$ runs round 2 as in the regular protocol.
Round 5: $\mathcal{P}$ runs round 3 as in the regular protocol.

Round 6: $\mathcal{V}$ runs round 4 as in the regular protocol.
Round 7: $\mathcal{P}$ runs round 5 as in the regular protocol.
Output: Upon receiving the last message, $\mathcal{V}$ recomputes $\left(C_{e}, Y_{e}\right) \leftarrow \operatorname{SampCircuit}\left(x, \tau_{\mathcal{P}, e}, \tau_{\mathcal{V}, e}\right)$ for each $e \in[M]$, verifies the MPC transcripts for the individual $C_{e}$ and then tests that each output lies in $Y_{e}$.

We next prove that protocol $\Pi_{\text {sac }}^{\text {samp }}$ is an honest verifier zero-knowledge argument of knowledge.

Lemma 6. Let $R$ be an NP-relation and Sample be a correct, $\alpha$-sound and simulatable $R$-circuit sampler. Then $\Pi_{\mathrm{sac}}^{\text {samp }}$ is an honest verifier zero-knowledge argument of knowledge that is statistically complete for the relation $R$ with knowledge error (soundness) $\left(\alpha+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N)\right)^{M}$.

Recall that $\xi_{\text {sac }}(N)=\frac{|\mathbb{F}|+N-1}{|\mathbb{F}| \cdot N}$ and so the soundness error of modified protocol, according to the lemma, is $\left(\alpha+(1-\alpha) \cdot \frac{|\mathbb{F}|+N-1}{\mid \mathbb{F} \cdot N}\right)^{M}$.

To prove the lemma, the main change here compared to the proof from Section 3 is that here the circuits are not identical throughout all instances. Fortunately, it turns out that this assumption can be relaxed without hurting the runtime of the extractor. Completeness and the zero-knowledge property, on the other hand, follow directly from the previous security argument.

Proof. We prove the three properties required by the definition in Section 2.
Completeness. $\Pi_{\mathrm{sac}}^{\mathrm{samp}}$ is complete, as Sample is a correct $R$-circuit sampler. We allowed Sample some slack to occasionally abort, but this will only happen with negligibly small probability.

Honest Verifier Zero-Knowledge. We can construct a simulator $\mathcal{S}^{\prime}$ for the new protocol as follows: first, $\mathcal{S}^{\prime}$ samples all messages of $\mathcal{V}$ as in the protocol, together with all the $\tau_{\mathcal{V}, e}$ which it chooses honestly. Next, $\mathcal{S}^{\prime}$ samples $\left(\tau_{\mathcal{P}, e}, C_{e}\left(\overline{\boldsymbol{w}}_{e}\right)\right)$ using $\mathcal{S}$ from Definition 6 and additionally generates $C_{e} \leftarrow \operatorname{SampCircuit}\left(x, \tau_{\mathcal{V}, e}, \tau_{\mathcal{P}, e}\right)$ for each $e \in[M]$. Then, run the simulator of the original argument of knowledge-scheme, where we hard-wire the outputs $C_{e}(\overline{\boldsymbol{w}})$ from $\mathcal{S}$ into the individual MPC instances.

The Zero-Knowledge property now follows by a hybrid argument: we let $\mathcal{H}_{0}$ be the distribution of transcripts of the protocol. In the first hybrid $\mathcal{H}_{1}$, we use the simulator of the overall argument of knowledge of Section 3 but compute $\left(\tau_{\mathcal{P}, e}, C_{e}\left(\boldsymbol{w}_{e}\right)\right) \leftarrow \operatorname{Response}\left(x, w, \boldsymbol{w}_{e}, \tau_{\mathcal{V}, e}\right)$ as in the protocol. By the proofs from the preceding section $\mathcal{H}_{0} \approx_{s} \mathcal{H}_{1}$. Next, we define a sequence of hybrids $\mathcal{H}_{2}^{i}$ for $i=0, \ldots, M$. In hybrid $\mathcal{H}_{2}^{i}$ we do the same as in $\mathcal{H}_{1}$, but replace ( $\tau_{\mathcal{P}, e}, C_{e}\left(\boldsymbol{w}_{e}\right)$ ) in the first $i$ instances by the output of $\mathcal{S}$ from Definition 6 .

By definition, we then have $\mathcal{H}_{1}=\mathcal{H}_{2}^{0}$. For each $i \in[M]$ we have $\mathcal{H}_{2}^{i-1} \approx_{s} \mathcal{H}_{2}^{i}$ by the definition of $\mathcal{S}$ as we only change one value. Finally, observe that $\mathcal{S}^{\prime}$ outputs the same distribution as $\mathcal{H}_{2}^{M}$ and thus the zero-knowledge property follows.

Knowledge Soundness. Since our proof here is only an adaptation of the proof of Theorem 3, we only highlight the changes that must be inserted to it.

The proof of Theorem 3 consists of three steps: (i) Show that for $M=1$ any $\delta(x)>\xi_{\text {sac }}(N)$ implies that a correct witness must have been committed. (ii) Show that this generalizes to $M>1$ executions, so that $\delta(x)>\xi_{\text {sac }}(M, N)$ means that at least one of the $M$ instances has a correct witness committed. (iii) Show that by sending different challenges, it is possible to extract the witness from at least one of the MPC instances with a correct witness.

The first step, where $M=1$ follows from the proof of Lemma 4 and the soundness of the sampler. Specifically, assume that the success probability $\delta(x)$ of the prover is higher than $\alpha+(1-\alpha) \cdot \xi_{\text {sac }}(N)$ and assume that it didn't commit to the correct witness $\boldsymbol{w}$ (and so the extended witness $\overline{\boldsymbol{w}}$ is also incorrect). Let $K$ be the set of circuits that might be returned by SampCircuit for a fixed $x$ and $\overline{\boldsymbol{w}}$. Then we have

$$
\begin{aligned}
\operatorname{Pr}[\mathrm{acc}] & =\operatorname{Pr}[\operatorname{acc} \mid C(\overline{\boldsymbol{w}}) \in Y] \cdot \operatorname{Pr}[C(\overline{\boldsymbol{w}}) \in Y]+\operatorname{Pr}[\operatorname{acc} \mid C(\overline{\boldsymbol{w}}) \notin Y] \cdot \operatorname{Pr}[C(\overline{\boldsymbol{w}}) \notin Y] \\
& =\alpha+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N)
\end{aligned}
$$

where the distribution is over $K$ and over the challenges of the verifier. This contradicts the assumption that $\delta(x)=\operatorname{Pr}[\mathrm{acc}]>\alpha+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N)$ and thus the prover must have committed to the correct witness.

Next, the above can be easily generalized to any $M>1$. That is, if we have $M>1$ executions, then $\delta(x)>\left(\alpha+(1-\alpha) \cdot \xi_{\text {sac }}(N)\right)^{M}$ implies that there is at least one execution where the prover have committed to a correct witness. This follows from the same argument as in the proof of Theorem 3 which led to Corollary 1 , with the only difference being that here the space of possible changes is larger and includes $|K|$.

Going into the third step of the proof, we can define the extractor $\mathcal{E}$ in the same way as in the proof of Theorem 3. Informally speaking, $\mathcal{E}$ first probe challenges at random till a first accepting transcript is found. Then, $\mathcal{E}$ runs $M$ processes in parallel where process $j$ runs until a second accepting transcript has been found where the second challenge in execution $j$ is different from the challenge in the first accepting transcript. Holding two accepting transcripts where the second challenge is different is sufficient for extracting the committed witness (as $\mathcal{E}$ has now the input shares of all parties), and so if the correct witness was found by one of the processes, then $\mathcal{E}$ halts (recall that once a correct extended witness $\overline{\boldsymbol{w}}$ is found then by the soundness of the sampler, $\boldsymbol{w}$ is computed in polynomial time).

Assume that $\delta(x)=\operatorname{Pr}[\mathrm{acc}]=\left(\alpha+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N)\right)^{M}+\varepsilon$ for some $\varepsilon>0$. Then, the first step requires $1 / \delta(x)<1 / \varepsilon$ expected number of steps. For the second step, we show that there exists an execution $j \in[M]$ with the correct witness for which each attempt in the second step of $\mathcal{E}$ succeeds with probability $>\varepsilon / M$, implying that the correct witness will be found in the second step within expected $M / \varepsilon$ number of steps.

Assume without loss of generality that the first $k$ executions are the executions with the correct witness and assume in contradiction that the probability of finding the desired second accepting transcript in each of them is $\leq \varepsilon / M$. Then, using exactly the same argument as in Lemma 5 (see Eq. 6 and 7) we have

$$
\delta(x)=\operatorname{Pr}[\mathrm{acc}] \leq\left(\alpha+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N)\right)^{M-k} \cdot \frac{1}{N^{k}}+\varepsilon
$$

To complete the proof, we therefore need to show that

$$
\left(\alpha+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N)\right)^{M-k} \cdot \frac{1}{N^{k}}<\left(\alpha+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N)\right)^{M} .
$$

This is equivalent to showing that

$$
1<\left(\alpha+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N)\right)^{k} \cdot N^{k}
$$

which holds if

$$
1<\left(\alpha+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N)\right) \cdot N=\alpha \cdot N+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N) \cdot N
$$

Recall that $\xi_{\text {sac }}(N)=\frac{N+|\mathbb{F}|-1}{N \cdot|\mathbb{F}|}$, and so we need to show that

$$
1<\alpha \cdot N+(1-\alpha) \cdot \frac{N+|\mathbb{F}|-1}{|\mathbb{F}|}
$$

Observe that $1<\frac{N+|\mathbb{F}|-1}{|\mathbb{F}|}<N$ (where the latter follows since $|\mathbb{F}|>1$ and $N>1$ ). Thus, it follows that

$$
1<\frac{N+|\mathbb{F}|-1}{|\mathbb{F}|}=\alpha \cdot \frac{N+|\mathbb{F}|-1}{|\mathbb{F}|}+(1-\alpha) \cdot \frac{N+|\mathbb{F}|-1}{|\mathbb{F}|}<\alpha \cdot N+(1-\alpha) \cdot \frac{N+|\mathbb{F}|-1}{|\mathbb{F}|}
$$

which is exactly what we wanted to show.
We conclude that $\delta(x)=\operatorname{Pr}[\operatorname{acc}]<\left(\alpha+(1-\alpha) \cdot \xi_{\mathrm{sac}}(N)\right)^{M}+\varepsilon$ in contradiction to our assumption. This means that our extractor can find the correct witness within expected number of $O(M|x| / \epsilon)$ steps, exactly as in the proof of Theorem 3 . This concludes the proof.

### 4.3 Batched Output Correctness Check as a Circuit Sampler

We now revisit the batching of the output verification from Section 3.4 and consider it in the context of circuit sampling. Recall that in this optimization, the verifier chooses random coefficients that are used to compute the linear combination of the outputs, so that only one value is eventually opened and checked instead of checking the correctness for each output wire of the original circuit.

```
Let \(C=\left(n_{\text {in }}, n_{\text {out }}, n_{C}, L, R, F\right)\) be a circuit over \(\mathbb{F}\).
ExtWitness: On input \((x=(C, \boldsymbol{y}), \boldsymbol{w})) \in R\) set \(\overline{\boldsymbol{w}}:=\boldsymbol{w}\).
SampCircuit: On input \(\tau_{\mathcal{V}}=(\gamma) \in \mathbb{F}^{n_{\text {out }}}\) output the circuit \(C^{\prime}\) which performs the following:
    1. Compute \(\boldsymbol{y}^{\prime}=C(\overline{\boldsymbol{w}})\) where \(\boldsymbol{y}^{\prime} \in \mathbb{F}_{\text {out }}^{n}\).
    2. Compute \(y_{1}=\sum_{i=1}^{n_{\text {out }}} \gamma[i] \cdot\left(\boldsymbol{y}^{\prime}[i]-\boldsymbol{y}[i]\right)\).
    3. Output \(y_{1}\).
Furthermore output the set \(Y=\{(0)\}\).
Response: Output 1.
```

Fig. 3: Batching the Output Check as a Circuit Sampler

We first define the three algorithms of the circuit sampler for this optimization: ExtWitness receives $((C, \boldsymbol{y}), \boldsymbol{w})$ as an input and returns the extended witness $\overline{\boldsymbol{w}}$, which in this case is just $\boldsymbol{w}$. Response receives as an input the tuple $\left((C, \boldsymbol{y}), \boldsymbol{w}, \overline{\boldsymbol{w}}, \tau_{\mathcal{V}}\right)$, but note that in this optimization, the verifier's challenge $\tau \mathcal{V}$ fully defines the circuit and thus the output of Response is just 1. Finally, SampCircuit receives $\left((C, \boldsymbol{y}), \tau_{\mathcal{V}}, \tau_{\mathcal{P}}\right)$ as its input and returns the circuit $C^{\prime}$ and the set $Y$ defined in the following way. The circuit $C^{\prime}$ consists of the original circuit $C$ and the following layers which are added on top of it: (i) subtraction gates for subtracting each value on an output wire $\boldsymbol{y}^{\prime}[k]$ by the expected public value $\boldsymbol{y}[k]$; (ii) 'multiplication-by-a-constant' gates for each result of the previous
layer, where the constants are defined by $\tau \mathcal{V}$; and (iii) addition gates for summing the results of the previous layer. The set $Y$ consists of one value only - 0 . We summarize the construction in Fig. 3.

The three algorithms defined above satisfies the properties of the Circuit Sampler. Correctness is straightforward. Soundness of the sampler is $\frac{1}{|\mathbb{F}|}$, since if $w$ is incorrect, then $C(w) \in Y$ with probability $\frac{1}{|\mathbb{F}|}$ due to the fact that the random coefficients are uniformly chosen (see Lemma 2). Finally, Simulation of the sampler is also trivial as here both $\tau_{\mathcal{P}}$ and $Y$ are deterministic and known in advance.

A remark about the soundness of the method. The reader might have observed that there is a disagreement between Section 3.4 and Section 4.2 regarding the soundness of the argument system when using the batched output correctness check. Specifically, in Section 3.4 we argued that the soundness of $\Pi_{\text {sac }}$ remains $\xi_{\text {sac }}(N, M)$ as when the optimization is not used (more precisely, the soundness is $\left(\max \left\{\xi_{\text {sac }}(N), \frac{1}{|\mathbb{F}|}\right\}\right)^{M}$ which equals to $\xi_{\text {sac }}(N, M)=\left(\xi_{\text {sac }}(N)\right)^{M}$ since $\left.\xi_{\text {sac }}(N)>\frac{1}{|\mathbb{F}|}\right)$, whereas Lemma 6 from Section 4.2 implies that the soundness is $\left(\frac{1}{|\mathbb{F}|}+\left(1-\frac{1}{|\mathbb{F}|}\right) \xi_{\text {sac }}(N)\right)^{M}$.

To understand the difference, note that the way the optimization is incorporated into the protocols in each of the sections is not the same. In this section, we incorporate the optimization through the general framework of circuit sampling. This means that the MPC protocol is being simulated by the prover only after the circuit was sampled and so when computing multiplication/square gates, the cheating prover knows whether $C(\boldsymbol{w})=\boldsymbol{y}$. Thus, it knows whether cheating in the computation of these gates is required or not. Therefore, in each of the $M$ executions, with probability $\frac{1}{|F|}$ cheating is successful in the circuit sampling step, and with probability $1-\frac{1}{|\mathbb{F}|}$ the prover will have to manipulate the MPC protocol simulation which succeeds with probability $\xi_{\text {sac }}(N)$. However, looking into the optimization (and not just plugging it into the general framework of circuit sampling), observe that it only adds layers on top of the initial circuit and that it does not add any multiplication nor square gate to the circuit. Thus, we can ask the prover to simulate the computation of all multiplication and square gate already in the first step before the additional layers were sampled. In this case, the prover needs to decide in each of the $M$ executions whether to manipulate the MPC computation or not in the first step. Thus, a cheating prover who wishes to maximize its success probability will choose to cheat in the MPC simulation only if the probability to succeed is higher than the probability that a random linear combination of incorrect outputs will yield 0 . This is exactly the way the optimized protocol is described in Section 3.4 and hence the "different" soundness error. We remark that in the other circuit sampling optimizations we will see in the next section, the circuit that is sampled contains also square gates, and thus the prover can delay its decision whether to cheat in the MPC simulation to after the circuit is known, which means that the obtained soundness is indeed as indicated by Lemma 6.

## 5 Proving Knowledge of SIS Instances

The protocols from Section 3 are asymptotically less communication-efficient than previous argument systems such as [AHIV17, $\left.\mathrm{BBC}^{+} 18\right]$. However, they have advantages when the circuit size is not too big and when there are many linear gates in the circuit, because the communication is dominated by the number of non-linear operations in the circuit $C$ and has very small circuit-independent cost. In this section, we exploit this fact to implement communication-efficient arguments of knowledge for different versions of the so-called Short-Integer Solution (SIS) problem.

The section is organized as follows. We begin by presenting an interactive argument for binary secrets which does not allow any slack. The approach can be simply generalized to secrets from a larger interval, but only at the expense of increasing the communication. Then, we introduce some optimizations that allow us to reduce the communication for the suggested arguments and then further squeeze down their size by introducing a slack factor.

Throughout the section, for each argument system we present, we will mention what is the resulted size of the proof, based on the analysis of $\Pi_{\text {sac }}^{\text {samp }}$, which is the same as that of $\Pi_{\text {sac }}$ from Section 3.5.

### 5.1 The Baseline Proof for SIS

We start by presenting a argument for the Binary SIS problem presented in Section 2.4. The reason behind that is because general range proofs are hard using a circuit over $\mathbb{F}_{q}$ whereas they are very simple for binary values. Moreover, the protocol we design for this problem will serve as a starting point for constructions supporting secrets from larger intervals.

Let us first recap the definition of the relation that we aim to prove, which was given in Section 2.4 as

$$
R_{\mathrm{B}-\mathrm{SIS}}^{m, n, q}=\left\{(x, w)=((\mathbf{A}, \boldsymbol{t}), \boldsymbol{s}) \mid \boldsymbol{s} \in\{0,1\}^{m} \wedge \mathbf{A} \in \mathbb{F}_{q}^{n \times m} \wedge \boldsymbol{t}=\mathbf{A} \boldsymbol{s}\right\}
$$

There are two main tasks that the protocol has to achieve, which is to show that the secret $s$ is a binary vector and the correctness of the product $\boldsymbol{t}=\mathbf{A} \boldsymbol{s}$. The matrix multiplication uses a publicly known matrix, and since linear operations are free in our used MPC scheme computing it can be done without increasing the proof size. What remains to show is that the witness consists of bits. As already mentioned in the introduction, this test is easy to perform because $s[i] \in\{0,1\}$ implies that $s[i]^{2}-s[i]=0$. We can therefore let the circuit $C$ compute the square of each element of $s$ and then perform a linear test.

The obtained circuit is described in Fig. 4. For ease of notation we let $a_{i, j} \in \mathbb{F}_{q}$ be the element in the $i$ th row and the $j$ th column of $\mathbf{A}$. The circuit can be evaluated using one of the protocols from Section 3, with $\mathcal{V}$ testing that the circuit's output $\hat{\boldsymbol{y}}$ equals $(\boldsymbol{t}[1], \ldots, \boldsymbol{t}[n], 0, \cdots, 0)$. This yields a highly efficient protocol, as there are only $m$ non-linear gates in the circuit that require communication, and all of them are square gates.

```
Witness: \(\boldsymbol{w}=(\boldsymbol{s}[1], \ldots, \boldsymbol{s}[m]) \in \mathbb{F}_{q}^{m}\)
Computation:
    1. \(\forall i \in[m]\) compute \(r_{i} \leftarrow \boldsymbol{s}[i]^{2}\)
    2. \(\forall j \in[n]\) compute \(y_{j} \leftarrow \sum_{i \in[m]} a_{j, i} s[i]\)
    3. \(\forall i \in[m]\) compute \(y_{i+n} \leftarrow r_{i}-\boldsymbol{s}[i]\)
Output: \(\hat{\boldsymbol{y}} \leftarrow\left(y_{1}, \ldots, y_{m+n}\right)\)
```

Fig. 4: An Arithmetic circuit representation of $R_{\mathrm{B}-\mathrm{SIS}}^{m, n, q}$; The circuit contains $m$ square gates, has $m$ inputs and $m+n$ outputs.

Using the cost analysis from Section 3.5 (for the scarifying-based protocol), We conclude that the total number of bits communicated by $\mathcal{P}$ is

$$
\mid \text { hash }|\cdot 2+|\mathrm{sd}| \cdot(2+M \log N)+|\operatorname{com}| \cdot M+\log q \cdot M(4 m+2)
$$

Generalizing the Distribution of the input $s$. It is immediate to extend the construction from Fig. 4 to other input distributions. We first generalize it to secrets $s$ such that $\|s\|_{\infty} \leq 1$. Again, we want to mainly use squaring instead of multiplication gates. Since $s[i] \in\{-1,0,1\} \Longleftrightarrow$ $s[i]^{3}-s[i]=0$ we could implement this test using one squaring and one multiplication gate. To further optimize this, observe that the polynomial $X^{4}-X^{2}$ has the same roots as $X^{3}-X$ (albeit with different multiplicity) but can be computed using two squaring gates instead (since $\left.X^{4}-X^{2}=\left(X^{2}\right)^{2}-X^{2}\right)$. Thus, in this setting, $\mathcal{P}$ 's total communication is

$$
\mid \text { hash }|\cdot 2+| \text { sd }|\cdot(2+M \log N)+|\operatorname{com}| \cdot M+\log q \cdot M(7 m+2)
$$

More generally, if we want to prove that $0 \leq s[i]<2^{r}$ for some fixed $r \in \mathbb{N}$, the most direct way is to again prove that $s[i]$ is the root of the polynomial $f(X)=\Pi_{j=0}^{2^{r}-1}(X-j)$. This will increase the argument size by a factor of $2^{r}$ compared with the Binary SIS argument, but for the aforementioned interval one can reduce this to $O(r)$ : a number $s[i]$ is in the interval $\left[0,2^{r}-1\right]$ if and only if there exists $s_{i, 1}, \ldots, s_{i, r}$ such that $s[i]=\sum_{j=0}^{r-1} s_{i, j+1} 2^{j}$ and $s_{i, j} \in\{0,1\}$. This means that the interval check can be done by providing the bit decomposition $s_{i, 1}, \ldots, s_{i, r}$ of each $s[i]$, testing if these $s_{i, j}$ are indeed bits (using one square gate per test) and then reconstructing $s[i]$ on the fly. The circuit construction is summarized in Fig. 5.

```
Witness: \(\boldsymbol{w}=\left(s_{1,1}, \ldots, s_{1, r}, \ldots, s_{m, 1}, \ldots, s_{m, r}\right) \in \mathbb{F}_{q}^{m \cdot r}\).
Computation:
    1. \(\forall i \in[m], j \in[r]\) compute \(r_{i, j} \leftarrow s_{i, j}^{2}\)
    2. \(\forall i \in[m]\) compute \(s[i] \leftarrow \sum_{j=0}^{r-1} s_{i, j+1} 2^{j}\)
    3. \(\forall j \in[n]\) compute \(y_{j} \leftarrow \sum_{i \in[m]} a_{j, i} s[i]\)
    4. \(\forall i \in[m], j \in[r]\) compute \(y_{i, j} \leftarrow r_{i, j}-s_{i, j}\)
```

Output: $\hat{\boldsymbol{y}} \leftarrow\left(y_{1}, \ldots, y_{n}, y_{1,1}, \cdots, y_{m, r}\right)$

Fig. 5: An Arithmetic circuit representation of $R_{\mathrm{SIS}}^{m, n, q}$ where each $s[i]$ is in the interval $\left[0,2^{r}\right)$; The circuit contains $m \cdot r$ square gates, has $m \cdot r$ inputs and $n+m \cdot r$ outputs.

The witness $w$ is valid if $\hat{\boldsymbol{y}}$ equals $(\boldsymbol{t}[1], \ldots, \boldsymbol{t}[n], 0, \cdots, 0)$. While the witness is now expanded by a factor $r$, we in total only have to compute $m \cdot r$ square gates. All other operations are linear and do not influence the argument size, which in total is

$$
\mid \text { hash }|\cdot 2+|\operatorname{sd}| \cdot(2+M \log N)+|\operatorname{com}| \cdot M+\log q \cdot M(4 m \cdot r+2)
$$

Proving Knowledge of general SIS instances. Finally, we aim at constructing an argument of knowledge for SIS as defined in Definition 1, for which we need $s[i] \in[-\beta, \beta]$ or alternatively, that $((\mathbf{A}, \boldsymbol{y}), \boldsymbol{s}) \in R_{\mathrm{SIS}}^{m, n, q, \beta}$. Instead of proving this exact bound, we show that $((\mathbf{A}, \boldsymbol{y}), \boldsymbol{s}) \in R_{\mathrm{SIS}}^{m, n, q, 2 \beta}$ using the circuit representation from Fig. 5. Therefore, consider the following algorithm:

1. $\mathcal{P}, \mathcal{V}$ set $\hat{\boldsymbol{t}}=\boldsymbol{t}+\mathbf{A} \boldsymbol{\beta}$ and $\mathcal{P}$ additionally sets $\hat{\boldsymbol{s}}=\boldsymbol{s}+\boldsymbol{\beta}$.
2. Choose the smallest $r \in \mathbb{N}$ such that $\beta \leq 2^{r}-1$.
3. $\mathcal{P}, \mathcal{V}$ run one of our two protocols on the circuit from Fig. 5 using the interval $\left[0,2^{r+1}-1\right]$ and with the common output being ( $\hat{\boldsymbol{t}}[1], \ldots, \hat{\boldsymbol{t}}[n], 0, \ldots, 0)$.

Then, the above algorithm is a zero-knowledge argument of knowledge for the SIS relation with slack 2 as long as $2 \beta<\frac{q-1}{2}$. To see this, we only have to look at correctness and soundness as the zero-knowledge property follows trivially. For correctness, the chosen $r$ will always lead to $\hat{s}$ being in the right interval since $2 \beta \leq 2\left(2^{r}-1\right)<2^{r+1}-1$. And due to the bound on $\beta$ and $q$, adding and subtracting $\beta$ will not cause a wrap-around $\bmod q$. For soundness, it is immediate that every extracted $\hat{\boldsymbol{s}}^{\prime}$ with $\hat{\boldsymbol{t}}=\mathbf{A} \hat{\boldsymbol{s}}^{\prime}$ can be turned into a witness $\boldsymbol{s}^{\prime}$ for $\boldsymbol{t}$ by setting $\boldsymbol{s}^{\prime}=\hat{\boldsymbol{s}}^{\prime}-\boldsymbol{\beta}$. In this case, it holds for each coefficient $s^{\prime}[i]$ that

$$
\begin{aligned}
\boldsymbol{s}^{\prime}[i] & \in\left[-\beta, 2^{r+1}-1-\beta\right] \\
& \in\left[-\beta, 2^{r+1}-2^{r}\right] \\
& \in\left[-\beta, 2^{r}\right]
\end{aligned}
$$

Note that in the worst case, $\beta$ is a power of 2 which means that we must set $r$ such that $2^{r}-1=$ $2 \beta-1$. But then $s^{\prime}[i] \in[-\beta, 2 \beta]$ and the claimed slack follows.

Thus, we will have to expand the witness to contain $\left(\left\lfloor\log _{2}(\beta)\right\rfloor+2\right) \cdot m$ elements and we furthermore have to evaluate as many square gates. $\mathcal{P}$ sends

$$
\mid \text { hash }\left|\cdot 2+|\operatorname{sd}| \cdot(2+M \log N)+|\operatorname{com}| \cdot M+\log q \cdot M\left(4 m \cdot\left(\left\lfloor\log _{2}(\beta)\right\rfloor+2\right)+2\right)\right.
$$

### 5.2 Reducing Verification Time

Our two protocols from Section 3 are almost symmetric in the amount of computation that $\mathcal{P}, \mathcal{V}$ have to perform. Increasing the number of parties $N$ in either protocol leads to a reduced number of instances $M$ for which one has to simulate the inner MPC protocol. At the same time, this increases the number of transcripts of parties which $\mathcal{V}$ has to check, namely $M \cdot(N-1)$ in total. As we do not assume any restrictions on $\mathbf{A}$ of the SIS-instance (such as using Ring- or Module-SIS), the multiplication with $\mathbf{A}$ on the side of $\mathcal{V}$ turns out to be a bottleneck, since it has to multiply the matrix with $N-1$ input shares (corresponding to the $N-1$ parties for whom the view is tested). We now describe how to counteract this problem by batching the linear tests together across multiple parties and MPC instances.

Recall that in the protocol (as presented in Section 3) the verifier $\mathcal{V}$ recomputes the circuit from the inputs to the outputs in every MPC instance. Therefore, to verify the linear relation as in the protocol, $\mathcal{V}$ recomputes the output share $o_{e, j, i}$ for each execution $e \in[M]$, output wire $j \in\left[n_{\text {out }}\right]$, and party $i \in I_{e}$ from the input shares $w_{e, 1, i}, \ldots, w_{e, n_{\text {in }}, i}$ of each party by computing $o_{e, j, i}=\sum_{k \in\left[n_{\text {in }}\right]} a_{j, k} w_{e, k, i}$. These output wire shares $o_{e, j, i}$ are then compared to the committed output shares in the commitment $\Pi_{e}$ which $\mathcal{V}$ obtained from $\mathcal{P}$ as part of the argument.

One observation is that this computation of $o_{e, j, i}$ is linear in all the values that depend on the simulated party $i$ as well as on the MPC instance. Thus instead of verifying each $o_{e, j, i}$ individually, we can check this with lower computational overhead by testing a linear combination across all parties and MPC instances instead. For this, we use the following standard argument:

Proposition 3. Let $\mathbf{A} \in \mathbb{F}^{\phi \times \tau}$ and consider the following game with adversary $\mathcal{A}$ :

1. $\mathcal{A}$ outputs $\boldsymbol{w}_{1}, \ldots, \boldsymbol{w}_{\rho} \in \mathbb{F}^{\tau}$ as well as $\boldsymbol{o}_{1}, \ldots, \boldsymbol{o}_{\rho} \in \mathbb{F}^{\phi}$.
2. Random $\lambda_{1}, \ldots, \lambda_{\rho} \in \mathbb{F}$ are chosen.
3. If $\exists j \in[\rho]: \boldsymbol{o}_{j} \neq \mathbf{A} \boldsymbol{w}_{j}$ and $\sum_{j \in[\rho]} \lambda_{j} \cdot \boldsymbol{o}_{j}=\mathbf{A}\left(\sum_{j \in[\rho]} \lambda_{j} \cdot \boldsymbol{w}_{j}\right)$ then the output of the game is 1 and and 0 otherwise.

Then, the probability that the game's output is 1 is at most $1 /|\mathbb{F}|$.
Proof. Assume that $\mathcal{A}$ wins (i.e., the output of the game is 1). Thus, there exists $\bar{j} \in[\rho]$ such that $\boldsymbol{o}_{j}=\mathbf{A} \boldsymbol{w}_{j}+\boldsymbol{\Delta}_{j}$, and $\boldsymbol{\Delta}_{j}$ is non-zero. It follows that

$$
\begin{aligned}
\sum_{j \in[\rho]} \lambda_{j} \cdot \boldsymbol{o}_{j} & =\sum_{j \in[\rho]} \lambda_{j} \cdot\left(\mathbf{A} \boldsymbol{w}_{j}+\boldsymbol{\Delta}_{j}\right)=\sum_{j \in[\rho]} \lambda_{j} \cdot \mathbf{A} \boldsymbol{w}_{j}+\sum_{j \in[\rho]} \lambda_{j} \cdot \boldsymbol{\Delta}_{j} \\
& =\mathbf{A}\left(\sum_{j \in[\rho]} \lambda_{j} \cdot \boldsymbol{w}_{j}\right)+\sum_{j \in[\rho] \backslash\{\bar{j}\}} \lambda_{j} \cdot \boldsymbol{\Delta}_{j}+\lambda_{\bar{j}} \boldsymbol{\Delta}_{\bar{j}} .
\end{aligned}
$$

Now, since $\mathcal{A}$ wins, this means that $\sum_{j \in[\rho] \backslash\{\bar{j}\}} \lambda_{j} \cdot \boldsymbol{\Delta}_{j}+\lambda_{j} \boldsymbol{\Delta}_{\bar{j}}=0$. As all the $\lambda_{j}$ are chosen uniformly at random and $\boldsymbol{\Delta}_{\bar{j}} \neq \mathbf{0}$, this happens only with probability $1 /|\mathbb{F}|$.

Using the above Proposition, we can reduce verification time as follows. The verifier $\mathcal{V}$ will now first sample uniformly random values $\lambda_{1}, \ldots, \lambda_{M(N-1)}$ locally. Next, instead of computing the $M(N-1)$ matrix products with $\mathbf{A}(N-1$ in each MPC instance $e)$, it will compute a weighted sum of the $M(N-1)$ vectors first and then compute a single matrix product with $\mathbf{A}$. While this approach has soundness error $1 /|\mathbb{F}|, \mathcal{V}$ can simply repeat it with different choices of $\lambda$ to reduce the error probability.

The drawback of this approach is that it comes at the cost of increasing the proof size. In particular, according to the description of our protocols in Section 3, $\mathcal{P}$ only needs to send the output shares of one party (the shares $o_{e, j, \bar{i}_{e}}$ of party $P_{\bar{i}_{e}}$ whose view is not opened and so cannot be locally computed by the verifier), whereas the above optimization requires $\mathcal{P}$ to send the output shares of all parties to $\mathcal{V}$. Thus, this optimization to the computation time increases the argument size by $|\log q| \cdot M \cdot(N-1) \cdot n$ bits.

### 5.3 Reducing Communication by Amortizing Bit Tests

We now discuss an optimization which aims at reducing the argument size for the Binary SIS problem by reducing the number of non-linear gates in the circuit. Recall that in Fig. 4, we defined a circuit for this problem that has $m$ square gate. Each of the gates was used to verify that one of $m$ inputs is a bit. We now show how the number of square gates can be reduced to 1 , at the cost of adding elements to the witness. This reduces the overall communication since adding an element to the witness increases the size of the argument per MPC instance by one field element, whereas evaluating a square gate requires sending at least two field elements (secret-shared random square, messages during evaluation of the gate etc.). The optimization uses circuit sampling as defined in Section 4 , where only $\mathcal{V}$ has a challenge and so only $\mathcal{V}$ is actually sampling the circuit.

Assume that we want to check if $m$ input sharings $s[1], \ldots, s[m]$ indeed are bits, and furthermore let $|\mathbb{F}| \gg 2 m$. We can implicitly define the polynomial $D(X) \in \mathbb{F}[X]$ of degree at most $m-1$ such that $\forall i \in[m]: \quad D(i)=s[i]$. Furthermore, we know that there exists a polynomial $B(X)=$ $D(X) \cdot D(X)$ of degree at most $2 m-2$ such that $\forall i \in \mathbb{F}: D(i)^{2}=B(i)$. We thus can say that $\forall i \in[m]: s[i] \in\{0,1\}$ if and only if $\forall i \in[m]: \quad B(i)=D(i)$.

This allows us to construct a new circuit-sampling procedure. Instead of testing all $s[i]$ separately for being bits, we let the prover $\mathcal{P}$ secret-share the predetermined $B(X)$ as part of the witness. Here, by our above observation that $\forall i \in[m]: B(i)=D(i)$ it is only necessary to share the points $B(m+1), \ldots, B(2 m-1)$ (in addition to sharing all $s[i])$. Then, using the fact that Lagrangeinterpolation requires only linear operations (so it is entirely local in the underlying MPC scheme)
we let $\mathcal{V}$ send a challenge $x \in \mathbb{F}$ that is the point at which we will evaluate $D, B$ and test that $B(x)-D(x)^{2}=0$. By the Schwartz-Zippel-Lemma, we then must have identity of $D(X)^{2}$ and $B(X)$ except with probability $\frac{2 m-2}{|\mathbb{F}|}$.

In Fig. 6 we summarize the above intuition. Given $h$ points $T=\left\{\left(x_{i}, y_{i}\right)\right\}_{i \in[h]}$ we define the Lagrange coefficients $\ell_{j}^{T}(x)=\prod_{\substack{1 \leq i \leq h \\ i \neq j}} \frac{x-x_{i}}{x_{j}-x_{i}}$ for polynomial evaluation. In addition to Lagrange interpolation we also use the random linear combination of the outputs optimization described in the previous section.

```
Let \(D(x)\) be a \((m-1)\)-degree polynomial such that \(\forall i \in[m]: D(i)=\boldsymbol{s}[i]\) and \(B(x)=(D(x))^{2}\) be a \((2 m-2)-\)
degree polynomial. Let \(S_{D}=\{(i, D(i))\}_{i \in[m]}\) and \(S_{B}=S_{D} \cup\{(i, B(i))\}_{i \in[2 m-1] \backslash[m]}\) be the evaluation points of
the polynomials \(D, B\). We set \(S_{\mathcal{V}}=\mathbb{F}^{n+1}\).
ExtWitness: On input
    \((x=(\boldsymbol{A}, \boldsymbol{t}[1], \ldots, \boldsymbol{t}[n]), w=(\boldsymbol{s}[1], \ldots, \boldsymbol{s}[m])) \in R_{\mathrm{B}-\mathrm{SIS}}^{m, n, q}\)
set \(\overline{\boldsymbol{w}}=\left(\boldsymbol{s}[1], \ldots, \boldsymbol{s}[m], b_{1}, \ldots, b_{m-1}\right) \in \mathbb{F}_{q}^{2 m-1}\) where \(b_{i}=B(i+m)=D(i+m)^{2}\).
SampCircuit: On input \(\tau_{\mathcal{V}}=(\gamma, x) \in \mathbb{F}^{n} \times \mathbb{F}\) output the circuit \(C\) which performs the following:
    1. \(\forall j \in[n]\) compute \(\left.\bar{y}_{j} \leftarrow \sum_{i \in[m]} a_{j, i} s i i\right]\)
    2. Set \(y_{1} \leftarrow \sum_{j \in[n]} \gamma[j] \cdot\left(\bar{y}_{j}-\boldsymbol{t}[j]\right)\)
3. \(z_{1} \leftarrow \sum_{j=1}^{m} \ell_{j}^{S_{D}}(x) \cdot \boldsymbol{s}[j]\)
4. \(z_{2} \leftarrow \sum_{j=1}^{m} \ell_{j}^{S_{B}}(x) \cdot \boldsymbol{s}[j]+\sum_{j=1}^{m-1} \ell_{j+m}^{S_{B}}(x) \cdot b_{j}\)
5. \(z_{3} \leftarrow z_{1}^{2}\)
6. \(y_{2} \leftarrow z_{3}-z_{2}\)
7. Output of \(C\) is \(\left(y_{1}, y_{2}\right)\)
Furthermore output the set \(Y=\{(0,0)\}\).
Response: Output 1.
```

Fig. 6: Sampling of a circuit for $R_{\mathrm{B}-\mathrm{SIS}}^{m, n, q}$. This circuit contains 1 squaring gate, has $2 m-1$ inputs and 2 outputs.

We now prove that the algorithm defined in Fig. 6 is a circuit sampler as defined in Section 4.1.

Theorem 4. The algorithms in Fig. 6 yield a perfectly correct, $\frac{2 m-2}{|\mathbb{F}|}$-sound and perfectly simulatable circuit sampler for the relation $R_{\mathrm{B}-\mathrm{SIS}}^{m, n, q}$.

Proof. Correctness and the simulation property follow directly from the definition of the above algorithms. What remains to show is the $\frac{2 m-2}{|\mathbb{F}|}$-soundness.

Fix a string $\overline{\boldsymbol{w}}$ and assume that soundness is higher than $\frac{2 m-2}{|\mathbb{F}|}$. As $\operatorname{Pr}\left[y_{1}=0\right]>\frac{2 m-2}{|\mathbb{F}|}>\frac{1}{|\mathbb{F}|}$ for all $m>1$, this implies that $\overline{\boldsymbol{w}}[1], \ldots, \overline{\boldsymbol{w}}[m]$ map to $\boldsymbol{t}$ under multiplication with $\mathbf{A}$ by the same argument as in Section 4.3. Consider the implicit polynomials $D(X), B(X)$ which were defined through $\overline{\boldsymbol{w}}$ and assume that $D(X)^{2} \neq B(X)$. Then by the Schwartz-Zippel Lemma the polynomial $F(X)=D(X)^{2}-B(X)$ can have at most $2 m-2$ roots. But since $\operatorname{Pr}\left[y_{2}=0\right]>\frac{2 m-2}{|F|}$ its number of roots is bigger and so $F(X)$ must be the zero-polynomial. Due to the way $D(X), B(X)$ are constructed (i.e., $B(x)=D(x)$ for all $i \in[m]$ ), it thus follows that $\overline{\boldsymbol{w}}[1], \ldots, \overline{\boldsymbol{w}}[m]$ are bits, which concludes the proof.

Applying this optimization and using $\Pi_{\text {sac }}^{\mathrm{samp}}$, we obtain that the total communication is

$$
\mid \text { hash }|\cdot 2+|\mathrm{sd}| \cdot(2+M \log N)+|\operatorname{com}| \cdot M+\log q \cdot M(2 m+4)
$$

which is approximately $\log q \cdot M(3 m)$ bits smaller than the baseline approach.

### 5.4 Trading Argument Size for Slack

All of the arguments for SIS-instances that we have seen so far have in common that the gap between the norm of correct witnesses and the norm that the argument guarantees is small: if we start with $((\mathbf{A}, \boldsymbol{y}), \boldsymbol{s}) \in R_{\text {SIS }}^{m, n, q, \beta}$ (i.e., $\left.\|\boldsymbol{s}\|_{\infty} \leq \beta\right)$ then the soundness guarantee is that a witness $s^{\prime}$ with $\left((\mathbf{A}, \boldsymbol{y}), s^{\prime}\right) \in R_{\text {SIS }}^{m, n, q, \tau \beta}$ could be extracted (i.e., $\left.\|s\|_{\infty} \leq \tau \beta\right)$ where $\tau$ is a small constant. However, the argument size depends on $M \cdot m \cdot \log _{2}(q) \cdot \log _{2}(\beta)$ as we have to perform non-linear computations for the bit-decomposition of each input $s[i]$. The goal of this subsection is to give an approximate argument of size for the $s[i]$ without having to resort to bit-decomposition for each $s[i]$. This would allow for a smaller number of square- or multiplication-gates as well as a more compact witness. On the other hand, the arguments will have a larger slack $\tau$ which will now also depend on the number of inputs $m$.

To achieve a more compact argument, we will ask the prover to show that random linear combinations of elements from $s$ are small. For this we use a Lemma from [BL17] who showed that random linear combinations $\bmod q$ of elements from $s$ are with certain probability not much smaller than $\|s\|_{\infty}$. Formally, they proved the following:
Lemma 7. For all $s \in \mathbb{F}_{q}^{k}$ it holds that

$$
\operatorname{Pr}_{\boldsymbol{c} \leftarrow\{0,1\}^{k}}\left[|\langle\boldsymbol{c}, \boldsymbol{s}\rangle|<\frac{1}{2} \cdot\|\boldsymbol{s}\|_{\infty}\right] \leq \frac{1}{2} \quad \text { and } \quad \operatorname{Pr}_{\mathbf{C} \leftarrow\{0,1\}^{\times \times k}}\left[\|\mathbf{C} \cdot \boldsymbol{s}\|_{\infty}<\frac{1}{2} \cdot\|\boldsymbol{s}\|_{\infty}\right] \leq 2^{-\ell} .
$$

Proof. See [BL17, Lemma 2.3 \& Corollary 2.4].
The above Lemma only talks about the chance of detecting a vector of high norm by seeing one large element in the result of the product with a random binary matrix. We will now extend it to the case where we always see that lots such large elements in the product $\mathbf{C} \cdot s$.
Lemma 8. Let $\kappa, r \in \mathbb{N}^{+}, s \in \mathbb{F}_{q}^{k}, \beta=\|s\|_{\infty}$ and define

$$
S_{\kappa}^{\beta}=\left\{\boldsymbol{h} \in \mathbb{F}_{q}^{r \cdot \kappa}|\exists T \subseteq[r \cdot \kappa] \wedge| T\left|>\kappa \wedge \forall i \in T:|\boldsymbol{h}[i]| \geq \frac{1}{2} \cdot \beta\right\}\right.
$$

Then

$$
\underset{\mathbf{C} \leftarrow\{0,1\}^{(r \cdot \kappa) \times k}}{\operatorname{Pr}}\left[\mathbf{C} \cdot \boldsymbol{s} \notin S_{\kappa}^{\beta}\right] \leq \exp \left(-\kappa \frac{(r-2)^{2}}{2 r}\right) .
$$

Proof. Set $\boldsymbol{r}=\mathbf{C} \cdot \boldsymbol{s}$ and define $X_{i} \in\{0,1\}$ to be 0 iff $|\boldsymbol{r}[i]| \geq \beta / 2$ and 1 otherwise. From Lemma 7 it follows that

$$
\operatorname{Pr}_{\mathbf{C} \leftarrow\{0,1\}^{(r \cdot k) \times k}}\left[X_{i}=1\right] \leq 1 / 2
$$

for all $i \in[r \cdot \kappa]$. Consider the mean $\bar{X}=\frac{1}{r \cdot \kappa} \sum_{i=1}^{r \cdot \kappa} X_{i}$. Then clearly $E[\bar{X}]=E\left[X_{i}\right]=\operatorname{Pr}\left[X_{i}=1\right] \leq$ $1 / 2$ and thus $-E[\bar{X}] \geq-1 / 2$.

Using the one-sided Hoeffding inequality we obtain

$$
\operatorname{Pr}_{\mathbf{C} \in\{0,1\}(r \cdot k) \times k}\left[\bar{X}-E[\bar{X}] \geq \frac{r-1}{r}-\frac{1}{2}\right] \leq \exp \left(-2(r \cdot \kappa) \cdot\left(\frac{r-2}{2 r}\right)^{2}\right)
$$

Since $-E[\bar{X}] \geq-1 / 2$, we have that

$$
\operatorname{Pr}_{\mathbf{C} \in\{0,1\}(r \cdot \kappa) \times k}\left[\bar{X} \geq \frac{r-1}{r}\right] \leq \exp \left(-\kappa \cdot \frac{(r-2)^{2}}{2 r}\right)
$$

Observe that for the event $\bar{X} \geq 1-1 / r$ to happen, there must be $\leq \kappa$ variables for which $X_{i}=0$ (since in that case $\bar{X}=\frac{1}{r \cdot k} \sum_{i=1}^{r \cdot k} X_{i} \geq \frac{r \cdot k-k}{r \cdot k}=1-\frac{1}{r}$ ). Thus,

$$
\operatorname{Pr}_{\mathbf{C} \in\{0,1\}^{(r \cdot k) \times k}}\left[\bar{X} \geq \frac{r-1}{r}\right]=\operatorname{Pr}_{\mathbf{C} \leftarrow\{0,1\}^{(r \cdot k) \times k}}\left[\mathbf{C} \cdot \boldsymbol{s} \notin S_{\kappa}^{\beta}\right]
$$

and the claim follows.
From the above Lemma we can easily derive the following:
Corollary 2. For the same conditions as in Lemma 8, we have that if $r \geq 5$ then

$$
\operatorname{Pr}_{\mathbf{C} \leftarrow\{0,1\}^{(r, k) \times k}}\left[\mathbf{C} \cdot \boldsymbol{s} \notin S_{\kappa}^{\beta}\right] \leq 2^{-\kappa}
$$

The above statements can directly be implemented in our argument system by the means of circuit sampling. Unfortunately, this results in a new problem, which is that we cannot output the product of $s$ with a random binary matrix to $\mathcal{V}$ without necessarily leaking information about $s$.

We resolve this problem using circuit sampling on the side of the prover and give two different solutions. The first idea is that $\mathcal{P}$ can compute $\boldsymbol{u}=\mathbf{C} \boldsymbol{s}$ and output $\boldsymbol{u}+$ "small" where "small" is a value of small norm. To achieve good soundness guarantees we let "small" only be polynomially bigger than $\|\boldsymbol{u}\|_{\infty}$ and use Rejection Sampling to hide the information from the product. Alternatively, we can allow $\mathcal{P}$ to prove knowledge of the bit decomposition of each value of $\boldsymbol{u}=\mathbf{C} \boldsymbol{s}$. We now describe both ideas in more detail and then formally express them in the context of circuit sampling, which allows to directly combine them with $\Pi_{\text {sac }}^{\text {samp }}$.
$1^{\text {st }}$ Approach: Rejection Sampling. In this solution, we let the prover $\mathcal{P}$ add additional random elements $x_{1}, x_{2}, \ldots$ to the witness, which are supposed to be small. The verifier $\mathcal{V}$ will then, as part of his challenge in the circuit sampling, ask $\mathcal{P}$ to open a subset of $x_{1}, x_{2}, \ldots$ to show that most of the remaining ones are indeed of small size. $\mathcal{P}$ will then open sums of each $\boldsymbol{u}[i]$ with some $x_{j}$, subject to the constraint that this does not leak information about $s . \mathcal{V}$ later tests that each such $\boldsymbol{u}[i]+x_{j}$ is of bounded norm.

As part of rejection sampling a prover aborts whenever the argument would leak information. But our goal is that the argument is complete with overwhelming probability. To achieve this, we use an idea which is inspired by the "imperfect proof" of [BDLN16]. There, the authors gave a protocol that showed how to prove knowledge of $\ell-\kappa$ out of $\ell$ SIS instances using cut-and-choose and rejection sampling. Their approach aborts only with negligible probability and turns out to be compatible with our application. This, on a high level, works as follows:

1. $\mathcal{P}$ will sample $x_{1}, \ldots, x_{16 \kappa}$ uniformly at random from $[-\pi \cdot m \cdot \beta, \pi \cdot m \cdot \beta] \subset \mathbb{F}$ and commit them as part of $\overline{\boldsymbol{w}}$.
2. $\mathcal{V}$ with probability $1 / 2$ puts each $x_{i}$ into a set $E$. Then, he samples a random bit matrix $\mathbf{C} \in\{0,1\}^{5 \kappa \times m}$ and sends $E, \mathbf{C}$ as its challenges to the prover.
3. $\mathcal{P}$ now sets up a circuit $C$ as follows:
(a) $C$ will output $\left\{x_{e}\right\}_{e \in \bar{E}}$. Then $\mathcal{V}$ can check that $x_{e} \in[-\pi \cdot m \cdot \beta, \pi \cdot m \cdot \beta]$.
(b) Compute $\boldsymbol{u}=\mathbf{C} s$ in the circuit. $\mathcal{P}$ will go through $\boldsymbol{u}[1], \ldots, \boldsymbol{u}[5 \cdot \kappa]$, take the first unused $e \in E$ and test if $\boldsymbol{u}[i]+x_{e} \in[-(\pi-1) \cdot m \cdot \beta,(\pi-1) \cdot m \cdot \beta]$. If so, then it makes $C$ output $v_{i}=\boldsymbol{u}[i]+x_{e}$, otherwise it removes $e$ from $E$ and repeats this procedure with the next-largest $e^{\prime} \in E . \mathcal{V}$ later checks that indeed $v_{i} \in[-(\pi-1) \cdot m \cdot \beta,(\pi-1) \cdot m \cdot \beta]$.
We present the full sampler in Fig. 7.

Set some auxiliary value $\pi=100$. We will have $S_{\mathcal{V}}=\left\{(\gamma, E, \mathbf{C}) \in \mathbb{F}_{q}^{n} \times\{0,1\}^{16 \kappa} \times\{0,1\}^{5 \kappa \times m}\right\}$.
ExtWitness: On input

$$
(x=(t[1], \ldots, \boldsymbol{t}[n]), w=(s[1], \ldots, s[m])) \in R_{\mathrm{sis}}^{m, n, q, \beta}
$$

sample $x_{1}, \ldots, x_{16 \kappa}$ uniformly at random from $[-\pi \cdot m \cdot \beta, \pi \cdot m \cdot \beta] \subset \mathbb{F}$. Then set

$$
\overline{\boldsymbol{w}}=\left(s[1], \ldots, \boldsymbol{s}[m], x_{1}, \ldots, x_{16 \kappa}\right) \in \mathbb{F}_{q}^{m+16 \kappa} .
$$

Response: On input $x, v, \overline{\boldsymbol{w}}, \tau_{\mathcal{V}}$ do the following:

1. Set $T \leftarrow \emptyset$ and let $E$ be as in $\tau_{\mathcal{V}}$.
2. Let $\boldsymbol{u} \leftarrow \mathbf{C} \boldsymbol{s}$. For all $i \in[5 \kappa]$ :
(a) Set $v_{i} \leftarrow \boldsymbol{u}[i]+x_{e}$ for the smallest $e \in E$.
(b) Set $E \leftarrow E \backslash\{e\}$.
(c) Set $T \leftarrow T \cup\{(i, e)\}$ if $v_{i} \in[-(\pi-1) \cdot m \cdot \beta,(\pi-1) \cdot m \cdot \beta]$, otherwise begin again for the next element in $E$.
3. Output $\tau_{\mathcal{P}}=T$.

SampCircuit: On input $x, \tau_{\mathcal{P}}=T, \tau_{\mathcal{V}}=(\gamma, E, \mathbf{C})$ where $k=|\bar{E}|$ output the circuit $C$ which performs the following:

1. Compute $\overline{\boldsymbol{y}} \leftarrow \mathbf{A} \boldsymbol{s}$.
2. Set $y \leftarrow \sum_{j \in[n]} \boldsymbol{\gamma}[j] \cdot(\overline{\boldsymbol{y}}[j]-\boldsymbol{t}[j])$.
3. Write $\bar{E}=\left\{e_{1}, \ldots, e_{\ell}\right\}$. Then for $i \in[\ell]$ set $v_{i} \leftarrow x_{e}$.
4. Compute $\boldsymbol{u} \leftarrow \mathbf{C} \boldsymbol{s}$.
5. For $i \in[5 \kappa]$ set $v_{i+\ell} \leftarrow \boldsymbol{u}[i]+x_{t_{i}}$ where $\left(i, t_{i}\right) \in T$.
6. Output of $C$ is $\left(y, v_{1}, \ldots, v_{\ell+5 \kappa}\right)$.

We implicitly set

$$
Y \leftarrow\left\{\left(0, v_{1}, \ldots, v_{k+5 \kappa}\right) \in \mathbb{F}_{q}^{\ell+5 \kappa+1} \left\lvert\, \begin{array}{l}
\forall i \in[\ell]: v_{i} \in[-\pi \cdot m \cdot \beta, \pi \cdot m \cdot \beta] \wedge \\
\forall j \in[5 k]: v_{\ell+j} \in[-(\pi-1) \cdot m \cdot \beta,(\pi-1) \cdot m \cdot \beta]
\end{array}\right.\right\} .
$$

Fig. 7: Sampling of a circuit for $R_{\mathrm{SIS}}^{m, n, q, 4 \pi m \cdot \beta}$.

Theorem 5. The algorithms in Fig. 7 yield a statistically correct, $\alpha$-sound and perfectly simulatable circuit sampler for the relation $R_{\text {SIS }}^{m, n, q, 4 \pi m \cdot \beta}$ where $\alpha=\max \left\{1 /|\mathbb{F}|, 2^{-\kappa}\right\}$.

Proof. We prove that the sampler satisfies the definitions in Section 4.1.
Correctness. We show that if the prover follows the sampler instructions, then $C(\overline{\boldsymbol{w}}) \in Y$ except for a negligible probability in $\kappa$. Clearly, this event that $C(\overline{\boldsymbol{w}}) \notin Y$ occurs only when $16 \kappa$
samples of $x_{e}$ are not sufficient. Thus, we need to show that this indeed happens only with negligible probability. To prove this, we first compute the probability that $x_{e}$ is "thrown away" and not used in the sum with $\boldsymbol{u}[i]$. As we assume that the prover acts honestly, we know that $x_{e} \in[-\pi \cdot m \cdot \beta, \pi \cdot m \beta]$, i.e., $x_{e}$ is sampled from a set of size $2 \pi \cdot m \cdot \beta+1$. In addition $-\beta \leq \boldsymbol{s}[i] \leq \beta$ and so $-m \beta \leq \boldsymbol{u}[i] \leq m \beta$. Denote by bad $_{x}$ the event that $x_{e}$ is not used. Thus, given $\boldsymbol{u}[i]$, we can write

$$
\begin{aligned}
\operatorname{Pr}\left[\operatorname{bad}_{x}\right] & =\operatorname{Pr}\left[\boldsymbol{u}[i]+x_{e}<-\pi \cdot m \cdot \beta+m \cdot \beta\right]+\operatorname{Pr}\left[\boldsymbol{u}[i]+x_{e}>\pi \cdot m \cdot \beta-m \cdot \beta\right] \\
& =\operatorname{Pr}\left[x_{e}<-\pi \cdot m \cdot \beta+m \cdot \beta-\boldsymbol{u}[i]\right]+\operatorname{Pr}\left[x_{e}>\pi \cdot m \cdot \beta-m \cdot \beta-\boldsymbol{u}[i]\right] \\
& =\frac{\pi \cdot m \cdot \beta+m \cdot \beta-\boldsymbol{u}[i]-(-\pi \cdot m \cdot \beta)}{2 \pi \cdot m \cdot \beta+1}+\frac{\pi \cdot m \cdot \beta-(\pi \cdot m \cdot \beta-m \cdot \beta-\boldsymbol{u}[i])}{2 \pi \cdot m \cdot \beta+1} \\
& =\frac{2 m \cdot \beta}{2 \pi \cdot m \cdot \beta+1}<\frac{2 m \cdot \beta}{2 \pi \cdot m \cdot \beta}=\frac{1}{\pi} .
\end{aligned}
$$

Note that this probability is independent of the value $\boldsymbol{u}[i]$. Therefore, given that each $x_{e}$ is being sent in the clear to $\mathcal{V}$ with probability $1 / 2$, we obtain that each $x_{e}$ is not used in the sum with $\boldsymbol{u}[i]$ with probability $p<1 / 2+1 / 2 \cdot 1 / \pi$.

We now compute the probability that there aren't enough samples of $x_{e}$ to construct the circuit. Let $X_{e} \in\{0,1\}$ be a random variable such that $X_{e}=0$ with probability $p$ and $X_{e}=1$ with probability $1-p$. Furthermore, define $\bar{X}=\frac{1}{16 \kappa} \sum_{e=1}^{16 \kappa} X_{e}$. Since there are $16 \kappa$ samples of $X_{e}$ to begin with and $5 \kappa$ are required to complete the circuit construction successfully, we need to determine $\operatorname{Pr}\left[\sum_{e=1}^{16 \kappa} X_{e}<5 \kappa\right]=\operatorname{Pr}\left[\bar{X}<\frac{5}{16}\right]$. Observe that furthermore $E[\bar{X}]=p<1 / 2+1 / 2 \pi$ and therefore $\operatorname{Pr}\left[\bar{X}<\frac{5}{16}\right] \leq \operatorname{Pr}\left[\bar{X}-E[\bar{X}]<\frac{5}{16}-\frac{1}{2}-\frac{1}{2 \pi}\right]=\operatorname{Pr}\left[E[\bar{X}]-\bar{X}>\frac{3}{16}+\frac{1}{2 \pi}\right]$.

Using the Hoeffding bound we obtain

$$
\begin{aligned}
\operatorname{Pr}\left[\sum_{e=1}^{16 \kappa} X_{e}<5 \kappa\right] & \leq \exp \left(-2 \cdot 16 \kappa \cdot\left(\frac{3 \pi+8}{16 \pi}\right)^{2}\right) \\
& \leq \exp (-\kappa)
\end{aligned}
$$

where the last inequality holds for any $\pi \geq 1$.
Simulatability. Recall that the definition of in Section 4.1, we need to show that a simulator who receives $x, w$ and $\tau_{\mathcal{V}}$ as its input can output $\tau_{\mathcal{P}}$ and $C(\overline{\boldsymbol{w}})$ that are indistinguishable from an output of a real execution. As we have seen, the aforementioned "throwing away" probability is actually independent of the value $\boldsymbol{u}[i]$ as long as $\boldsymbol{u}[i] \in[-m \cdot \beta, m \cdot \beta]$. One can therefore simulate $\tau_{\mathcal{P}}$ by flipping a biased coin. To simulate $C(\overline{\boldsymbol{w}})$, each value $v_{1}, \ldots, v_{\ell+5 \kappa}$ is simply chosen from its respective interval. This is obviously a perfect simulation for $v_{1}, \ldots, v_{\ell}$ whereas $v_{\ell+1}, \ldots, v_{\ell+5 \kappa}$ is uniformly random in its interval by the choice of $x_{i}$ in Response.
$\alpha$-Soundness. Let $\overline{\boldsymbol{w}}$ be fixed and assume that the prover succeeds with probability $>\alpha=$ $\max \left\{1 /|\mathbb{F}|, 2^{-\kappa}\right\}$. As in previous arguments, this particularly implies that the committed $\overline{\boldsymbol{w}}$ contains a correct preimage of $\boldsymbol{t}$ under multiplication with $\mathbf{A}$ and it remains to show that this preimage is of correct size.

By the fact that each $x_{e}$ is chosen to be in the set $E$ with probability $1 / 2$, it follows that all but $\kappa$ of the unopened $x_{e}$ must be within the interval $[-\pi \cdot \beta \cdot m, \pi \cdot \beta \cdot m]$, as otherwise the success probability of the prover must be lower than $\frac{1}{2^{\kappa}}$.

Let's consider the vector $\boldsymbol{u}$ which the prover computes. By Corollary 2 we know that if $\mathcal{P}$ uses $\boldsymbol{s}$ such that $\|\boldsymbol{s}\|_{\infty} \geq(4 \pi-2) m \cdot \beta+2$ then $\boldsymbol{u} \in S_{\kappa}^{(4 \pi-2) m \beta+2}$ (i.e., there exist $>\kappa$ values $\boldsymbol{u}[j]$
for which $|\boldsymbol{u}[j]| \geq(2 \pi-1) m \cdot \beta+1)$, except with probability $2^{-\kappa}$. Thus for the output of the circuit to be in $Y$ each $\boldsymbol{u}[j]$ with $|\boldsymbol{u}[j]| \geq(2 \pi-1) m \beta+1>(2 \pi-1) m \beta$ must be paired with a value $x_{e}$ with $\left|x_{e}\right|>\pi m \beta$ so that $\left|v_{j}\right|=\left|\boldsymbol{u}[j]+x_{e}\right| \leq(\pi-1) m \beta$. As there are at most $\kappa$ many such "bad" $x_{i}$, the prover can make at most $\kappa$ bad sums and at least one generated $v_{j}$ will be outside of the interval and thus be noticed, except with probability $2^{-\kappa}$. Therefore, if the success probability of the prover is higher than $2^{-\kappa}$, then $\overline{\boldsymbol{w}}$ must contain a preimage of $\boldsymbol{t}$ of bound at most $4 \pi m \beta \geq(4 \pi-2) \cdot m \cdot \beta+2$.

A drawback of this approach is the rather big slack of $4 \pi \cdot m$. This slack is caused by two reasons. First, there is an inherent increase of $m$ due to the use of Lemma 7. In addition, using Rejection Sampling means that we lose another factor $\pi=100$. One could decrease the constant by using a discrete Gaussian distribution for the $x_{i}$ as in [Lyu12], but we opted for presenting the above idea due to its simplicity. On the positive side, there are no non-linear gates in the sampled circuit and $\mathcal{P}$ will only have to add $16 \cdot \kappa$ more values to the witness, independently of $\beta$. The sampled circuit will output $\ell+5 \kappa+1$ elements of $\mathbb{F}$, which in expectancy is around $13 \kappa+1$ (since each of the $16 \kappa$ random samples is opened with probability $1 / 2$ ).

Summing up, the communication of the actual argument (neglecting the length of $\tau_{\mathcal{P}}$ ) when using $\Pi_{\mathrm{sac}}^{\mathrm{samp}}$ is

$$
\mid \text { hash }\left|\cdot 2+|\mathbf{s d}| \cdot(2+M \log N)+|\operatorname{com}| \cdot M+\log _{2} q \cdot M(m+29 \kappa+1)\right.
$$

$\mathbf{2}^{\text {nd }}$ Approach: The Power of Random Bits. The circuit sampler from Fig. 7 has the disadvantage of having a comparably high slack of $4 \pi m$. On the other hand, it does not use any non-linear gates. We will now show how to decrease the slack to be essentially $m$ by reintroducing one square gate.

To reduce the slack, we will again rely on Lemma 7 . But instead of performing rejection sampling on the output, we perform a range proof for each element of the matrix product $\boldsymbol{u}=\mathbf{C s}$. The problem that arises is that $\mathbf{C}$ is only chosen at runtime, while the committed witness must be independent of the actual values in $\mathbf{C}$. At the same time, we must construct the argument in such a way that the circuit $C$ will not reveal any information about the product except for bounds on each value.

We resolve this problem as follows: if the witness has $\|s\|_{\infty} \leq \beta$, then since $\mathbf{C} \in\{0,1\}^{\kappa \times m}$ it must hold that $\|\mathbf{C} \boldsymbol{s}\|_{\infty} \leq m \cdot \beta$. Thus, letting $r$ be the smallest integer such that $m \cdot \beta<2^{r}$, it suffices for the prover to show that $\boldsymbol{u}[i] \in\left[0,2^{r}-1\right]$ (we can also shift the interval as done in Section 5.1). To show the inclusion $\mathcal{P}$ can add random bits $x_{1}^{i}, \ldots, x_{r}^{i}$ to the witness. Then, once the challenge is received from $\mathcal{V}$ and $\boldsymbol{u}$ is known to $\mathcal{P}$, it can compute the bit decomposition $\boldsymbol{u}[i]=\sum_{j=1}^{r} 2^{j} h_{j}^{i}$ for each $i \in[\kappa]$ and tell $\mathcal{V}$ for each $j \in[r]$ if it should use $x_{j}^{i}$ or $1-x_{j}^{i}$ to represent $h_{j}^{i}$. As all $x_{i}^{j}$ are chosen randomly, this yields a simulatable circuit. The only issue that remains is for $\mathcal{P}$ to prove that each $x_{i}^{j}$ is indeed a bit. For this task, we use the method presented in Section 5.3, which uses polynomial evaluation and requires a single non-linear gate. We provide the full circuit sampler in Fig. 8.

Theorem 6. Assume that $(q-1) / 2>4 m \beta$. The algorithms in Fig. 8 yield a perfectly correct, $\alpha$-sound and perfectly simulatable circuit sampler for the relation $R_{\mathrm{SIS}}^{m, n, q, 2 m \cdot \beta+4}$ where $\alpha=$ $\max \left\{\frac{2(r+1) \kappa-1}{|\mathbb{F}|}, 2^{-\kappa}\right\}$ and $r$ is the smallest integer such that $m \cdot \beta \leq 2^{r}-1$.

We will have $S_{\mathcal{V}}=\left\{(\boldsymbol{\gamma}, \hat{x}, \mathbf{C}) \in \mathbb{F}_{q}^{n} \times \mathbb{F}_{q} \times\{0,1\}^{\kappa \times m}\right\}$.
Let $\ell_{j}^{T}(x)=\prod_{\substack{1 \leq i \leq h \\ i \neq j}} \frac{x-x_{i}}{x_{j}-x_{i}}$ be Lagrange coefficients, where $T=\left\{\left(x_{i}, y_{i}\right)\right\}_{i \in h}$ is the set of points used for polynomial evaluation.

ExtWitness: On input $(x=(\boldsymbol{t}[1], \ldots, \boldsymbol{t}[n]), w=(\boldsymbol{s}[1], \ldots, \boldsymbol{s}[m])) \in R_{\text {SIS }}^{m, n, q, \beta}$ do the following:

1. For $i \in[\kappa]$ sample the random bits $x_{1}^{i}, \ldots, x_{r+1}^{i}$.
2. Compute the unique polynomial $D(X)$ of degree $(r+1) \kappa-1$ where $D((i-1)(r+1)+j)=x_{j}^{i}$ for $i \in$ $[\kappa], j \in[r+1]$. Furthermore, compute the polynomial $B(X)=(D(X))^{2}$ and set $S_{D}=\{(i, D(i))\}_{i \in[(r+1) \kappa]}$ and $S_{B}=S_{D} \cup\{(i, B(i))\}_{i \in[2(r+1) \kappa-1] \backslash[(r+1) \kappa]}$ as the evaluation points of the polynomials $D, B$.
3. Set

$$
\overline{\boldsymbol{w}}=\left(\boldsymbol{s}[1], \ldots, \boldsymbol{s}[m], x_{1}^{1}, \ldots, x_{r+1}^{1}, \ldots, x_{1}^{\kappa}, \ldots, x_{r+1}^{\kappa}, z_{1}, \ldots, z_{(r+1) \kappa-1}\right) \in \mathbb{F}_{q}^{m+(r+1) \kappa}
$$

where $z_{i}=B(i+(r+1) \kappa)=(D(i+(r+1) \kappa))^{2}$.
Response: On input $x, v, \overline{\boldsymbol{w}}, \tau_{\mathcal{V}}$ do the following:

1. Set $\boldsymbol{u} \leftarrow \mathbf{C} \boldsymbol{s}$. Then for all $i \in[\kappa]$ do the following:
(a) Set $v_{i} \leftarrow \boldsymbol{u}[i]+m \cdot \beta$.
(b) Find $h_{0}^{i}, \ldots, h_{r}^{i} \in\{0,1\}$ such that $v_{i}=\sum_{j=0}^{r} h_{j}^{i} 2^{j}$.
(c) For each $j \in\{0, \ldots, r\}$ set

$$
b_{j}^{i} \leftarrow \begin{cases}0 & \text { if } h_{j}^{i}=x_{j}^{i} \\ 1 & \text { otherwise }\end{cases}
$$

2. Output $\tau_{\mathcal{P}}=\left(\left\{b_{0}^{i}, \ldots, b_{r}^{i}\right\}_{i \in[\kappa]}\right)$.

SampCircuit: On input $x, \tau_{\mathcal{P}}=\left(\left\{b_{0}^{i}, \ldots, b_{r}^{i}\right\}_{i \in[\kappa]}\right), \tau_{\mathcal{V}}=(\boldsymbol{\gamma}, \hat{x}, \mathbf{C})$ the circuit $C$ performs the following:

1. Compute $\overline{\boldsymbol{y}} \leftarrow \mathbf{A} \boldsymbol{s}$.
2. Set $y_{1} \leftarrow \sum_{j \in[n]} \gamma[j] \cdot(\overline{\boldsymbol{y}}[j]-\boldsymbol{t}[j])$.
3. Set $\boldsymbol{u} \leftarrow \mathbf{C s}$.
4. $z_{1} \leftarrow \sum_{i=1}^{\kappa} \sum_{j=1}^{r+1} \ell_{(i-1)(r+1)+j}^{S_{D}}(\hat{x}) \cdot x_{j}^{i}$
5. $z_{2} \leftarrow \sum_{i=1}^{\kappa} \sum_{j=1}^{r+1} \ell_{(i-1)(r+1)+j}^{S_{B}}(\hat{x}) \cdot x_{j}^{i}+\sum_{j=1}^{(r+1) \kappa-1} \ell_{j+(r+1) \kappa}^{S_{B}}(\hat{x}) \cdot z_{j}$
6. $z_{3} \leftarrow z_{1}^{2}$
7. $y_{2} \leftarrow z_{3}-z_{2}$
8. For each $i \in[\kappa], j \in\{0, \ldots, r\}$ set

$$
h_{j}^{i} \leftarrow \begin{cases}x_{j+1}^{i} & \text { if } b_{j}^{i}=0 \\ 1-x_{j-1}^{i} & \text { otherwise }\end{cases}
$$

9. For $i \in[\kappa]$ set $y_{i+2} \leftarrow \boldsymbol{u}[i]+m \cdot \beta-\left(\sum_{j=0}^{r} h_{j}^{i} 2^{j}\right)$.
10. Output $\left(y_{1}, \ldots, y_{\kappa+2}\right)$.

We output the set

$$
Y=\left\{(0, \cdots, 0) \in \mathbb{F}_{q}^{\kappa+2}\right\} .
$$

Fig. 8: Sampling of a circuit for $R_{\mathrm{SIS}}^{m, n, q, 3 m \cdot \beta}$.

Proof. Correctness: Most of the correctness follows simply by linearity as in the other constructions, so we will focus on the bit-decomposition part.

The multiplication of $s$ with $\mathbf{C}$ trivially yields a bound $\|\boldsymbol{u}\|_{\infty} \leq m \cdot\|s\|_{\infty}=m \cdot \beta$. By shifting each $\boldsymbol{u}[i]$ with the constant $m \cdot \beta$ we will have that $\boldsymbol{u}[i]+m \cdot \beta \in\left[0,2^{r+1}-1\right]$. This can always be represented with $r+1$ bits as in the protocol.
$\alpha$-Soundness: The prover has three ways it can get away with using an invalid witness. The first is to use a input vector $s$ which is not a preimage of $\boldsymbol{t}$ under multiplication with $\mathbf{A}$. But as in previous constructions this will only succeed with probability $1 /|\mathbb{F}|$ (due to the random linear combination of outputs). A second option to cheat is to use $x_{j}^{i}$ that are not bits. Following the
same idea as in the proof of Theorem 4, this prover can succeed here with probability of at most $\frac{2(r+1) \kappa-1}{|\mathbb{F}|}$. The third way is to use an input vector with norm that is not in allowed range. We will show now that the success probability in this case is at most $2^{-\kappa}$. Observe that $\frac{2(r+1) \kappa-1}{|\mathbb{F}|}>\frac{1}{|\mathbb{F}|}$ and so the overall soundness error is as defined in the Theorem.

Thus, for the rest of the proof we assume that the committed values $x_{j}^{i}$ are bits. Assume that $\boldsymbol{s}$ as committed in the witness $\overline{\boldsymbol{w}}$ is such that $\|\boldsymbol{s}\|_{\infty} \geq 2 m \beta+4$. Then by Lemma 7 we have that $\operatorname{Pr}\left[\|\boldsymbol{u}\|_{\infty}<m \cdot \beta+2\right] \leq 2^{-\kappa}$. If $\|\boldsymbol{u}\|_{\infty} \geq m \cdot \beta+2$ then there exists an index $i$ such that either $\boldsymbol{u}[i] \in[-(q-1) / 2,-m \cdot \beta-2]$ or $\boldsymbol{u}[i] \in[m \cdot \beta+2,(q-1) / 2]$. In the first case we have $v_{i}=\boldsymbol{u}[i]+m \cdot \beta$ which results in $v_{i} \in[-(q-1) / 2+m \cdot \beta,-2]$. But no such $v_{i}$ can be represented as $v_{i}=\sum_{j=0}^{r} h_{j}^{i} 2^{j}$ using bits $h_{j}^{i}$ only. If on the other hand $\boldsymbol{u}[i] \in[m \cdot \beta+2,(q-1) / 2]$ then $v_{i} \in[2 m \cdot \beta+2,(q-1) / 2] \cup[-(q-1) / 2,-(q-1) / 2+m \beta-1]$. But the largest number which the sum $\sum_{j=0}^{r} h_{j}^{i} 2^{j}$ can express with $h_{j}^{i} \in\{0,1\}$ is $2^{r+1}-1=2 m \beta+1$ and the values from $[-(q-1) / 2,-(q-1) / 2+m \beta-1]$ are not expressible by the bound on $q$. As the prover succeeds with probability strictly larger than $2^{-\kappa}$ to represent all $v_{i}$ as bits, it must hold by Lemma 7 that $\|s\|_{\infty}<2 m \beta+4$ and the claim follows.

Simulatable: The set $Y$ is fixed for all instances. $\tau_{\mathcal{P}}$ consists of bits $b_{j}^{i}$ which are essentially the XOR of the bit decomposition of the secret $\boldsymbol{u}[i]+m \beta$ with uniformly random bits $x_{j}^{i}$ and thus perfectly simulatable.

The circuit we obtain has $m+\kappa(r+1)$ inputs, one square gate and $\kappa+2$ outputs. Then the total communication of this argument when using $\Pi_{\mathrm{sac}}^{\mathrm{samp}}$ is

$$
\mid \text { hash }\left|\cdot 2+|\mathrm{sd}| \cdot(2+M \log N)+|\operatorname{com}| \cdot M+\log _{2} q \cdot M(m+\kappa(r+2)+5) .\right.
$$

## 6 Evaluation and Experimental Results

We ran extensive experiments to measure the performance of our two protocols for the Binary-SIS problem. As setup we used Amazon C5.9xlarge instances using two servers with Intel Platinum 8000 series processors (Skylake-SP) which have clock speed up to $3.4 \mathrm{GHZ}, 36$ virtual cores per server (utilized based on the experiment setup) and 72 Gb RAM. The network bandwidth between the nodes is 10 Gpbs . For our implementation we used only the baseline construction for the BinarySIS problem presented in Section 5.1. Nevertheless, this includes the three general optimizations described in Section 3.4. Hash functions as well as commitments were implemented using SHA-256. Generation of pseudo-randomness from a seed was done using AES in counter-mode where the seed is the AES key. Thus, $\mid$ hash $|=|$ com $\mid=256$ bits and $|\mathrm{sd}|=128$ bits.

Binary-SIS Problem parameters. We used five sets of parameters for our experiments:

1. $\log _{2}|\mathbb{F}|=15, n=256$ and $m=1024$.
2. $\log _{2}|\mathbb{F}|=15, n=256$ and $m=4096$.
3. $\log _{2}|\mathbb{F}|=31, n=512$ and $m=2048$.
4. $\log _{2}|\mathbb{F}|=59, n=1024$ and $m=4096$.
5. $\log _{2}|\mathbb{F}|=61, n=1024$ and $m=4096$.

The first parameter set reflects SIS-based constructions that do not need any additional functionality. For example, they can be used to instantiate [KTX08] with a binary secret. The second

|  | Cut-and-Choose |  |  |  |  |  | Sacrificing |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| N | $\xi \leq 2^{-40}$ |  |  | $\xi \leq 2^{-80}$ |  |  | $\xi \leq 2^{-40}$ |  | $\xi \leq 2^{-80}$ |  |
|  | $M$ | $\tau$ | Argument size | M | $\tau$ | Argument Size | $M$ | Argument Size | $M$ | Argument Size |
| 2 | 75 | 34 | $31+0.123 \cdot \rho m$ | 145 | 63 | $61.1+0.246 \cdot \rho m$ | 40 | $26.2+0.16 \cdot \rho m$ | 80 | $51.8+0.32 \cdot \rho m$ |
| 4 | 55 | 32 | $22.4+0.069 \cdot \mathrm{\rho m}$ | 105 | 57 | $44.8+0.144 \cdot \rho m$ | 20 | $16+0.08 \cdot \mathrm{\rho m}$ | 40 | $31.3+0.16 \cdot \rho m$ |
| 8 | 55 | 38 | $20.7+0.051 \cdot \mathrm{\rho m}$ | 95 | 57 | $42+0.114 \cdot \rho m$ | 14 | $13.2+0.056 \cdot \mathrm{\rho m}$ | 27 | $24.8+0.108 \cdot \mathrm{\rho m}$ |
| 16 | 45 | 26 | $23.4+0.057 \cdot \mathrm{\rho m}$ | 95 | 63 | $41.5+0.096 \cdot \rho m$ | 10 | $10.9+0.04 \cdot \rho m$ | 20 | $21.2+0.08 \cdot \mathrm{\rho m}$ |
| 32 | 45 | 28 | $23.8+0.051 \cdot \rho m$ | 85 | 47 | $50.4+0.114 \cdot \rho m$ | 8 | $9.9+0.032 \cdot \rho m$ | 16 | $19.1+0.064 \cdot \mathrm{\rho m}$ |
| 64 | 45 | 28 | $26+0.051 \cdot \rho m$ | 85 | 49 | $53+0.108 \cdot \rho \mathrm{~m}$ | 7 | $9.6+0.028 \cdot \mathrm{\rho m}$ | 14 | $18.6+0.056 \cdot \mathrm{\rho m}$ |

Table 1: Summary of Parameters used in the experiments for each protocol and the argument size in Kbits for each set of parameters as a function of the SIS problem parameters $\rho=\log _{2}|\mathbb{F}|$ and $m$.
parameter set is then used to study the impact of using a much larger message in the commitment scheme, which also shows how the matrix size impacts the runtimes.

The third set would be a typical example for SIS-based constructions such as somewhat homomorphic commitments and allows to prove that a committed message is small. An example for an application would be the commitment scheme of $\left[\mathrm{BDL}^{+} 18\right]$. The last two sets are used for applications such as somewhat homomorphic encryption schemes like [BGV14].

Protocol parameters. We ran experiments for 40 and 80 bits of statistical security $\kappa$. For the parameter $N$, which is the number of parties in the underlying MPC protocol, we used 6 different values: $2,4,8,16,32$ and 64 . Then, given the desired level of security and the number of parties, we searched for the set of parameters in each of the two protocols, that minimizes the overall cost.

In $\Pi_{c \& c}$, there are two parameters to define: $M$ (number of pre-processing executions) and $\tau$ (number of pre-processing executions to open). To obtain these, we wrote a script that finds the minimal $M$ and $\tau$ such that $\xi(M, N, \tau) \leq 2^{-40}$ or $2^{-80}$. In $\Pi_{\mathrm{sac}}$, we observe that for our choices of $|\mathbb{F}|$ and $N$, it holds that $\frac{3 N+|\mathbb{F}|-3}{N \cdot|\mathbb{F}|} \approx \frac{1}{N}$ and so it suffices to choose the minimal $M$ such that $\xi(M, N) \approx \frac{1}{N^{M}} \leq 2^{-40}$ or $2^{-80}$.

We summarize the parameters used in our experiments in Table 1. In addition, for each set of parameters we give the size of the argument in Kbits as a formula of the SIS problem parameters $\rho=\log _{2}|\mathbb{F}|$ and $m$. Observe that as the number of parties N grows, the number of MPC instances in $\Pi_{\mathrm{sac}}$ becomes much smaller than the number required in $\Pi_{\mathrm{c} \& \mathrm{c}}$, which is translated to smaller proof size.

Running times. In Table 2 and Table 3 we present the running times (in Msec.) of the two protocols for 40 and 80 bits of security respectively. For each SIS problem set of parameters, we report only the best running times achieved together with the MPC protocol parameters which lead to the result. As the number of non-linear gates in this circuit is small, it is not surprising that both schemes achieve similar results. Observe that small numbers of parties in the MPC protocol lead to faster running times, in contrast to proof size which is getting smaller when the number of parties is increased. When compared with the data presented in Table 1 which shows that the sacrificing protocol has lower argument size and that argument size usually becomes smaller as the number of

| $\rho$ | $n$ | $m$ | Cut-and-Choose |  |  | Sacrificing |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  | $N$ | $M$ | $\tau$ | time | $N$ | $M$ | time |
| 15 | 256 | 1024 | 2 | 75 | 34 | 73.2 | 4 | 20 | 59.4 |
| 15 | 256 | 4096 | 2 | 75 | 34 | 295.8 | 4 | 20 | 252.6 |
| 31 | 512 | 2048 | 2 | 75 | 34 | 252.3 | 4 | 20 | 217.5 |
| 59 | 1024 | 4096 | 2 | 75 | 34 | 1010.4 | 2 | 40 | 1075.1 |
| 61 | 1024 | 4096 | 2 | 75 | 34 | 1204.6 | 2 | 40 | 1228.8 |

Table 2: Best running times in MSec for different sets of SIS problem parameters with 40-bit security.

| $\rho$ | $n$ | $m$ | Cut-and-Choose |  |  | Sacrificing |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  | $N$ | $M$ | $\tau$ | time | $N$ | $M$ | time |
| 15 | 1024 | 256 | 2 | 145 | 63 | 146 | 4 | 40 | 118.6 |
| 15 | 256 | 4096 | 2 | 145 | 63 | 589.2 | 4 | 40 | 507.7 |
| 31 | 512 | 2048 | 2 | 145 | 63 | 505.4 | 4 | 40 | 436.8 |
| 59 | 1024 | 4096 | 2 | 145 | 63 | 2000.4 | 2 | 80 | 2138.8 |
| 61 | 1024 | 4096 | 2 | 145 | 63 | 2396.8 | 2 | 80 | 2432.3 |

Table 3: Best running times in MSec for different sets of SIS problem parameters with 80-bit security.
parties grows (which is to be expected), this indicates that, in the current state, computation time is the bottleneck of the protocol.

It is worth noting that a major source of improvement we discovered was to postpone the modular reduction in the matrix multiplication to the end. That is, when the prover/verifier multiply a row in the matrix $\mathbf{A}$ with a vector of shares of $\boldsymbol{s}$ (which is eventually what the computed circuit does), it is highly beneficial to do the reduction modulo $q$ only at the end of the matrix multiplication. This simple optimization alone yields an improvement of approximately $33 \%$ in our results.

Using Multi-threads. The above results were obtained using a single thread. As computation time is the bottleneck, we examined what happens when working with multiple threads which seems to be a straightforward optimization. In Fig. 9 we present the improvement in the running time when utilizing more threads. This experiment was run for the most "toughest" instance of the SIS problem, with $\rho=\log |\mathbb{F}|=61, n=1024$ and $m=4096$ and with the MPC protocol parameters who yielded the best running time in Table 2 and Table 3. As can be seen, using many threads speeds-up the performance by more than $80 \%$. As a consequence, we obtain a lattice ZKAoK that runs in less than 0.5 seconds even for the of SIS instance with the largest parameters. This is orders of magnitude faster than any previous implementation for arithmetic circuits of the same size.

Faster Matrix Products \& Structured Lattices. In this work we solely focus on unstructured matrices $A$ for SIS. By micro-benchmarking the results, we observe that as the size of the matrix $A$ grows, the time spent on computing the matrix multiplication becomes dominant. In particular, for the large instances, matrix multiplication takes $>85 \%$ of the overall local computation time. As we


Fig. 9: Running time in Msec. as a function of the number of threads used, for the instance with the parameters: $\rho=61, n=1024, m=4096$ and $N=2$
use only textbook matrix multiplication, this leaves plenty of room for improvement. Furthermore, on the verifier side it is possible to batch the matrix multiplications together as only verification is needed. Another direction would be to use structured matrices i.e. structured lattices, which opens the door for FFT-like algorithms.

## 7 Comparison to Previous Works

We now describe how our protocol compares in terms of proof size with other state-of-the-art arguments of knowledge. For this comparison, we consider only our second "sacrificing"-based protocol. We stress that comparing only the proof size does not provide a complete picture, since communication is not necessarily the bottleneck (as can be observed from the experimental results in the previous section). As we focus on practicality, the comparison focuses on two instances of the SIS problem: the Binary-SIS problem and a more general instance where $\|s\|_{\infty} \leq 15$ (i.e., $\beta=15$ and so the secrets are of 5 -bit size). For each instance, we consider three set of parameters: 40-bit and 80 -bit soundness security for $\log _{2}(|\mathbb{F}|) \approx 32$ and $\mathbf{A} \in \mathbb{F}^{512 \times 2048}$ as well as 128 -bit soundness for $\log _{2}(|\mathbb{F}|) \approx 61$ and $\mathbf{A} \in \mathbb{F}^{1024 \times 4096}$. As in the previous subsection, for seed-size we use 128 bits, whereas length of commitments and outputs of hash function-calls will be 256 bits. Whenever necessary, we will use the same values for comparable parameters in other protocols. Since our focus here is on communication only, we use $N=16$ simulated parties in the underlying MPC protocol, and so we set $M=11,21$ and 33 to achieve 40,80 and 128 bit of soundness security.

| Protocol | $\log _{2}(\|\mathbb{F}\|)=32$, Binary | $\log _{2}(\|\mathbb{F}\|)=32, \beta=15$ | $\log _{2}(\|\mathbb{F}\|)=61$, Binary | $\log _{2}(\|\mathbb{F}\|)=61, \beta=15$ |
| :---: | :---: | :---: | :---: | :---: |
| Stern [Ste96] | 971 KB | 7285 KB | 3703 KB | 27775 KB |
| Ligero [AHIV17] | 1158 KB | 1174 KB | 2821 KB | 2860 KB |
| Ours, baseline | 357 KB | 2138 KB | 1359 KB | 8148 KB |
| Ours, amortized | 179 KB | 1069 KB | 680 KB | 4075 KB |

Table 4: Proof sizes for Binary-SIS and secrets of 5 -bit size, small constant slack, 40-bit soundness.

| Protocol | $\log _{2}(\|\mathbb{F}\|)=32$, Binary | $\log _{2}(\|\mathbb{F}\|)=32, \beta=15$ | $\log _{2}(\|\mathbb{F}\|)=61$, Binary | $\log _{2}(\|\mathbb{F}\|)=61, \beta=15$ |
| :---: | :---: | :---: | :---: | :---: |
| Stern [Ste96] | 1457 KB | 10928 KB | 5555 KB | 41663 KB |
| Ligero [AHIV17] | 2218 KB | 2245 KB | 6009 KB | 6085 KB |
| Ours, baseline | 682 KB | 4082 KB | 2595 KB | 15557 KB |
| Ours, amortized | 342 KB | 2042 KB | 1299 KB | 7780 KB |

Table 5: Proof sizes for Binary-SIS and secrets of 5 -bit size, small constant slack, 80 -bit soundness.

| Protocol | $\log _{2}(\|\mathbb{F}\|)=61$, Binary | $\log _{2}(\|\mathbb{F}\|)=61, \beta=15$ |
| :---: | :---: | :---: |
| Stern [Ste96] | 11851 KB | 88882 KB |
| Ligero [AHIV17] | 10173 KB | 10309 KB |
| Ours, baseline | 4077 KB | 24461 KB |
| Ours, amortized | 2041 KB | 12225 KB |

Table 6: Proof sizes for Binary-SIS and secrets of 5-bit size, small constant slack, 128-bit soundness.

Protocols with Small Constant Slack. We subsume all protocols that achieve constant slack here. These are either based on Stern-type arguments [LNSW13], direct applications of MPC-in-the-head [AHIV17] or IOP-based constructions [ $\left.\mathrm{BSCR}^{+} 18\right]$. Though STARKs [BSBHR18] fall into the third category, we do not consider those here as they are rather tailored to computations with looping components.

While [LNSW13] is a specific technique tailored to problems such as SIS, [AHIV17,BSCR ${ }^{+}$18] require an arithmetic circuit (similar to us) for the verification of the statement. Recall that in our system, all linear gates of these circuits are for free whereas in [AHIV17,BSCR ${ }^{+}$18] they necessarily contribute to the size of the proof. The circuits for the two different problems follow directly from Section 5.1: for the Binary-SIS proof we have $m \cdot(n+2)$ arithmetic gates with $m$ inputs and $m+n$ outputs, whereas the proof for the general SIS problem has $m \cdot(n+2 r)$ arithmetic gates, $m \cdot r$ inputs and $n+m \cdot r$ outputs (recall that $r$ is the smallest integer such that $\beta \leq 2^{r}-1$ ).

In Table 4 we present the proof size for 40 bits of soundness, whereas Table 5 and Table 6 deal with 80 bits and 128 bits, respectively. For our protocol, we present the proof size when using the baseline protocol (Section 5.1) and when using the 'amortizing bit tests' optimization from Section 5.3.

Observe that the tables do not include proof sizes for the recent Aurora protocol [ $\left.\mathrm{BSCR}^{+} 18\right]$, as the authors there did not provide any general expression for the proof size, but rather only experimental results for the binary field $\mathbb{F}_{2^{192}}$. However, even for this large field, their proof size is around 250 KB for a circuit with the same size as ours. Though [ $\left.\mathrm{BSCR}^{+} 18\right]$ 's current implementation

| Protocol | Slack | $\log _{2}(\|\mathbb{F}\|)=3240$ bits $\log _{2}(\|\mathbb{F}\|)=3280 \mathrm{bits}^{2} \log _{2}(\|\mathbb{F}\|)=61128 \mathrm{bits}$ |  |  |
| :---: | :---: | :---: | :---: | :---: |
| Sigma-protocol [Lyu12] | 288 m | 184 KB | 368 KB | 1241 KB |
| Ours, Approach 1 $(\kappa=8)$ | 400 m | 100 KB | 191 KB | 1079 KB |
| Ours, Approach 2 $(\kappa=8)$ | $<3 m$ | 96 KB | 183 KB | 1057 KB |
| Ours, Exact | 1 | 179 KB | 342 KB | 2041 KB |

Table 7: Proof sizes for Binary-SIS non-constant slack.

| Protocol | Slack | $\log _{2}(\|\mathbb{F}\|)=3240$ bits | $\log _{2}(\|\mathbb{F}\|)=3280 \operatorname{bits}^{2} \log _{2}(\|\mathbb{F}\|)=61128 \mathrm{bits}$ |  |
| :---: | :---: | :---: | :---: | :---: |
| Sigma-protocol [Lyu12] | 288 m | 223 KB | 447 KB | 1494 KB |
| Ours, Approach 1 $(\kappa=8)$ | 400 m | 100 KB | 191 KB | 1079 KB |
| Ours, Approach 2 $(\kappa=8)$ | $<3 \mathrm{~m}$ | 97 KB | 184 KB | 1061 KB |
| Ours, Exact | 1 | 1069 KB | 2042 KB | 12225 KB |

Table 8: Proof sizes for SIS with $\beta=15$ and non-constant slack.
is only for binary fields and they require to use field extensions (whereas we can work over the prime field itself), it is fair to assume that the proof size for smaller fields will be smaller than the one obtained by our protocols. Nevertheless, the prover running time according to their experiments is approximately 200 sec , and so is expected to be at least one order of magnitude bigger than the prover's running time in our protocols (the same applies to [AHIV17] whose prover's running time is even higher than of $\left[\mathrm{BSCR}^{+} 18\right]$ ). Thus, in applications where only the proof size matters, their protocol might be preferable, whereas in applications where the overall running time of the interactive protocol is more important, our argument system will outperform theirs.

Protocols with Non-constant Slack. Here, we compare with argument system from the signature scheme of [Lyu12].

We present the comparison in Tables 7 and 8 . The scheme of Lyubashevsky consists of 3 steps in each round $i$ : (i) $\mathcal{P}$ sends a value $\boldsymbol{a}_{i}=\mathbf{A} \boldsymbol{x}_{i}$ where $\boldsymbol{x}_{i}$ is sampled according to a discrete Gaussian distribution; (ii) $\mathcal{V}$ sends a challenge bit $e_{i}$; and (iii) $\mathcal{P}$ finishes by either sending $\boldsymbol{z}_{i}=\boldsymbol{x}_{i}+e_{i} \cdot \boldsymbol{s}$ or aborting. For the sake of comparison we consider as message complexity only the messages that are sent in the last of the three rounds, as the first message can be compressed into a commitment and then the opening be obtained from the third message. A further optimization, which we do not consider, is that the third message itself must only be sent whenever $e_{i}=1$. Whenever $e_{i}=0$, it would instead be feasible to just send a PRG seed which was used to generate the vector $\boldsymbol{x}_{i}$. But this vector is sampled according to a discrete Gaussian distribution and re-running such a sampler would be rather time-consuming. The vector $\boldsymbol{x}_{i}$ could alternatively be sampled from a bounded uniform distribution as in our case, but that would increase the slack proportionally. We could arguably also further decrease the size of our new arguments by increasing the number of parties $N$, which would also lead to higher computational cost on our side.

We compare the proof size of [Lyu12] with our baseline protocol and with the two solutions described in Section 5.4. We see that in particular the 2 nd protocol of Section 5.4 improves upon [Lyu12] for all three considered cases. This is particularly true in the cases where the gap between $\beta$ and $|\mathbb{F}|$ is small, as our proof size increases as $|\mathbb{F}|$ grows whereas the size of [Lyu12] stays the same.

At the same time, increasing $\beta$ seems not to substantially change the communication complexity of either of our two proofs, whereas it has a direct impact on [Lyu12].

Protocols not based on post-quantum assumptions. Recently, del Pino et al. [dPLS19] showed how to obtain a ZK argument for our problem setting. While they have a drastically smaller proof size (in the order of 1.5 KB ), we think that comparing our work to theirs would not be fair. First, the soundness of their construction relies on the Discrete-Log assumption and is therefore not post-quantum secure. Moreover, their computational efficiency relies on using structured lattices, which we do not need.

For the same reason, we excluded Hyrax [WTS $\left.{ }^{+} 18\right]$ and Sonic [MBKM19] from the comparison. As [dPLS19] these constructions rely on the Discrete Logarithm-assumption for their security. Older ZK-SNARKs such as [PHGR16,BSCTV14] would offer low argument size and verification time but in addition to large keys and a high prover runtime also rely on very strong assumptions. Similarly,the work of $\left[\mathrm{BCC}^{+} 16\right]$ is also in the DLog setting. Its lattice-based variant $\left[\mathrm{BBC}^{+} 18\right]$ is so far not implemented, may have large hidden constants and itself uses ZKAoKs for SIS as building blocks.

Conclusions. We conclude that in terms of achieving both small proof size and low prover's running time, our scheme is the most efficient compared to other proof systems which rely on similar assumptions. We remark again that if the goal is only to minimize communication, it is possible to further increase the number of parties in the underlying MPC scheme and reduce the proof size even more.

## Acknowledgments

We thank Roey Sefi and Assi Barak for their help with the implementation as well as Carmit Hazay, Yehuda Lindell and Avishay Yanay for helpful comments.

## References

AHIV17. Scott Ames, Carmit Hazay, Yuval Ishai, and Muthuramakrishnan Venkitasubramaniam. Ligero: Lightweight sublinear arguments without a trusted setup. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017.
$\mathrm{BBC}^{+}$18. Carsten Baum, Jonathan Bootle, Andrea Cerulli, Rafael del Pino, Jens Groth, and Vadim Lyubashevsky. Sub-linear lattice-based zero-knowledge arguments for arithmetic circuits. In Advances in Cryptology CRYPTO 2018. Springer International Publishing, 2018.
$\mathrm{BCC}^{+}$16. Jonathan Bootle, Andrea Cerulli, Pyrros Chaidos, Jens Groth, and Christophe Petit. Efficient zeroknowledge arguments for arithmetic circuits in the discrete log setting. In Marc Fischlin and JeanSébastien Coron, editors, Advances in Cryptology - EUROCRYPT 2016. Springer Berlin Heidelberg, 2016.

BD10. Rikke Bendlin and Ivan Damgård. Threshold decryption and zero-knowledge proofs for lattice-based cryptosystems. In Theory of Cryptography Conference. Springer, 2010.
$\mathrm{BDL}^{+}$18. Carsten Baum, Ivan Damgård, Vadim Lyubashevsky, Sabine Oechsner, and Chris Peikert. More efficient commitments from structured lattice assumptions. In Dario Catalano and Roberto De Prisco, editors, Security and Cryptography for Networks, Cham, 2018. Springer International Publishing.
BDLN16. Carsten Baum, Ivan Damgård, Kasper Green Larsen, and Michael Nielsen. How to prove knowledge of small secrets. In Annual Cryptology Conference. Springer, 2016.
Bea91. Donald Beaver. Efficient multiparty protocols using circuit randomization. In Annual International Cryptology Conference. Springer, 1991.

BGV14. Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (leveled) fully homomorphic encryption without bootstrapping. ACM Transactions on Computation Theory (TOCT), 6(3), 2014.
BHR12. Mihir Bellare, Viet Tung Hoang, and Phillip Rogaway. Foundations of garbled circuits. In Proceedings of the 2012 ACM conference on Computer and communications security. ACM, 2012.
BL17. Carsten Baum and Vadim Lyubashevsky. Simple amortized proofs of shortness for linear relations over polynomial rings, 2017. https://eprint.iacr.org/2017/759.
BSBHR18. Eli Ben-Sasson, Iddo Bentov, Yinon Horesh, and Michael Riabzev. Scalable, transparent, and postquantum secure computational integrity. Cryptology ePrint Archive, Report 2018/046, 2018. https: //eprint.iacr.org/2018/046.
$\mathrm{BSCR}^{+}$18. Eli Ben-Sasson, Alessandro Chiesa, Michael Riabzev, Nicholas Spooner, Madars Virza, and Nicholas P. Ward. Aurora: Transparent succinct arguments for r1cs. Cryptology ePrint Archive, Report 2018/828, 2018. https://eprint.iacr.org/2018/828.

BSCTV14. Eli Ben-Sasson, Alessandro Chiesa, Eran Tromer, and Madars Virza. Succinct non-interactive zero knowledge for a von neumann architecture. In USENIX Security Symposium, 2014.
dPLS19. Rafael del Pino, Vadim Lyubashevsky, and Gregor Seiler. Short discrete log proofs for fhe and ring-lwe ciphertexts. Cryptology ePrint Archive, Report 2019/057, 2019. https://eprint.iacr.org/2019/057, to appear at PKC 2019K.
DPSZ12. Ivan Damgård, Valerio Pastro, Nigel P. Smart, and Sarah Zakarias. Multiparty computation from somewhat homomorphic encryption. In Advances in Cryptology - CRYPTO 2012-32nd Annual Cryptology Conference, 2012.
FS86. Amos Fiat and Adi Shamir. How to prove yourself: Practical solutions to identification and signature problems. In Advances in Cryptology - CRYPTO '86, 1986.
GMO16. Irene Giacomelli, Jesper Madsen, and Claudio Orlandi. Zkboo: Faster zero-knowledge for boolean circuits. In USENIX Security Symposium, 2016.
GMR89. Shafi Goldwasser, Silvio Micali, and Charles Rackoff. The knowledge complexity of interactive proof systems. SIAM Journal on computing, 18(1), 1989.
IKOS07. Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Zero-knowledge from secure multiparty computation. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. ACM, 2007.

KKW18. Jonathan Katz, Vladimir Kolesnikov, and Xiao Wang. Improved non-interactive zero knowledge with applications to post-quantum signatures. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, 2018.
KTX08. Akinori Kawachi, Keisuke Tanaka, and Keita Xagawa. Concurrently secure identification schemes based on the worst-case hardness of lattice problems. In International Conference on the Theory and Application of Cryptology and Information Security. Springer, 2008.
LLNW17. Benoît Libert, San Ling, Khoa Nguyen, and Huaxiong Wang. Zero-knowledge arguments for lattice-based prfs and applications to e-cash. In Advances in Cryptology - ASIACRYPT 2017-23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, 2017.

LN17. Yehuda Lindell and Ariel Nof. A framework for constructing fast MPC over arithmetic circuits with malicious adversaries and an honest-majority. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, 2017.
LNSW13. San Ling, Khoa Nguyen, Damien Stehlé, and Huaxiong Wang. Improved zero-knowledge proofs of knowledge for the isis problem, and applications. In Public-Key Cryptography-PKC 2013. Springer, 2013.
Lyu09. Vadim Lyubashevsky. Fiat-shamir with aborts: Applications to lattice and factoring-based signatures. In Mitsuru Matsui, editor, Advances in Cryptology - ASIACRYPT 2009. Springer Berlin Heidelberg, 2009.
Lyu12. Vadim Lyubashevsky. Lattice signatures without trapdoors. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 2012.
MBKM19. Mary Maller, Sean Bowe, Markulf Kohlweiss, and Sarah Meiklejohn. Sonic: Zero-knowledge snarks from linear-size universal and updateable structured reference strings. Cryptology ePrint Archive, Report 2019/099, 2019. https://eprint.iacr.org/2019/099.
PHGR16. Bryan Parno, Jon Howell, Craig Gentry, and Mariana Raykova. Pinocchio: Nearly practical verifiable computation. Communications of the ACM, 59(2), 2016.
Ste96. Jacques Stern. A new paradigm for public key identification. IEEE Transactions on Information Theory, 42(6), 1996.
WTS $^{+}$18. Riad S. Wahby, Ioanna Tzialla, Abhi Shelat, Justin Thaler, and Michael Walfish. Doubly-efficient zksnarks without trusted setup. In 2018 IEEE Symposium on Security and Privacy, SP 2018, Proceedings, 2018.


[^0]:    * Supported by the BIU Center for Research in Applied Cryptography and Cyber Security in conjunction with the Israel National Cyber Bureau in the Prime Minister's Office.

