# Formal Verification of Side-channel Countermeasures via Elementary Circuit Transformations Jean-Sébastien Coron University of Luxembourg jean-sebastien.coron@uni.lu September 18, 2017 Abstract. We describe a technique to formally verify the security of masked implementations against side-channel attacks, based on elementary circuit transforms. We describe two complementary approaches: a generic approach for the formal verification of any circuit, but for small attack orders only, and a specialized approach for the verification of specific circuits, but at any order. We also show how to generate security proofs automatically, for simple circuits. We describe the implementation of CheckMasks, a formal verification tool for side-channel countermeasures, using the Common Lisp programming language. Using this tool, we show how to formally verify the security of the Rivain-Prouff countermeasure for AES. #### 1 Introduction The masking countermeasure. Masking is the most widely used countermeasure against side-channel attacks for block-ciphers and symmetric-key algorithms. In a first-order countermeasure, all intermediate variables x are masked into $x' = x \oplus r$ where r is a randomly generated value. For such countermeasure, it is usually straightforward to verify its security against first-order attacks; namely it suffices to check that all intermediate variables have the uniform distribution, or at least that their distribution is independent from the key; therefore an attacker processing the side-channel leakage of intermediate variables separately (as in a first-order attack) does not get useful information. However second-order attacks combining the leakage on x' and r can be mounted in practice (see for example [OMHT06]), so it makes sense to design masking algorithms resisting higher-order attacks. This is done by extending Boolean masking to n shares with $x = x_1 \oplus \cdots \oplus x_n$ ; in that case an implementation should be resistant against t-th order attacks, in which the adversary combines leakage information from at most t < n intermediate variables. **Security proofs.** In principle any countermeasure against high-order attacks should have a security proof, but such proof can be either missing, incomplete, or incorrect. In this paper we describe the construction of a tool, called CheckMasks, to formally verify the security of high-order masking schemes. The first step is to specify what it means for a masking countermeasure to have a security proof, *i.e.* what is the security model. Such formalization was initiated by Ishai, Sahai and Wagner in [ISW03]. In this model, the adversary can probe at most t wires in the circuit, but he should not learn anything about the secret key. The approach for proving security is based on simulation: one must show that any set of t wires probed by the adversary can be perfectly simulated without the knowledge of the secret-key. This shows that the t probes do not bring any useful information to the attacker, since he could run this simulation by himself. More precisely, the technique consists in showing that any set of t probes can be perfectly simulated by the knowledge of only a proper subset of the input shares $x_i$ . At the beginning of the algorithm one then uses a pre-sharing of x into n shares $x_i$ that cannot be probed by the adversary, when x is part of the secret-key; therefore any subset of at most n-1 of the input shares $x_i$ are uniformly and independently distributed. This implies that the simulation of the probed variables can be performed without knowing the secret-key. The main result in [ISW03] is to show that any circuit C can be transformed into a new circuit C' of size $\mathcal{O}(t^2 \cdot |C|)$ that is resistant against an adversary probing at most t wires in the circuit. The construction is based on secret-sharing every variable x into n shares with $x = x_1 \oplus \cdots \oplus x_n$ , and processing the shares in a way that prevents a t-limited adversary from leaning any information about the initial variable x, using $n \geq 2t + 1$ shares. Formal verification of masking. The formal verification of the masking countermeasure was initiated by Barthe et al. in [BBD+15]. The authors describe an automated method to prove the security of masked implementation against t-th order attacks, for small values of t (in practice, t < 5). The method only works for small values of t because the number of possible t-uples of intermediate variables to consider grows exponentially with t. To formally prove the security of a masking algorithm, the authors describe an algorithm to construct a bijection between the observations of the adversary (corresponding to a t-uple of intermediate variables) and a distribution that is syntactically independent from the secret inputs; this implies that the adversary learns nothing from this particular t-uple of intermediate variables. All possible t-uples of intermediates variables are then examined by exhaustive search. The authors also describe a divide-and-conquer approach to reduce the complexity of the formal verification (but this remains exponential in the attack order t). The approach initiated by the authors enables to obtain a formal verification of various masked implementations, up to second order masked implementation of AES, and up to 5-th order for the masked Rivain-Prouff multiplication [RP10]. In particular, the authors were able to rediscover some known attacks and discover new ways of attacking already broken schemes. Their approach is implemented in the framework of EasyCrypt [BDG<sup>+</sup>14], and relies on its internal representations of programs and expressions. The main drawback of the previous approach is that it can only work for small orders t, since the running time is exponential in t. To overcome this problem, in a follow-up work [BBD $^+16$ ], Barthe $et\ al.$ studied the composition property of masked algorithms. In particular, the authors introduce the notion of $strong\ simulatability$ , a stronger property which requires that the number of input shares necessary to simulate the observations of the adversary in a given gadget is independent from the number of observations made on output wires. This ensures some separation between the input and the output wires: no matter how many output wires must be simulated (to ensure the composition of gadgets), the number of input wires that must be known to perform the simulation only depends on the number of internal probes within the gadget. The paper [BBD<sup>+</sup>16] has a number of important contributions that we summarize below. Firstly, the authors introduce the t-NI and t-SNI definitions. The t-NI security notion corresponds to the original security definition in the ISW probing model [ISW03]; it requires that any t probes of the gadget circuit can be simulated from at most t of its input shares. The stronger t-SNI notion corresponds to the strong simulatability property mentioned above, in which the number of input shares required for the simulation is upper bounded by the number of probes t in the circuit, and is independent from the number of output variables $|\mathcal{O}|$ that must be simulated (as long as the condition $t + |\mathcal{O}| < n$ is satisfied). We recall these definitions in Section 2, as they are fundamental in our paper. The authors show that the t-SNI definition allows for securely composing masked algorithms; i.e. for a construction involving many gadgets, one can prove that the full construction is t-SNI secure, based on the t-SNI security of its components. The advantages are twofold: firstly the proof becomes modular and much easier to describe. Secondly as opposed to [ISW03] the masking order does not need to be doubled throughout the circuit, as one can work with $n \ge t+1$ shares, instead of $n \ge 2t+1$ shares. Since most gadgets have complexity $\mathcal{O}(n^2)$ , this usually gives a factor 4 improvement in efficiency. In [BBD<sup>+</sup>16], the authors prove the t-SNI property of several useful gadgets: the multiplication of Rivain-Prouff [RP10], the mask refreshing based on the same multiplication algorithm, and the multiplication between linearly dependent inputs from [CPRR13]. Moreover, in [BBD<sup>+</sup>16] the authors also machine-checked the multiplication of Rivain-Prouff and the multiplication-based mask refreshing in the EasyCrypt framework [BDG<sup>+</sup>14]. The main point is that their machine verification works for any order, whereas in [BBD<sup>+</sup>15] the formal verification could only be performed at small orders t, since the number of t-uples to consider grows exponentially with t. However, the approach seems difficult to understand (at least for a non-expert in formal methods), and when reading [BBD<sup>+</sup>16] it is far from obvious how the automated verification of the countermeasure can be implemented concretely; this seems to require a deep knowledge of the EasyCrypt framework. Finally, the authors built an automated approach for verifying that an algorithm constructed by composing provably secure gadgets is itself secure. They also implemented an algorithm for transforming an input program P into a program P' secure at order t; their algorithm automatically inserts mask refreshing gadgets whenever required. Our contributions. Our main goal in this paper is to achieve essentially the same formal verification results as in [BBD<sup>+</sup>15] and [BBD<sup>+</sup>16], but in a much simpler way. We describe two complementary approaches: a generic approach for the formal verification of any circuit, but for small attack orders only (as in [BBD<sup>+</sup>15]), and a specialized approach for the verification of specific circuits, but at any order (as in [BBD<sup>+</sup>16]). We describe the implementation of CheckMasks, a formal verification tool for side-channel countermeasures, using the Common Lisp programming language. Using this tool, we show how to formally verify the security of the Rivain-Prouff countermeasure for AES. For the generic verification at small orders, our approach is essentially the same as in [BBD<sup>+</sup>15], except that we use the Common Lisp programming language instead of the C language. In principle, this enables to get a much shorter implementation, because Common Lisp is very well suited to formal manipulations. In particular, we are able to formally verify the security of the Rivain-Prouff multiplication [RP10] with very few lines of code. Our running times for formal verification are similar to those in [BBD<sup>+</sup>15]. For the verification of specific gadgets at any order, our technique is quite different from $[BBD^+16]$ and consists in applying elementary transforms to the circuit, until the t-NI or t-SNI properties become straightforward to verify. We show that for a set of well-chosen elementary transforms, the formal verification time becomes polynomial in t (instead of exponential with the generic approach). This implies that there is no gap between the running time of the formal verification and the practical running time of the countermeasure. In particular, we provide a formally verified proof of t-SNI property of the multiplication algorithm in the Rivain-Prouff countermeasure, and of the mask refreshing based on the same multiplication algorithm; in both cases the running time of the formal verification is polynomial in the number of shares n. Finally, we show how to get the best of both worlds, at least for simple circuits. Namely we show how to automatically apply the circuit transforms that lead to a polynomial time verification, based on a limited set of generic rules. More precisely, we identify a set of three simple rules that enable to automatically prove the t-SNI property of the multiplication based mask refreshing [BBD<sup>+</sup>16], and also two security properties of mask refreshing considered in [Cor17a]. **Source Code.** The source code of our CheckMasks verification tool is publicly available at [Cor17b], under the GPL v2.0 license. # 2 Security Properties In this section we recall the t-NI and t-SNI security definitions from [BBD<sup>+</sup>16]. For simplicity we only provide the definitions for a simple gadget taking as input a single variable x (given by n shares $x_i$ ) and outputting a single variable y (given by n shares $y_i$ ). Given a vector of n shares $(x_i)_{1 \le i \le n}$ , we denote by $x_{|I|} := (x_i)_{i \in I}$ the sub-vector of shares $x_i$ with $i \in I$ . **Definition 1** (t-NI security). Let G be a gadget taking as input $(x_i)_{1 \leq i \leq n}$ and outputting the vector $(y_i)_{1 \leq i \leq n}$ . The gadget G is said t-NI secure if for any set of t intermediate variables, there exists a subset I of input indices with $|I| \leq t$ , such that the t intermediate variables can be perfectly simulated from $x_{|I|}$ . **Definition 2** (t-SNI security). Let G be a gadget taking as input $(x_i)_{1 \leq i \leq n}$ and outputting $(y_i)_{1 \leq i \leq n}$ . The gadget G is said t-SNI secure if for any set of t intermediate variables and any subset $\mathcal{O}$ of output indices such that $t + |\mathcal{O}| < n$ , there exists a subset I of input indices with $|I| \leq t$ , such that the t intermediate variables and the output variables $y_{|\mathcal{O}|}$ can be perfectly simulated from $x_{|I|}$ . The t-NI security notion corresponds to the original security definition in the ISW probing model; based on the ISW multiplication gadget, it allows to prove the security of a transformed circuit with $n \geq 2t+1$ shares. The stronger t-SNI notion allows to prove the security with $n \geq t+1$ shares only [BBD<sup>+</sup>16]. The difference between the two notions is as follows: in the stronger t-SNI notion, the size of the input shares subset I can only depend on the number of internal probes t and is independent of the number of output variables $|\mathcal{O}|$ that must be simulated (as long as the condition $t + |\mathcal{O}| < n$ is satisfied). The t-SNI security notion is very convenient for proving the security of complex constructions, as one can prove that the t-SNI security of a full construction based on the t-SNI security of its components. In this paper, for simplicity we always work in a finite field of characteristic 2, and we use the $\oplus$ and + operators indistinctly; our techniques could be easily adapted to any finite field. ### 3 Formal Verification of Generic Circuits for Small Order In this section, we show that the t-NI and t-SNI properties can be easily verified formally for any circuit, using a generic approach. As in [BBD<sup>+</sup>15] the complexity of the formal verification is exponential in the number of shares n, so this can only work for small n. In Section 4 we will show how to formally verify the above properties in time polynomial in n, but for specific circuits. # 3.1 The RefreshMasks Algorithm To illustrate our approach we consider the RefreshMasks algorithm below from [RP10]; see Figure 1 for an illustration. The RefreshMasks algorithm was also used is the randomized table countermeasure from [Cor14]. We first recall a straightforward property of the RefreshMasks algorithm: when the intermediate variables of the algorithm are not probed, any subset of n-1 output shares $y_i$ of RefreshMasks is uniformly and independently distributed. We first provide a pen-and-paper proof; we then explain in the next section how this property can be formally verified in our tool. **Lemma 1.** Let $(y_i)_{1 \leq i \leq n}$ be the output of RefreshMasks. Any subset of n-1 output shares $y_i$ is uniformly and independently distributed. ### Algorithm 1 RefreshMasks ``` Input: x_1, \ldots, x_n, where x_i \in \{0, 1\}^k Output: y_1, \ldots, y_n such that y_1 \oplus \cdots \oplus y_n = x_1 \oplus \cdots \oplus x_n 1: y_n \leftarrow x_n 2: for i = 1 to n - 1 do 3: r_i \leftarrow \{0, 1\}^k 4: y_i \leftarrow x_i \oplus r_i 5: y_n \leftarrow y_n \oplus r_i 6: end for 7: return y_1, \ldots, y_n ``` Fig. 1. The RefreshMasks algorithm, with the randoms $r_i$ accumulated on the last column. *Proof.* Let $S \subsetneq [1, n]$ be the corresponding subset. We distinguish two cases. If $n \notin S$ , we have $y_i = x_i \oplus r_i$ for all $i \in S$ , and therefore those $y_i$ 's are uniformly and independently distributed. If $n \in S$ , let $i^* \notin S$ . We have $y_i = x_i \oplus r_i$ for all $i \in S \setminus \{n\}$ . Moreover by definition: $$y_n = \left(x_n \oplus \bigoplus_{i=1, i \neq i^*}^{n-1} r_i\right) \oplus r_{i^*}$$ where $r_{i^*}$ is not used in another $y_i$ for $i \in S$ . Therefore the n-1 output $y_i$ 's are uniformly and independently distributed. #### 3.2 Formal Verification of Circuits We represent a circuit with nested lists, using the prefix notation. Consider the circuit taking as input x and y and outputting x + y; we represent it as (+ X Y). Similarly the circuit computing $x \cdot y$ is represented as (\* X Y). To represent more complex circuits the lists are recursively nested. For example, to represent the circuit $x + y \cdot z$ , we write (+ X (\* Y Z)). If a circuit has many outputs, we represent the list of outputs without any prefix operator; for example, the circuit outputting $(x + y, x \cdot y)$ can be represented as ((+ X Y) (\* X Y)). It is easy to write a program in Common Lisp that generates the circuit corresponding to RefreshMasks; we refer to [Cor17b] for the source code. For example, we obtain for n=3 input shares: ``` > (RefreshMasks '(x1 x2 x3)) ((+ R1 X1) (+ R2 X2) (+ R2 (+ R1 X3))) ``` Note that the above RefreshMasks function in Common Lisp takes as input a list of n shares (here n=3) and outputs a list of n shares; therefore it can be easily composed with other such Common Lisp functions to create a more complex circuit. We now show how Lemma 1 can be formally verified. Consider for example the two output variables (+ R1 X1)) and (+ R2 (+ R1 X3)) from above. We would like to show that these two variables are uniformly and independently distributed. Since the random R2 is used only once in those two outputs, it can play the role of a one-time pad, and we can perform the following substitution in the second output: $$(+ R2 (+ R1 X3)) \longrightarrow R2$$ Namely, since R2 is used only once, the distribution of (+ R2 (+ R1 X3)) is the same as the distribution of R2; therefore the knowledge of X3 is not needed to perform the simulation. Starting with the above list of two output variables, we can perform the following sequence of elementary substitutions: $$((+ R1 X1) (+ R2 (+ R1 X3))) \longrightarrow ((+ R1 X1) R2) \longrightarrow (R1 R2)$$ which shows that neither X1 nor X3 is required for the simulation of the two output variables; moreover since we have obtained two distinct randoms (R1 R2) at the end, the two probes are uniformly and independently distributed. These transforms on lists are easy to implement in Common Lisp. Namely it suffices to perform a tree search to count the number of times a given random R is used. If a random R is used only once, we can then perform the substitution: $$(+ R X) \longrightarrow R$$ (1) and such substitution can be recursively applied until the rule cannot be applied anymore. To formally verify Lemma 1, it suffices to consider all possible subsets of n-1 output shares $y_i$ among n, and check that for every subset, we obtain after a series of such elementary substitutions a list of n-1 distinct randoms. Since there are n such subsets, in this particular case the formal verification is polynomial time. We obtain for example for n=3: ``` > (Check-RefreshMasks-Uni 3) Input: (X0 X1 X2) Output: ((+ R1 X0) (+ R2 X1) (+ R2 (+ R1 X2))) Case 0: ((+ R2 X1) (+ R2 (+ R1 X2))) => ((+ R2 X1) (+ R2 R1)) => ((+ R2 X1) R1) => (R2 R1) Case 1: ((+ R1 X0) (+ R2 (+ R1 X2))) => ((+ R1 X0) R2) => (R1 R2) Case 2: ((+ R1 X0) (+ R2 X1)) => ((+ R1 X0) R2) => (R1 R2) ``` The above transcript shows that Lemma 1 is formally verified for n=3, as in all possible cases we obtain a list of distinct randoms after a sequence of elementary substitutions; see [Cor17b] for the source code. #### 3.3 Security properties of RefreshMasks We first recall another straightforward property of RefreshMasks, namely that it achieves the t-NI property. **Lemma 2** (t-NI of RefreshMasks). Let $(x_i)_{1 \leq i \leq n}$ be the input of RefreshMasks and let $(y_i)_{1 \leq i \leq n}$ be the output. For any set of t intermediate variables, there exists a subset I of input indices such that the t intermediate variables can be perfectly simulated from $x_{|I|}$ , with $|I| \leq t$ . *Proof.* The set I is constructed as follows. If for some $1 \le i \le n-1$ , any of the variables $x_i$ , $r_i$ or $y_i$ is probed, we add i to I. If $x_n$ or $y_n$ or any intermediate variable $y_{n,j}$ is probed, we add n to I. Since we add at most one index to I per probe, we must have $|I| \le t$ . The simulation of the probed variable is straightforward. All the randoms $r_i$ for $1 \le i \le n-1$ can be simulated as in the real algorithm, by generating a random element from $\{0,1\}^k$ . If $y_i$ is probed, then we must have $i \in I$ , so it can be perfectly simulated from $y_i = x_i \oplus r_i$ , from the knowledge of $x_i$ . Similarly, if any intermediate variable $y_{n,j}$ is probed, then we must have $n \in I$ , so it can be perfectly simulated from $x_n$ . Therefore all probes can be perfectly simulated from $x_{II}$ . While the RefreshMasks algorithm achieves the t-NI property, it is easy to see that it does not achieve the stronger t-SNI property, as already observed in [BBD<sup>+</sup>16]. Namely one can probe the output $y_1 = r_1 \oplus x_1$ and the internal variable $y_{n,1} = r_1 \oplus x_n$ ; see Fig. 1. This gives $y_1 \oplus y_{n,1} = x_1 \oplus x_n$ and therefore the knowledge of both inputs $x_1$ and $x_n$ is required for the simulation, while only t=1 internal variables has been probed (and therefore, at most one input $x_i$ can be used for the simulation to achieve the t-SNI property). More generally, using our CheckMasks formal tool, we can obtain the list of all (n-1)-uples of probes that contradict the t-SNI property of RefreshMasks with n input shares. For example, we obtain for n=4: ``` > (Check-Refreshmasks-Non-SNI 4 : all 't) Input: (X1 X2 X3 X4) Output: ((+ R1 X1) (+ R2 X2) (+ R3 X3) (+ R3 (+ R2 (+ R1 X4)))) ((+ R1 X1) X2 (+ R1 X4)) ((+ R1 X1) (+ R2 X2) (+ R1 X4)) ((+ R1 X1) (+ R2 X2) (+ R2 (+ R1 X4))) ((+ R1 X1) X3 (+ R1 X4)) ((+ R1 X1) (+ R3 X3) (+ R1 X4)) ((+ R1 X1) (+ R3 X3) (+ R1 X4)) ((+ R1 X1) (+ R1 X4) (+ R3 (+ R2 (+ R1 X4)))) ``` Consider for example the first 3-uple of probes ((+ R1 X1) X2 (+ R1 X4)). We see that the substitution rule (1) does not apply as the random R1 occurs twice. Therefore the simulation of this 3-uple requires the knowledge of the 3 inputs $x_1$ , $x_2$ and $x_4$ ; this contradicts the t-SNI property, as only t = 2 intermediate variables $x_2$ and $y_{4,1} = x_4 \oplus r_1$ are probed, since $y_1 = x_1 \oplus r_1$ is an output variable. Consider also the second 3-uple ((+ R1 X1) (+ R2 X2) (+ R1 X4)). Since R2 occurs only once, we can perform the substitution: $$((+ R1 X1) (+ R2 X2) (+ R1 X4)) \Rightarrow ((+ R1 X1) R2 (+ R1 X4))$$ However, since R1 appears twice we cannot perform any further substitution, and the knowledge of both $x_1$ and $x_4$ is required to simulate the 3-uple; this again contradicts the t-SNI property, as only t=1 intermediate variable $y_{4,1}=x_4\oplus r_1$ has been probed and the other 2 variables are output variables. ### 3.4 The FullRefresh Algorithm We recall below an improved mask refreshing algorithm that satisfies the t-SNI property, as opposed to the previous RefreshMasks, as shown in [BBD<sup>+</sup>16]. The algorithm is based on the masked multiplication from [ISW03] and was already used in [ISW03] and [DDF14]. Note that the algorithm has complexity $\mathcal{O}(n^2)$ instead of $\mathcal{O}(n)$ . **Lemma 3** (t-SNI of FullRefresh [BBD<sup>+</sup>16]). Let $(x_i)_{1 \le i \le n}$ be the input shares of the FullRefresh operation, and let $(y_i)_{1 \le i \le n}$ be the output shares. For any set of t intermediate variables and any subset $\mathcal{O}$ of output shares such that $t + |\mathcal{O}| < n$ , there exists a subset I of indices with $|I| \le t$ , such that the t intermediate variables as well as the output shares $y_{|\mathcal{O}|}$ can be perfectly simulated from $x_{|I|}$ . ### Algorithm 2 FullRefresh ``` Input: x_1, \ldots, x_n Output: y_1, \ldots, y_n such that \bigoplus_{i=1}^n y_i = \bigoplus_{i=1}^n x_i 1: for i = 1 to n do y_i \leftarrow x_i 2: for i = 1 to n do \mathbf{for}\ j=i+1\ \mathrm{to}\ n\ \mathbf{do} 3: r \leftarrow \{0,1\}^k \triangleright Referred by r_{i,j} 4: \triangleright Referred by y_{i,j} 5: y_i \leftarrow y_i \oplus r 6: \triangleright Referred by y_{j,i} y_j \leftarrow y_j \oplus r 7: end for 8: end for 9: return y_1, \ldots, y_n ``` Formal Verification of FullRefresh. In the following, we describe the formal verification of Lemma 3 using our CheckMasks tool. As previously we first implement the FullRefresh algorithm in Common Lisp; for example, we get the following output for n = 3 shares: ``` > (FullRefresh '(x1 x2 x3)) ((+ R2 (+ R1 X1)) (+ R3 (+ R1 X2)) (+ R3 (+ R2 X3))) ``` Using our CheckMasks tool, the t-SNI property in Lemma 3 can be easily verified for small values of n. Namely it suffices to compute the list of all (n-1)-uples of intermediate variables (including the outputs $y_i$ ) and check that every such (n-1)-uple can be perfectly simulated from the knowledge of at most t inputs $x_i$ , where t is the number of non-output variables in the (n-1)-uple. For example, considering the two variables (+ R2 (+ R1 X1)) and (+ R1 X2) in the circuit above for n=3, since (+ R2 (+ R1 X1)) is an output variable, the simulation must be performed using at most a single input $x_i$ . We obtain using elementary substitutions: $$((+ R2 (+ R1 X1)) (+ R1 X2)) => (R2 (+ R1 X2)) => (R2 R1)$$ and therefore no input $x_i$ is actually needed to simulate those two variables. Note that the running time to consider all possible (n-1)-uples of intermediate variables is exponential in n. We summarize in Table 1 the running time of the formal verification of FullRefresh, up to n=6. Although we are only able to verify Lemma 3 for small values of n, this still provides some confidence in the correctness of Lemma 3 for any n. In Section 4.3 we will show how to formally verify Lemma 3 in time polynomial in n, so that the formal verification can be performed for any number of shares n used in practice. | n | #variables | #tuples | Security | Time | |---|------------|-----------|----------|--------------------| | 3 | 12 | 66 | ✓ | ε | | 4 | 22 | 1,540 | ✓ | 0.02 s | | 5 | 35 | 52,360 | ✓ | $0.6 \mathrm{\ s}$ | | 6 | 51 | 2,349,060 | ✓ | 46 s | **Table 1.** Formal verification of the t-SNI property of FullRefresh, for small values of n. #### 3.5 Other Properties of RefreshMasks We now return to the RefreshMasks algorithm (Alg. 1) and consider a non-trivial property of RefreshMasks that was used in the Boolean to arithmetic conversion algorithm from [Cor17a]. The property is the following: if the output $y_n$ is among the t probed variables, then we can simulate those t probed variables with t-1 input shares only, instead of t as in Lemma 2. This property was crucial for obtaining a provably secure Boolean to arithmetic conversion algorithm in [Cor17a]. **Lemma 4** (RefreshMasks [Cor17a]). Let $x_1, \ldots, x_n$ be the input of a RefreshMasks where the randoms are accumulated on $x_n$ , and let $y_1, \ldots, y_n$ be the output. Let t be the number of probed variables, with t < n. If $y_n$ is among the probed variables, then there exists a subset I such that all probed variables can be perfectly simulated from $x_{|I|}$ , with $|I| \le t - 1$ . Remark 1. The lemma does not hold for other output variables. For example the adversary can probe both $y_1 = x_1 \oplus r_1$ and $y_{n,1} = x_n \oplus r_1$ . Since $y_1 \oplus y_{n,1} = x_1 \oplus x_n$ , both $x_1$ and $x_n$ are required for the simulation, which contradicts the bound $|I| \leq t - 1$ . Using our CheckMasks formal tool, Lemma 4 can be easily verified for small values of n. Namely we can check that all t-uples of probes containing $y_n$ require at most t-1 inputs $x_i$ to be simulated. We first claim that it is sufficient to check this property for t=n-1 only, instead of all $1 \le t \le n-1$ . Namely, assume that the property is not satisfied for some t < n-1; then there exists a set of t probes which can only be simulated by a subset I of inputs with $|I| \ge t$ . If |I| = n, then this also holds for some superset of n-1 probes. If $|I| \le n-1$ , then we can complement the set of t probes with n-1-t additional probes, among which n-1-|I| are directly on some input shares $x_i$ for $i \notin I$ . We obtain a set of t' = n-1 probes which can only be simulated by a subset I' of the inputs, with |I'| = n-1. In both cases this would contradict Lemma 4 for t = n-1. ``` > (Check-RefreshMasks-L2 4) Input: (X0 X1 X2 X3) Output: ((+ R1 X0) (+ R2 X1) (+ R3 X2) (+ R3 (+ R2 (+ R1 X3)))) (X0 X1 X2) (X0 X1 X3) (X0 X2 X3) ((+ R1 X0) X1 (+ R1 X3)) ((+ R1 X0) (+ R2 X1) (+ R2 (+ R1 X3))) ((+ R1 X0) X2 (+ R1 X3)) (X1 X2 X3) ``` **Fig. 2.** Formal verification of Lemma 4 for n=4. We compute the list of 3-uples of probes whose simulation require the knowledge of at least 3 inputs; none of these 3-uples contains the last output (+ R3 (+ R2 (+ R1 X3))) of the circuit. To formally verify Lemma 4, our CheckMasks tool computes in Figure 2 the list of t-uples that require at least t inputs $x_i$ to be simulated, for n=4 shares, with t=n-1=3. We see that as required none of these t-uples include the output $y_n$ ; therefore Lemma 4 is formally verified for n=4. Note that since the number of intermediate variables in RefreshMasks is 4n-3 and we must consider all possible subsets of n-1 variables, the formal verification of Lemma 4 takes $\binom{4n-3}{n-1} \simeq 2^{3\cdot 2n}$ time and is therefore exponential in n. We summarize the observed running times in Table 2, up to n=8. In Section 4.4 we will show how to formally verify the correctness of Lemma 4 for any value of n. We also verify Lemma 5 from [Cor17a]. We consider the RefreshMasks algorithm taking as input n+1 shares (instead of n), but we fix $x_{n+1}=0$ . In that case, any t probes in the circuit can be simulated from t-1 input shares (instead of t), except in the trivial case of the adversary probing the input $x_i$ 's only. **Lemma 5** (RefreshMasks [Cor17a]). Let $x_1, \ldots, x_n$ be n inputs shares, and let $x_{n+1} = 0$ . Consider the circuit $y_1, \ldots, y_{n+1} \leftarrow \text{RefreshMasks}_{n+1}(x_1, \ldots, x_n, x_{n+1})$ , where the randoms are | n | #variables | #tuples | Security | Time | |---|------------|-----------|----------|-------| | 3 | 9 | 36 | ✓ | ε | | 4 | 13 | 286 | ✓ | ε | | 5 | 17 | 2,380 | ✓ | ε | | 6 | 21 | 20,349 | ✓ | 0.1 s | | 7 | 25 | 177,100 | ✓ | 0.9 s | | 8 | 29 | 1,560,780 | ✓ | 10 s | **Table 2.** Formal verification of Lemma 4, for small values of n. accumulated on $x_{n+1}$ . Let t be the number of probed variables. There exists a subset I such that all probed variables can be perfectly simulated from $x_{|I|}$ , with $|I| \le t - 1$ , except if only the input $x_i$ 's are probed. From the previous reasoning, we only have to verify Lemma 5 for t = n; we summarize the timings in Table 3. In Appendix A, we also describe a formal verification of some other properties of RefreshMasks, namely lemmas 7 and 8 from [Cor17a], again for small values of n. | n | #variables | #tuples | Security | Time | |---|------------|-----------|----------|--------------------| | 3 | 12 | 220 | ✓ | ε | | 4 | 16 | 1,820 | ✓ | ε | | 5 | 20 | 15,504 | ✓ | $0.3 \mathrm{\ s}$ | | 6 | 24 | 134,596 | ✓ | 2.9 s | | 7 | 28 | 1,184,040 | ✓ | 34 s | **Table 3.** Formal verification of Lemma 5, for small values of n. #### 3.6 The Rivain-Prouff Countermeasure The Rivain-Prouff countermeasure for the AES block-cipher is based on the SecMult algorithm below [RP10]; it is an extension over $\mathbb{F}_{2^k}$ of the masked AND gate from [ISW03]. It enables to securely compute a n-sharing of the product $c = a \cdot b$ over $\mathbb{F}_{2^k}$ , from an n-sharing of a and b. # Algorithm 3 SecMult ``` Require: shares a_i satisfying \bigoplus_{i=1}^n a_i = a, shares b_i satisfying \bigoplus_{i=1}^n b_i = b Ensure: shares c_i satisfying \bigoplus_{i=1}^n c_i = a \cdot b 1: for i = 1 to n do c_i \leftarrow a_i \cdot b_i 3: end for 4: for i = 1 to n do for j = i + 1 to n do \triangleright referred by r_{i,j} 6: r \leftarrow \mathbb{F}_{2^k} \triangleright referred by c_{i,j} 7: c_i \leftarrow c_i \oplus r r \leftarrow (a_i \cdot b_j + r) + a_j \cdot b_i \triangleright referred by r_{j,i} 8: \triangleright referred by c_{j,i} 9: c_j \leftarrow c_j \oplus r 10: end for 11: end for 12: return (c_1, \ldots, c_n) ``` It was shown in [BBD<sup>+</sup>16] that the SecMult algorithm is t-SNI secure for any t < n; see also [CGPZ16] for a slightly more detailed security proof. **Lemma 6** (t-SNI of SecMult [BBD<sup>+</sup>16]). Let $(a_i)_{1 \leq i \leq n}$ and $(b_i)_{1 \leq i \leq n}$ be the input shares of the SecMult operation, and let $(c_i)_{1 \leq i < n}$ be the output shares. For any set of t intermediate variables and any subset $\mathcal{O}$ of output shares such that $t + \mathcal{O} < n$ , there exist two subsets I and J of indices with $|I| \leq t$ and $|J| \leq t$ , such that those t intermediate variables as well as the output shares $c_{|\mathcal{O}|}$ can be perfectly simulated from $a_{|I|}$ and $b_{|J|}$ . Formal verification of SecMult. As previously, the first step is to implement the SecMult algorithm in Common Lisp; this requires only 12 lines of Common Lisp (see [Cor17b] for the source code). For n = 3, we obtain: As previously, to formally verify the t-SNI property of SecMult as stated in Lemma 6, it suffices to compute the list of all (n-1)-uples of intermediate variables (including the output $c_i$ 's) and check that every such (n-1)-uple can be perfectly simulated from the knowledge of at most t inputs $a_i$ and at most t inputs $b_j$ , where t is the number of non-output variables in the (n-1)-uple. For example, if we probe the non-output variables (+R1 (\*A1 B1)) and (+(\*A1 B2) R1) (\*A2 B1)) from above, we cannot perform any substitution because the random R1 is used twice, so we must know the inputs (A1 A2) and (B1 B2). On the other hand, if we consider the first two outputs, we have the substitutions: ``` ((+ R2 (+ R1 (* A1 B1))) (+ R3 (+ (* A2 B2) (+ (+ (* A1 B2) R1) (* A2 B1))))) => ((+ R2 (+ R1 (* A1 B1))) R3) => (R2 R3) ``` and therefore no inputs $a_i$ or $b_i$ is needed, as required for the t-SNI property (since we have considered output variables only). We obtain the following timings for the formal verification of SecMult using our CheckMasks tool: | n | #variables | #tuples | Security | Time | |---|------------|-----------|----------|--------------------| | 3 | 30 | 435 | ✓ | ε | | 4 | 54 | 24,804 | ✓ | $0.5 \mathrm{\ s}$ | | 5 | 85 | 2,024,785 | ✓ | 80 s | **Table 4.** Formal verification of the t-SNI property of SecMult, for small values of n. As previously, using this generic approach we can only verify Lemma 6 for small values of n. In Section 4.5 we describe a specialized approach to formally verify the security of SecMult for any value of n. ### 3.7 Discussion Our approach is essentially the same as in [BBD<sup>+</sup>15], except that we use the Common Lisp language instead of the C language, in order to get a simpler implementation; namely Common Lisp is very well suited to formal manipulations. Another difference is that we verify the t-NI or t-SNI properties directly on the input shares $x_i$ , while in [BBD<sup>+</sup>15] the input shares $x_i$ come from a pre-sharing of the original secret variable x, where the pre-sharing cannot be probed by the adversary. More precisely, the pre-sharing would compute: $$(x_1,\ldots,x_n) \leftarrow (x \oplus r_1,r_2,\ldots,r_{n-1},r_1 \oplus \cdots \oplus r_{n-1})$$ and in [BBD<sup>+</sup>15] the authors do not consider the input shares $x_i$ but rather the variables $(x \oplus r_1, r_2, \dots, r_{n-1}, r_1 \oplus \dots \oplus r_{n-1})$ . Their approach consists in checking that in the masking algorithm the distribution of any set of n-1 probes can be made syntactically independent from the original secret variable x, while in our approach we show that the distribution of any set of n-1 probes only depends on at most n-1 of the input shares $x_i$ ; this in turn also implies that the distribution is independent from the original x, since any subset of n-1 input shares $x_i$ is uniformly and independently distributed. In principle the two approaches are equivalent when verifying the t-NI property, but the second approach also enables to easily verify the stronger t-SNI property. # 4 Formal Verification in Polynomial Time The main drawback of the previous approach is that it has exponential complexity in the number of shares n, because the number of t-uples to consider grows exponentially with n. In this section we describe a new approach for proving the security of a side-channel countermeasure. Instead of performing a simulation of the probed variables as in [ISW03], our approach consists in applying a sequence of elementary circuit transforms, until the transformed circuit becomes so simple that the security property becomes straightforward to verify. The main advantage is that in the context of formal verification, our new approach seems much easier to verify formally than the classical simulation-based approach from [ISW03]. Our technique is based on the following two elementary transforms: - The Random-zero transform: we set to 0 a subset of the randoms $r_i$ used in the circuit. - The One-time-pad transform: if a random r appears only once in a circuit, and moreover r is not probed, we can replace any variable $x \oplus r$ by r. We show that the security of the gadgets considered in the previous section can always be verified in polynomial time in n thanks to these two transforms. #### 4.1 The Random-zero Transform Our first circuit transformation consists in setting to 0 a subset of the randoms $r_i$ used in the circuit. This enables to significantly simplify the circuit; namely the variable $x \oplus r$ can be replaced by x, and we can then remove duplicate variables in the circuit. In the following we show that it is sufficient to verify the security of a masking algorithm when a subset of its randoms are set to 0, if the circuit is additively masked. **Definition 3 (Additive masking).** Let C be a circuit taking as input $x_1, \ldots, x_n$ . We say that C is additively masked if every intermediate variable y in the circuit can be written as $y = f(x_1, \ldots, x_n) + g(r_1, \ldots, r_n)$ , where g is a linear function. For example, the circuit computing $y = x_1 \cdot x_2 + r_1 + r_2$ is additively masked, while the circuit computing $y = x \cdot r$ is not. Most side-channel countermeasures for block-ciphers are additively masked. In particular, this is true for the RefreshMasks, FullRefresh and SecMult algorithms considered in the previous sections. In this paper, we only consider additively masked circuits. **Lemma 7 (Random-zero transform).** Let C be an additively masked circuit and let $C_0$ be the same circuit as C but with a subset of the randoms fixed to 0. Anything an adversary can compute from a set of probes in C, he can compute from the same set of probes in the circuit $C_0$ . Lemma 7 shows that it is sufficient to consider the security of the simpler circuit $C_0$ where a subset of the randoms are fixed to 0. Namely if there was an attack against the original circuit C, then the same attack would apply against $C_0$ . Note that this does not hold for general circuits; consider for example the circuit taking as input sk and outputting $(sk \cdot r, r)$ ; when considering the output only, the circuit would be secure when r is fixed to 0, but the output leaks the secret sk whenever $r \neq 0$ . *Proof* (of Lemma 7). Let y be a vector of probed intermediate variables. Let x be the vector of inputs of the circuit and let r be the vector of randoms used in the circuit. Since the circuit is additively masked, we can write: $$y = h(x, r) = f(x) + g(r)$$ for some functions h, f and g, where g is linear. We write $\mathbf{r} = \mathbf{r'} + \mathbf{r''}$ where the randoms corresponding to $\mathbf{r'}$ are distributed as in the real circuit, while the randoms corresponding to $\mathbf{r''}$ are distributed as in the real circuit in C and set to 0 in $C_0$ . Since g is a linear function, we have: $$h(x,r) = h(x,r'+r'') = f(x) + g(r'+r'') = f(x) + g(r') + g(r'')$$ which gives: $$h(\boldsymbol{x}, \boldsymbol{r}) = h(\boldsymbol{x}, \boldsymbol{r'}) + g(\boldsymbol{r''}) \tag{2}$$ In the circuit C, the adversary obtain the probes $\mathbf{y} = h(\mathbf{x}, \mathbf{r})$ , while in the circuit $C_0$ the adversary obtains the probes $\mathbf{y}_0 = h(\mathbf{x}, \mathbf{r}')$ . From (2), we have that anything the adversary can compute from $\mathbf{y} = h(\mathbf{x}, \mathbf{r})$ , he can compute from $\mathbf{y}_0 = h(\mathbf{x}, \mathbf{r}')$ , simply by first computing: $$\boldsymbol{y} \leftarrow \boldsymbol{y}_0 + g(\boldsymbol{r''})$$ using for r'' the same distribution as in the real circuit. This proves Lemma 7. **Application:** t-NI of RefreshMasks. The t-NI property of RefreshMasks, as stated in Lemma 2, is easily verified formally using the Random-zero transform. Namely, if we fix all randoms of RefreshMasks to 0, we obtain the identity function, which is trivially t-NI. For example, we obtain for n=4: ``` > (check-refreshmasks-tni-poly 4) Input: (X1 X2 X3 X4) Output: ((+ R1 X1) (+ R2 X2) (+ R3 X3) (+ R3 (+ R2 (+ R1 X4)))) Random zero => (X1 X2 X3 X4) Identity function: T ``` Note that the verification is performed in polynomial time in n, while in the generic approach the complexity would be exponential in n when examining all possible t-uples. #### 4.2 The One-time Pad Transform The One-time Pad transform is defined as follows: if a random r is used only once in a circuit, and moreover r is not probed, then we can replace the variable $x \oplus r$ by r. Note that in principle the variable x can still be probed, so it must not be removed from the circuit. Consider for example a circuit taking as input $x_1$ , $x_2$ and outputting $(x_1 \oplus r) \oplus x_2$ ; it can be represented as (+ (+ X1 R) X2). If we assume that R is not probed, we can replace (+ (+ X1 R) X2) by (+ R X2); however if the original (+ X1 R) can still be probed, we cannot assume that R will not be probed in the new circuit (+ R X2), so we cannot apply the same transformation again on (+ R X2). Moreover, even if the variable X1 does not appear anymore in the circuit (+ R X2), in principle it can still be probed, so we must keep it separately on a list of variables that can be probed. In general we cannot assume that a certain random r is not probed by the adversary. We can only make this assumption when we have an upper bound on the number of probes in the circuit, as it is the case for the t-NI and t-SNI properties. For example, if a circuit contains n randoms $r_i$ but the adversary has only access to t = n - 1 probes, then we are guaranteed that at least one of the random $r_i$ has not been probed, and we can apply the previous One-time Pad transform on this random (if $r_i$ is not used twice in the circuit). The proof technique then consists in considering all possible n cases separately (corresponding to the non-probed $r_i$ , for $1 \le i \le n$ ), and then applying the admissible One-time Pad transform in each case. In the next sections, we illustrate this approach by providing a formal verification of the same security properties of the RefreshMasks, FullRefresh and SecMult algorithms as considered in Section 3, but this time with complexity polynomial in n, instead of exponential. This implies that the security of these algorithms can be formally verified for any value of n for which the countermeasure would be used in practice. #### 4.3 Formal Verification of Lemma 3 for FullRefresh In this section we provide a formal proof of Lemma 3 for the t-SNI property of FullRefresh (see Alg. 2 in Section 3.4); as opposed to Section 3.4 the formal verification time is now polynomial in n. The proof strategy is to perform a sequence of elementary circuit transforms until we obtain a simple circuit C for which the t-SNI property is straightforward to verify. The proof can then be formally verified by computing those circuit transforms in Common Lisp and checking that we indeed obtain this simple circuit C. In the case of FullRefresh we obtain the following simple circuit C-otp: **Lemma 8** (t-SNI of $C_{otp}$ ). Let $C_{otp}$ be the circuit taking as input as input $x_1, \ldots, x_n$ and outputting $y_i = x_i \oplus r_i$ for all $1 \le i \le n$ , where the randoms $r_i$ are uniformly and independently distributed. The circuit C is t-SNI for any $t \le n$ . *Proof.* The proof is straightforward. If $x_i$ or $r_i$ or $y_i = x_i \oplus r_i$ is probed, we put i in I. We obtain $|I| \leq t$ . From the knowledge of $x_{|I|}$ we can simulate any probed variable $x_i$ , $r_i$ and $y_i = x_i \oplus r_i$ since in that case $i \in I$ . Consider now any $i \in \mathcal{O} \setminus I$ ; in that case $y_i = x_i \oplus r_i$ can be simulated by a random value since $r_i$ is not probed, because $i \in \mathcal{O} \setminus I$ . We recall the t-SNI property of FullRefresh below from Section 3.4; this t-SNI property was already proved in [BBD<sup>+</sup>16]. Below we provide an alternative proof of Lemma 3, based on elementary circuit transforms, so that the proof can be formally verified in our CheckMasks tool. **Lemma 3** (t-SNI of FullRefresh). Let $(x_i)_{1 \leq i \leq n}$ be the input shares of the FullRefresh operation, and let $(y_i)_{1 \leq i \leq n}$ be the output shares. For any set of t intermediate variables and any subset $\mathcal{O}$ of output shares such that $t + |\mathcal{O}| < n$ , there exists a subset I of indices with $|I| \leq t$ , such that the t intermediate variables as well as the output shares $y_{|\mathcal{O}|}$ can be perfectly simulated from $x_{|I|}$ . *Proof.* We first construct a subset I of indices as follows. We refer to the definition of Alg. 2 for the notations. If $x_i$ or any intermediate variable $y_{i,j}$ is probed (including $y_i$ ), we add the i to I. Since we have considered at most t probes, we obtain $|I| \leq t$ . Moreover we have **Fig. 3.** Proof of Lemma 3: after removing the row $i^*$ and setting all randoms to 0 except on the column $i^*$ , there remains only a one-time pad of the n-1 inputs $x_i$ for $i \neq i^*$ , corresponding to the circuit $C_{otp}$ from Lemma 8. $|I \cup \mathcal{O}| \leq |I| + |\mathcal{O}| \leq t + |\mathcal{O}| < n$ , therefore there exists some $1 \leq i^* \leq n$ such that $i^* \notin I \cup \mathcal{O}$ . Since neither $x_{i^*}$ nor any intermediate variable $y_{i^*,j}$ has been probed on the row $i^*$ , and moreover $y_{i^*}$ must not be simulated (since $i^* \notin \mathcal{O}$ ), we can remove the row $i^*$ from the circuit; see Fig 3 for an illustration. We obtain a circuit with n-1 inputs $x_i$ for $1 \leq i \leq n$ and $i \neq i^*$ . We now apply the Random-zero transform and set to 0 all randoms $r_{ij}$ in the circuit, except the randoms on the column $i^*$ , namely $r_{i,i^*}$ for $i \neq i^*$ . We obtain a circuit taking as input $x_i$ and outputting $x_i \oplus r_{i,i^*}$ for all $i \neq i^*$ ; see Fig 3 for an illustration. This is exactly the circuit $C_{otp}$ from Lemma 8 with n-1 inputs. Since from Lemma 8 this circuit is t-SNI for all $t \leq n-1$ , using Lemma 7 the FullRefresh circuit is also t-SNI for all t < n, which proves the lemma. Note that the main difference with the original proof of Lemma 3 in [BBD<sup>+</sup>16] is that we have not performed an explicit simulation of the probed variables; instead we have performed a sequence of elementary circuit transforms (conditioned on some of the intermediate variables being probed or not) until we have obtained a trivial circuit. The above proof can be formally verified by performing a loop over all possible $1 \leq i^* \leq n$ . For each $i^*$ we first remove the row $i^*$ from the circuit, and then we set to 0 all randoms in the circuit, except the randoms $r_{i,i^*}$ for $i \neq i^*$ . For any given n, we can check formally that this leads to a circuit equivalent to taking $a_1, \ldots, a_{n-1}$ as input and outputting $a_1 \oplus r_1, \ldots, a_{n-1} \oplus r_{n-1}$ , namely the $C_{otp}$ circuit. Since such circuit is t-SNI from Lemma 8, the original circuit is t-SNI. We illustrate in Fig. 4 the formal verification for n=3. Note that the formal verification has a running time polynomial in n (as opposed to exponential in Section 3.4); therefore it can be performed for any n for which the countermeasure is used in practice. ### 4.4 Formal Verification of Lemma 4 for RefreshMasks We now consider the RefreshMasks algorithm (see Fig. 1), and we recall the security property of RefreshMasks considered in Section 3.5: if the output $y_n$ is among the t probed variables, then we can simulate any t probed variables with t-1 input shares only, instead of t in the basic t-NI property in Lemma 2. We recall the corresponding Lemma 4 from [Cor17a]. Below we provide an alternative proof that can be formally verified in time polynomial in n, using our CheckMasks tool. **Lemma 4.** Let $x_1, \ldots, x_n$ be the input of a RefreshMasks where the randoms are accumulated on $x_n$ , and let $y_1, \ldots, y_n$ be the output. Let t be the number of probed variables, with t < n. If $y_n$ is among the probed variables, then there exists a subset I such that all probed variables can be perfectly simulated from $x_{|I|}$ , with $|I| \le t - 1$ . ``` > (check-fullrefresh-tsni-poly 3) Input: (X1 X2 X3) Output: ((+ R2 (+ R1 X1)) (+ R3 (+ R1 X2)) (+ R3 (+ R2 X3))) Case 0: no output, no probe in (+ R2 (+ R1 X1)) Subcircuit: ((+ R3 (+ R1 X2)) (+ R3 (+ R2 X3))) Set all randoms to 0 except (R1 R2) => ((+ R1 X2) (+ R2 X3)) Case 1: no output, no probe in (+ R3 (+ R1 X2)) Subcircuit: ((+ R2 (+ R1 X1)) (+ R3 (+ R2 X3))) Set all randoms to 0 except (R1 R3) => ((+ R1 X1) (+ R3 X3)) Case 2: no output, no probe in (+ R3 (+ R2 X3)) Subcircuit: ((+ R2 (+ R1 X1)) (+ R3 (+ R1 X2))) Set all randoms to 0 except (R2 R3) => ((+ R2 X1) (+ R3 X2)) ``` **Fig. 4.** Formal verification of the FullRefresh circuit for n=3. **Fig. 5.** Proof of Lemma 4: after removing the sub-circuit corresponding to $i^*$ and setting to zero all randoms except $i^*$ , the remaining circuit is the identity circuit except $y_n = x_n \oplus r_{i^*}$ . *Proof.* Without loss of generality, we can consider t=n-1 probes (see Section 3.5). We first construct a subset I of indices as follows. For any $1 \le i \le n-1$ , if $x_i$ or $r_i$ or $y_i = x_i \oplus r_i$ is probed, then we put i in I. Since by assumption $y_n$ has been probed, we have considered at most n-2 probes in the construction of I, and therefore we have $|I| \le n-2$ . Therefore there must be some $1 \le i^* \le n-1$ such that there was no probe in the subcircuit $y_{i^*} = x_{i^*} \oplus r_{i^*}$ , that is neither $y_{i^*}$ nor $x_{i^*}$ nor $r_{i^*}$ have been probed. For a given $i^*$ , we can remove the subcircuit $y_{i^*} = x_{i^*} \oplus r_{i^*}$ from the original circuit, since there are no probes in it. Note that $r_{i^*}$ is still used in the computation of $y_n$ . We then apply the Random-zero transform to all randoms except $r_{i^*}$ . As illustrated in Figure 5, we obtain a circuit taking as input the $x_i$ 's for $1 \le i \le n$ and $i \ne i^*$ , and outputting $y_i = x_i$ for $1 \le i \le n-1$ and $i \ne i^*$ , and $y_n = x_n \oplus r_{i^*}$ . It is easy to see that the transformed circuit satisfies the required property from Lemma 4. This could be proved using the classical simulation-based approach, but we can also continue with elementary transforms, as follows. Since by assumption $r_{i^*}$ has not been probed, we can apply the One-time-pad transform to $r_{i^*}$ , and we obtain $y_n = r_{i^*}$ (and we also keep $x_n$ in the circuit). Finally, we apply the Random-zero transform to $r_{i^*}$ , and we obtain $y_n = 0$ . Therefore we have obtained a final circuit taking as input $(x_1, \ldots, x_n)$ except $x_{i^*}$ and outputting $(x_1, \ldots, x_{n-1}, 0)$ . Moreover we have a set of n-1 probes, one of which is 0 (corresponding to $y_n$ ), and the remaining n-2 probes are on the inputs $x_i$ and can therefore be simulated from the knowledge of at most n-2 inputs. This proves Lemma 4. It is easy to verify the above proof with a formal tool, since it consists in elementary circuit transforms conditioned on the value of $1 \le i^* \le n-1$ ; we provide the transcript of the formal proof for n=4 in Fig. 6; see [Cor17b] for the source code. In Appendix B, we also provide a formally verifiable proof of Lemma 5 from [Cor17a]. ``` > (check-refreshmasks-last-poly 4) Input: (X1 X2 X3 X4) Output: ((+ R1 X1) (+ R2 X2) (+ R3 X3) (+ R3 (+ R2 (+ R1 X4)))) First probe: ((+ R3 (+ R2 (+ R1 X4)))) Case 0: no probe in (+ R1 X1) Subcircuit: ((+ R2 X2) (+ R3 X3) (+ R3 (+ R2 (+ R1 X4)))) Set all randoms to 0 except R1 \Rightarrow (X2 X3 (+ R1 X4)) One-time pad: (X2 X3 R1 X4). Random zero: (X2 X3 0 X4) First probe: 0. Other 2 probes in (X2\ X3\ X4) Case 1: no probe in (+ R2 X2) Subcircuit: ((+ R1 X1) (+ R3 X3) (+ R3 (+ R2 (+ R1 X4)))) Set all randoms to 0 except R2 \Rightarrow (X1 X3 (+ R2 X4)) One-time pad: (X1 X3 R2 X4). Random zero: (X1 X3 0 X4) First probe: 0. Other 2 probes in (X1 X3 X4) Case 2: no probe in (+ R3 X3) Subcircuit: ((+ R1 X1) (+ R2 X2) (+ R3 (+ R2 (+ R1 X4)))) Set all randoms to 0 except R3 \Rightarrow (X1 X2 (+ R3 X4)) One-time pad: (X1 X2 R3 X4). Random zero: (X1 X2 0 X4) First probe: 0. Other 2 probes in (X1 X2 X4) ``` **Fig. 6.** Formal verification of Lemma 4 for n = 4, using our CheckMasks tool for performing the sequence of elementary transforms. Remark 2. The above formal verification of Lemma 4 has time complexity polynomial in n, so we can perform the verification for any n. For example, generating the transcript of the formal proof for n=50 takes only a few seconds (since there are only n-1 cases to consider), while this would be completely unfeasible with the generic technique of Section 3.5, which has complexity $2^{3\cdot 2n}$ (see Table 2 for the corresponding timings). # 4.5 Formal Verification of the Rivain-Prouff Countermeasure In this section our goal is to provide a proof of the t-SNI property of SecMult from the Rivain-Prouff countermeasure [RP10], that can be formally verified in polynomial time; this corresponds to Lemma 6 from Section 3.6. As a warm-up we consider the weaker t-NI security property, for which a pen-and-paper proof was already given in [ISW03]; below we provide a proof that can be formally verified in our CheckMasks tool, based on circuit transforms. **Lemma 9** (t-NI of SecMult). Let $(a_i)_{1 \leq i \leq n}$ and $(b_i)_{1 \leq i \leq n}$ be the input shares of the SecMult circuit, and let $(c_i)_{1 \leq i < n}$ be the output shares. For any set of t intermediate variables and any subset $\mathcal O$ of output shares, there exists a subset I of indices such that $I = J \cup \mathcal O$ where $|J| \leq 2t$ , such that those t intermediate variables as well as the output shares $c_{|\mathcal O|}$ can be perfectly simulated from $a_{|I|}$ and $b_{|I|}$ . Proof. We prove the result recursively on n. The property holds for n=1. We now assume that it holds for n-1, and we prove that it must hold for n. We construct a set of indices U as follows, starting from $U=\mathcal{O}$ . If one the variables $\{a_i,b_i,a_i\cdot b_i,c_{i,j}\}$ is probed, we add i to U. If one of the variables $\{a_i\cdot b_j,r_{i,j},a_ib_j+r_{i,j}\}$ is probed (for any $i\neq j$ ), we add both i and j to U. We obtain $|U|\leq 2t+|\mathcal{O}|$ . We distinguish two cases. If |U|=n, we can perfectly simulate all variables in the circuit by letting I=U=[1,n], and we have $|I|\leq 2t+|\mathcal{O}|$ as required. We now consider the case |U| < n, so we can let $1 \le i^* \le n$ such that $i^* \notin U$ . Since none of the variables $c_{i^*,j}$ has been probed, we can remove them from the circuit. We now consider the $r_{i^*,j}$ variables; none of these variables has been probed. On the row $i^*$ and before the diagonal $(j < i^*)$ , the $r_{i^*,j} = (a_j b_{i^*} + r_{j,i^*}) + a_{i^*} b_j$ variables are only used in the $c_{i^*,j}$ variables on the same row (see Fig. 7). Since we have already removed the $c_{i^*,j}$ variables, we can also remove those $r_{i^*,j}$ variables for $j < i^*$ from the circuit. Moreover, since $a_j b_{i^*} + r_{j,i^*}$ has not been probed, $$\begin{vmatrix} 0 & \cdots & r_{1,i^{*}} & \cdots & r_{1,n} \\ \vdots & \ddots & & \vdots \\ r_{i^{*},1} & \cdots & 0 & \cdots & r_{i^{*},n} \\ \vdots & & \ddots & \vdots \\ r_{n,1} & \cdots & r_{n,i^{*}} & \cdots & 0 \end{vmatrix} \xrightarrow{c_{i^{*}}} \longrightarrow \begin{vmatrix} 0 & \cdots & r_{1,i^{*}} & \cdots & r_{1,n} \\ \vdots & & \vdots & & \vdots \\ r_{i^{*}-1,1} & \cdots & r_{i^{*}-1,i^{*}} & \cdots & r_{i^{*}-1,n} \\ \vdots & & \vdots & & \vdots \\ r_{n,1} & \cdots & r_{n,i^{*}} & \cdots & 0 \end{vmatrix} \xrightarrow{c_{i^{*}+1}} \begin{bmatrix} c_{i^{*}+1} \\ \vdots \\ c_{i^{*}+1} \\ \vdots \\ c_{i^{*}-1,1} & \cdots & 0 & \cdots & r_{i^{*}-1,n} \\ \vdots \\ r_{i^{*}-1,1} & \cdots & 0 & \cdots & r_{i^{*}+1,n} \\ \vdots \\ r_{i^{*}+1,1} & \cdots & 0 & \cdots & r_{i^{*}+1,n} \\ \vdots \\ r_{n,1} & \cdots & 0 & \cdots & 0 \end{vmatrix} \xrightarrow{c_{i^{*}+1}} \xrightarrow{c_{i^{*}+1}}$$ ig 7. After removing the $i^{*}$ -th row and applying the one-time pad transform, we obtain **Fig. 7.** After removing the $i^*$ -th row and applying the one-time pad transform, we obtain a column $i^*$ in which all variables $r_{j,i^*}$ are independent randoms. One can then apply the random-zero transform, and eventually remove the column $i^*$ . we can also remove the corresponding variables from the circuit. Therefore we can remove the row $i^*$ from the circuit. As illustrated in Figure 7, there remains a circuit in which the original randoms $r_{i^*,j}$ for $j > i^*$ (after the diagonal) are used only once, namely in the variable $r_{j,i^*} = (a_{i^*}b_j + r_{i^*,j}) + a_jb_{i^*}$ . Since $r_{i^*,j}$ is not probed, and moreover $a_{i^*}b_j + r_{i^*,j}$ is not probed, we can apply the One-time-pad transform twice and replace the variables $r_{j,i^*}$ below the diagonal by an independently generated random value, which we still denote by $r_{j,i^*}$ . We obtain a circuit in which on the column $i^*$ , all $r_{j,i^*}$ for $j \neq i^*$ are independently generated random values, which are used only once in the circuit. We can therefore apply the Random-zero transform to these randoms, *i.e.* we set to 0 all the randoms $r_{j,i^*}$ on the $i^*$ column; see Figure 7 for an illustration. Since all elements on the $i^*$ column are now zero, we can remove the $i^*$ column and eventually obtain a circuit with n-1 inputs $a_i$ and $b_i$ that is equivalent to the original SecMult circuit, but with n-1 inputs instead of n, and still the same value of t. We can therefore apply the recursive hypothesis: there exists a subset I of indices such that $I = J \cup \mathcal{O}$ where $|J| \leq 2t$ , such that those t intermediate variables as well as the output shares $c_{|\mathcal{O}|}$ can be perfectly simulated from $a_{|I|}$ and $b_{|I|}$ . This implies that the same property holds for the original circuit with n inputs; this proves the lemma. To verify the above proof formally, as previously it suffices to do a loop on all possible values of $1 \le i^* \le n$ . We provide in Figure 12 in Appendix E the transcript of the formal verification for n=3; we refer to [Cor17b] for the source code. We see that in each case, one obtains after a sequence of elementary transforms a circuit that is equivalent to the original circuit but with n-1 input shares; therefore one can apply the recursive hypothesis. Finally, we provide in Appendix C a formally verifiable proof of the full t-SNI property of SecMult (instead of t-NI only as above), corresponding to Lemma 3 from Section 3.6. ### 5 Towards Automatic Generation of Security Proofs The drawback of the previous approach is that for the security verification to happen in polynomial time, we must select ourselves the right circuit transforms. Instead we would like to have the circuit transforms being selected automatically by our verification tool, based on a limited set of elementary rules, and still in polynomial-time. In the following, we show that this can be achieved for simple circuits based on the three following rules. We denote by P the property that must be checked; for example, for t-NI security, the property P would require that any t-uple of intermediate variables is simulatable from a subset of the inputs $x_{|I|}$ , with $|I| \leq t$ . Below we denote by $C_{otp}$ the circuit $y_i = x_i \oplus r_i$ for $1 \leq i \leq n$ (see Lemma 8 in Section 4.3). We assume that the property P is already verified by $C_{otp}$ , so that P does not need to be verified explicitly for $C_{otp}$ . - (R1) Perform a loop to select and remove the subset of the circuit that is unprobed. - (R2) Apply the random-zero transform, except on randoms used only once in the circuit. - (R3) Check whether the resulting circuit is equal to $C_{otp}$ . Otherwise check the property P for all possible t-uple of probes. | Algorithm | Property | Lemma | Rules | Final circuit | |--------------|----------------------------------------------------------------------------|------------------|------------|-------------------------------------------------------------| | RefreshMasks | t-NI | Lemma 2 | R2, R3 | $(x_1,\ldots,x_n)$ | | FullRefresh | t-SNI | | R1, R2, R3 | | | RefreshMasks | $ I \le t - 1$ with probed $y_n$ | Lemma 4 [Cor17a] | R1, R2, R3 | $(x_1,\ldots,x_{n-1}, x_n \oplus r_i)$ | | RefreshMasks | $\begin{vmatrix} I \le t - 1 & \text{with} \\ x_{n+1} = 0 \end{vmatrix}$ | Lemma 5 [Cor17a] | R1, R2, R3 | $(x_1, \dots, x_{n-1}, x_n \oplus r_i) \text{ or } C_{otp}$ | **Table 5.** Rules and final circuit to verify a security property in polynomial-time in n. We show in Table 5 that from the three above rules, we can formally verify in polynomial time the main properties of RefreshMasks and FullRefresh considered in this paper; we refer to Appendix D for the details. Namely by applying rules R1 and R2, we always obtain a simplified circuit that is either the same as $C_{otp}$ , or a circuit where the number of intermediate variables is upper bounded by $n + \delta$ for a small constant $\delta$ , and therefore the number of t-uples of intermediate variables for t = n - 1 remains polynomial in n. This implies that the property P can be verified by exhaustive search on the t-uples, as in the generic approach from Section 3, but now in polynomial time. We refer to [Cor17b] for the source code. # 6 Conclusion We have described a simple technique to formally verify the security of masked implementations against side-channel attacks, using two complementary approaches. The generic approach is essentially the same as in [BBD $^+15$ ], but using the Common Lisp language to get a very short source code; it enables to verify the security of any circuit, but for small orders only. For the specialized approach, we have introduced an alternative proof technique based on elementary circuit transforms; we can then formally verify the security of specific circuits for any order n, in time polynomial in n. This new approach is potentially simpler than the technique in [BBD $^+16$ ] based on the EasyCrypt framework; the difference is that we do not perform an explicit simulation of the probed variables; instead we perform a sequence of elementary circuit transforms, until the simulation of the probes in the final circuit becomes trivial. Finally we have shown how to get automatic security proofs, at least for simple circuits. We have described the implementation of CheckMasks, our formal verification tool for sidechannel countermeasures, based on the Common Lisp programming language. Using this tool, we have shown how to formally verify the security of the Rivain-Prouff countermeasure for AES. ### References - BBD<sup>+</sup>15. Gilles Barthe, Sonia Belaïd, François Dupressoir, Pierre-Alain Fouque, Benjamin Grégoire, and Pierre-Yves Strub. Verified proofs of higher-order masking. In Advances in Cryptology EUROCRYPT 2015 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria, April 26-30, 2015, Proceedings, Part I, pages 457–485, 2015. - BBD<sup>+</sup>16. Gilles Barthe, Sonia Belaïd, François Dupressoir, Pierre-Alain Fouque, Benjamin Grégoire, Pierre-Yves Strub, and Rébecca Zucchini. Strong non-interference and type-directed higher-order masking. In *Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016*, pages 116–129, 2016. - BDG<sup>+</sup>14. Gilles Barthe, François Dupressoir, Benjamin Grégoire, César Kunz, Benedikt Schmidt, and Pierre-Yves Strub. *EasyCrypt: A Tutorial*, pages 146–166. Springer International Publishing, Cham, 2014. - CGPZ16. Jean-Sébastien Coron, Aurélien Greuet, Emmanuel Prouff, and Rina Zeitoun. Faster evaluation of sboxes via common shares. In *Cryptographic Hardware and Embedded Systems CHES 2016 18th International Conference, Santa Barbara, CA, USA, August 17-19, 2016, Proceedings*, pages 498–514, 2016. - Cor14. Jean-Sébastien Coron. Higher order masking of look-up tables. In Advances in Cryptology EU-ROCRYPT 2014 33rd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Copenhagen, Denmark, May 11-15, 2014. Proceedings, pages 441-458, 2014. - Cor17a. Jean-Sébastien Coron. High-order conversion from boolean to arithmetic masking. In *CHES*, 2017. Cor17b. Jean-Sébastien Coron. CheckMasks: formal verification of side-channel countermeasures, 2017. Publicly available at https://github.com/coron/checkmasks. - CPRR13. Jean-Sébastien Coron, Emmanuel Prouff, Matthieu Rivain, and Thomas Roche. Higher-order side channel security and mask refreshing. In Fast Software Encryption 20th International Workshop, FSE 2013, Singapore, March 11-13, 2013. Revised Selected Papers, pages 410–424, 2013. - DDF14. Alexandre Duc, Stefan Dziembowski, and Sebastian Faust. Unifying leakage models: From probing attacks to noisy leakage. In Advances in Cryptology EUROCRYPT 2014 33rd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Copenhagen, Denmark, May 11-15, 2014. Proceedings, pages 423–440, 2014. - ISW03. Yuval Ishai, Amit Sahai, and David Wagner. Private circuits: Securing hardware against probing attacks. In Advances in Cryptology CRYPTO 2003, 23rd Annual International Cryptology Conference, Santa Barbara, California, USA, August 17-21, 2003, Proceedings, pages 463-481, 2003. - OMHT06. Elisabeth Oswald, Stefan Mangard, Christoph Herbst, and Stefan Tillich. Practical second-order DPA attacks for masked smart card implementations of block ciphers. In CT-RSA, pages 192–207, 2006. - RP10. Matthieu Rivain and Emmanuel Prouff. Provably secure higher-order masking of AES. In *CHES*, pages 413–427, 2010. ### A Other Properties of RefreshMasks We recall Lemma 7 from [Cor17a]. **Lemma 10.** Let $x_1, \ldots, x_n$ be the input of a RefreshMasks where the randoms are accumulated on $x_n$ , and let $y_1, \ldots, y_n$ be the output. Let t be the number of probed variables, with t = n. If $y_n$ is among the probed variables, then either all probed variables can be perfectly simulated from $x_1 \oplus \cdots \oplus x_n$ , or there exists a subset I with $|I| \leq n-1$ such that they can be perfectly simulated from $x_{|I|}$ . Using our CheckMasks formal tool, Lemma 10 can be easily verified for small values of n. In the proof of the lemma in [Cor17a], it appears that the knowledge of $x_1 \oplus \cdots \oplus x_n$ is only necessary when the n probes are the n outputs $y_1, \ldots, y_n$ of RefreshMasks. This case is already covered by the following straightforward lemma recalled in [Cor17a]. **Lemma 11.** Let $(x_i)_{1 \leq i \leq n}$ be the input and let $(y_i)_{1 \leq i \leq n}$ be the output of RefreshMasks. The distribution of $(y_i)_{1 \leq i \leq n}$ can be perfectly simulated from $x_1 \oplus \cdots \oplus x_n$ . Therefore, to formally verify Lemma 10, we can exclude the previous case; one must then verify that the n probes can always be perfectly simulated from the knowledge of at most n-1 variables. We obtain the following timings: We also formally verify Lemma 8 from [Cor17a], showing that if we xor the last two output variables $y_{n-1}$ and $y_n$ of RefreshMasks, then the circuit is t-NI for all $t \leq n-1$ ; as previously, for t = n-1 we must exclude the case of all n-1 output variables being probed. The proof is a straightforward application of Lemma 4 and Lemma 10. | n | #variables | #tuples | Security | Time | |---|------------|---------|----------|-------| | 3 | 9 | 84 | ✓ | ε | | 4 | 13 | 715 | ✓ | ε | | 5 | 17 | 6,188 | ✓ | ε | | 6 | 21 | 54,264 | ✓ | 0.4 s | | 7 | 25 | 480,700 | ✓ | 4.3 s | **Table 6.** Formal verification of Lemma 10, for small values of n. **Lemma 12.** Consider the circuit with $y_1, \ldots, y_n \leftarrow \text{RefreshMasks}(x_1, \ldots, x_n)$ , $z_i \leftarrow y_i$ for all $1 \leq i \leq n-2$ and $z_{n-1} \leftarrow y_{n-1} \oplus y_n$ . Let t be the number of probed variables. If t < n-1, there exists a subset I with $|I| \leq t$ such that all probed variables can be perfectly simulated from $x_{|I}$ . If t = n-1, then either all probed variables can be perfectly simulated from $x_1 \oplus \cdots \oplus x_n$ , or there exists a subset I with $|I| \leq n-1$ such that they can be perfectly simulated from $x_{|I|}$ . Using our CheckMasks tool, we obtain the following timings. As explained previously, it suffices to check the t-NI property for t = n - 1. | n | #variables | #tuples | Security | Time | |---|------------|---------|----------|--------------------| | 3 | 10 | 45 | ✓ | ε | | 4 | 14 | 364 | ✓ | ε | | 5 | 18 | 3,060 | ✓ | ε | | 6 | 22 | 26,334 | ✓ | $0.5 \mathrm{\ s}$ | | 7 | 26 | 230,230 | ✓ | 5.7 s | **Table 7.** Formal verification of Lemma 12, for small values of n. # B Formal Verification of Lemma 5 for RefreshMasks We consider the RefreshMasks algorithm (see Fig. 1), and we recall the other security property of RefreshMasks considered in Section 3.5; see [Cor17a, Lemma 5] for the pen-and-paper proof. Below we provide an alternative proof that can be formally verified in time polynomial in n, using our CheckMasks tool. We first prove the following simple lemma, on the same $C_{otp}$ circuit as considered in Section 4.3. **Lemma 13.** Let $C_{otp}$ be the circuit taking as input as input $x_1, \ldots, x_n$ and outputting $y_i = x_i \oplus r_i$ for all $1 \le i \le n$ , where the randoms $r_i$ are uniformly and independently distributed. For any set of t intermediate variables, there exists a subset I of indices with $|I| \le t - 1$ , such that the t intermediate variables can be perfectly simulated from $x_{|I|}$ , except if only the input shares $x_i$ are probed. *Proof.* By assumption, there exists an index $i^*$ such that $r_{i^*}$ or $y_{i^*}$ or both have been probed, with $1 \leq i^* \leq n$ . We construct the set I as follows. For any $i \neq i^*$ , if $x_i$ or $r_i$ or $y_i$ has been probed, we add i to I; moreover if $x_{i^*}$ has been probed, or if both $r_{i^*}$ and $y_{i^*}$ have been probed, we add $i^*$ to I. We first show that we must have $|I| \leq t - 1$ as required. Namely either a single variable among $r_{i^*}$ and $y_{i^*}$ has been probed, and this probe does not contribute to I, or both $r_{i^*}$ and $y_{i^*}$ have been probed, and these two probes contribute to only one index in I. One can then simulate any probed variable $x_i$ , $y_i$ and $r_i$ for $i \neq i^*$ from $i \in I$ . If $i^* \in I$ , then $x_{i^*}$ , $y_{i^*}$ and $r_{i^*}$ can also be simulated. Finally, if $i^* \notin I$ , then either $r_{i^*}$ or $y_{i^*}$ has been probed (but not both); in both cases such variable can be perfectly simulated. We now proceed with the proof of Lemma 5. As previously, the proof strategy is to perform a sequence of elementary circuit transforms until we obtain the above circuit $C_{otp}$ on which the property is proven by Lemma 13. **Lemma 5** (RefreshMasks). Let $x_1, \ldots, x_n$ be n inputs shares, and let $x_{n+1} = 0$ . Consider the circuit $y_1, \ldots, y_{n+1} \leftarrow$ RefreshMasks $_{n+1}(x_1, \ldots, x_n, x_{n+1})$ , where the randoms are accumulated on $x_{n+1}$ . Let t be the number of probed variables. There exists a subset I such that all probed variables can be perfectly simulated from $x_{|I|}$ , with $|I| \leq t - 1$ , except if only the input $x_i$ 's are probed. **Fig. 8.** The RefreshMasks circuit with n+1 inputs, with $x_{n+1}=0$ . Proof. From the reasoning of Section 3.5, we only have to prove the lemma for t=n. We distinguish two cases; see Figure 8 for an illustration. If none of the intermediate variables $y_{n+1,j}$ has been probed nor $y_{n+1}$ , we remove the corresponding subcircuit, and there remains the circuit $C_{otp}$ for which the property is proven by Lemma 13. In the second case, if one of the intermediate variables $y_{n+1,j}$ or $y_{n+1}$ has been probed, we apply the random-zero transform to all $r_i$ 's. There remains a circuit outputting $(x_1, \ldots, x_n, 0)$ , where one of the probe is now 0. Therefore the remaining n-1 probes can be simulated by $x_{|I|}$ with $|I| \leq n-1$ . We provide in Figure 9 the transcript of the formal proof for n = 4; see [Cor17b] for the source code. **Fig. 9.** Formal verification of Lemma 5 for n = 4, using our CheckMasks tool for performing the sequence of elementary transforms. ### C Proof of Lemma 6 via Circuit Transforms The proof of the t-SNI property of SecMult proceeds in two steps. In the first step, we define an index $i^*$ and we show that we can remove the row $i^*$ from the circuit; as in the t-NI proof from Section 4.5, we obtain a transformed circuit C in which all the variables $r_{j,i^*}$ on the column $i^*$ are independent randoms (see Fig. 10). We then show recursively that the resulting circuit C is t-SNI. For this, in the second step, we define another index $k^* \neq i^*$ , and we show that we can remove the row and column corresponding to $k^*$ . We then obtain a circuit similar to C but with n-1 inputs instead of n; one can then apply the recursive hypothesis. **Fig. 10.** In the first step, we remove the $i^*$ -th row, and we obtain a transformed circuit in which the variables $r_{j,i^*}$ on the column $i^*$ are all independent randoms. **First step.** We let U be the set of of indices i such that $r_{i,j}$ or $c_{i,j}$ has been probed (for any j). We also construct a set V using the following rule: If $$a_i b_j + r_{ij}$$ has been probed: put $j$ in $V$ if $i \in O$ or $i \in U$ , otherwise put $i$ in $V$ . (3) Since we have considered at most t probes in the definition of U and V, we must have $|U|+|V| \le t$ , which gives $|U|+|V|+|O| \le t+|O| < n$ . Therefore we can let $1 \le i^* \le n$ such that $i^* \notin U \cup V \cup O$ . By definition of $i^*$ , none of the $r_{i^*,j}$ or $c_{i^*,j}$ variables has been probed. In particular, on the row $i^*$ and before the diagonal $(j < i^*)$ , the variable $r_{i^*,j} = (a_j b_{i^*} + r_{j,i^*}) + a_{i^*,j}$ has not been probed. Therefore we can remove these variables from the circuit. This implies that we can remove the row corresponding to $i^*$ from the circuit; however the variables $a_{i^*}b_j$ or $a_jb_{i^*}$ can still be probed, so we must keep them in a separate list L of variables that can be probed. On the row $i^*$ and after the diagonal $(j > i^*)$ the variable $r_{i^*,j}$ is not probed; it is used only in the variable $a_{i^*}b_j + r_{i^*,j}$ , which is used in $r_{j,i^*} = (a_{i^*}b_j + r_{i^*,j}) + a_jb_{i^*}$ . We claim that the $a_{i^*}b_j + r_{i^*,j}$ variable is also not probed; namely, if it had been probed, since $i^* \notin O$ and $i^* \notin U$ , from Rule (3) we would have $i^* \in V$ , a contradiction. We can therefore apply the one-time pad transform twice on $r_{i^*,j}$ , and consider a modified circuit in which $r_{j,i^*}$ for $j > i^*$ (below the diagonal) is an independent random. In summary, we obtain a transformed circuit in which on the column $i^*$ , the variables $r_{j,i^*}$ are independent randoms for all $j \neq i^*$ ; see Figure 10 for an illustration. Moreover, above the diagonal $(j < i^*)$ , the variables $a_jb_{i^*} + r_{j,i^*}$ can still be probed; we note that for such j, we must have $j \notin U \cup O$ (otherwise, from Rule (3) we would have $i^* \in V$ , a contradiction). **Second step.** We consider the transformed circuit from the first step and taking as input n shares. We show recursively that the circuit is t-SNI. We still define the sets U and V as previously. We must have $|U| \leq n - 1$ . We distinguish two cases. If |U| = n - 1, we must have $t \geq n - 1$ . We again distinguish two cases. If none of the variables $a_j b_{i^*} + r_{j,i^*}$ has been probed, then neither $a_{i^*}$ nor $b_{i^*}$ is required for the simulation; we can therefore let $I = [1, n] \setminus \{i^*\}$ for the simulation of the full circuit. If at least one of the variables $a_j b_{i^*} + r_{j,i^*}$ has been probed, we must have $t \ge n$ and therefore we can let I = [1, n] for the simulation of the full circuit. In both cases we have $|I| \le t$ as required. We now consider the second case, namely |U| < n-1. In that case we can let $k^* \notin U \cup \{i^*\}$ . Recall that on the $i^*$ column, all variables $r_{j,i^*}$ are independent randoms (see Fig. 11), and moreover above the diagonal $(j < i^*)$ , the variables $a_j b_{i^*} + r_{j,i^*}$ can be probed. We distinguish two cases. If the variable $a_{k^*}b_{i^*} + r_{k^*,i^*}$ has been probed, from Rule (3) we must have $k^* \in V$ and $k^* \notin O$ . Since $k^* \notin U \cup O$ , the random $r_{k^*,i^*}$ has not been probed and is used only once, in the computation of the previous variable $a_{k^*}b_{i^*} + r_{k^*,i^*}$ . Therefore we can perfectly simulate the previous variable, without knowing $a_{k^*}$ and $b_{i^*}$ . We now consider the second case, in which the variable $a_{k^*}b_{i^*} + r_{k^*,i^*}$ has not been probed. Since in that case the random $r_{k^*,i^*}$ is used only once and in the computation of $c_{k^*}$ , the $c_{k^*}$ output variable can be perfectly simulated if $k^* \in O$ , without knowing $a_{k^*}$ and $b_{k^*}$ . $$\begin{pmatrix} 0 & \cdots & r_{1,k^*} & \cdots & r_{1,i^*} & \cdots & r_{1,n} \\ \vdots & \ddots & \vdots & & \vdots & & \vdots \\ r_{k^*,1} & \cdots & 0 & \cdots & r_{k^*,i^*} & \cdots & r_{k^*,n} \\ \vdots & & \vdots & & \vdots & & \vdots \\ r_{i^*-1,1} & \cdots & r_{i^*-1,k^*} & \cdots & r_{i^*-1,i^*} & \cdots & r_{i^*-1,n} \\ r_{i^*+1,1} & \cdots & r_{i^*+1,k^*} & \cdots & r_{i^*+1,i^*} & \cdots & r_{i^*+1,n} \\ \vdots & & \vdots & & \vdots & & \vdots \\ r_{n,1} & \cdots & r_{n,k^*} & \cdots & r_{n,i^*} & \cdots & 0 \end{pmatrix}$$ **Fig. 11.** In the second step, we define a second index $k^* \neq i^*$ . Thanks to the random $r_{k^*,i^*}$ , we can perfectly simulate the output $c_{k^*}$ , and then remove the row and column corresponding to $k^*$ . In both cases, on the row $k^*$ , none of the variables $c_{k^*,j}$ has been probed, so they can be removed from the circuit. Moreover, on the row $k^*$ , before the diagonal $(j < k^*)$ , the variables $r_{k^*,j} = (a_j b_{k^*} + r_{j,k^*}) + a_{k^*} b_j$ are also not probed, so they can also be removed from the circuit. After the diagonal $(j > k^*)$ , the randoms $r_{k^*,j}$ are not probed and are used only in the variable $a_{k^*} b_j + r_{k^*,j}$ , which are used in the variables $r_{j,k^*} = (a_{k^*} b_j + r_{k^*,j}) + a_j b_{k^*}$ . Therefore, we can replace the variable $r_{j,k^*}$ by an independent random, and replace the variable $a_{k^*} b_j + r_{k^*,j}$ by the identical variable $a_j b_{k^*} + r_{j,k^*}$ . Therefore, on the column $k^*$ , all the variables $r_{i,k^*}$ are independent randoms; moreover the variables $a_i b_{k^*} + r_{i,k^*}$ can possibly be probed. We apply the Random-zero transform to all these randoms on the $k^*$ column; the variables $a_i b_{k^*}$ are put in a separate list of variables that can be probed. We can then remove the column and row corresponding to $k^*$ . Therefore, for the simulation of the resulting circuit, the knowledge of $a_{k^*}$ and $b_{k^*}$ is not necessary anymore. After removing the $k^*$ row and column, we obtain a circuit with n-1 inputs, with the same structure as in the beginning of the second step. We can therefore apply the recursive hypothesis: all t probes and all output variables $c_i$ for $i \in O$ can be perfectly simulated from $a_{|I|}$ and $b_{|J|}$ , where $|I| \leq t$ and $|J| \leq t$ . This implies that the same property holds for the original circuit; this proves the lemma. **Formal verification.** As previously, the above proof can be formally verified by performing the elementary circuit transforms for all possible indices $i^* \neq k^*$ . Note that the number of cases to consider is now quadratic in n, instead of linear in the t-NI proof from Section 4.5; therefore, as opposed to the previous transcripts, it is too long to be included in this paper, even for small values of n. We refer to [Cor17b] for the source code to generate such transcript, for any value of n. # D Automatic Verification for Simple Circuits In this section, we show that by applying the rules R1, R2 and R3 from Section 5, the security properties of RefreshMasks from lemma 2, 4 and 5 can be formally verified in polynomial time, as well as Lemma 3 for FullRefresh; see Table 5. Namely, for verifying the t-NI property of RefreshMasks, as explained in Section 4.1, it suffices to apply the Random-zero transform to all randoms $r_i$ of the circuit, and one obtains the identity circuit which is trivially t-NI. More precisely, we can apply rule R2 which sets all randoms to 0, since all randoms are used twice in the circuit. Then the t-NI property for t = n - 1 is immediately verified, since there are $\binom{n}{n-1} = n$ uples of n-1 probes; for this we use the generic technique from Section 3, and in this case it works in polynomial time. Alternatively one can firstly apply Rule R1 and then rules R2 and R3 on each subcircuit, and the complexity of verification is still polynomial-time. For the t-SNI property of FullRefresh, the successive application of rules R1, R2 and R3 is exactly what is done in the proof of Lemma 3 in Section 4.3. Namely for each subcircuit obtained by removing the subcircuit corresponding to the output $y_i$ , after applying the random-zero transform to all $r_{ij}$ except those used only once, we obtain the $C_{otp}$ circuit whose t-SNI property is proven in Lemma 8. For the property of RefreshMasks proven in Lemma 4, the successive application of rules R1, R2 and R3 also leads to a polynomial-time verification. Namely, since $y_n$ is probed, rule R1 is applied by performing a loop on $1 \le i \le n-1$ and removing the subcircuit corresponding to $y_i$ . In the resulting subcircuit, the only random that is used only once is $r_i$ . Therefore by applying rule R2, one obtains the final circuit $(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots x_{n-1}, x_n \oplus r_i)$ where the last output $y_n = x_n \oplus r_i$ is probed. The final circuit above has now n+2 intermediate variables, and since $y_n$ is already probed, the number of (n-1)-uples to consider is $\binom{n+1}{n-2} \le n^3$ . Therefore the property can be verified using the generic technique from Section 3, and in this case it works in polynomial time. Finally, for the property of RefreshMasks from Lemma 5, the successive application of rules R1, R2 and R3 also leads to a polynomial-time verification. Recall that in Lemma 5 the Refresh-Masks circuit takes as input n+1 shares with $x_{n+1}=0$ , and outputs n+1 shares $y_i$ . Firstly, rule R1 is applied by performing a loop on $1 \le i \le n+1$ and removing the subcircuit corresponding to $y_i$ . When the subcircuit corresponding to $y_{n+1}$ has been removed, the resulting circuit is exactly the $C_{otp}$ circuit; since all randoms are used once, the rule R2 does nothing, and eventually the rule R3 applies, where the required property is proved in Lemma 13. Moreover, when the subcircuit corresponding to $y_i$ has been removed for $1 \le i \le n$ , all randoms are used twice except $r_i$ . By applying Rule R2, one obtains the final circuit $(x_1, \ldots, x_{i-1}, x_{i+1}, \ldots x_n, r_i)$ . As previously, the required property can therefore be verified by rule R3 using the generic technique from Section 3, and in this case it also works in polynomial time. For the automatic verification of the four above properties, we refer to [Cor17b] for the source code, and to Appendix F for the transcript of the proof for n = 3. # E Transcript of Lemma 9 ``` > (check-secmult-ni-poly 3) Input: (X1 X2 X3) (Y1 Y2 Y3) Output: (M \ 1 \ 1) R1 R2 (M 1 2 R1) (M 2 2) R3 (M 2 3 R3) (M 1 3 R2) (M \ 3 \ 3) Case 0: no probe in (M 1 1) R1 R2 New circuit: (M 1 2 R1) R3 (M 2 3 R3) (M 3 3) (M 1 3 R2) Simplify: (M \ 2 \ 2) R1 R3 (M 2 3 R3) (M \ 3 \ 3) R2 (M \ 2 \ 2) Random zero: R3 (M 2 3 R3) (M \ 3 \ 3) Case 1: no probe in (M\ 1\ 2\ R1)\ (M\ 2\ 2)\ R3 New circuit: (M 1 1) R1 R2 (M 1 3 R2) (M 2 3 R3) (M \ 3 \ 3) Simplify: (M \ 1 \ 1) R1 R2 (M 1 3 R2) R3 (M \ 3 \ 3) Random zero: (M \ 1 \ 1) R2 (M 1 3 R2) (M \ 3 \ 3) Case 2: no probe in (M 1 3 R2) (M 2 3 R3) (M 3 3) (M 1 1) New circuit: R1 R2 (M 1 2 R1) (M 2 2) R3 (M 1 1) Simplify: R2 R1 (M 2 2) (M 1 2 R1) R3 Random zero: (M \ 1 \ 1) R1 (M 1 2 R1) (M 2 2) ``` **Fig. 12.** Formal verification of the t-NI property of the SecMult circuit for n = 3. For simplicity we use a different notation to represent the $a_ib_j$ and $r_{ij}$ variables, namely we write $(M \ i \ j)$ for $a_ib_j$ and $(M \ i \ j \ R)$ for $r_{ij} = (a_jb_i + r_{ji}) + a_ib_j$ where R corresponds to $r_{ji}$ , for j < i. # F Transcript of Automatic Verification ``` > (check-circuits 3) Refreshmasks: t-NI property: Input: (X1 X2 X3) Circuit: ((+ R1 X1) (+ R2 X2) (+ R2 (+ R1 X3))) R1: ((+ R2 X2) (+ R2 (+ R1 X3))) R2: (X2 (+ R1 X3)) R3: is OTP: NIL R3: check: T R1: ((+ R1 X1) (+ R2 (+ R1 X3))) R2: (X1 (+ R2 X3)) R3: is OTP: NIL R3: check: T R1: ((+ R1 X1) (+ R2 X2)) R2: ((+ R1 X1) (+ R2 X2)) R3: is OTP: T Verif: T FullRefresh: t-SNI property: Input: (X1 X2 X3) Circuit: ((+ R2 (+ R1 X1)) (+ R3 (+ R1 X2)) (+ R3 (+ R2 X3))) R1: ((+ R3 (+ R1 X2)) (+ R3 (+ R2 X3))) R2: ((+ R1 X2) (+ R2 X3)) R3: is OTP: T R1: ((+ R2 (+ R1 X1)) (+ R3 (+ R2 X3))) R2: ((+ R1 X1) (+ R3 X3)) R3: is OTP: T R1: ((+ R2 (+ R1 X1)) (+ R3 (+ R1 X2))) R2: ((+ R2 X1) (+ R3 X2)) R3: is OTP: T Verif: T Refreshmasks: with probed yn: Input: (X1 X2 X3) Circuit: ((+ R1 X1) (+ R2 X2) (+ R2 (+ R1 X3))) R1: ((+ R2 X2) (+ R2 (+ R1 X3))) R2: (X2 (+ R1 X3)) R3: is OTP: NIL R3: check: T R1: ((+ R1 X1) (+ R2 (+ R1 X3))) R2: (X1 (+ R2 X3)) R3: is OTP: NIL R3: check: T R1: ((+ R1 X1) (+ R2 X2)) R2: ((+ R1 X1) (+ R2 X2)) R3: is OTP: T Verif: T Refreshmasks: with probed x_{-}\{n+1\}=0: Input: (X1 X2 X3 0) Circuit: ((+ R1 X1) (+ R2 X2) (+ R3 X3) (+ R3 (+ R2 (+ R1 0)))) R1: ((+ R2 X2) (+ R3 X3) (+ R3 (+ R2 (+ R1 0)))) R2: (X2 X3 R1) R3: is OTP: NIL R3: check: T R1: ((+ R1 X1) (+ R3 X3) (+ R3 (+ R2 (+ R1 0)))) R2: (X1 X3 R2) R3: is OTP: NIL R3: check: T R1: ((+ R1 X1) (+ R2 X2) (+ R3 (+ R2 (+ R1 0)))) R2: (X1 X2 R3) R3: is OTP: NIL R3: check: T R1: ((+ R1 X1) (+ R2 X2) (+ R3 X3)) R2: ((+ R1 X1) (+ R2 X2) (+ R3 X3)) R3: is OTP: T Verif: T ``` **Fig. 13.** Automatic verification of lemmas 2, 3, 4 and 5 for n = 3, based on rules R1, R2 and R3.