We study the concrete security of high-performance implementations of half-gates garbling, which all rely on (hardware-accelerated)~AES. We find that current instantiations using k-bit wire labels can be completely broken—in the sense that the circuit evaluator learns all the inputs of the circuit garbler—in time O(2k/C), where C is the total number of (non-free) gates that are garbled, possibly across multiple independent executions. The attack can be applied to existing circuit-garbling libraries using k=80 when C≈109, and would require 267 machine-months and cost about USD 3500 to implement on the Google Cloud Platform. Since the attack can be entirely parallelized, the attack could be carried out in about a month using \approx 250 machines.
With this as our motivation, we seek a way to instantiate the hash function in the half-gates scheme so as to achieve better concrete security. We present a construction based on AES that achieves optimal security in the single-instance setting (when only a single circuit is garbled). We also show how to modify the half-gates scheme so that its concrete security does not degrade in the multi-instance setting. Our modified scheme is as efficient as prior work in networks with up to 2 Gbps bandwidth.
Secure two party computation (2PC) of arbitrary programs can be efficiently achieved using garbled circuits (GC). The bottleneck of GC efficiency is communication. It is widely believed that it is necessary to transmit the entire GC during 2PC, even for conditional branches that are not taken.
This folklore belief is false.
We propose a novel GC technique, stacked garbling, that eliminates the communication cost of inactive conditional branches. We extend the ideas of conditional GC evaluation explored in (Kolesnikov, Asiacrypt 18) and (Heath and Kolesnikov, Eurocrypt 20). Unlike these works, ours is for general 2PC where no player knows which conditional branch is taken.
Our garbling scheme, Stack, requires communication proportional to the longest execution path rather than to the entire circuit. Stack is compatible with state-of-the-art techniques, such as free XOR and half-gates.
Stack is a garbling scheme. As such, it can be plugged into a variety of existing protocols, and the resulting round complexity is the same as that of standard GC. The approach does incur computation cost quadratic in the conditional branching factor vs linear in standard schemes, but the tradeoff is beneficial for most programs: GC computation even on weak hardware is faster than GC transmission on fast channels.
We implemented Stack in C++. Stack reduces communication cost by approximately the branching factor: for 16 branches, communication is reduced by 10.5x. In terms of wall-clock time for circuits with branching factor 16 over a 50 Mbps WAN on a laptop, Stack outperforms state-of- the-art half-gates-based 2PC by more than 4x.
Zero-knowledge (ZK) proofs (ZKP) have received wide attention, focusing on non-interactivity, short proof size, and fast verification time. We focus on the fastest total proof time, in particular for large Boolean circuits. Under this metric, Garbled Circuit (GC)-based ZKP (Jawurek et al., [JKO], CCS 2013) remained the state-of-the-art technique due to the low-constant linear scaling of computing the garbling. We improve GC-ZKP for proof statements with conditional clauses. Our communication is proportional to the longest branch rather than to the entire proof statement. This is most useful when the number m of branches is large, resulting in up to factor $m\times$ improvement over JKO. In our proof-of-concept illustrative application, prover P demonstrates knowledge of a bug in a codebase consisting of any number of snippets of actual C code. Our computation cost is linear in the size of the codebase and communication is constant in the number of snippets. That is, we require only enough communication for a single largest snippet! Our conceptual contribution is stacked garbling for ZK, a privacy-free circuit garbling scheme that can be used with the JKO GC-ZKP protocol to construct more efficient ZKP. Given a Boolean circuit C and computational security parameter $\kappa$, our garbling is $L \cdot \kappa$ bits long, where $L$ is the length of the longest execution path in C. All prior concretely efficient garbling schemes produce garblings of size $|C| \cdot \kappa$. The computational cost of our scheme is not increased over prior state-of-the-art. We implement our GC-ZKP and demonstrate significantly improved ($m\times$ over JKO) ZK performance for functions with branching factor $m$. Compared with recent ZKP (STARK, Libra, KKW, Ligero, Aurora, Bulletproofs), our scheme offers much better proof times for larger circuits ($35-1000\times$ or more, depending on circuit size and compared scheme). For our illustrative application, we consider four C code snippets, each of about 30-50 LOC; one snippet allows an invalid memory dereference. The entire proof takes 0.15 seconds and communication is 1.5 MB.
Wang et al. (CCS 2017) recently proposed a protocol for malicious secure two-party computation that represents the state-of-the- art with regard to concrete efficiency in both the single-execution and amortized settings, with or without preprocessing. We show here several optimizations of their protocol that result in a significant improvement in the overall communication and running time. Specifically:
We show how to make the “authenticated garbling” at the heart of their protocol compatible with the half-gate optimization of Zahur et al. (Eurocrypt 2015). We also show how to avoid sending an information-theoretic MAC for each garbled row. These two optimizations give up to a 2.6x improvement in communication, and make the communication of the online phase essentially equivalent to that of state-of-the-art semi-honest secure computation.
We show various optimizations to their protocol for generating AND triples that, overall, result in a 1.5x improvement in the communication and a 2x improvement in the computation for that step.
Given a set S = {C_1,…,C_k } of Boolean circuits, we show how to construct a universal for S circuit C_0, which is much smaller than Valiant’s universal circuit or a circuit incorporating all C_1,…,C_k. Namely, given C_1,…,C_k and viewing them as directed acyclic graphs (DAGs) D_1,…,D_k, we embed them in a new graph D_0. The embedding is such that a GC garbling of any of C_1,…,C_k could be implemented by a corresponding garbling of a circuit corresponding to D_0.
We show how to improve Garbled Circuit (GC) and GMW-based secure function evaluation (SFE) of circuits with if/switch clauses using such S-universal circuit.
The most interesting case here is the application to the GMW approach. We provide a novel observation that in GMW the cost of processing a gate is almost the same for 5 (or more) Boolean inputs, as it is for the usual case of 2 Boolean inputs. While we expect this observation to greatly improve general GMW-based computation, in our context this means that GMW gates can be programmed almost for free, based on the secret-shared programming of the clause.
Our approach naturally and cheaply supports nested clauses. Our algorithm is a heuristic; we show that solving the circuit embedding problem is NP-hard. Our algorithms are in the semi-honest model and are compatible with Free-XOR.
We report on experimental evaluations and discuss achieved performance in detail. For 32 diverse circuits in our experiment, our construction results 6.1x smaller circuit than prior techniques.
In the setting of secure multiparty computation, a set of mutually distrustful parties carry out a joint computation of their inputs, without revealing anything but the output. Over recent years, there has been tremendous progress towards making secure computation practical, with great success in the two-party case. In contrast, in the multiparty case, progress has been much slower, even for the case of semi-honest adversaries.
In this paper, we consider the case of constant-round multiparty computation, via the garbled circuit approach of BMR (Beaver et al., STOC 1990). In recent work, it was shown that this protocol can be efficiently instantiated for semi-honest adversaries (Ben-Efraim et al., ACM CCS 2016). However, it scales very poorly with the number of parties, since the cost of garbled circuit evaluation is quadratic in the number of parties, per gate. Thus, for a large number of parties, it becomes expensive. We present a new way of constructing a BMR-type garbled circuit that can be evaluated with only a constant number of operations per gate. Our constructions use key-homomorphic pseudorandom functions (one based on DDH and the other on Ring-LWE) and are concretely efficient. In particular, for a large number of parties (e.g., 100), our new circuit can be evaluated faster than the standard BMR garbled circuit that uses only AES computations. Thus, our protocol is an important step towards achieving concretely efficient large-scale multiparty computation for Internet-like settings (where constant-round protocols are needed due to high latency).
A crucial issue, that mostly affects the performance of actively secure computation of RAM programs, is the task of reading/writing from/to memory in a private and authenticated manner. Previous works in the active security and multiparty settings are based purely on the SPDZ (reactive) protocol, hence, memory accesses are treated just like any input to the computation. However, a garbled-circuit-based construction (such as BMR), which benefits from a lower round complexity, must resolve the issue of converting memory data bits to their corresponding wire keys and vice versa.
In this work we propose three techniques to construct a secure memory access, each appropriates to a different level of abstraction of the underlying garbling functionality. We provide a comparison between the techniques by several metrics. To the best of our knowledge, we are the first to construct, prove and implement a concretely efficient garbled-circuit-based actively secure RAM computation with dishonest majority.
Our construction is based on our third (most efficient) technique, cleverly utilizing the underlying SPDZ authenticated shares (Damgård et al., Crypto 2012), yields lean circuits and a constant number of communication rounds per physical memory access. Specifically, it requires no additional circuitry on top of the ORAM’s, incurs only two rounds of broadcasts between every two memory accesses and has a multiplicative overhead of 2 on top of the ORAM’s storage size.
Our protocol outperforms the state of the art in this settings when deployed over WAN. Even when simulating a very conservative RTT of 100ms our protocol is at least one order of magnitude faster than the current state of the art protocol of Keller and Scholl (Asiacrypt 2015).
Consider key agreement by two parties who start out knowing a common secret (which we refer to as “pass-string”, a generalization of “password”), but face two complications: (1) the pass-string may come from a low-entropy distribution, and (2) the two parties’ copies of the pass-string may have some noise, and thus not match exactly. We provide the first efficient and general solutions to this problem that enable, for example, key agreement based on commonly used biometrics such as iris scans.
The problem of key agreement with each of these complications individually has been well studied in literature. Key agreement from low-entropy shared pass-strings is achieved by password-authenticated key exchange (PAKE), and key agreement from noisy but high-entropy shared pass-strings is achieved by information-reconciliation protocols as long as the two secrets are “close enough.” However, the problem of key agreement from noisy low-entropy pass-strings has never been studied.
We introduce (universally composable) fuzzy password-authenticated key exchange (fPAKE), which solves exactly this problem. fPAKE does not have any entropy requirements for the pass-strings, and enables secure key agreement as long as the two pass-strings are “close” for some notion of closeness. We also give two constructions. The first construction achieves our fPAKE definition for any (efficiently computable) notion of closeness, including those that could not be handled before even in the high-entropy setting. It uses Yao’s garbled circuits in a way that is only two times more costly than their use against semi-honest adversaries, but that guarantees security against malicious adversaries. The second construction is more efficient, but achieves our fPAKE definition only for pass-strings with low Hamming distance. It builds on very simple primitives: robust secret sharing and PAKE.
In this work, we present two new universally composable, actively secure, constant round multi-party protocols for generating BMR garbled circuits with free-XOR and reduced costs.
(1) Our first protocol takes a generic approach using any secret-sharing based MPC protocol for binary circuits, and a correlated oblivious transfer functionality.
(2) Our specialized protocol uses secret-sharing based MPC with information-theoretic MACs. This approach is less general, but requires no additional correlated OTs to compute the garbled circuit.
In both approaches, the underlying secret-sharing based protocol is only used for one secure F2
multiplication per AND gate. An interesting consequence of this is that, with current techniques, constant round MPC for binary circuits is not much more expensive than practical, non-constant round protocols.
We demonstrate the practicality of our second protocol with an implementation, and perform experiments with up to 9
parties securely computing the AES and SHA-256 circuits. Our running times improve upon the best possible performance with previous BMR-based protocols by 60 times.
We introduce {\em Free Hash}, a new approach to generating Garbled Circuit (GC) hash at no extra cost during GC generation. This is in contrast with state-of-the-art approaches, which hash GCs at computational cost of up to 6× of GC generation. GC hashing is at the core of the cut-and-choose technique of GC-based secure function evaluation (SFE).
Our main idea is to intertwine hash generation/verification with GC generation and evaluation. While we {\em allow} an adversary to generate a GC \GCˆ whose hash collides with an honestly generated \GC, such a \GCˆ w.h.p. will fail evaluation and cheating will be discovered. Our GC hash is simply a (slightly modified) XOR of all the gate table rows of GC. It is compatible with Free XOR and half-gates garbling, and can be made to work with many cut-and-choose SFE protocols.
With today’s network speeds being not far behind hardware-assisted fixed-key garbling throughput, eliminating the GC hashing cost will significantly improve SFE performance. Our estimates show substantial cost reduction in typical settings, and up to factor 6 in specialized applications relying on GC hashes.
We implemented GC hashing algorithm and report on its performance.
In cut-and-choose protocols for two-party secure computation (2PC) the main overhead is the number of garbled circuits that must be sent. Recent work (Lindell, Riva; Huang et al., Crypto 2014) has shown that in a batched setting, when the parties plan to evaluate the same function N times, the number of garbled circuits per execution can be reduced by a O(logN) factor compared to the single-execution setting. This improvement is significant in practice: an order of magnitude for N as low as one thousand. % Besides the number of garbled circuits, communication round trips are another significant performance bottleneck. Afshar et al. (Eurocrypt 2014) proposed an efficient cut-and-choose 2PC that is round-optimal (one message from each party), but in the single-execution setting.
In this work we present new malicious-secure 2PC protocols that are round-optimal and also take advantage of batching to reduce cost. Our contributions include: \begin{itemize} \item A 2-message protocol for batch secure computation (N instances of the same function). The number of garbled circuits is reduced by a O(logN) factor over the single-execution case. However, other aspects of the protocol that depend on the input/output size of the function do not benefit from the same O(logN)-factor savings. \item A 2-message protocol for batch secure computation, in the random oracle model. All aspects of this protocol benefit from the O(logN)-factor improvement, except for small terms that do not depend on the function being evaluated. \item A protocol in the offline/online setting. After an offline preprocessing phase that depends only on the function f and N, the parties can securely evaluate f, N times (not necessarily all at once). Our protocol’s online phase is only 2 messages, and the total online communication is only ℓ+O(κ) bits, where ℓ is the input length of f and κ is a computational security parameter. This is only O(κ) bits more than the information-theoretic lower bound for malicious 2PC.
We propose a new protocol for two-party computation, secure against malicious adversaries, that is significantly faster than prior work in the single-execution setting (i.e., non-amortized and with no pre-processing). In particular, for computational security parameter κ and statistical security parameter ρ, our protocol uses only ρ garbled circuits and O(κ) public-key operations, whereas previous work with the same number of garbled circuits required either O(ρn+κ) public-key operations (where n is the input/output length) or a second execution of a secure-computation sub-protocol. Our protocol can be based on the decisional Diffie-Hellman assumption in the standard model.
We implement our protocol to evaluate its performance. With ρ=40, our implementation securely computes an AES evaluation in 65 ms over a local-area network using a single thread without any pre-computation, 22x faster than the best prior work in the non-amortized setting. The relative performance of our protocol is even better for functions with larger input/output lengths.
We propose a new, constant-round protocol for multi-party computation of boolean circuits that is secure against an arbitrary number of malicious corruptions. At a high level, we extend and generalize recent work of Wang et al. in the two-party setting. Namely, we design an efficient preprocessing phase that allows the parties to generate authenticated information; we then show how to use this information to distributively construct a single “authenticated” garbled circuit that is evaluated by one party.
Our resulting protocol improves upon the state-of-the-art both asymptotically and concretely. We validate these claims via several experiments demonstrating both the efficiency and scalability of our protocol:
Efficiency: For three-party computation over a LAN, our protocol requires only 95 ms to evaluate AES. This is roughly a 700X improvement over the best prior work, and only 2.5X slower than the best known result in the two-party setting. In general, for n-party computation our protocol improves upon prior work (which was never implemented) by a factor of more than 230n, e.g., an improvement of 3 orders of magnitude for 5-party computation.
Scalability: We successfully executed our protocol with a large number of parties located all over the world, computing (for example) AES with 128 parties across 5 continents in under 3 minutes. Our work represents the largest-scale demonstration of secure computation to date.
We propose a simple and efficient framework for obtaining efficient constant-round protocols for maliciously secure two-party computation. Our framework uses a function-independent preprocessing phase to generate authenticated information for the two parties; this information is then used to construct a single“authenticated” garbled circuit which is transmitted and evaluated.We also show how to efficiently instantiate the preprocessing phase by designing a highly optimized version of the TinyOT protocol by Nielsen et al. Our overall protocol outperforms existing work in both the single-execution and amortized settings, with or without preprocessing: In the single-execution setting, our protocol evaluates an AES circuit with malicious security in37 ms with an online time of just 1 ms. Previous work with the best online time (also 1 ms)requires 124 ms in total; previous work with the best total time requires 62 ms (with 14 ms online time). If we amortize the computation over 1024 executions, each AES computation requires just 6.7 ms with roughly the same online time as above. The best previous work in the amortized setting has roughly the same total time but does not support function-independent preprocessing.Our work shows that the performance penalty for maliciously secure two-party computation (as compared to semi-honest security) is much smaller than previously believed.
Cut-and-choose (CC) is the standard approach to making Yao’s garbled circuit two-party computation (2PC) protocol secure against malicious adversaries. Traditional cut-and-choose operates at the level of entire circuits, whereas the LEGO paradigm (Nielsen & Orlandi, TCC 2009) achieves asymptotic improvements by performing cut-and-choose at the level of individual gates. In this work we propose a unified approach called DUPLO that spans the entire continuum between these two extremes. The cut-and-choose step in our protocol operates on the level of arbitrary circuit “components,” which can range in size from a single gate to the entire circuit itself.
With this entire continuum of parameter values at our disposal, we find that the best way to scale 2PC to computations of realistic size is to use CC components of intermediate size, and not at the extremes. On computations requiring several millions of gates or more, our more general approach to CC gives between 4-7x improvement over existing approaches.
In addition to our technical contributions of modifying and optimizing previous protocol techniques to work with general CC components, we also provide an extension of the recent Frigate circuit compiler (Mood et al, Euro S&P 2016) to effectively express any C-style program in terms of components which can be processed efficiently using our protocol.
Advancements in deep learning enable cloud servers to provide inference-as-a-service for clients. In this scenario, clients send their raw data to the server to run the deep learning model and send back the results. One standing challenge in this setting is to ensure the privacy of the clients’ sensitive data. Oblivious inference is the task of running the neural network on the client’s input without disclosing the input or the result to the server. This paper introduces XONN (pronounced /ZAn/), a novel end-to-end framework based on Yao’s Garbled Circuits (GC) protocol, that provides a paradigm shift in the conceptual and practical realization of oblivious inference. In XONN, the costly matrix-multiplication operations of the deep learning model are replaced with XNOR operations that are essentially free in GC. We further provide a novel algorithm that customizes the neural network such that the runtime of the GC protocol is minimized without sacrificing the inference accuracy.
We design a user-friendly high-level API for XONN, allowing expression of the deep learning model architecture in an unprecedented level of abstraction. We further provide a compiler to translate the model description from high-level Python (i.e., Keras) to that of XONN. Extensive proof-of-concept evaluation on various neural network architectures demonstrates that XONN outperforms prior art such as Gazelle (USENIX Security’18) by up to 7×, MiniONN (ACM CCS’17) by 93×, and SecureML (IEEE S&P’17) by 37×. State-of-the-art frameworks require one round of interaction between the client and the server for each layer of the neural network, whereas, XONN requires a constant round of interactions for any number of layers in the model. XONN is first to perform oblivious inference on Fitnet architectures with up to 21 layers, suggesting a new level of scalability compared with state-of-the-art. Moreover, we evaluate XONN on four datasets to perform privacy-preserving medical diagnosis. The datasets include breast cancer, diabetes, liver disease, and Malaria.