The counter mode (CTR) is a simple, efficient and widely used encryption mode using a block cipher. It comes with a security proof that guarantees no attacks up to the birthday bound (i.e. as long as the number of encrypted blocks σ satisfies σ≪2n/2), and a matching attack that can distinguish plaintext/ciphertext pairs from random using about 2n/2 blocks of data.
The main goal of this paper is to study attacks against the counter mode beyond this simple distinguisher. We focus on message recovery attacks, with realistic assumptions about the capabilities of an adversary, and evaluate the full time complexity of the attacks rather than just the query complexity. Our main result is an attack to recover a block of message with complexity O~(2n/2). This shows that the actual security of CTR is similar to that of CBC, where collision attacks are well known to reveal information about the message.
To achieve this result, we study a simple algorithmic problem related to the security of the CTR mode: the missing difference problem. We give efficient algorithms for this problem in two practically relevant cases: where the missing difference is known to be in some linear subspace, and when the amount of data is higher than strictly required.
As a further application, we show that the second algorithm can also be used to break some polynomial MACs such as GMAC and Poly1305, with a universal forgery attack with complexity O~(22n/3).
We design, implement, and evaluate a zkSNARK for Rank-1 Constraint Satisfaction (R1CS), a widely-deployed NP-complete language that is undergoing standardization. Our construction uses a transparent setup, is plausibly post-quantum secure, and uses lightweight cryptography. A proof attesting to the satisfiability of n constraints has size O(log2n); it can be produced with O(nlogn) field operations and verified with O(n). At 128 bits of security, proofs are less than 130kB even for several million constraints, more than 20x shorter than prior zkSNARK with similar features.
A key ingredient of our construction is a new Interactive Oracle Proof (IOP) for solving a univariate analogue of the classical sumcheck problem [LFKN92], originally studied for multivariate polynomials. Our protocol verifies the sum of entries of a Reed–Solomon codeword over any subgroup of a field.
We also provide libiop, an open-source library for writing IOP-based arguments, in which a toolchain of transformations enables programmers to write new arguments by writing simple IOP sub-components. We have used this library to specify our construction and prior ones.
Tor is a primary tool for maintaining anonymity online. It provides a low-latency, circuit-based, bidirectional secure channel between two parties through a network of onion routers, with the aim of obscuring exactly who is talking to whom, even to adversaries controlling part of the network. Tor relies heavily on cryptographic techniques, yet its onion encryption scheme is susceptible to tagging attacks (Fu and Ling, 2009), which allow an active adversary controlling the first and last node of a circuit to deanonymize with near-certainty. This contrasts with less active traffic correlation attacks, where the same adversary can at best deanonymize with high probability. The Tor project has been actively looking to defend against tagging attacks and its most concrete alternative is proposal 261, which specifies a new onion encryption scheme based on a variable-input-length tweakable cipher.
We provide a formal treatment of low-latency, circuit-based onion encryption, relaxed to the unidirectional setting, by expanding existing secure channel notions to the new setting and introducing circuit hiding to capture the anonymity aspect of Tor. We demonstrate that circuit hiding prevents tagging attacks and show proposal 261’s relay protocol is circuit hiding and thus resistant against tagging attacks.
This paper revisits the multi-user (mu) security of symmetric encryption, from the perspective of delivering an analysis of the AES-GCM-SIV AEAD scheme. Our end result shows that its mu security is comparable to that achieved in the single-user setting. In particular, even when instantiated with short keys (e.g., 128 bits), the security of AES-GCM-SIV is not impacted by the collisions of two user keys, as long as each individual nonce is not re-used by too many users. Our bounds also improve existing analyses in the single-user setting, in particular when messages of variable lengths are encrypted. We also validate security against a general class of key-derivation methods, including one that halves the complexity of the final proposal.
As an intermediate step, we consider mu security in a setting where the data processed by every user is bounded, and where user keys are generated according to arbitrary, possibly correlated distributions. This viewpoint generalizes the currently adopted one in mu security, and can be used to analyze re-keying practices.
State machine replication, or “consensus”, is a central abstraction for distributed systems where a set of nodes seek to agree on an ever-growing, linearly-ordered log. In this paper, we propose a practical new paradigm called Thunderella for achieving state machine replication by combining a fast, asynchronous path with a (slow) synchronous “fall-back” path (which only gets executed if something goes wrong); as a consequence, we get simple state machine replications that essentially are as robust as the best synchronous protocols, yet “optimistically” (if a super majority of the players are honest), the protocol “instantly” confirms transactions. We provide instantiations of this paradigm in both permissionless (using proof-of-work) and permissioned settings. Most notably, this yields a new blockchain protocol (for the permissionless setting) that remains resilient assuming only that a majority of the computing power is controlled by honest players, yet optimistically—if 3/4 of the computing power is controlled by honest players, and a special player called the “accelerator”, is honest—transactions are confirmed as fast as the actual message delay in the network. We additionally show the 3/4 optimistic bound is tight for protocols that are resilient assuming only an honest majority.
SPDZ denotes a multiparty computation scheme in the preprocessing model based on somewhat homomorphic encryption (SHE) in the form of BGV. At CCS ’16, Keller et al. presented MASCOT, a replacement of the preprocessing phase using oblivious transfer instead of SHE, improving by two orders of magnitude on the SPDZ implementation by Damgård et al. (ESORICS ’13). In this work, we show that using SHE is faster than MASCOT in many aspects:
We present a protocol that uses semi-homomorphic (addition-only) encryption. For two parties, our BGV-based implementation is 6 times faster than MASCOT on a LAN and 20 times faster in a WAN setting. The latter is roughly the reduction in communication.
We show that using the proof of knowledge in the original work by Damgård et al. (Crypto ’12) is more efficient in practice than the one used in the implementation mentioned above by about one order of magnitude.
We present an improvement to the verification of the aforementioned proof of knowledge that increases the performance with a growing number of parties, doubling it for 16 parties.
Polynomial-based authentication algorithms, such as GCM and Poly1305, have seen widespread adoption in practice. Due to their importance, a significant amount of attention has been given to understanding and improving both proofs and attacks against such schemes. At EUROCRYPT 2005, Bernstein published the best known analysis of the schemes when instantiated with PRPs, thereby establishing the most lenient limits on the amount of data the schemes can process per key. A long line of work, initiated by Handschuh and Preneel at CRYPTO 2008, finds the best known attacks, advancing our understanding of the fragility of the schemes. Yet surprisingly, no known attacks perform as well as the predicted worst-case attacks allowed by Bernstein’s analysis, nor has there been any advancement in proofs improving Bernstein’s bounds, and the gap between attacks and analysis is significant. We settle the issue by finding a novel attack against polynomial-based authentication algorithms using PRPs, and combine it with new analysis, to show that Bernstein’s bound, and our attacks, are optimal.
A crucial issue, that mostly affects the performance of actively secure computation of RAM programs, is the task of reading/writing from/to memory in a private and authenticated manner. Previous works in the active security and multiparty settings are based purely on the SPDZ (reactive) protocol, hence, memory accesses are treated just like any input to the computation. However, a garbled-circuit-based construction (such as BMR), which benefits from a lower round complexity, must resolve the issue of converting memory data bits to their corresponding wire keys and vice versa.
In this work we propose three techniques to construct a secure memory access, each appropriates to a different level of abstraction of the underlying garbling functionality. We provide a comparison between the techniques by several metrics. To the best of our knowledge, we are the first to construct, prove and implement a concretely efficient garbled-circuit-based actively secure RAM computation with dishonest majority.
Our construction is based on our third (most efficient) technique, cleverly utilizing the underlying SPDZ authenticated shares (Damgård et al., Crypto 2012), yields lean circuits and a constant number of communication rounds per physical memory access. Specifically, it requires no additional circuitry on top of the ORAM’s, incurs only two rounds of broadcasts between every two memory accesses and has a multiplicative overhead of 2 on top of the ORAM’s storage size.
Our protocol outperforms the state of the art in this settings when deployed over WAN. Even when simulating a very conservative RTT of 100ms our protocol is at least one order of magnitude faster than the current state of the art protocol of Keller and Scholl (Asiacrypt 2015).
While there has been a lot of progress in designing efficient custom protocols for computing Private Set Intersection (PSI), there has been less research on using generic Multi-Party Computation (MPC) protocols for this task. However, there are many variants of the set intersection functionality that are not addressed by the existing custom PSI solutions and are easy to compute with generic MPC protocols (e.g., comparing the cardinality of the intersection with a threshold or measuring ad conversion rates).
Generic PSI protocols work over circuits that compute the intersection. For sets of size n, the best known circuit constructions conduct O(nlogn) or O(nlogn/loglogn) comparisons (Huang et al., NDSS’12 and Pinkas et al., USENIX Security’15). In this work, we propose new circuit-based protocols for computing variants of the intersection with an almost linear number of comparisons. Our constructions are based on new variants of Cuckoo hashing in two dimensions.
We present an asymptotically efficient protocol as well as a protocol with better concrete efficiency. For the latter protocol, we determine the required sizes of tables and circuits experimentally, and show that the run-time is concretely better than that of existing constructions.
The protocol can be extended to a larger number of parties. The proof technique for analyzing Cuckoo hashing in two dimensions is new and can be generalized to analyzing standard Cuckoo hashing as well as other new variants of it.
Consider key agreement by two parties who start out knowing a common secret (which we refer to as “pass-string”, a generalization of “password”), but face two complications: (1) the pass-string may come from a low-entropy distribution, and (2) the two parties’ copies of the pass-string may have some noise, and thus not match exactly. We provide the first efficient and general solutions to this problem that enable, for example, key agreement based on commonly used biometrics such as iris scans.
The problem of key agreement with each of these complications individually has been well studied in literature. Key agreement from low-entropy shared pass-strings is achieved by password-authenticated key exchange (PAKE), and key agreement from noisy but high-entropy shared pass-strings is achieved by information-reconciliation protocols as long as the two secrets are “close enough.” However, the problem of key agreement from noisy low-entropy pass-strings has never been studied.
We introduce (universally composable) fuzzy password-authenticated key exchange (fPAKE), which solves exactly this problem. fPAKE does not have any entropy requirements for the pass-strings, and enables secure key agreement as long as the two pass-strings are “close” for some notion of closeness. We also give two constructions. The first construction achieves our fPAKE definition for any (efficiently computable) notion of closeness, including those that could not be handled before even in the high-entropy setting. It uses Yao’s garbled circuits in a way that is only two times more costly than their use against semi-honest adversaries, but that guarantees security against malicious adversaries. The second construction is more efficient, but achieves our fPAKE definition only for pass-strings with low Hamming distance. It builds on very simple primitives: robust secret sharing and PAKE.
A boomerang attack is a cryptanalysis framework that regards a block cipher E as the composition of two sub-ciphers E1∘E0 and builds a particular characteristic for E with probability p2q2 by combining differential characteristics for E0 and E1 with probability p and q, respectively. Crucially the validity of this figure is under the assumption that the characteristics for E0 and E1 can be chosen independently. Indeed, Murphy has shown that independently chosen characteristics may turn out to be incompatible. On the other hand, several researchers observed that the probability can be improved to p or q around the boundary between E0 and E1 by considering a positive dependency of the two characteristics, e.g.~the ladder switch and S-box switch by Biryukov and Khovratovich. This phenomenon was later formalised by Dunkelman et al.~as a sandwich attack that regards E as E1∘Em∘E0, where Em satisfies some differential propagation among four texts with probability r, and the entire probability is p2q2r. In this paper, we revisit the issue of dependency of two characteristics in Em, and propose a new tool called Boomerang Connectivity Table (BCT), which evaluates r in a systematic and easy-to-understand way when Em is composed of a single S-box layer. With the BCT, previous observations on the S-box including the incompatibility, the ladder switch and the S-box switch are represented in a unified manner. Moreover, the BCT can detect a new switching effect, which shows that the probability around the boundary may be even higher than p or q. To illustrate the power of the BCT-based analysis, we improve boomerang attacks against Deoxys-BC, and disclose the mechanism behind an unsolved probability amplification for generating a quartet in SKINNY. Lastly, we discuss the issue of searching for S-boxes having good BCT and extending the analysis to modular addition.
Bootstrapping is a crucial operation in Gentry’s breakthrough work on fully homomorphic encryption (FHE), where a homomorphic encryption scheme evaluates its own decryption algorithm. There has been a couple of implementations of bootstrapping, among which HElib arguably marks the state-of-the-art in terms of throughput, ciphertext/message size ratio and support for large plaintext moduli.
In this work, we apply a family of “lowest digit removal” polynomials to improve homomorphic digit extraction algorithm which is crucial part in bootstrapping for both FV and BGV schemes. If the secret key has 1-norm h=l1(s) and the plaintext modulus is t=pr, we achieved bootstrapping depth logh+log(logp(ht)) in FV scheme. In case of the BGV scheme, we bring down the depth from logh+2logt to logh+logt.
We implemented bootstrapping for FV in the SEAL library. Besides the regular mode, we introduce another “slim mode’”, which restrict the plaintexts to batched vectors in Zpr. The slim mode has similar throughput as the regular mode, while each individual run is much faster and uses much smaller memory. For example, bootstrapping takes 6.75 seconds for 7 bit plaintext space with 64 slots and 1381 seconds for GF(257128) plaintext space with 128 slots. We also implemented our improved digit extraction procedure for the BGV scheme in HElib.
In this work we construct efficient aggregate signatures from the RSA assumption in the synchronized setting. In this setting, the signing algorithm takes as input a (time) period t as well the secret key and message. A signer should sign at most once for each t. A set of signatures can be aggregated so long as they were all created for the same period t. Synchronized aggregate signatures are useful in systems where there is a natural reporting period such as log and sensor data, or for signatures embedded in a blockchain protocol where the creation of an additional block is a natural synchronization event.
We design a synchronized aggregate signature scheme that works for a bounded number of periods T that is given as a parameter to a global system setup. The big technical question is whether we can create solutions that will perform well with the large T values that we might use in practice. For instance, if one wanted signing keys to last up to ten years and be able to issue signatures every second, then we would need to support a period bound of upwards of 228.
We build our solution in stages where we start with an initial solution that establishes feasibility, but has an impractically large signing time where the number of exponentiations and prime searches grows linearly with T. We prove this scheme secure in the standard model under the RSA assumption with respect to honestly-generated keys. We then provide a tradeoff method where one can tradeoff the time to create signatures with the space required to store private keys. One point in the tradeoff is where each scales with T√.
Finally, we reach our main innovation which is a scheme where both the signing time and storage scale with lgT which allows for us to keep both computation and storage costs modest even for large values of T. Conveniently, our final scheme uses the same verification algorithm, and has the same distribution of public keys and signatures as the first scheme. Thus we are able to recycle the existing security proof for the new scheme.
We also show how to extend our results to the identity-based setting in the random oracle model, which can further reduce the overall cryptographic overhead. We conclude with a detailed evaluation of the signing time and storage requirements for various practical settings of the system parameters.
Masking provides a high level of resistance against side-channel analysis. However, in practice there are many possible pitfalls when masking schemes are applied, and implementation flaws are easily overlooked. Over the recent years, the formal verification of masked software implementations has made substantial progress. In contrast to software implementations, hardware implementations are inherently susceptible to glitches. Therefore, the same methods tailored for software implementations are not readily applicable. In this work, we introduce a method to formally verify the security of masked hardware implementations that takes glitches into account. Our approach does not require any intermediate modeling steps of the targeted implementation and is not bound to a certain leakage model. The verification is performed directly on the circuit’s netlist, and covers also higher-order and multivariate flaws. Therefore, a sound but conservative estimation of the Fourier coefficients of each gate in the netlist is calculated, which characterize statistical dependence of the gates on the inputs and thus allow to predict possible leakages. In contrast to existing practical evaluations, like t-tests, this formal verification approach makes security statements beyond specific measurement methods, the number of evaluated leakage traces, and the evaluated devices. Furthermore, flaws detected by the verifier are automatically localized. We have implemented our method on the basis of an SMT solver and demonstrate the suitability on a range of correctly and incorrectly protected circuits of different masking schemes and for different protection orders. Our verifier is efficient enough to prove the security of a full masked AES S-box, and of the Keccak S-box up to the third protection order.
In this paper, we focus on collision attacks against Keccak hash function family and some of its variants. Following the framework developed by Dinur et al. at FSE~2012 where 4-round collisions were found by combining 3-round differential trails and 1-round connectors, we extend the connectors one round further hence achieve collision attacks for up to 5 rounds. The extension is possible thanks to the large degree of freedom of the wide internal state. By linearization of all S-boxes of the first round, the problem of finding solutions of 2-round connectors are converted to that of solving a system of linear equations. However, due to the quick freedom reduction from the linearization, the system has solution only when the 3-round differential trails satisfy some additional conditions. We develop a dedicated differential trail search strategy and find such special differentials indeed exist. As a result, the first practical collision attack against 5-round SHAKE128 and two 5-round instances of the Keccak collision challenges are found with real examples. We also give the first results against 5-round Keccak224 and 6-round Keccak collision challenges. It is remarked that the work here is still far from threatening the security of the full 24-round Keccak family.
We introduce {\em Free Hash}, a new approach to generating Garbled Circuit (GC) hash at no extra cost during GC generation. This is in contrast with state-of-the-art approaches, which hash GCs at computational cost of up to 6× of GC generation. GC hashing is at the core of the cut-and-choose technique of GC-based secure function evaluation (SFE).
Our main idea is to intertwine hash generation/verification with GC generation and evaluation. While we {\em allow} an adversary to generate a GC \GCˆ whose hash collides with an honestly generated \GC, such a \GCˆ w.h.p. will fail evaluation and cheating will be discovered. Our GC hash is simply a (slightly modified) XOR of all the gate table rows of GC. It is compatible with Free XOR and half-gates garbling, and can be made to work with many cut-and-choose SFE protocols.
With today’s network speeds being not far behind hardware-assisted fixed-key garbling throughput, eliminating the GC hashing cost will significantly improve SFE performance. Our estimates show substantial cost reduction in typical settings, and up to factor 6 in specialized applications relying on GC hashes.
We implemented GC hashing algorithm and report on its performance.
Supersingular isogeny Diffie-Hellman (SIDH) is an attractive candidate for post-quantum key exchange, in large part due to its relatively small public key sizes. A recent paper by Azarderakhsh, Jao, Kalach, Koziel and Leonardi showed that the public keys defined in Jao and De Feo’s original SIDH scheme can be further compressed by around a factor of two, but reported that the performance penalty in utilizing this compression blew the overall SIDH runtime out by more than an order of magnitude. Given that the runtime of SIDH key exchange is currently its main drawback in relation to its lattice- and code-based post-quantum alternatives, an order of magnitude performance penalty for a factor of two improvement in bandwidth presents a trade-off that is unlikely to favor public-key compression in many scenarios.
In this paper, we propose a range of new algorithms and techniques that accelerate SIDH public-key compression by more than an order of magnitude, making it roughly as fast as a round of standalone SIDH key exchange, while further reducing the size of the compressed public keys by approximately 12.5%. These improvements enable the practical use of compression, achieving public keys of only 330 bytes for the concrete parameters used to target 128 bits of quantum security and further strengthens SIDH as a promising post-quantum primitive.
Recent work on searchable symmetric encryption (SSE) has focused on increasing its expressiveness. A notable example is the OXT construction (Cash et al., CRYPTO ’13 ) which is the first SSE scheme to support conjunctive keyword queries with sub-linear search complexity. While OXT efficiently supports disjunctive and boolean queries that can be expressed in searchable normal form, it can only handle arbitrary disjunctive and boolean queries in linear time. This motivates the problem of designing expressive SSE schemes with worst-case sub-linear search; that is, schemes that remain highly efficient for any keyword query.
In this work, we address this problem and propose non-interactive highly efficient SSE schemes that handle arbitrary disjunctive and boolean queries with worst-case sub-linear search and optimal communication complexity. Our main construction, called IEX, makes black-box use of an underlying single keyword SSE scheme which we can instantiate in various ways. Our first instantiation, IEX-2Lev, makes use of the recent 2Lev construction (Cash et al., NDSS ’14 ) and is optimized for search at the expense of storage overhead. Our second instantiation, IEX-ZMF, relies on a new single keyword SSE scheme we introduce called ZMF and is optimized for storage overhead at the expense of efficiency (while still achieving asymptotically sub-linear search). Our ZMF construction is the first adaptively-secure highly compact SSE scheme and may be of independent interest. At a very high level, it can be viewed as an encrypted version of a new Bloom filter variant we refer to as a Matryoshka filter. In addition, we show how to extend IEX to be dynamic and forward-secure.
To evaluate the practicality of our schemes, we designed and implemented a new encrypted search framework called Clusion. Our experimental results demonstrate the practicality of IEX and of its instantiations with respect to either search (for IEX-2Lev) and storage overhead (for IEX-ZMF).
Reducing latency overhead while maintaining critical security guarantees like forward secrecy has become a major design goal for key exchange (KE) protocols, both in academia and industry. Of particular interest in this regard are 0-RTT protocols, a class of KE protocols which allow a client to send cryptographically protected payload in zero round-trip time (0-RTT) along with the very first KE protocol message, thereby minimizing latency. Prominent examples are Google’s QUIC protocol and the upcoming TLS protocol version 1.3.
Intrinsically, the main challenge in a 0-RTT key exchange is to achieve forward secrecy and security against replay attacks for the very first payload message sent in the protocol. According to cryptographic folklore, it is impossible to achieve forward secrecy for this message, because the session key used to protect it must depend on a non-ephemeral secret of the receiver. If this secret is later leaked to an attacker, it should intuitively be possible for the attacker to compute the session key by performing the same computations as the receiver in the actual session.
In this paper we show that this belief is actually false. We construct the first 0-RTT key exchange protocol which provides full forward secrecy for all transmitted payload messages and is automatically resilient to replay attacks. In our construction we leverage a puncturable key encapsulation scheme which permits each ciphertext to only be decrypted once. Fundamentally, this is achieved by evolving the secret key after each decryption operation, but without modifying the corresponding public key or relying on shared state.
Our construction can be seen as an application of the puncturable encryption idea of Green and Miers (S&P 2015). We provide a new generic and standard-model construction of this tool that can be instantiated with any selectively secure hierarchical identity-based key encapsulation scheme.
Micropayments (payments worth a few pennies) have numerous potential applications. A challenge in achieving them is that payment networks charge fees that are high compared to “micro” sums of money.
Wheeler (1996) and Rivest (1997) proposed probabilistic payments as a technique to achieve micropayments: a merchant receives a macro-value payment with a given probability so that, in expectation, he receives a micro-value payment. Despite much research and trial deployment, micropayment schemes have not seen adoption, partly because a trusted party is required to process payments and resolve disputes.
The widespread adoption of decentralized currencies such as Bitcoin (2009) suggests that decentralized micropayment schemes are easier to deploy. Pass and Shelat (2015) proposed several micropayment schemes for Bitcoin, but their schemes provide no more privacy guarantees than Bitcoin itself, whose transactions are recorded in plaintext in a public ledger.
We formulate and construct decentralized anonymous micropayment (DAM) schemes, which enable parties with access to a ledger to conduct offline probabilistic payments with one another, directly and privately. Our techniques extend those of Zerocash (2014) with a new probabilistic payment scheme; we further provide an efficient instantiation based on a new fractional message transfer protocol.
Double spending in our setting cannot be prevented. Our second contribution is an economic analysis that bounds the additional utility gain of any cheating strategy, and applies to virtually any probabilistic payment scheme with offline validation. In our construction, this bound allows us to deter double spending by way of advance deposits that are revoked when cheating is detected.
The security analysis of Keccak, the winner of SHA-3, has attracted considerable interest. Recently, some attention has been paid to the analysis of keyed modes of Keccak sponge function. As a notable example, the most efficient key recovery attacks on Keccak-MAC and Keyak were reported at EUROCRYPT’15 where cube attacks and cubeattack- like cryptanalysis have been applied. In this paper, we develop a new type of cube distinguisher, the conditional cube tester, for Keccak sponge function. By imposing some bit conditions for certain cube variables, we are able to construct cube testers with smaller dimensions. Our conditional cube testers are used to analyse Keccak in keyed modes. For reduced-round Keccak-MAC and Keyak, our attacks greatly improve the best known attacks in key recovery in terms of the number of rounds or the complexity. Moreover, our new model can also be applied to keyless setting to distinguish Keccak sponge function from random permutation.We provide a searching algorithm to produce the most efficient conditional cube tester by modeling it as an MILP (mixed integer linear programming) problem. As a result, we improve the previous distinguishing attacks on Keccak sponge function significantly. Most of our attacks have been implemented and verified by desktop computers. Finally we remark that our attacks on the the reduced-round Keccak will not threat the security margin of Keccak sponge function.
It is widely accepted that higher-order masking is a sound countermeasure to protect implementations of block ciphers against side-channel attacks. The main issue while designing such a countermeasure is to deal with the nonlinear parts of the cipher \textit{i.e.} the so-called s-boxes. The prevailing approach to tackle this issue consists in applying the Ishai-Sahai-Wagner (ISW) scheme from CRYPTO 2003 to some polynomial representation of the s-box. Several efficient constructions have been proposed that follow this approach, but higher-order masking is still considered as a costly (impractical) countermeasure. In this paper, we investigate efficient higher-order masking techniques by conducting a case study on ARM architectures (the most widespread architecture in embedded systems). We follow a bottom-up approach by first investigating the implementation of the base field multiplication at the assembly level. Then we describe optimized low-level implementations of the ISW scheme and its variant (CPRR) due to Coron \textit{et al.} (FSE 2013). Finally we present improved state-of-the-art methods with custom parameters and various implementation-level optimizations. We also investigate an alternative to polynomials methods which is based on bitslicing at the s-box level. We describe new masked bitslice implementations of the AES and PRESENT ciphers. These implementations happen to be significantly faster than (optimized) state-of-the-art polynomial methods. In particular, our bitslice AES masked at order 10 runs in 0.48 megacycles, which makes 8 milliseconds in presence of a 60 MHz clock frequency.
This paper reports on the number field sieve computation of a 768-bit prime field discrete logarithm, describes the different parameter optimizations and resulting algorithmic changes compared to the factorization of a 768-bit RSA modulus, and briefly discusses the cryptologic relevance of the result.
In cut-and-choose protocols for two-party secure computation (2PC) the main overhead is the number of garbled circuits that must be sent. Recent work (Lindell, Riva; Huang et al., Crypto 2014) has shown that in a batched setting, when the parties plan to evaluate the same function N times, the number of garbled circuits per execution can be reduced by a O(logN) factor compared to the single-execution setting. This improvement is significant in practice: an order of magnitude for N as low as one thousand. % Besides the number of garbled circuits, communication round trips are another significant performance bottleneck. Afshar et al. (Eurocrypt 2014) proposed an efficient cut-and-choose 2PC that is round-optimal (one message from each party), but in the single-execution setting.
In this work we present new malicious-secure 2PC protocols that are round-optimal and also take advantage of batching to reduce cost. Our contributions include: \begin{itemize} \item A 2-message protocol for batch secure computation (N instances of the same function). The number of garbled circuits is reduced by a O(logN) factor over the single-execution case. However, other aspects of the protocol that depend on the input/output size of the function do not benefit from the same O(logN)-factor savings. \item A 2-message protocol for batch secure computation, in the random oracle model. All aspects of this protocol benefit from the O(logN)-factor improvement, except for small terms that do not depend on the function being evaluated. \item A protocol in the offline/online setting. After an offline preprocessing phase that depends only on the function f and N, the parties can securely evaluate f, N times (not necessarily all at once). Our protocol’s online phase is only 2 messages, and the total online communication is only ℓ+O(κ) bits, where ℓ is the input length of f and κ is a computational security parameter. This is only O(κ) bits more than the information-theoretic lower bound for malicious 2PC.
Private set intersection (PSI) refers to a special case of secure two-party computation in which the parties each have a set of items and compute the intersection of these sets without revealing any additional information. In this paper we present improvements to practical PSI providing security in the presence of {\em malicious} adversaries. Our starting point is the protocol of Dong, Chen & Wen (CCS 2013) that is based on Bloom filters. We identify a bug in their malicious-secure variant and show how to fix it using a cut-and-choose approach that has low overhead while simultaneously avoiding one the main computational bottleneck in their original protocol. We also point out some subtleties that arise when using Bloom filters in malicious-secure cryptographic protocols. We have implemented our PSI protocols and report on its performance. Our improvements reduce the cost of Dong et al.’s protocol by a factor of 14−110× on a single thread. When compared to the previous fastest protocol of De Cristofaro et al., we improve the running time by 8−24×. For instance, our protocol has an online time of 14 seconds and an overall time of 2.1 minutes to securely compute the intersection of two sets of 1 million items each.