We present a scalable protocol for database joins on secret shared data in the honest-majority three-party setting. The key features of our protocol are a rich set of SQL-like join/select queries and the ability to compose join operations together due to the inputs and outputs being generically secret shared between the parties. Provided that all joins operate on unique primary keys, no information is revealed to any party during the protocol. In particular, not even the sizes of intermediate joins are revealed. All of our protocols are constant-round and achieve O(n) communication and computation overhead for joining two tables of n rows.
These properties make our protocol ideal for outsourced secure computation. In this setting several non-colluding servers are setup and the input data is shared among them. These servers then perform the relevant secret shared computation and output the result. This model has recently been gaining traction in industry, e.g. Facebook’s Crypten, Cape Privacy’s TFEncrypted, Mozilla Telemetry.
We additionally implement two applications on top of our framework. The first application detects voter registration errors within and between agencies of 50 US states, in a privacy-preserving manner. The second application allows several organizations to compare network security logs to more accurately identify common security threats, e.g. the IP addresses of a bot net. In both cases, the practicality of these applications depends on efficiently performing joins on millions of secret shared records. For example, our three party protocol can perform a join on two sets of 1 million records in 4.9 seconds or, alternatively, compute the cardinality of this join in just 3.1 seconds.
Secure search looks for and retrieves records from a (possibly cloud-hosted) encrypted database while ensuring the confidentiality of the queries. Researchers are paying increasing attention to secure search in recent years due to the growing concerns about database privacy. However, the low efficiency of (especially multiplicative) homomorphic operations in secure search has hindered its deployment in practice. To address this issue, Akavia et al. [CCS 2018, PETS 2019] proposed new protocols that bring down the number of multiplications in the search algorithm from O(n2) to O(n log2 n), and then to O(n log n), where n is the size of the database.
In this paper, we present the first secure search protocol – LEAF and its variant LEAF+ – which only requires $O(n)$ multiplications. Specifically, at the core of LEAF are three novel methods we propose, referred to as Localization, Extraction, and Reconstruction. In addition, LEAF enjoys low communication complexity and only requires the client to perform decryption, which adds its advantage in deployment on weak-power devices such as mobile phones.
In the past few years, we have seen multiple attacks on one-dimensional databases that support range queries. These attacks achieve full database reconstruction by exploiting access pattern leakage along with known query distribution or search pattern leakage. We are the first to go beyond one dimension, exploring this threat in two dimensions. We unveil an intrinsic limitation of reconstruction attacks by showing that there can be an exponential number of distinct databases that produce equivalent leakage. Next, we present a full database reconstruction attack. Our algorithm runs in polynomial time and returns a poly-size encoding of all databases consistent with the given leakage profile. We implement our algorithm and observe real-world databases that admit a large number of equivalent databases, which aligns with our theoretical results.
Recent foundational work on leakage-based attacks on encrypted databases has broadened our understanding of what an adversary can accomplish with a standard leakage profile. Nevertheless, all known value reconstruction attacks succeed under strong assumptions that may not hold in the real world. The most prevalent assumption is that queries should be issued uniformly at random by the client. We present the first value reconstruction attacks for encrypted databases without any assumptions about the query or data distribution. Our approach uses the search pattern leakage, which exists in all known structured encryption schemes but has not been effectively utilized so far. At the core of our method lies a support size estimator, a technique that utilizes the repetition of search tokens with the same response to estimate distances between encrypted values without any assumptions about the underlying distribution. We develop distribution-agnostic reconstruction attacks for both range queries and k-nearest-neighbor (k-NN) queries based on information extracted from the search pattern leakage. Our new range attack follows a different algorithmic approach than state-of-the-art attacks, which are fine-tuned to succeed under the uniform queries. Instead, we reconstruct plaintext values under a variety of skewed query distributions and even outperform the accuracy of previous approaches under uniform query distribution. Our new k-NN attack succeeds with far fewer samples than a previously proposed attack and scales to much larger values of k. We demonstrate the effectiveness of our attacks by experimentally testing them on a wide range of query distributions and database densities, both unknown to the adversary.
Building expressive encrypted databases that can scale to large volumes of data while enjoying formal security guar- antees has been one of the holy grails of security and cryp- tography research. Searchable Encryption (SE) is considered to be an attractive implementation choice for this goal: It naturally supports basic database queries such as point, join, group-by and range, and is very practical at the expense of well-defined leakage such as search and access pattern. Never- theless, recent attacks have exploited these leakages to recover the plaintext database or the posed queries, casting doubt to the usefulness of SE in encrypted systems. Defenses against such leakage-abuse attacks typically require the use of Obliv- ious RAM or worst-case padding—such countermeasures are however quite impractical. In order to efficiently defend against leakage-abuse attacks on SE-based systems, we pro- pose SEAL, a family of new SE schemes with adjustable leakage. In SEAL, the amount of privacy loss is expressed in leaked bits of search or access pattern and can be defined at setup. As our experiments show, when protecting only a few bits of leakage (e.g., three to four bits of access pattern), enough for existing and even new more aggressive attacks to fail, SEAL query execution time is within the realm of practical for real-world applications (a little over one order of magnitude slowdown compared to traditional SE-based en- crypted databases). Thus, SEAL could comprise a promising approach to build efficient and robust encrypted databases.
Motivated by the problem of data breaches, we formalize a notion of security for dynamic structured encryption (STE) schemes that guarantees security against a snapshot adversary; that is, an adversary that receives a copy of the encrypted structure at various times but does not see the transcripts related to any queries. In particular, we focus on the construction of dynamic encrypted multi-maps which are used to build efficient searchable symmetric encryption schemes, graph encryption schemes and encrypted relational databases. Interestingly, we show that a form of snapshot security we refer to as breach resistance implies previously-studied notions such as a (weaker version) of history independence and write-only obliviousness. Moreover, we initiate the study of dual-secure dynamic STE constructions: schemes that are forward-private against a persistent adversary and breach-resistant against a snapshot adversary. The notion of forward privacy guarantees that updates to the encrypted structure do not reveal their association to any query made in the past. As a concrete instantiation, we propose a new dual-secure dynamic multi-map encryption scheme that outperforms all existing constructions; including schemes that are not dual-secure. Our construction has query complexity that grows with the selectivity of the query and the number of deletes since the client executed a linear-time rebuild protocol which can be de-amortized. We implemented our scheme (with the de-amortized rebuild protocol) and evaluated its concrete efficiency empirically. Our experiments show that it is highly efficient with queries taking less than 1 microsecond per label/value pair.
Encrypted search algorithms (ESA) are cryptographic algorithms that support search over encrypted data. ESAs can be designed with various primitives including searchable/structured symmetric encryption (SSE/STE) and oblivious RAM (ORAM). Leakage abuse attacks attempt to recover client queries using knowledge of the client’s data. An important parameter for any leakage-abuse attack is its known-data rate; that is, the fraction of client data that must be known to the adversary.
In this work, we revisit leakage abuse attacks in several ways. We first highlight some practical limitations and assumptions underlying the well-known IKK (Islam et al. NDSS ’12) and Count (Cash et al., CCS ’15) attacks. We then design four new leakage-abuse attacks that rely on much weaker assumptions. Three of these attacks are volumetric in the sense that they only exploit leakage related to document sizes. In particular, this means that they work not only on SSE/STE-based ESAs but also against ORAM-based solutions. We also introduce two volumetric injection attack which use adversarial file additions to recover queries even from ORAM-based solutions. As far as we know, these are the first attacks of their kind.
We evaluated all our attacks empirically and considered many experimental settings including different data collections, query selectivities, known-data rates, query space size and composition. From our experiments, we observed that the only setting that resulted in reasonable recovery rates under practical assumptions was the case of high-selectivity queries with a leakage profile that includes the response identity pattern (i.e., the identifiers of the matching documents) and the volume pattern (i.e., the size of the matching documents). All other attack scenarios either failed or relied on unrealistic assumptions (e.g., very high known-data rates). For this specific setting, we propose several suggestions and countermeasures including the use of schemes like PBS (Kamara et al, CRYPTO ’18), VLH/AVLH (Kamara and Moataz, Eurocrypt ’19 ), or the use of padding techniques like the ones recently proposed by Bost and Fouque (Bost and Fouque, IACR ePrint 2017/1060).
Recent foundational work on leakage-abuse attacks on encrypted databases has broadened our understanding of what an adversary can accomplish with a standard leakage profile. Nevertheless, all known value reconstruction attacks succeed under strong assumptions that may not hold in the real world. The most prevalent assumption is that queries are issued uniformly at random by the client. We present the first value reconstruction attacks that succeed without any knowledge about the query or data distribution. Our approach uses the search-pattern leakage, which exists in all known structured encryption schemes but has not been fully exploited so far. At the core of our method lies a support size estimator, a technique that utilizes the repetition of search tokens with the same response to estimate distances between encrypted values without any assumptions about the underlying distribution. We develop distribution-agnostic reconstruction attacks for both range queries and k-nearest neighbor (k-NN) queries based on information extracted from the search-pattern leakage. Our new range attack follows a different algorithmic approach than state-of-the-art attacks, which are fine-tuned to succeed under the uniformly distributed queries. Instead, we reconstruct plaintext values under a variety of skewed query distributions and even outperform the accuracy of previous approaches under the uniform query distribution. Our new k-NN attack succeeds with far fewer samples than previous attacks and scales to much larger values of k. We demonstrate the effectiveness of our attacks by experimentally testing them on a wide range of query distributions and database densities, both unknown to the adversary.
Secure search is the problem of securely retrieving from a database table (or any unsorted array) the records matching specified attributes, as in SQL ``SELECT…WHERE…’’ queries, but where the database and the query are encrypted. Secure search has been the leading example for practical applications of Fully Homomorphic Encryption (FHE) since Gentry’s seminal work in 2009, attaining the desired properties of a single-round low-communication protocol with semantic security for database and query (even during search). Nevertheless, the wide belief was that the high computational overhead of current FHE candidates is too prohibitive in practice for secure search solutions (except for the restricted case of searching for a uniquely identified record as in SQL UNIQUE constrain and Private Information Retrieval). This is due to the high degree in existing solutions that is proportional at least to the number of database records m.
We present the first algorithm for secure search that is realized by a polynomial of logarithmic degree (log m)^c for a small constant c>0. We implemented our algorithm in an open source library based on HElib, and ran experiments on Amazon’s EC2 cloud with up to 100 processors. Our experiments show that we can securely search to retrieve database records in a rate of searching in millions of database records in less than an hour on a single machine.
We achieve our result by: (1) Designing a novel sketch that returns the first strictly-positive entry in a (not necessarily sparse) array of non-negative real numbers; this sketch may be of independent interest. (2) Suggesting a multi-ring evaluation of FHE – instead of a single ring as in prior works – and leveraging this to achieve an exponential reduction in the degree.
Pattern matching is essential in applications such as deep-packet inspection (DPI), searching on genomic data, or analyzing medical data. A simple task to do on plaintext data, pattern matching is much harder to do when the privacy of the data must be preserved. Existent solutions involve searchable encryption mechanisms with at least one of these three drawbacks: requiring an exhaustive (and static) list of keywords to be prepared before the data is encrypted (like in symmetric searchable encryption); requiring tokenization, i.e., breaking up the data to search into substrings and encrypting them separately (e.g., like BlindBox); relying on symmetric-key cryptography, thus implying a token-regeneration step for each encrypted-data source (e.g., user). Such approaches are ill-suited for pattern-matching with evolving patterns (e.g., updating virus signatures), variable searchword lengths, or when a single entity must filter ciphertexts from multiple parties.
In this work, we introduce Searchable Encryption with Shiftable Trapdoors (SEST): a new primitive that allows for pattern matching with universal tokens (usable by all entities), in which keywords of arbitrary lengths can be matched to arbitrary ciphertexts. Our solution uses public-key encryption and bilinear pairings. It consists of projecting keywords on polynomials of degree equal to the length of the keyword and using a sliding-window-like technique to match the trapdoor to successive fragments of the encrypted data. In addition, very minor modifications to our solution enable it to take into account regular expressions, such as fully- or partly-unknown characters in a keyword (wildcards and interval/subset searches). Our trapdoor size is at most linear in the keyword length (and independent of the plaintext size), and we prove that the leakage to the searcher is only the trivial one: since the searcher learns whether the pattern occurs and where, it can distinguish based on different search results of a single trapdoor on two different plaintexts.
To better show the usability of our scheme, we implemented it to run DPI on all the SNORT rules. We show that even for very large plaintexts, our encryption algorithm scales well. The pattern-matching algorithm is slightly slower, but extremely parallelizable, and it can thus be run even on very large data. Although our proofs use a (marginally) interactive assumption, we argue that this is a relatively small price to pay for the flexibility and privacy that we are able to attain.
Recent works by Kellaris et al. (CCS’16) and Lacharite et al. (SP’18) demonstrated attacks of data recovery for encrypted databases that support rich queries such as range queries. In this paper, we develop the first data recovery attacks on encrypted databases supporting one-dimensional k-nearest neighbor (k-NN) queries, which are widely used in spatial data management. Our attacks exploit a generic k-NN query leakage profile: the attacker observes the identifiers of matched records. We consider both unordered responses, where the leakage is a set, and ordered responses, where the leakage is a k-tuple ordered by distance from the query point.
As a first step, we perform a theoretical feasibility study on exact reconstruction, i.e., recovery of the exact plaintext values of the encrypted database. For ordered responses, we show that exact reconstruction is feasible if the attacker has additional access to some auxiliary information that is normally not available in practice. For unordered responses, we prove that exact reconstruction is impossible due to the infinite number of valid reconstructions. As a next step, we propose practical and more realistic approximate reconstruction attacks so as to recover an approximation of the plaintext values. For ordered responses, we show that after observing enough query responses, the attacker can approximate the client’s encrypted database with considerable accuracy. For unordered responses we characterize the set of valid reconstructions as a convex polytope in a k-dimensional space and present a rigorous attack that reconstructs the plaintext database with bounded approximation error.
As multidimensional spatial data can be efficiently processed by mapping it to one dimension via Hilbert curves, we demonstrate our approximate reconstruction attacks on privacy-sensitive geolocation data. Our experiments on real-world datasets show that our attacks reconstruct the plaintext values with relative error ranging from 2.9% to 0.003%.
Order-preserving encryption and its generalization order-revealing encryption (OPE/ORE) are used in a variety of settings in practice in order to allow sorting, performing range queries, and filtering data — all while only having access to ciphertexts. But OPE and ORE ciphertexts necessarily leak information about plaintexts, and what level of security they provide has been unclear. In this work, we introduce new leakage-abuse attacks that show how to recover plaintexts from OPE/ORE-encrypted databases. Underlying our new attacks against practically-used schemes is a framework in which we cast the adversary’s challenge as a non- crossing bipartite matching problem. This allows easy tailoring of attacks to a specific scheme’s leakage profile. In a case study of customer records, we show attacks that recover 99% of first names, 97% of last names, and 90% of birthdates held in a database, despite all values being encrypted with the OPE scheme most widely used in practice. We also show the first attack against the recent frequency- hiding Kerschbaum scheme, to which no prior attacks have been demonstrated. Our attack recovers frequently occurring plaintexts most of the time.
Protected database search systems cryptographically isolate the roles of reading from, writing to, and administering the database. This separation limits unnecessary administrator access and protects data in the case of system breaches. Since protected search was introduced in 2000, the area has grown rapidly; systems are offered by academia, start-ups, and established companies.
However, there is no best protected search system or set of techniques. Design of such systems is a balancing act between security, functionality, performance, and usability. This challenge is made more difficult by ongoing database specialization, as some users will want the functionality of SQL, NoSQL, or NewSQL databases. This database evolution will continue, and the protected search community should be able to quickly provide functionality consistent with newly invented databases.
At the same time, the community must accurately and clearly characterize the tradeoffs between different approaches. To address these challenges, we provide the following contributions:
We present a range of novel attacks which exploit information about the volume of answers to range queries in encrypted database. Our attacks rely on a strategy which is simple yet robust and effective. We illustrate the robustness of our strategy in a number of ways. We show how i) to adapt the attack for several variations of a basic usage scenario ii) to defeat countermeasures intended to thwart the premise of our basic attack and iii) to perform partial reconstruction of secret data when unique reconstruction is information theoretically impossible. Furthermore, over the state of the art, our attacks require one order of magnitude fewer queries. We show how to improve the attacks even further, under the assumption that some partial information is known to the adversary. We validate experimentally all of our attacks through extensive experiments on real-world medical data and justify theoretically the effectiveness of our strategy for the basic attack scenario. Our new attacks further underscore the difficulty of striking an appropriate functionality-security trade-off for encrypted databases.
We show that the problem of reconstructing encrypted databases from access pattern leakage is closely related to statistical learning theory. This new viewpoint enables us to develop broader attacks that are supported by streamlined performance analyses. First, we address the problem of ε-approximate database reconstruction (ε-ADR) from range query leakage, giving attacks whose query cost scales only with the relative error ε, and is independent of the size of the database, or the number N of possible values of data items. This already goes significantly beyond the state-of-the-art for such attacks, as represented by Kellaris et al. (ACM CCS 2016) and Lacharité et al. (IEEE S&P 2018). We also study the new problem of ε-approximate order reconstruction (ε-AOR), where the adversary is tasked with reconstructing the order of records, except for records whose values are approximately equal. We show that as few as O(ε^−1 log ε^−1) uniformly random range queries suffice. Our analysis relies on an application of learning theory to PQ-trees, special data structures tuned to compactly record certain ordering constraints. We then show that when an auxiliary distribution is available, ε-AOR can be enhanced to achieve ε-ADR; using real data, we show that devastatingly small numbers of queries are needed to attain very accurate database reconstruction. Finally, we generalize from ranges to consider what learning theory tells us about the impact of access pattern leakage for other classes of queries, focusing on prefix and suffix queries. We illustrate this with both concrete attacks for prefix queries and with a general lower bound for all query classes. We also show a very general reduction from reconstruction with known or chosen queries to PAC learning.
We analyse the security of database encryption schemes supporting range queries against persistent adversaries. The bulk of our work applies to a generic setting, where the adversary’s view is limited to the set of records matched by each query (known as access pattern leakage). We also consider a more specific setting where rank information is also leaked, which is inherent inherent to multiple recent encryption schemes supporting range queries. We provide three attacks.
First, we consider full reconstruction, which aims to recover the value of every record, fully negating encryption. We show that for dense datasets, full reconstruction is possible within an expected number of queries N log N + O(N), where N is the number of distinct plaintext values.
This directly improves on a quadratic bound in the same setting by Kellaris et al. (CCS 2016).
Second, we present an approximate reconstruction attack recovering all plaintext values in a dense dataset within a constant ratio of error, requiring the access pattern leakage of only O(N) queries.
Third, we devise an attack in the common setting where the adversary has access to an auxiliary distribution for the target dataset. This third attack proves highly effective on age data from real-world medical data sets. In our experiments, observing only 25 queries was sufficient to reconstruct a majority of records to within 5 years.
In combination, our attacks show that current approaches to enabling range queries offer little security when the threat model goes beyond snapshot attacks to include a persistent server-side adversary.
Search indices are fundamental building blocks of many systems, and there is great interest in running them on encrypted data. Unfortunately, many known schemes that enable search queries on encrypted data achieve efficiency at the expense of security, as they reveal access patterns to the encrypted data. In this paper we present Oblix, a search index for encrypted data that is oblivious (provably hides access patterns), is dynamic (supports inserts and deletes), and has good efficiency. Oblix relies on a combination of novel oblivious-access techniques and recent hardware enclave platforms (e.g., Intel SGX). In particular, a key technical contribution is the design and implementation of doubly-oblivious data structures, in which the client’s accesses to its internal memory are oblivious, in addition to accesses to its external memory at the server. These algorithms are motivated by hardware enclaves like SGX, which leak access patterns to both internal and external memory. We demonstrate the usefulness of Oblix in several applications: private contact discovery for Signal, private retrieval of public keys for Key Transparency, and searchable encryption that hides access patterns and result sizes.
Cloud database systems such as Amazon RDS or Google Cloud SQL enable the outsourcing of a large database to a server who then responds to SQL queries. A natural problem here is to efficiently verify the correctness of responses returned by the (untrusted) server. In this paper we present vSQL, a novel cryptographic protocol for publicly verifiable SQL queries on dynamic databases. At a high level, our construction relies on two extensions of the CMT interactive-proof protocol [Cormode et al., 2012]: (i) supporting outsourced input via the use of a polynomial-delegation protocol with succinct proofs, and (ii) supporting auxiliary input (i.e., non-deterministic computation) efficiently. Compared to previous verifiable-computation systems based on interactive proofs, our construction has verification cost polylogarithmic in the auxiliary input (which for SQL queries can be as large as the database) rather than linear. In order to evaluate the performance and expressiveness of our scheme, we tested it on SQL queries based on the 1PC-H benchmark on a database with 6 x 106 rows and 13 columns. The server overhead in our scheme (which is typically the main bottleneck) is up to 120 x lower than previous approaches based on succinct arguments of knowledge (SNARKs), and moreover we avoid the need for query-dependent pre-processing which is required by optimized SNARK-based schemes. In our construction, the server/client time and the communication cost are comparable to, and sometimes smaller than, those of existing customized solutions which only support specific queries.
Free cloud-based services are powerful candidates for deploying ubiquitous encryption for messaging. In the case of email and increasingly chat, users expect the ability to store and search their messages persistently. Using data from a major mail provider, we confirm that for a searchable encryption scheme to scale to millions of users, it should be highly IO-efficient (locality) and handle a very dynamic message corpi. We observe that existing solutions fail to achieve both properties simultaneously. We then design, build, and evaluate a provably secure Dynamic Searchable Symmetric Encryption (DSSE) scheme with significant reduction in IO cost compared to preceding works when used for email or other highly dynamic material.
We propose EnclaveDB, a database engine that guarantees confidentiality, integrity, and freshness for data and queries. EnclaveDB guarantees these properties even when the database administrator is malicious, when an attacker has compromised the operating system or the hypervisor, and when the database runs in an untrusted host in the cloud. EnclaveDB achieves this by placing sensitive data (tables, indexes and other metadata) in enclaves protected by trusted hardware (such as Intel SGX). EnclaveDB has a small trusted computing base, which includes an in-memory storage and query engine, a transaction manager and pre-compiled stored procedures. A key component of EnclaveDB is an efficient protocol for checking integrity and freshness of the database log. The protocol supports concurrent, asynchronous appends and truncation, and requires minimal synchronization between threads. Our experiments using standard database benchmarks and a performance model that simulates large enclaves show that EnclaveDB achieves strong security with low overhead (up to 40% for TPC-C) compared to an industry strength in-memory database engine.