1. SecureML: A System for Scalable Privacy-Preserving Machine Learning 2017 2PC MachineLearning Oakland
    Payman Mohassel and Yupeng Zhang
    [View PDF on eprint.iacr.org]
    [Show BibTex Citation]

    @INPROCEEDINGS{7958569,
    author={P. {Mohassel} and Y. {Zhang}},
    booktitle={2017 IEEE Symposium on Security and Privacy (SP)},
    title={SecureML: A System for Scalable Privacy-Preserving Machine Learning},
    year={2017},
    volume={},
    number={},
    pages={19-38},
    keywords={C++ language;data privacy;gradient methods;learning (artificial intelligence);neural nets;regression analysis;security of data;stochastic processes;SecureML;scalable privacy-preserving machine learning;data collection;data privacy;linear regression;logistic regression;neural network training;stochastic gradient descent method;two-party computation;2PC;C++;Training;Logistics;Protocols;Data models;Privacy;Linear regression;Neural networks;Privacy-preserving machine learning;secure computation},
    doi={10.1109/SP.2017.12},
    ISSN={2375-1207},
    month={May},
    }

Machine learning is widely used in practice to produce predictive models for applications such as image processing, speech and text recognition. These models are more accurate when trained on large amount of data collected from different sources. However, the massive data collection raises privacy concerns. In this paper, we present new and efficient protocols for privacy preserving machine learning for linear regression, logistic regression and neural network training using the stochastic gradient descent method. Our protocols fall in the two-server model where data owners distribute their private data among two non-colluding servers who train various models on the joint data using secure two-party computation (2PC). We develop new techniques to support secure arithmetic operations on shared decimal numbers, and propose MPC-friendly alternatives to non-linear functions such as sigmoid and softmax that are superior to prior work. We implement our system in C++. Our experiments validate that our protocols are several orders of magnitude faster than the state of the art implementations for privacy preserving linear and logistic regressions, and scale to millions of data samples with thousands of features. We also implement the first privacy preserving system for training neural networks.

  1.