Secure Multiple Linear Regression Based on Homomorphic Encryption
Rob Hall, Stephen E. Fienberg, Yuval Nardi
We consider the problem of linear regression where the data are split up and held by different parties. We conceptualize the existence of a single combined database containing all of the information for the individuals in the separate databases and for the union of the variables. We propose an approach that gives full statistical calculation on this combined database without actually combining information sources. We focus on computing linear regression and ridge regression estimates, as well as certain goodness of fit statistics. We make use of homomorphic encryption in constructing a protocol for regression analysis which adheres to the definitions of security laid out in the cryptography literature. Our approach provides only the final result of the calculations, in contrast with other methods that share intermediate values and thus present an opportunity for compromise of privacy. We perform an experiment on a dataset extracted from the Current Population Survey, with 51,016 cases and 22 covariates, to show that our approach is practical for moderate-sized problems.
Combining data sources, confidentiality, homomorphic encryption, privacy-preserving statistical calculation, secure multi-party computation