Journal of Official Statistics, Vol.19, No.1, 2003. pp. 1–16
Multiple Imputation for Statistical Disclosure Limitation
T.E. Raghunathan, J.P. Reiter and D.B. Rubin
Abstract:This article evaluates the use of the multiple imputation framework to protect the confidentiality of respondents' answers in sample surveys. The basic proposal is to simulate multiple copies of the population from which these respondents have been selected and release a random sample from each of these synthetic populations. Users can analyze the synthetic sample data sets with standard complete-data software for simple random samples, then obtain valid inferences by combining the point and variance estimates using the methods in this article. Both parametric and nonparametric approaches for simulating these synthetic databases are discussed and evaluated. It is shown, using actual and simulated data sets in simple settings, that statistical inferences from these simulated research databases and the actual data sets are similar, at least for a class of analyses. Arguably, this class will be large enough for many users of public-use data. Users with more detailed demands may have to apply for special access to the confidential data.
Keywords:Bayesian approach; Bayesian bootstrap; combining rules; confidentiality protection; sample survey; synthetic data sets.
Copyright © Statistics Sweden 1996-2017. Open AccessISSN 0282-423XCreated and Maintained by OKS Group