Journal of Official Statistics, Vol.19, No.1, 2003. pp. 116

Current Issue
Personal Reference Library (PRL)
Personal Page

Multiple Imputation for Statistical Disclosure Limitation

This article evaluates the use of the multiple imputation framework to protect the confidentiality of respondents' answers in sample surveys. The basic proposal is to simulate multiple copies of the population from which these respondents have been selected and release a random sample from each of these synthetic populations. Users can analyze the synthetic sample data sets with standard complete-data software for simple random samples, then obtain valid inferences by combining the point and variance estimates using the methods in this article. Both parametric and nonparametric approaches for simulating these synthetic databases are discussed and evaluated. It is shown, using actual and simulated data sets in simple settings, that statistical inferences from these simulated research databases and the actual data sets are similar, at least for a class of analyses. Arguably, this class will be large enough for many users of public-use data. Users with more detailed demands may have to apply for special access to the confidential data.

Bayesian approach; Bayesian bootstrap; combining rules; confidentiality protection; sample survey; synthetic data sets.

Copyright Statistics Sweden 1996-2018.  Open Access
ISSN 0282-423X
Created and Maintained by OKS Group