Journal of Official Statistics, Vol.28, No.4, 2012. pp. 583590

Current Issue
Personal Reference Library (PRL)
Personal Page

Inferentially Valid, Partially Synthetic Data: Generating from Posterior Predictive Distributions not Necessary

To avoid disclosures in public use microdata, one approach is to release partially synthetic data sets. These comprise the units originally surveyed with some collected values, for example sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. In practice, partially synthetic data typically are generated from Bayesian posterior predictive distributions; that is, one draws repeated values of parameters in the synthesis models before generating data from them. We show, however, that inferentially valid, partially synthetic data can be generated by fixing the parameters of the synthesis models at their modes. We do so with both a theoretical example and illustrative simulation studies. We also discuss implications of these results for agencies generating synthetic data.

Confidentiality, disclosure, imputation, microdata, privacy, survey

Copyright Statistics Sweden 1996-2018.  Open Access
ISSN 0282-423X
Created and Maintained by OKS Group