Journal of Official Statistics, Vol.25, No.2, 2009. pp. 245–268
Multiply Imputed Synthetic Data: Evaluation of Hierarchical Bayesian Imputation Models
Patrick Graham, Jim Young, Richard Penny
Abstract:The use of synthetic data has been proposed as a method for allowing official statistics agencies to honour confidentiality commitments while facilitating researcher access to data. Because fully synthetic data does not contain unit-records for real individuals, confidentiality concerns are much reduced compared to release of the data actually collected. Synthetic datasets are draws from the posterior predictive distribution of responses for a new sample, given the data from the observed study sample. The generation of synthetic data is underpinned by a model for the distribution of the observable data. Hierarchical Bayesian modelling is a promising framework for generating (imputing) synthetic data, because hierarchical Bayes models provide some protection against model misspecification. In this article we use a simulation study to compare the performance of hierarchical Bayes and conventional generalised linear imputation models for creating synthetic data. We conclude that the frequentist properties of synthetic data estimators are superior under hierarchical Bayes imputation models, compared to conventional generalised linear imputation models.
Keywords:Confidentiality, hierarchical Bayesian modelling, multiple imputation, synthetic data
Copyright © Statistics Sweden 1996-2018. Open AccessISSN 0282-423XCreated and Maintained by OKS Group