Journal of Official Statistics, Vol.25, No.2, 2009. pp. 245268

Current Issue
Personal Reference Library (PRL)
Personal Page

Multiply Imputed Synthetic Data: Evaluation of Hierarchical Bayesian Imputation Models

The use of synthetic data has been proposed as a method for allowing official statistics agencies to honour confidentiality commitments while facilitating researcher access to data. Because fully synthetic data does not contain unit-records for real individuals, confidentiality concerns are much reduced compared to release of the data actually collected. Synthetic datasets are draws from the posterior predictive distribution of responses for a new sample, given the data from the observed study sample. The generation of synthetic data is underpinned by a model for the distribution of the observable data. Hierarchical Bayesian modelling is a promising framework for generating (imputing) synthetic data, because hierarchical Bayes models provide some protection against model misspecification. In this article we use a simulation study to compare the performance of hierarchical Bayes and conventional generalised linear imputation models for creating synthetic data. We conclude that the frequentist properties of synthetic data estimators are superior under hierarchical Bayes imputation models, compared to conventional generalised linear imputation models.

Confidentiality, hierarchical Bayesian modelling, multiple imputation, synthetic data

Copyright Statistics Sweden 1996-2018.  Open Access
ISSN 0282-423X
Created and Maintained by OKS Group