JOS

Abstract
Journal of Official Statistics, Vol.25, No.4, 2009. pp. 549567

Contents
Current Issue
Personal Reference Library (PRL)
Personal Page
Archive
Search
Home


Using Bayesian Networks to Create Synthetic Data

Abstract:
A Bayesian network is a graphical model of the joint probability distribution for a set of variables. A Bayesian network could be used to create multiple synthetic data sets that are then released by an official statistics agency while the original data remain confidential, so that an analyst outside the agency can explore associations between an attribute of interest and other variables. The process is illustrated with an example. Inferences from the original data are compared to inferences from synthetic data created by a single Bayesian network and by Bayesian model averaging over a set of networks. Informative prior information is needed in order to assign appropriate weights to each network in this set if synthetic data are to have both good inferential properties and an acceptable risk of disclosure. This sensitivity to prior information will make it difficult for an official statistics agency to use Bayesian networks to automate the process of creating synthetic data.

Keywords:
Confidentiality, hierarchical Bayesian modelling, multiple imputation

Copyright Statistics Sweden 1996-2018.  Open Access
ISSN 0282-423X
Created and Maintained by OKS Group