Journal of Official Statistics, Vol.25, No.4, 2009. pp. 549–567
Using Bayesian Networks to Create Synthetic Data
Jim Young, Patrick Graham, Richard Penny
Abstract:A Bayesian network is a graphical model of the joint probability distribution for a set of variables. A Bayesian network could be used to create multiple synthetic data sets that are then released by an official statistics agency while the original data remain confidential, so that an analyst outside the agency can explore associations between an attribute of interest and other variables. The process is illustrated with an example. Inferences from the original data are compared to inferences from synthetic data created by a single Bayesian network and by Bayesian model averaging over a set of networks. Informative prior information is needed in order to assign appropriate weights to each network in this set if synthetic data are to have both good inferential properties and an acceptable risk of disclosure. This sensitivity to prior information will make it difficult for an official statistics agency to use Bayesian networks to automate the process of creating synthetic data.
Keywords:Confidentiality, hierarchical Bayesian modelling, multiple imputation
Copyright © Statistics Sweden 1996-2018. Open AccessISSN 0282-423XCreated and Maintained by OKS Group