JOS

Abstract
Journal of Official Statistics, Vol.21, No.3, 2005. pp. 441462

Contents
Current Issue
Personal Reference Library (PRL)
Personal Page
Archive
Search
Home


Using CART to Generate Partially Synthetic Public Use Microdata

Abstract:
To limit disclosure risks, one approach is to release partially synthetic public use microdata sets. These comprise the units originally surveyed, but some collected values, for example sensitive values at high risk of disclosure or values of key identifiers, are replaced with multiple imputations. This article presents and evaluates the use of classification and regression trees to generate partially synthetic data. Two potential applications of CART are studied via simulation: (i) generate synthetic data for sensitive variables; and, (ii) generate synthetic data for variables that are key identifiers.

Keywords:
CART, confidentiality, disclosure, multiple imputation, synthetic data, trees

Copyright Statistics Sweden 1996-2018.  Open Access
ISSN 0282-423X
Created and Maintained by OKS Group