|
A Comparison of Multiple Imputation and Data Perturbation for Masking Numerical Variables Krishnamurty Muralidhar and Rathindra Sarathy Abstract: Statistical disclosure limitation techniques are designed to provide legitimate users with access to useful data while simultaneously preventing disclosure of sensitive information. Two techniques that can be used to limit disclosure of sensitive numerical data are multiple imputation and data perturbation. While many studies have addressed the effectiveness of perturbation and multiple imputation individually, no studies have directly compared the two techniques. In this study, we compare the effectiveness of multiple imputation and data perturbation for numerical microdata. The results indicate that, in the absence of missing data, data perturbation performs better than multiple imputation. In addition, since only a single perturbed data set is released (unlike the multiply-imputed data sets that are released), data perturbation eases the burden on users of such data. Keywords: Confidentiality, privacy, data dissemination
|