Journal of Official Statistics, Vol.21, No.4, 2005. pp. 635655

Current Issue
Personal Reference Library (PRL)
Personal Page

Data Swapping as a Decision Problem

We construct a decision-theoretic formulation of data swapping in which quantitative measures of disclosure risk and data utility are employed to select one release from a possibly large set of candidates. The decision variables are the swap rate, swap attribute(s) and, possibly, constraints on the unswapped attributes. Risk–utility frontiers, consisting of those candidates not dominated in (risk, utility) space by any other candidate, are a principal tool for reducing the scale of the decision problem. Multiple measures of disclosure risk and data utility, including utility measures based directly on use of the swapped data for statistical inference, are introduced. Their behavior and resulting insights into the decision problem are illustrated using data from the U.S. Current Population Survey, the well-studied “Czech auto worker data” and data on schools and administrators generated by the U.S. National Center for Education Statistics.

Data swapping, disclosure risk, data utility, risk-utility frontier, data confidentiality, categorical data

Copyright Statistics Sweden 1996-2018.  Open Access
ISSN 0282-423X
Created and Maintained by OKS Group