Data Swapping as a Decision Problem
Shanti Gomatam, Alan F. Karr and Ashish P. Sanil
We construct a decision-theoretic formulation of data swapping in which quantitative measures of disclosure risk and data utility are employed to select one release from a possibly large set of candidates. The decision variables are the swap rate, swap attribute(s) and, possibly, constraints on the unswapped attributes. Riskutility frontiers, consisting of those candidates not dominated in (risk, utility) space by any other candidate, are a principal tool for reducing the scale of the decision problem. Multiple measures of disclosure risk and data utility, including utility measures based directly on use of the swapped data for statistical inference, are introduced. Their behavior and resulting insights into the decision problem are illustrated using data from the U.S. Current Population Survey, the well-studied Czech auto worker data and data on schools and administrators generated by the U.S. National Center for Education Statistics.
Data swapping, disclosure risk, data utility, risk-utility frontier, data confidentiality, categorical data