Journal of Official Statistics, Vol.13, No.4, 1997. pp. 417434

Statistical Disclosure Control and Sampling Weights

Before a microdata set is disseminated by a statistical office it should be checked whether sensitive information about individual respondents could be disclosed by a potential intruder. The procedure to check whether the dissemination of a microdata set could lead to disclosure of sensitive information usually amounts to examining how much so-called (indirectly) identifying information is contained in the microdata set. In case too much identifying information is contained in the microdata set it is considered unsafe for release. When a statistical office releases a microdata set, sampling weights are usually included to facilitate analyses. A description of the auxiliary variables, their categories and the sampling method underlying the weights is usually also provided. Unfortunately, the sampling weights, innocent as they may seem, can provide additional identifying information to an intruder when they are based on identifying information that is not contained in the released microdata set. A simple idea to prevent disclosure from sampling weights would be not to publish which weight corresponds to which stratum. Surprisingly, this is not sufficient. In this article we demonstrate that in many cases an intruder will be able to determine which stratum corresponds to a specific weight given sufficient knowledge about the population.

Statistical disclosure control; sampling weights; microdata.

