Journal of Official Statistics, Vol.17, No.4, 2001. pp. 499520

Applying Pitman's Sampling Formula to Microdata Disclosure Risk Assessment

Ewens's sampling formula (Ewens 1972), which is mainly studied in statistical ecology, has been used to assess the microdata disclosure risk. Pitman (1995) considered an extension of the Ewens sampling formula, and in the present article we evaluate the usefulness of the Pitman sampling formula in the disclosure field. First we clarify some theoretical implications of the Pitman model as a tool for assessing the risk. We then compare various models based on the Akaike Information Criterion (AIC) by applying them to real data sets from the Japanese labor force survey. Our comparison strongly supports the Pitman model. These results suggest that the Pitman sampling formula is very promising for the microdata disclosure problem as well as for statistical ecology.

Privacy; uniqueness; species abundance; superpopulation; random clustering.

Copyright Statistics Sweden 1996-2018.  Open Access
ISSN 0282-423X
