Journal of Official Statistics, Vol.9, No.4, 1993. pp. 729745

Error Control of Automated Industry and Occupation Coding

The Automated Industry and Occupation Coding System (AIOCS) was used in the 1990 Decennial Census to classify the natural language responses into one of 243 industry and 504 occupation categories. This paper presents the empirical results from a new error control procedure for estimating the production rates and error rates of the AIOCS. This new procedure consists of the cutoff method (per class threshold) and the weighted approach. It was implemented for the 1990 census to control the production and error rates. One of the basic assumptions of the cutoff method is that there is a positive correlation between the score associated with classifying a response and the probability that the response is correctly classified. For each code category, the magnitude of the score below which selected phrases have an unacceptable probability of error is referred to as the “cutoff score.” The use of the weighted approach was to compensate for the procedure used to select validation data used to evaluate the AIOCS. This paper also shows that the potential bias problem from estimating the cutoff score and the production and error rates with the same set of data is very small if the sample size is large.

Classification; estimations; cutoff.

