Journal of Official Statistics, Vol.3, No.1, 1987. pp. 45–67
Methods and Problems in Coding Natural Language Survey Data
Abstract:This paper discusses a computer algorithm for coding, i.e., classifying, natural language survey data. The algorithm uses “semantic vectors” over the set of codes to be assigned. The database for the algorithm can be automatically constructed from manually coded records. When applied to industry descriptions from the 1970 U.S. Population and Housing Census, the algorithm agreed with expert manual coding in 80 % of the cases. The agreement rate of 80 % for the industry data is comparable to the rate of agreement between a novice and an expert coder.
Keywords:Coding; classifying; natural language; survey data; semantic pattern matching.
Copyright © Statistics Sweden 1996-2018. Open AccessISSN 0282-423XCreated and Maintained by OKS Group