JOS

Abstract
Journal of Official Statistics, Vol.3, No.1, 1987. pp. 4567

Contents
Current Issue
Personal Reference Library (PRL)
Personal Page
Archive
Search
Home


Methods and Problems in Coding Natural Language Survey Data

Abstract:
This paper discusses a computer algorithm for coding, i.e., classifying, natural language survey data. The algorithm uses “semantic vectors” over the set of codes to be assigned. The database for the algorithm can be automatically constructed from manually coded records. When applied to industry descriptions from the 1970 U.S. Population and Housing Census, the algorithm agreed with expert manual coding in 80 % of the cases. The agreement rate of 80 % for the industry data is comparable to the rate of agreement between a novice and an expert coder.

Keywords:
Coding; classifying; natural language; survey data; semantic pattern matching.

Copyright Statistics Sweden 1996-2018.  Open Access
ISSN 0282-423X
Created and Maintained by OKS Group