Journal of Official Statistics, Vol.3, No.4, 1987. pp. 419429

Current Issue
Personal Reference Library (PRL)
Personal Page

Correction for Misclassification Using Doubly Sampled Data

In doubly sampled data the units of a subsample are classified jointly by two methods: (i) a fallible but inexpensive, and (ii) a reliable but expensive. The rest of the units are classified only by method (i). We propose an extension of the generalized linear model (Nelder and Wedderburn (1972)) for such data. We model explicitly the nonsampling errors, i.e., the probabilities of misclassification. We then incorporate these into the model for the dependence of the response on the explanatory factors. There might be misclassifications both in the response and in the explanatory factors.

A car accident data set is analyzed in which 80 084 accidents were categorized only by the police, and 1 796 accidents were categorized both by the police and by personal interview of the accident victims. Our model is more explicit concerning the nonsampling errors than the models used for these data by Hochberg (1977) and by Espeland and Odoroff (1985).

Error in explanatory factor; error in binary response; exponential family nonlinear model; generalized linear model; GLIM; misclassification model; structural model.

Copyright Statistics Sweden 1996-2018.  Open Access
ISSN 0282-423X
Created and Maintained by OKS Group