PRIMA: A New Multiple Imputation Procedure for Binary Variables
Ralf Münnich and Susanne Rässler
When investigating unemployment data, one may be interested in estimating the totals of unemployed in subpopulations, e.g., by regional or by contentional differentiation. For the estimation of a population total, typically Horvitz-Thompson type estimates are used. However, often the data are prone to item nonresponse. To achieve valid results for these estimates from a randomization-based perspective, in general variance correction methods are needed.
In this article we discuss imputation issues in large-scale datasets with different scaled variables, laying special emphasis on binary variables. Since fitting a multivariate imputation model can be cumbersome, univariate specifications are proposed which are much easier to perform. The regression-switching or chained equations Gibbs sampler is proposed and possible theoretical shortcomings of this approach are addressed as well as data problems.
A simulation study is done based on the data of the German Microcensus, which is often used to analyze unemployment. Multiple imputation, raking, and calibration techniques are compared for estimating the number of unemployed in different settings. We find that the logistic multiple imputation routine for binary variables, in some settings, may lead to poor point estimates as well as variance estimates. To overcome possible shortcomings of the logistic regression imputation, we derive a multiple imputation-matching algorithm, which turns out to work well.
Complex surveys, missing data, multiple imputation, logistic regression, Horvitz-Thompson estimator, GREG estimator