Journal of Official Statistics, Vol.16, No.3, 2000. pp. 229–241

Current Issue
Personal Reference Library (PRL)
Personal Page

Survey Estimation for Highly Skewed Populations in the Presence of Zeroes


Estimation of the population total of a highly skewed survey variable from a small sample using straightforward methods is problematic for two reasons: (i) when there are no extreme values in the sample, too small estimates will be obtained, and (ii) if extreme values are sampled, the estimates will become grotesquely large. Traditional methods for outlier treatment will usually compensate for outliers in the sample, thereby avoiding (ii), whereas the small negative bias of (i) will persist. Here, an estimator based on a lognormal-logistic superpopulation model is proposed.

A particular strength of the model estimator is that the lognormal structure of the survey variable is used for estimation -- even in the absence of extremely large values in the sample. Another advantage of the model estimator is that it can be applied to situations in which the survey variable, while highly skewed, may assume the value zero for a „number of units.

The model estimator is applied to an agricultural survey variable in a simulation study, in which it is compared to a design-based (regression) estimator as well as a Winsorization-based estimator specifically constructed for outlier treatment. The simulation results indicate that the lognormal-logistic model estimator constitutes a sensible alternative to the other estimators, in particular when the sample size is small.

Extreme values; model-based inference; superpopulation; lognormal distribution.

Copyright © Statistics Sweden 1996-2018.  Open Access
ISSN 0282-423X
Created and Maintained by OKS Group