Survey Estimation for Highly Skewed Populations in the Presence of Zeroes
Estimation of the population total
of a highly skewed survey variable from a small sample
using straightforward methods is problematic for two reasons: (i) when there are no
extreme values in the sample, too small estimates will be
obtained, and (ii) if extreme values are sampled, the estimates will
become grotesquely large. Traditional methods for outlier treatment will usually
compensate for outliers in the sample, thereby avoiding (ii), whereas the small
negative bias of (i) will persist. Here, an estimator based on a
lognormal-logistic superpopulation model is proposed.
A particular strength of the model estimator is that the
lognormal structure of the survey variable is used for estimation -- even in the
absence of extremely large values in the sample. Another advantage of the model
estimator is that it can be applied to situations in which the survey variable,
while highly skewed, may assume the value zero for a „number of units.
The model estimator is applied to an agricultural survey variable in a
simulation study, in which it is compared to a design-based (regression)
estimator as well as a Winsorization-based estimator specifically constructed
for outlier treatment. The simulation results indicate that the
lognormal-logistic model estimator constitutes a sensible alternative to the
other estimators, in particular when the sample size is small.
Extreme values; model-based inference; superpopulation; lognormal distribution.