Efficient Stratification Based on Nonparametric Regression Methods
Enrico Fabrizi, Carlo Trivisano
Classical rules for optimal one-way stratification, such as the Dalenius and Hodges rule, are applied under the assumption that a single stratification variable is to be used. In this article, we consider an information setting in which a set of candidate stratification variables is available and a proxy of the target variable (or the target variable itself) is known for a random sample of units from the population. Under these assumptions, we propose various extensions of the Dalenius and Hodges rule based either on linear prediction or on nonparametric regression methods. The resulting stratification rules are compared by means of a Monte Carlo exercise based on a set of pseudo-populations covering a wide range of possible forms of relationship between the target and the stratification variables. The application of regression trees as stratification rules, an option that may be intuitively appealing in the considered information setting, is also discussed.
Dalenius and Hodges rule, one-way optimal stratification, regression trees, additive models, MARS, boosted regression trees