Reliability of estimates in socio-demographic groups with small samples

Dario Buono[1], Agne Bikauskaite[2]
Eurostat, European Commission, Luxembourg

Abstract

Keywords: Small Area Estimation, Empirical Bayes, Hierarchical Bayes, Nested Error model, Fay-Herriot model

Introduction

The European Statistical System‘s (ESS) objective is to provide trustable and sufficiently reliable statistics. The direct methods do not necessarily manage to produce accurate estimateswhen measuring parameters for some sub-populationsor sub-domains, due to too small sample sizes.

The aim of this work is twofold: to investigate the possibilities of model-based approach implementation in the official statistics that to ensure reliability of indicators estimates by specific breakdowns under small samples; and to discuss advantages, disadvantages, and the potentiality of usability of small area estimation (SAE) models and tools for production of the official statistics.

In this talk we focus only on socialstatistics' at-risk-of-poverty rate estimates, but there are possible extensions to other social and economic indicators and domains.

Methods / Problem statement

The issue of SAE is the production of reliable estimates for sub-populations covered by small samples. In order to analyse feasibility of the model's fitting to particular type of data with different sample sizes, variousruns were conducted applying SAE techniques (such as empirical Bayes, hierarchical Bayes) already developedinwithin the R software (packages sae, hbsae) to obtain area level Fay-Herriot model based and unit level Nested Error model based at-risk-of-poverty rateestimates and the mean squared errors of the estimates of European countries data.

The primary micro-data source used for the applications is the European Survey on Income and LivingConditions (EU-SILC 2011), while auxiliary variables at unit and area levels obtained from Census survey data of 2011. Data samples of each analysed country are divided in 18 disjoint socio-demographic groups of small and large sizes by specific breakdowns of interest.

Results

The obtained results showlarge reduction in coefficient of variation of model-based small area estimators compared to direct approach.The most efficient method fordata analysed in this study caseis Empirical Bayes based on unit levelNested Error model, which produced the lowest coefficient of variations of at-risk-of-poverty rate estimates within small and large samples as well.

Conclusions

The purpose of this paper is to analyse if data reliability could be improved applying SAE techniques in theofficial statistics. We check how SAE methodology treats various sub-population of particular type of datawith different sample sizes. We compare obtained results by applying SAE Bayesian models and direct methods.

As expected, SAE model-based methods perform significantly better than direct methods forbreakdowns of interest with small samples sizes, while for large samples gap in coefficient of variation of estimates are notconsiderable. Hence we note that by applying SAE techniques reliability and availability of estimates could be increased.

[1]Eurostat, Unit of Methodology and Corporate Architecture,

[2]Sogeti, Luxembourg,