Updating the urban/rural classification of NUTS-3 regions

Hugo Poelman

Introduction

This document contains technical notes explaining the workflow to determine or update the urban/rural classification of NUTS3 regions.

Input data

The classification relies upon the following input data:

- GEOSTAT 1 sqkm population grid, in vector (polygon) format: GEOSTAT_GRD_RG

- NUTS3 polygons at the best available resolution (01M): NUTS_RG

- Urban Audit polygons of cities and greater cities (01M): URAU_RG and the related attribute table.

Initial data preparation

The first implementation of the revised urban/rural typology in 2008/2010[1] essentially relied upon data stored in raster format. Some of the steps in the classification methodology have subsequently been re-used to determine the degree of urbanisation (DEGURBA) classification of LAU2 units. During that process, the initial workflow has been changed in favour of a more vector oriented workflow. This has proven to facilitate the implementation of the methodology, especially because several of the steps of the workflow can be carried out in a purely tabular way. For that reason, this note suggests a revised implementation method.

The GEOSTAT grid is composed of regular 1 sqkm cells. These are the basic units of analysis when determining urban and rural areas at grid level. The main reference layer is the polygon layer GEOSTAT_GRD_RG. We will use a derived point layer containing the centres of the 1 sqkm cells: GEOSTAT_GRD_PT, containing the cell identifier GRD_ID.

To this point feature class, we will join attribute table GEOSTAT_GRD_AT (joined to the points on GRD_ID), containing several attributes characterising the cells.

Initially, GEOSTAT_GRD_AT needs to contain the grid cell population, the proportion of land in the cell (LAND_PROPORTION, provided by JRC), and the population density. Due to small inconsistencies in the coastline definition between the data used in the production of the GEOSTAT grid and the data used by JRC to calculate the land proportion, it can happen that a cell contains population > 0 but has a land proportion of zero. In these (exceptional) cases, we propose to set the population density equal to the population.

In addition, the NUTS3 codes need to be included. This is done by applying a spatial join between the cell centroids and the NUTS boundaries. Again, due to coastline and/or national boundary inconsistencies, populated cell centroids may fall outside any NUTS polygon. By selecting only these points, they can be spatially joined to the nearest NUTS3 region. In a similar way, the attribute table contains the codes of the cities and greater cities in which the points are located.

The join between the attribute table GEOSTAT_GRD_AT and the point layer GEOSTAT_GRD_PT is now used to generate two raster layers:

·  POPL_DENS_GR_1KM_GEOSTAT (population density, float)

·  POPL_GR_1KM_GEOSTAT (population counts, integer)

Pixels of both rasters have the same value when they are completely located on land surface, but they differ when located partially in water, because POPL_DENS_GR_1KM_ GEOSTAT has been calculated using a land-proportion factor.

Based on these 2 rasters, the Urban Clusters (and High density Clusters) can be computed, using tools in Model Builder (see CLST_CALCULATION.tbx).

Urban Cluster Generation:

Urban clusters are composed of contiguous cells with a density of at least 300 inhabitants/km² and have a total population of at least 5000 inhabitants. The model URBAN_cluster_generation in CLST_CALCULATION.tbx produces:

- a raster of Urban Clusters with, for each one, its total population;

- a mask raster giving the location of Urban Clusters (with values 0 and 1).

The urban cluster mask raster is converted into polygons. The point feature class GEOSTAT_GRD_PT is then intersected with these polygons to identify cells located in the urban clusters: item URBAN_CLST of the GEOSTAT_GRD_AT table is populated with value 1 when a cell is located in an urban cluster.

Determining the initial urban/rural classification

The initial urban/rural classification can now be determined (without geoprocessing) by a set of SQL queries on GEOSTAT_GRD_AT.

A) calculation of the total grid population by NUTS3

B) calculation of the total population of cells in urban clusters, by NUTS3

C) calculation of the share of rural population by NUTS3: ((A)-(B)) / (A)

D) initial classification of NUTS3 regions: if (C) < 20%: urban; if (C) >= 20% and < 50%: intermediate; if (C) >= 50%: rural.

We store these results in a table of NUTS3 regions, for further treatment.

Classification of small NUTS3 regions

The urban/rural typology contains a special treatment of small NUTS3 regions, in order to avoid some of the classification distortions due to the variety in size of NUTS3 regions throughout Europe. NUTS3 regions with a surface of less than 500 km² will be combined with one or more neighbours in order to determine their classification.

The initial workflow refereed to the length of the boundaries between small NUTS regions and their neighbours in order to determine which regions would be grouped. Still, this method has proven to be quite cumbersome, and it has needed some specific (manual) adaptations, as explained in chapter 15 of the Eurostat Regional Yearbook 2010.

Therefore, we suggest a slightly different approach. First, we select all NUTS3 regions with a surface of less than 500 km². For all NUTS3 regions, we determine the population-weighted centroid point (e.g. see for the NUTS 2010 regions: GISREGIO.NUTS_POPL_WEIG_CTR_PT_2010). Then, we calculate the distances between the centroids of the small NUTS3s to the nearest centroid of the neighbouring regions (e.g. using the Proximity > Near tool, after having projected the centroid points to LAEA). If the small region and its nearest neighbour have the same urban/rural class, nothing will change. If they have a different class, they will be considered as an ad-hoc NUTS group, for which the share of rural population needs to assessed in order to determine an adjusted urban/rural classification. In the case of very small adjacent NUTS3 regions, this process may need to be repeated in order to add more neighbours. Especially in areas with many small NUTS3 regions, the results may need visual inspection and possibly some ad-hoc adjustments.

The clustering of small NUTS3 regions should not result in combinations of regions of different countries.

Adapted classification taking into account the presence of main cities

The urban/rural typology foresees two adaptations to take into account the presence of main cities:

- a rural region becomes intermediate if at least 25% of its population lives in a city of at least 200,000 inhabitants.

- an intermediate region becomes urban if at least 25% of its population lives in a city of at least 500,000 inhabitants.

For the update of this aspect of the methodology, we propose to refer to the current definition of cities and greater cities[2], i.e. to take into account all greater cities plus all cities located outside the greater cities. The geometry of this collection of greater cities and cities (called "CGC") is available from a spatial view created on the basis of the URAU_RG polygons:

CREATE OR REPLACE FORCE VIEW "GISREGIO"."V_URAU_2011_RG_CGC" ("OBJECTID", "SHAPE", "URAU_CODE_2011", "URAU_NAME_2011", "URAU_CATG", "CAPITAL_CITY", "CITY_IN_KERNEL", "URAU_NAME_NLAT", "URBAN_CNTRE_SIZE", "URBAN_CNTRE_SIZE_SRC", "ADMIN_GRD_POPL", "ADMIN_GRD_POPL_SRC") AS SELECT

URAU_2011_RG.OBJECTID,

URAU_2011_RG.SHAPE,

URAU_2011_AT.URAU_CODE_2011,

URAU_2011_AT.URAU_NAME_2011,

URAU_2011_AT.URAU_CATG,

URAU_2011_AT.CAPITAL_CITY,

URAU_2011_AT.CITY_IN_KERNEL,

URAU_2011_AT.URAU_NAME_NLAT,

URAU_2011_AT.URBAN_CNTRE_SIZE,

URAU_2011_AT.URBAN_CNTRE_SIZE_SRC,

URAU_2011_AT.ADMIN_GRD_POPL,

URAU_2011_AT.ADMIN_GRD_POPL_SRC

FROM GISREGIO.URAU_2011_RG,

GISREGIO.URAU_2011_AT

WHERE URAU_2011_RG.URAU_CODE_2011=URAU_2011_AT.URAU_CODE_2011

AND (URAU_2011_AT.URAU_CATG = 'K'

OR (URAU_2011_AT.URAU_CATG = 'C' AND URAU_2011_AT.CITY_IN_KERNEL IS NULL));

By joining the GEOSTAT_GRD_AT table with the URAU_AT table, we determine which points are located in cities/greater cities (CGC) with at least 200,000 resp. 500,000 inhabitants.

Based on that query, we can calculate:

A) the total population living in CGC of at least 200,000 inhabitants, by NUTS3 region

B) the total population living in CGC of at least 500,000 inhabitants, by NUTS3 region

These results are added to the NUTS classification table created earlier.

The rural NUTS regions where the share of (A) in the total NUTS3 population is >= 25% will change to intermediate.

The intermediate NUTS3 regions where the share of (B) in the total NUTS3 population is >= 25% will change to urban.

These adaptations result in the final urban/rural typology of NUTS3 regions.

For the sake of clarity, we suggest to keep the information on the initial classification, the effect of the grouping of small regions, and the effect of the presence of big cities/greater cities.

4/4

[1] Description of the typology in chapter 15 of the Eurostat regional yearbook 2010: http://epp.eurostat.ec.europa.eu/portal/page/portal/product_details/publication?p_product_code=KS-HA-10-001-15

[2] See: Cities in Europe – The new OECD-EC definition: http://ec.europa.eu/regional_policy/sources/docgener/focus/2012_01_city.pdf