Italian National Statistical Institute

Session No.1

Paper No.8

Country: Italy

Progress Report

Giuseppe Garofalo

ISTAT

The ASIA project

(Setting-up of the Italian Business Register)

Synthesis of the methodological manual

Helsinki, September 1998

This paper is a synthesis of the methodological manual of the ASIA project (Italian Business Statistical Register). The several paragraphs present the main conceptual aspects, the productive process and the statistical methodologies of input, estimation and check of data adopted. Besides the document contains some tables quantifying the obtained results. In order to realize this synthesis documents and publications, drawn up by several researchers involved in carrying out the project, have been adopted, integrated and re-elaborated. Specific references are made in a special bibliographic note.

The synthesis has been edited by Giuseppe Garofalo the translation into the english language has been realized by Maria Fustaci

1. Introduction

The knowledge of universe of enterprises on a territory is a remarkable information for decisions of operators and for analysis on characteristics of production development of country and its own evolution through time. In Italy such universe was known only every 10 years, in occasion of general censuses, and with a few years of delay (3 in the last census of 1991) with respect to date of reference, because of technical necessary times to elaborate the survey patterns. Over the remarkable problems of costs and of the difficult organization, the census surveys present, for the survey “door to door” adopted technique, a reduced covering for a few sectors of activity (free professionals, intermediary of trade, building, and transports).

Limits of the census survey, and the continuous technological innovation, which reduces the times of structural modification in an industrial system, have stimulated the demand for defining and setting-up a new instrument at the centre and service of an integrated system of economic information: a unique, complete and updated statistical business register.

Planning and constructing a complete and updated statistical register have to exploit, in order to be economically feasible in a synergical processing, all informative capacity on enterprises whose information is already stored by institutions or public administrations. As a matter of fact, enterprises systematically and frequently produce administrative acts during their lives: they pay taxes, stipulate telephone or electric contracts, insure the employees against accidents on work, etc. All these are administrative acts, but in reality each of them hides information that is possible to locate and explain in statistical point of view.

On the use of administrative registers for statistical purposes have matured, from the end of 1980, a series of convictions and significant experiences of research, inside and out the Italian National Institute of Statistics, that have consented the cumulating of know how and proved the technical feasibility.

The Italian National Institute of Statistics carried out[1], since 1994 a complex project (called ASIA) for the realisation of a statistical business register as result of the logical and physical integration of data resident in administrative and statistical sources and their treatment with statistical methodologies.

2. Phases and timetable of ASIA project

ISTAT instituted in 1994 a workgroup to plan and realize the new statistical register of enterprises on base of available information in administrative field. Such group ended its works in December 1994, outlining the conceptual and organizational architecture on the base of which ASIA project has developed in 4 phases.

First phase, 1995. Experiment on 3 provinces. During 1995 has been realized the experiment of the first plant of ASIA base, relative to all sectors of economic activity except agriculture, forestry, fishing, public administration and services of public utility (instruction, soundness, assistance, culture, etc.) and its application in national ambit. The first phase is divided in 5 subprojects: a)definitions and classifications system of unit and characters; and its correspondence with those contained in the administrative and statistical sources that constitute data input of ASIA; b)arrangement and experiment on 3 provinces of procedures of check, code matching that lead to physical integration of administrative registers utilized - DATIN (Administrative integrated data) -; c)arrangement and experiment on 3 provinces of linkage procedures, of check of coherence, methodologies of data imputation and missing data estimation that lead to the list of elementary statistical units of ASIA; d)arrangement, forming and organization of necessary resources for the first plant of ASIA base (sources, people, informatic instruments). Predisposition of modality of filing, distribution and publication of elementary data - LISTER (territorial lists) - and aggregated - DATER (territorial statistics data) -; e) predisposition and organization of territorial permanent net (regional and provincial) necessary for the realization of surveys of check and for local data diffusion.
Second phase, 1996. First experimental national plant. On the base of the experiment on 3 provinces, in the second phase was pointed a complex system operating at national level; the training of personnel operating in the regional offices of ISTAT and in the provincial statistical offices of Chambers of Commerce was done (territorial check net) reaching the end of the first national experimental plant of ASIA enterprises (with identified data referred to 1995 and structural ones referred to 1994).
Third phase, 1997. Final national plant. In 1997 started the quality check of the first national plant of ASIA base; DATINT’95 was produced and returned to the suppliers and new data of 1996 were collected; the productive processing of the final first plant with the 1996 matched data and the 1995 stratification data, was carried out.
Fourth phase, 1998/99. Development.The fourth phase foresees the 1998/99 development, the check on territory of first national plant of ASIA through the Intermediate Census. Afterwards the satellite registers will be built for particular sectors (for instance trade, craft, and tourism) for which specific stratification characters, through proper sources, will be collected.

3.ASIA Project: the main features

3.1. ASIA as centre of the Statistical Information System

In the 1996-98 National Statistical Programme and especially in economic area, it’s specified that during these three years, ISTAT will develop the “progressive passage from statistics on enterprises, actually relevant, to an integrated system on economical statistics”. This means that the Statistical Information System will be structured and integrated as the complex of available information in economic field, both collected directly from the Institute through the surveys and from other sources (for instance administrative registers).

A systematic approach to the economic information collected by ISTAT can assure a growth on the quantity of information “distributed” and a better quality besides it guarantees a higher coherence among several information gathered by different surveys for the same sub-universe of statistical unit. At the same time, there is a reduction of time, costs and statistical burden when the system foresees the integration of “external” not statistical data.

Generally an information system is planned in order to let any organization (for instance: an enterprise) to manage the necessary information for its activity. We can mention SIS (Statistical Information System) when wide basis, from information sources managed by different organizations, are available only for the study of particular phenomena. In particular SIS supports the first phase of the statistical analysis, that is the collection of basic information and of the right use of meta-information. In the general meaning SIS is an integrated System because contains all the information produced and managed by several statistical surveys. Each statistical survey can be considered as an organizing structure which produces and manages independently from the others, an own base of data.

The central element of SIS is a complete and updated Register of a statistical units survey (and not necessarily the observation ones), which is both the Physical connector centre since the several typologies of gathered or acquired information are connected to the individual elementary units, and the Logical connector centre since it has the necessary meta-information (definitions, classifications) for the whole system consistency.

The Business Statistical Register (ASIA) was designed and realized in order to:

define an updated list of elementary units which is both the target frame and the sampling frame of different surveys on enterprises;

define statistical data on the economic units structure and on its evolution;

be the gravity centre of the Integrated Information System on economic statistics of ISTAT.

3.2. ASIA as a complex filing system

The target to define a statistical register of economic units, seen as the connector centre of a complex statistical information system, implies that data filing system should assure:

completeness: should include all the elementary units in the national territory and should guarantee for each unit, the completeness of characters (eventually through proper estimation methodologies) necessary for the whole management system;

reliability: should assure a right and constant updating on units and characters;

confidentiality: shouldprotect the individual information during the several phases of data processing;

management: the completeness and entrusting could be secured if the recording and updating is limited to the only necessary and main characters;

flexibility: should guarantee the management of particular sub-universe defined according to the known demands of different typologies of users;

inexpensiveness: should avoid redundancies on data, duplications and disagreements in updating.

This means to propose a system that:

1.defines different typologies of data for different typologies of units;

2.separates the moment of “processing data” from the “access” on data by users;

3.allows the management of “units partial views” and of data defined on the base of individual users’ requirements.

Therefore, the System has been carried out as an integrated complex of registers set up as follows:

a) - central, sectorial (or satellite) and survey registers,

b) - management and dissemination registers,

c) - national and territorial registers,

d) - separate registers for different typologies of recorded units.

The necessity to determine the typology sub a) (schematized at Fig. 1) is due to the requirement to separate the subregisters according to the necessity of each sector to record particular stratification characters. Three central registers have been located: ASIA-enterprises, ASIA-agriculture and ASIA-institutions. The first register includes all units carrying out its main activity in the industrial and services sectors, in profit organizations (units built with a profit aim including co-operatives and unions). ASIA-agriculture includes all units carrying out its prevalent activity in agriculture, silviculture and fishing sectors. Fourthly ASIA-institutions includes the whole Public Administration, public organizations which undertake services not for sale and private non-profit organizations.

In particular the logical approach of the Asia system is the definition of the satellite registers (SR). This typology has been defined because the informative requirements for some activities sectors (or for some unit typologies) demand the recording of particular characters, different by those of central registers, with the main task of:

better stratify the units in order to reduce the variability in the strata,
better classify the units in order to ensure a major knowledge of the whole structure of a certain sector.

Within the definition and building of a SR, the filing system should be unitary through the following three inalienable conditions:

1.all units of the satellite register will be included also in the general or central register,

2.the unit identify code should be the same in the several register typologies where the unit is shown,

3.the concept of information hereditary , therefore the SR collects the interesting information from the central registers since it is possible to modify them.

So a SR can be consideredas a partial view enriched by the central register.

The necessity to distinguish between registers of management and registers for users rises in order to warrant the user to dispose of verified information checked at a specific date, while with regard to the need of ASIA "management", registers of management will be able to contain information at several level of updating and correctness.

The complexity of a system like ASIA, including different millions of units, requests the involvement of many organizational structures, particularly those closer to territory, that better and more directly can verify units in the competent territory and at the same time can warrant diffusion of ASIA products "personalizing them" at particular demands. In this sense the organizational structure of ASIA has been designed at 3 levels: National (Central ISTAT), regional (regional offices of ISTAT) and provincial (statistical offices of Chambers of Commerce) where at each level the activities of management and of set unit diffusion are assigned. The typology named “territorial and national registers" defines, therefore, a division of ASIA which individuates set units that can be managed only at central level (great enterprises, enterprises at national diffusion, groups of enterprises etc.) i.e. at local level.

At last it is necessary to individuate, in order to avoid redundancies, different registers’ structures for different typologies of units. So, logically, the local units register is distinguished by the enterprises (the latter won't contain for instance the character address that will be recorded in the correspondent local unit defined as the enterprises’ office). Logical links between different typologies of units (local units referring to the same enterprise, enterprises belonging to a same group, etc.) are warranted by physical recalls to unit codes.

3.3. ASIA as integration of administrative sources.

Every administrative body has its own function to collect data and manage the corresponding records, under specific legislation and rules which govern relations between various individuals and between them and the public administration. Thus, each source makes use of definitions, classifications and rules on entry and cessation that are peculiar to itself and depend on the functions of the authority concerned. The administrative body defines, classifies, collects and records information on economic subjects and their characteristics that, in the strict sense of word, do not have statistical validity. In other words using administrative data causes statisticians a problem (not of easy solution): the inconsistency of data. In a survey or inside a system for the collection of statistical data consistency is a problem evaluated ex-ante as well as it is strongly linked to the process of microdata collection and macrodata production. When we want to use data stored in non-statistical (administrative) databases, for which statisticians do not have any control of the production process, the problem of consistency is set in a different context and it is resolvable only ex-post.

So the main problem that arises through the use of administrative sources for statistical purposes is to identify the correspondences between the statistical concepts and the administrative rules and laws through which those sources observe the universe, or population, of reference. It is therefore necessary to handle in some way the administrative sources in order to align them to the statistical concept and definitions. This is possible if, at one side, we have a deep knowledge of the sources to be used and, on the other side, suitable statistical methodologies are available.

The conceptual elements previously mentioned show how, considering the use of administrative data (and their possible integration) an informatic problem tied to the treatment of large database, can lead to grave errors. Only the appropriate use of statistical methodologies can assure consistent results.

Referring to a defined statistical universe, the typologies of errors generated in the use of an administrative source for statistical purposes, can be summarised as follows:

E1 – error of under-recordinga) missing-recording of legal subjects due to evasion, delays, etc…

b) unrecording of legal subject not obliged to the registration.

E2 – error of over-recordinga) registration of not active legal subject due to duplications, delays of cessation recording

b) registration of legal subjects without any feature of enterprise

E3 – error assignment of charactersa) wrong recording due to delays in variations acquired or to errors in declarations, in recording, in checking.

b) wrong recording due to different definitions and classifications

E4 – missing assignment of char.a) partial or total missing of attribution of a character

The integration process is useful when the considered input sources do not assure the completeness in units and in characters’ units, obtaining in such way a reduction of errors of type E1 and E4. Such process is less useful for the reduction of over recording errors and of wrong character attribution. In fact using more sources can cause an increase of type E2 error; while if each source is really and considerably better of other, further information for imputation of statistical characters would cause troubles. In such case the probable gain of information obtained is tied to hypothesis of structure and quality of data on available sources and to the veracity of statistical hypothesis on which the process of conceptual and physical integration and optimisation are based.

Referring to the formal aspect of the integration process, let xi the real value of the i-ma unit related to the attribute X and xi1… x ij , ....x im are the values recorded in M available sources. The relation between the available values and the real ones can be described as follows:

x ij = g (x i , e j ,  ij)

where e j is the error due to the bias (structural errors of type “b” previously described) of j-ma source, and  ij describes the random error (errors type “a”).

When the knowledge of the j-ma sources is completed, it is possible to locate rules which standardise (or harmonise or normalise) the units and the variables of input source in statistical units and variables. So the standardisation function is defined as the following application

f s: Xj X which changes the values x j Xj in values x iXi

In other words, a standardisation rule converts administrative concepts and classifications into statistical ones. This rule, generally deterministic, can be divided into three types:

coding rules: which convert coding (e.g. economic activity, legal form, and location) into statistical classifications (Nace, Nuts, etc. );
link rules: by which the different records corresponding to legal or administrative units in one source can be combined to define one statistical unit (enterprise or local unit);
conversion rules: to obtain statistical variable from administrative characters

After the standardisation on process the erratic component of the model is reduced to the random error  ij ; therefore the sources are independent and unbiased random variables with same or at least constant quality in order to adopt procedures appropriate to some experimental frameworks such as the theory of repetition of an experiment.But the situation is more complicated. The systematic errors in each sources, produced by administrative-juridical functions, are the result of two separate elements: the first tied to laws, classifications, proper definitions of the source, can be located and easily standardised; the latter, tied to elusion and evasion phenomena, can’t be located and is misted inside the random error. In such way the model will change and be complex as follows: