ESSnet on DATA INTEGRATION

Course on statistical methods for data integration

Venue: Centro Congressi Cavour,

Via Cavour, 50/a – Rome, Italy

Date: 28-30September 2011

Dear colleague,

The idea of establishing ESSnets in the field of statistics was launched in 2005 as a way to reinforce cooperation between National Statistical Institutes.

In this way the various institutes in Europe could benefit from each others’ experiences and together raise the level of their statistical production process.

TheESSnet on Data Integration (ESSnet-DI) is a 2-year project partially funded by Eurostat and active from December 2009 to December 2011.An overview of the activities related to this ESSnet can be found on the ESSnet portal:

The institutions involved in the project are Istat (Italy, coordinator), CBS (The Netherlands), INE (Spain), SSB (Norway), Statistical Office in Poznan (Poland), SFSO (Switzerland).

This ESSnet organizes a training course for interested NSIs in the area “Statistical methods for data integration”.

Please, refer to the attached documentation for the programme course.

Trainees airplane ticket costs will be covered by the ESSnet-DI budget up to a maximum of 500 Euro.

Applications to the course should be emailed to Mauro Scanu () by 30 June 2011 attaching the filled-in application form.

Looking forward to hearing from you

Yours truly,

Mauro Scanu: ESSnet-DI coordinator

Istituto Nazionale di Statistica (Istat)

Via Cesare Balbo 16, 00184 Roma (ITALY)

Tel: +39 06 4673 2887

Fax. +39 06 4673 2972

Email:

Application form

Date:28-30September 2011

Place:Centro Congressi Cavour

Via Cavour 50/A

Roma - Italy

Name of first applicant:

Name
Full Address
Email:
Phone:
Fax:

Name of second (optional) applicant:

Name
Full Address
Email:
Phone:
Fax:

Please return this form by e-mail toMauro Scanu, (for any information, phone: +39 06 4673 2887, fax: +39 06 4673 2972).

The following questionnaire should be compiled by each applicant.
Preparatory Questionnaire

Surname and First name

The present part aims at providing the trainers with detailed information on both background and level of the audience.

1.In your current professional activity, are you working in the:

Production of statistics (as user of methodology)
Development of methodology

2.Please describe your previous professional activity:

Institution
Department
Field of Activity
Hierarchical position
Seniority in that hierarchical position (years)

3.Which background do you have in the field of this course?

Mainly theoretical experience
Mainly practical experience
Some overview through seminars or reading articles
Neither courses / seminars nor reading articles

4.Present one or two problems within the field of this course, in which you have a personal interest. If you refer to some data, please enclose a short description.

5.Did you receive sufficient description of the programme of the course before registering to judge whether it would be of real use to you?

Yes
No (if so please elaborate)

6.Please state why and how you selected our course.

7.What do you expect from it?

8.Is there any subject in the programme you would like emphasis to be put on?

Outline for the ESSnet-DI course on statistical methods for data integration

Location: Centro Congressi Cavour, Roma, Italy

Lecturers: Nicoletta Cibella, Marcello D’Orazio, Marco Di Zio, Marco Fortini, Monica Scannapieco, Mauro Scanu, Luca Valentino(Istat), Eric Schulte Nordholt (Statistics Netherlands)

Language: the course language will be English.

Outline

There aremore statistical data produced in today’s modern society than ever before. Thesedata areanalysed and cross-referenced for innumerable reasons. In the case of National Institutes of Statistics the joint analysis of two or more statistical and administrative sources is theresult of a rational organization of all available informative sources andit allows the reduction of survey costs and response burden. However, many data sets are sometimes hard to combine: errors in record identifiers or lack of record identifiers may jeopardize any meaningful integrated use of the data sets.

The combination of different surveys or of surveys with administrative data needsto be analyzed by appropriate statistical methodologies. Broadly speaking, two main procedures can be considered:

  • Record linkage: complete records at unit level are obtained by fusing records of two or more data sets with appropriate unit identifiers;
  • Statistical matching: complete (synthetic) records at unit level are obtained with appropriate imputation procedures, where the data sets to integrate play the role of respectively donor and recipient files.

Once a match has been produced, it may be appropriate to use actions that ensure better quality of the matched results. These actions are usually collectivelycalled

  • Micro integration processing: quality and timeliness of the matched files; defining checks; editing procedures to get better estimates; imputation procedures to get better estimates; weighting (to population totals) issues of matched files.

This course aims at providing a comprehensive view of the state-of-the-art of record linkage methodologies , statistical matching and micro integration processing. The course will be a combination of teaching sessions, where the theory will be explained, and software sessions, complemented with practical exercises. Each topic will focus on one main application.

The target audience will be researchersworking on projects involving the integrated use of more than one data source.

The course leaders will provide handouts of all the material used during the sessions.

Day 1: topic record linkage

Lecturers. Nicoletta Cibella, Marco Fortini, Monica Scannapieco, Luca Valentino(ISTAT – Italy)

9:00-9:20Introduction to the ESSnet DI course. Opening of the course. Course orientation. Knowing each other.

9:20-10:15Definition of record linkage. Examples and motivations. Harmonization of different files: harmonization of populations, variables, classifications. Blocking of files.

10:15-10:30Coffee break

10:30-12:00A statistical model for the record linkage problem. Decision rules: the Fellegi-Sunter approach and its extensions. Estimators of the components of decision rules. One-to-one matching.

12:00-13:30Lunch break

13:30-14:45Quality in record linkage: the false match rate. Effects of linkage errors on statistical analysis.

14:45-15:00Tea break

15:00-17:00RELAIS: a toolkit for record linkage.

All the topics will be illustrated with the use of one case study: record linkage of the census and the post-enumeration survey for the evaluation of census underenumeration.

Day 2: topic statistical matching

Lecturers: Marcello D’Orazio, Marco Di Zio, Mauro Scanu (ISTAT – Italy)

9:00-10:15Definition of statistical matching: characteristics of the files and similarities with imputation problems. Definition of statistical matching objectives: micro and macro.

10:15-10:30Coffee break

10:30-12:00Statistical matching under conditional independence models. Feasibility of conditional independence. Parametric and nonparametric methods for conditional independence.

12:00-13:30Lunch break

13:30-14:45Statistical matching methods with auxiliary information. Exercises and examples on auxiliary information and choice of the matching variables.

14:45-15:00Tea break

15:00-17:00Some remarks on uncertainty: use of multiple imputation techniques;use of maximum likelihood techniques. Accuracy of statistical matching results.

All the topics will be illustrated with the use of one case study: the construction of the social accounting matrix

Day 3: topic micro integration processing

Lecturer: Eric Schulte Nordholt (Statistics Netherlands)

9.00-10.15Introduction, a model for social statistics and the Social Statistical Database as a place to integrate two different data sets.

10.15-10.30Coffee break

10.30-12.00Group work on registers and surveys.

12.00-13.30Lunch break

13.30-14.15Quality of integrated data.

14.15-14.30Tea break

14.30-15.30The Virtual Census as a more difficult example of micro integration processing.

15.30-16.00Evaluation of the course.