2 Short Description of the National CIS4 Methodology Used4

Reference period / 2004
Observation period / 2002-2004
Person who filled the report / Maria Predonu
Date / 10.11.2006

TABLE OF CONTENTS

1 Overview3

2 Short Description of the national CIS4 methodology used4

3 Relevance 8

4 Accuracy 11

5 Timeliness and Punctuality17

6 Accessibility and Clarity 18

7 Comparability 19

8 Coherence 21

9 Cost and Burden22

10 Annexes24

1 OVERVIEW

The purpose of this report is to get an overview of the quality of the Fourth Community Innovation Survey (CIS 4) carried out in each member state. The quality report is to be established for the CIS 4. The same is also envisaged for subsequent Community Innovation Surveys.

This quality assessment will be based on different quality dimensions and indicators. The quality dimensions are based on the standard ones as defined in the Eurostat standard statistical quality framework. Also the indicators themselves are in linewith these recommendations. Indeed, the criteria to judge statistical quality will correspond to a specific chapter in the report. These criteria are: Relevance, Accuracy, Timeliness and Punctuality, Accessibility and Clarity, Comparability, Coherence and Cost and Burden. In addition each report should contain a short methodological description of the national methodology used for the CIS 4.

2 SHORT DESCRIPTION OF THE NATIONAL CIS 4 METHODOLOGY USED

2.1 Target population

NACE

In accordance with section 2 of the annex of the Commission Regulation No 1450/2004 on innovation statistics, the following industries are included in the core target population of the CIS 4:

-mining and quarrying (NACE 10-14)

-manufacturing (NACE 15-37)

-electricity, gas and water supply (NACE 40-41)

-wholesale trade (NACE 51)

-transport, storage and communication (NACE 60-64)

-financial intermediation (NACE 65-67)

-computer and related activities (NACE 72)

-architectural and engineering activities (NACE 74.2)

-technical testing and analysis (NACE 74.3)

Please list all “non-core” industries that were covered in addition:

Comments: in addition research and development (NACE 73)

………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

Size-classes

All enterprises included in the target population follow the minimum coverage which is all enterprises with 10 employees or more.

Please indicate if there were some deviations.

Comments: no deviations

Statistical units

The main statistical unit for CIS 4 is the enterprise, as defined in the Council Regulation 696/1993 on statistical units or as defined in the national statistical business register. EU Regulation 2186/1993 requires that Member States set up and maintain a register of enterprises, as well as associated legal units and local units.

Please indicate if there were some deviations.

Comments:no deviations

The observation and reference periods

The observation period to be covered by the survey is 2002-2004 inclusive i.e. the three-year period from the beginning of 2002 to the end of 2004. The reference period of the CIS 4 is the year 2004.

Please indicate if there were some deviations.

Comments: no deviations

2.2 Sampling design

The target population of the CIS 4 is broken down into similar structured subgroups or strata (which should be as homogeneous as possible and form mutually exclusive groups).

The stratification variables to be used for the CIS 4, i.e. the characteristics used to break down the sample into similarly structured groups, are:

- The economic activities (in accordance with NACE)[1].

In accordance with the requirements of section 5, paragraph 2 of the annex of the Commission Regulation 1450/2004 on innovation statistics, stratification by NACE should be done at least at two-digit (division) level, except for NACE 74. Here the three digit sections NACE 74.2 and 74.3 should be treated as separate NACE categories while NACE 74.1 and 74.4 to 74.8 should be treated as a single NACE category.

- Enterprise size according to the number of employees[2].

The size-classes used should at least be the following:

0-9 employees

10-49 employees
50-249 employees
250+ employees.

- Regional aspects at NUTS 2 level:

In accordance with section 7, paragraph 2 of the annex of the Commission Regulation 1450/2004 on innovation statistics, the regional allocation of the sample is taken into consideration when sampling.

The selection of the sample should be based on random sampling techniques, with known selection probabilities, applied to strata.

Please describe the sampling and allocation scheme used (number of strata, number of samples…).

Comments:The sampling design used was the stratified sampling with simple random sampling within the strata. The strata were defined according to the geographical region, economical activity and the size of the enterprise. For the sample allocation we used the Neymann allocation

2.3Sampling frame

The official, up-to-date, statistical business register[3] of the country was used.

Please indicate if there were some deviations.

Comments:no deviations

2.4 Sample size and overall sample rate

There is no minimum sample size. However, if a particular stratum has less than 6 enterprises, then all the enterprises in this stratum were selected for the survey.

Please indicate national deviations from this rule as well as the overall sample rate if available.

Comments:no deviations

2.5 Data collection method

Data are collected through a census, sample survey or a combination of both.

Please indicate the data collection method used.

Comments:…The method used was a combination of a sample survey(for 10-99 employees) and a census survey( for 100 and more employees).

2.6 Weights calculation method (short description)

The survey results are weighted in order to adjust for the sampling design and for unit non-response to produce valid results for the target population.

The basic method for adjusting for different probabilities of selection used in the sampling process is to use the inverse of the sampling fraction i.e. using the number of enterprises or employees. This would be based on the figure Nh/nh where Nh is the total number of enterprises/employees in stratum h of the population and nh is the number of enterprises/employees in the realised sample in stratum h of the population, assuming that each unit in the stratum had the same inclusion probability. This will automatically adjust the sample weights of the respondents to compensate for unit non-response.

However, if a non-response analysis is carried out (and the results indicate that there is a difference between respondents and non-respondents), then the results of the non-response analysis should also be used when calculating the final weighting factors. One approach is to divide each stratum into a number of response homogeneity groups with (assumed) equal response probabilities within groups. A second approach could be to use auxiliary information at the estimation stage for reducing the non-response bias.

Various software packages are available to do the calculations needed to derive calibrated weights. These include:

CLAN. This was developed by Statistics Sweden and it is a suite of SAS-macro commands.
CALMAR (Calibration on Margins). This is another SAS macro developed by INSEE in France.
CALJACK. This is also a SAS macro developed by Statistics Canada.

Please describe the calibration method and the software used:

Comments:The calibration method was CLAN.

2.7 Transmission

CIS 4 data will be transmitted to Eurostat via STADIUM. This safe, secure procedure guarantees a method of tracking transmission. All necessary steps will be taken to ensure that the STADIUM system is working at national level.

Please indicate if there are some deviations.

Comments: no deviations

2.8 Overall assessment of the survey

Please give a short overall assessment of the quality of the CIS 4 (in listing the main strengths and weaknesses of the CIS 4 by also referring to the standard quality criteria).

Comments: difficult to answer but, we consider as:

strengths criteria: relevance, timeliness and punctuality, accuracy, accessibility and clarity

weaknesses criteria: comparability, cost and burden

3 RELEVANCE

3.1 Introduction

Relevance is the degree to which statistics meet current and potential users’ needs. It includes the production of all needed statistics and the extent to which concepts used (definitions, classifications etc.) reflect user needs. The aim is to describe the extent to which the statistics are useful to, and used by, the broadest array of users. For this purpose, statisticians need to compile information, firstly about their users (who they are, how many they are, how important is each one of them), secondly on their needs, and finally to assess how far these needs are met.

The CIS is useful and demanded by many users. As an example, the CIS data are used for the European Innovation Scoreboard and many other analytical publications.

3.2 Description and classification of users and users’ needs

The CIS4 is based on a common questionnaire and a common survey methodology, as laid down in the Oslo manual 1997, in order to achieve comparable, harmonised and high quality results for EU Member States, Candidates Associated countries and EFTA countries.

Table 3.1: Users and users’ needs at national level (an example is given in the table)

Users’ class / Classification of users / Description of users / Users’ needs
1 / European level / The European Commission (DG ENTR) / Data used for the European Innovation Scoreboard and its further development
National level / Ministry of Finance, Ministry of Economy and Trade, Ministry of Research and Education, National Authority for Scientific Research,
Regional Authorities for Development, National Institute of Economy, National Institute of World Economy / Data used for R&D and Innovation national strategy, publications, training,
International organisations / OECD / Data used for international comparability
2 / Social actors / Authorities for Regional Development / Data used for development of regional indicators
3 / Media / National and regional media / Data used for analyses and comments to the general public.
4 / Researchers and students / Data used for analyses
5 / Enterprises or businesses / Market analysis, marketing strategy, consultancy services

Please use the following user classes when completing the table:

1- Institutions:

• European level: Commission (DGs, Secretariat General), Council, European Parliament,ECB, other European agencies etc.

• in Member States, at the national or regional level: Ministries of Economy or Finance,Other Ministries (for sectoral comparisons), National Statistical Institutes and otherstatistical agencies (norms, training, etc.), and

• International organisations: OECD, UN, IMF, ILO, etc.

2- Social actors:

Employers’ association, trade unions, lobbies, among others, at the European, national orregional level.

3- Media

International, national or regional media – specialised or for the general public – interested both infigures and analyses or comments. The media are the main channels of statistics to thegeneral public.

4- Researchers and students

Researchers and students need statistics, analyses, ad hoc services, access to specificdata.

5- Enterprises or businesses

Either for their own market analysis, their marketing strategy (large enterprises) or because they offer consultancy services

Table 3.2: Unmet users’ needs at national level

Please use the user classes shown above when completing the table.

Users’ class / Unmet users’ needs / Plans for improvement

If there are some actions for decreasing the unmet user needs, please specific them:

……………………………………………………………………………………………………………………………We have not met such a situation.

3.3 User satisfaction

To evaluate if users’ needs have been satisfied, the best way is to use user satisfaction surveys. However, if no user satisfaction survey has been conducted, a proxy of this is to measure how the delivered data corresponds to the requested data. This aspect of relevance is measuredby the main deviations from information specified in the CIS4 data collection, in terms of:

Nace deviations
Size class deviations
Variable deviations

Please describe the national user satisfaction survey if it has been undertaken.

Comments:We have not used a satisfaction survey………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

Please calculate the number of missing cells in the standard CIS 4 output tabulation at national level.

(On CIRCA: CIS4 documentation/CIS4 National tables)

Table 3.3: National tables

TABLE / NUMBER OF ALL CELLS / Number of compulsory cells / Compulsory cells missing / Number of voluntary cells / Voluntary cells missing
INN_BASIC1 / 95x6 / none / 570 / none
INN_BASIC2 / 95x8 / 232 / none / 528 / none
INN_GEN / 95x12 / none / 1140 / none
INN_ENTER / 95x8 / 78 / none / 682 / none
INN_DEVELOP / 95x6 / none / 570 / none
INN_NEWPROD / 95x7 / 233 / none / 432 / none
INN_EXPEND / 95x15 / none / 1425 / none
INN_FUNDING / 95x5 / none / 475 / none
INN_SOURCES / 95x10 / none / 950 / none
INN_COOP / 95x19 / 617 / none / 1188 / none
INN_EFFECTS / 95x9 / 693 / none / 162 / none
INN_DELAY / 95x3 / none / 285 / none
INN_HAMP1 / 95x9 / 693 / none / 162 / none
INN_HAMP2 / 95x9 / 693 / none / 162 / none
INN_HAMP3 / 95x4 / 308 / none / 72 / none
INN_PATENT / 95x8 / none / 760 / none
INN_ORGMKT / 95x6 / none / 570 / none
INN_EFFORG / 95x8 / none / 760 / none

Please calculate the number of missing cells in the standard CIS 4 output tabulation at regional level.

(On CIRCA: CIS4 documentation/CIS4 tabulation scheme for NUTS2 regions, by country/ CIS4 tables for country x.)

Table 3.4: Regional tables (all tables are delivered on a voluntary base)

TABLE / NUMBER OF ALL CELLS / Number of missing cells
INN_BASIC1 / complete / none
INN_BASIC2 / complete / none
INN_GEN / complete / none
INN_ENTER / complete / none
INN_DEVELOP / complete / none
INN_NEWPROD / complete / none
INN_EXPEND / complete / none
INN_FUNDING / complete / none
INN_SOURCES / complete / none
INN_COOP / complete / none
INN_EFFECTS / complete / none
INN_DELAY / complete / none
INN_HAMP1 / complete / none
INN_HAMP2 / complete / none
INN_HAMP3 / complete / none
INN_PATENT / complete / none
INN_ORGMKT / complete / none
INN_EFFORG / complete / none

4 ACCURACY

4.1 Introduction

Accuracy in the statistical sense denotes the closeness of computations or estimates to the exact or true values. Statistics are not equal with the true values because of variability (the statistics change from implementation to implementation of the survey due to random effects) and bias (the average of the possible values of the statistics from implementation to implementation is not equal to the true value due to systematic effects).

Several types of error occur during the survey process which comprises the error of the statistics (their bias and variability). A typology of errors has been adopted:

1. Sampling errors. These only affect sample survey; they are simply due to the fact that only a subset of the population, usually randomly selected, is enumerated.

2. Non-sampling errors. Non-sampling errors affect sample surveys and complete enumerations alike and comprise:

a)Coverage errors,

b)Measurement errors,

c)Processing errors,

d)Non response errors and

e)Model assumption errors.

4.2 Sampling errors

The aim of this sub-chapter is to measure the sampling errors for CIS4 data. The main indicator used is the coefficient of variation (CV).

Table 4.1: Coefficient of variation for key variables by NACE and size (cf. Annex 10.1)

NACE / Breakdown / 1 / 2 / 3 / 4 / 5
Total NACE
Total / 0.3495 / 0.0294 / 0.0543 / 0.0874 / 0.7298
Small [10-49] / 0.2028 / 0.0487 / 0.0884 / 0.1414 / 0.7464
Medium-sized [50-249] / 0.6554 / 0.0288 / 0.0549 / 0.0701 / 0.0255
Large [> 249] / 0.9293 / 0.0087 / 0.0165 / 0.0997 / 0.0191
10-14, 15-37, 40-41 / Industry
Total / 0.4930 / 0.0634 / 0.0836 / 0.1316 / 0.3320
51, 60-64, 65-67, 72, 74.2, 74.3 / Services
Total / 0.4065 / 0.1265 / 0.1861 / 0.2678 / 0.2149

[1] = Coefficient of variation for the percentage of innovating enterprises.

[2] = Coefficient of variation for the percentage of innovators that introduced new or improved products to the market.

[3] = Coefficient of variation for the turnover of new or improved products, as a percentage of total turnover.

[4] = Coefficient of variation forpercentage of innovation active enterprises involved in innovation cooperation.

[5] = Coefficient of variation for total turnover per employee.

4.3 Non-sampling errors

Non-sampling errors occur in all phases of a survey. They add to the sampling errors (if present) and contribute to decreasing overall accuracy. It is important to assess their relative weight in the total error and devote appropriate resources for their control and assessment.

4.3.1 Coverage errors

Coverage errors (or frame errors) are due to divergences between the target population and the frame population.

Coverage error

The indicator measures the percentage of enterprises that changed strata from when the survey design was created to the date when the actual survey was done. This is defined as the number of enterprises that changed stratum from the frame population to the realised sample, as a % of the number of enterprises in the sample population.

Table 4.2: Frame misclassification rate by size class

SMALL [10-49] / MEDIUM
[50-249] / LARGE
[>249] / TOTAL
Total Number or enterprises / 5702 / 4213 / 1627 / 11542
Number of enterprises that changed strata / 299 / 93 / 26 / 418
Share of enterprises that changed strata / 0.0524377 / 0.02207453 / 0.0159803 / 0.03621556
Share of enterprises that did not change strata / 0.9475623 / 0.97792547 / 0.9840197 / 0.96378444

Please include an assessment of under-coverage if the information is available (if possible in quantitative terms):

Comments:…only 827 units are in this situation.

4.3.2 Processing errors

Between data collection and the beginning of statistical analysis on the base of the statistics produced, data must undergo a certain processing: coding, data entry, data editing, imputation, etc. Errors introduced at these stages are called processing errors. Data editing identifies inconsistencies in the data which usually represent errors.

Please describe the editing process and method (give the editing rate if possible).

Comments:…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

Please describe the data processing in adding quantitative information if possible:

4.3.3 Non-response errors

Non response is when a survey failed to collect data on all survey variables from all the population units designated for data collection in a sample or complete enumeration.

There are two elements of non response:

Unit non-response which occurs when no data (or so little as to be unusable) are collected about a designated population unit.
Item non-response which occurs when only data on some, but not all survey variables are collected for a designated population unit.

The extent of response (and accordingly of non response) is measured with response rates.

4.3.3.1 Unit response rate

In this part, the main interest is to judge if the response from the target population was satisfying by computing the weighted and un-weighted response rate.

Table 4.4: Unweighted and weighted unit response rate

NACE / Breakdown / [1] / [2] / [3] / [4]
Total NACE
Total / 9180 / 11698 / 0.7847 / 0.7551
Small [10-49] / 3893 / 5754 / 0.6765 / 0.6948
Medium-sized [50-249] / 3791 / 4302 / 0.8812 / 0.8987
Large [> 249] / 1496 / 1642 / 0.9111 / 0.9296
10-14, 15-37, 40-41 / Industry
Total / 5563 / 6067 / 0.9117 / 0.7900
51, 60-64, 65-67, 72, 74.2, 74.3 / Services
Total / 3617 / 5631 / 0.6423 / 0.6970

[1] = Number of units with a response in the realised sample

[2] = Total number of units in the sample

[3] = Unweighted unit response rate

[4] = Weighted unit response rate

NB: The weight to be taken is the sampling weight

4.3.3.2 Item non-response rate

Analysing the response in terms of how many of the enterprises that responded related to the sample size is the first step. The second one is to evaluate the item non-response (INR).

Please fill in the table 10.3(Item non-response rate per NACE) and the table 10.4 (Item non-response rate per SIZE) in the annexes to this report if the information is not yet added.