WGMApril05/EN

WG MEthodology

5 APRIL 2017

Item 2.2of the agenda

Waivers implementation in business statistics

Eurostat1WGM April2017/EN

1. Executive summary

Detailed tables in business statistics usually have several cells with small counts or dominant businesses due to the skewed distributions. This constitutes a disclosure risk and it is often dealt with through suppression of the cells at risk. This can easily lead to suppression of a lot of cells, thus producing a significant loss of information. A potential solution is the use of waivers (disclosure permissions). The EU Statistical Law allows publishing data with the agreement of the company in question. However, this option is currently not used by all ESS countries. In order to facilitate a more widespread use of waivers, Eurostat – in collaboration with the Expert Group on Statistical Disclosure Control – has drawn up a document discussing some of the issues surrounding waivers, based on experiences in some National Statistical Institutes (NSIs) and inparticular in ProdCom (see annex).

Discussions within the Expert Group are ongoing on the issues presented in paragraph 2.

The Working Group is invited to give feedback on the document on the use of waivers in business statistics, including on the open issues.

2. Open issues

Tick-boxes

In one example, a NSI has included a question on permission to publish confidential data (if needed) directly in the questionnaire through a tick-box. Some members of the Expert Group questioned this approach as it would not be clear whether the person giving the authorisation, has the proper mandate to do so(e.g. sometimes external companies complete the statistical questionnaires for the big enterprises).

What is the perception of the Working Group in view of formulating a recommendation? Do similar practices exist in other NSIs?

(2,k)-rule

On the basis of the (2,k)-rule the cell is confidential if two companies are dominating the cell (over k% of the total). If the two largest companies have given a waiver, the cell can be published. The on-going discussion focusses on the case where the largest company has given a waiver, but the second largest not. According to some, this does not change the situation and the cell should remain confidential. According to others, keeping this cell confidential would lead to overprotection of the second largest company, especially in case this company is not dominant in the remainder of the cell total (see the table below); the publication of the cell with the waiver of the biggest company shall be therefore subject to a further check.

Example: two cases for the (2,90)-rule, contributions in percentage of the cell total

Case a / Case b
Largest / 90 / 46
Second / 1 / 45
Remainder / 9 / 9

Consider the cases a and b in the table.

1) Without a waiver of the biggest and second biggest contributors to the cell, the cell should remain confidential in both cases.

2) If only the second largest has given a waiver the cell should remain confidential in both cases.

Publication of the cell total would leave the second largest company in the best position to estimate the contribution of the largest relative to the actual contribution of the company (taking the cell total minus the own contribution). This risk is not changed by the waiver; therefore the cell total should remain confidential.

(In case a the second largest company may approximate the contribution of the largest with a margin of 10 % (99/90) and in case b with a margin of about 20 % (55/46).)

3) If only the largest company has a waiver, two interpretations can be considered:

- the (2,90) rule defines a 10% safety margin (100-90) relative to the cell total; the cell should still remain confidentialas the approximation of the second biggest contributor relative to the cell total is less than 10%.

(In case a and b the largest contributor can estimate the second contribution within less than 10% of the cell total, case a: (100-90-1)/100, case be: 100-46-45/100) so the cell shall remain confidential).

- the waiver of the largest company shifts the highest risk to the largest company disclosing the contribution of the second largest (taking the cell total minus the own contribution); the accuracy of the estimation is relative to the contribution of the second largest. It depends on the dominance of the second largest in the remainder of the cell total. An additional test (e.g. (1,k) rule or p% rule on the confidential reminder) is needed to decide whether there is a significant disclosure risk.

(In case a,where the second largest is small compared to the remainder of the cell,the largest has no good approximation (10/1).In case b, where the largest and the second company are about the same size, the margin for the largest to guess the contribution of the second is about 20% (54/45). It Itn )

The difference in the two interpretations is in the denominator: in the first interpretation the approximation of the value of the second contributor by the largest (in case a: the approximation is 100-90-1, in case b: 100-46-45)is calculated in relation to the cell total, in the second interpretation in relation to the actual value of the second biggest.

What is the view of the Working Group? The final document should probably present both interpretations. The issue is whether or not to come to a common understanding. Would a clear recommendation be preferred or not?

Passive confidentiality

The document on waivers does not address passive confidentiality (a situation that data is considered non-confidential by default); it only clearly distinguishes passive confidentiality from the use of waivers. From the perspective of statistical production, passive confidentiality is easier to implement than waivers, and it is also likely to generate more valuable output. Currently,over half of cells are suppressed in detailed tables like for ProdCom. If such detailed information is needed in European statistics, then the confidentiality approach should support the availability of information.

It is thus proposed that the Expert Group will look for its future work into criteria on where to apply passive confidentiality. A new project in the Centre of Excellence could also be envisaged. What is the view of the Working Group?

/ EUROPEAN COMMISSION
EUROSTAT
Directorate B: Methodology; Corporate statistical and IT services
Directorate G: Global business statistics
Unit B-1: Methodology and corporate architecture
Unit G-1: Coordination andinfrastructuredevelopment

Luxembourg, 10 March 2017

[Annex to document 2.2 for the Working Group on Methodology on 5 April 2017]

Deployment of waivers to reduce confidentiality suppressions in business statistics in the ESS

Introduction

In September 2016 the EG SDC (Expert Group on Statistical Disclosure Control) endorsed recommendations for confidentiality management in business statistics. One of these was to encourage the use of waivers (disclosure permissions). The EU Statistical Law allows publishing data with the agreement of the company in question. However, this option is currently not used by all ESS countries. In order to facilitate a more widespread deployment of waivers, this paper aims to discuss some of the issues surrounding waivers, and share good practices and experiences of some NSIs already using them.

What is the legal status of waivers?

The EU Statistical law (Regulation 223/2009) provides for waivers, which is already sufficient as such as their legal basis for European statistics. It would be awkward if some data would be published in European statistics but not in national ones, in case of lacking national legislation on waivers. Based on European law the statistics may be published nationally, but problems may arise in situations where nationally disseminated statistics are not identical to European statistics (e.g. additional variables or codes). It may be necessary to follow the requirements of the national provisions regarding the exceptions to confidentiality, such as using waivers. It is to be noted that waivers constitute an exception to active confidentiality; they are not the same as passive confidentiality in which by default data would be non-confidential (an approach stipulated for international trade in goods statistics).

What data sources can waivers be applied to?

Waivers are applicable to all data sources including administrative data. Although a company would not be directly surveyed, a waiver can still be asked for. Communication on this with companies can easily become confusing as the company might be unaware of the statistical use of the data.

Do waivers pay off; will companies give consent?

This is the key question when considering the implementation of a waiver system. There are some fresh, encouraging examples in the domain of industrial production statistics (ProdCom):

Slovakia sent a letter asking for a waiver together with the survey questionnaire to 751 selected companies, out of which 41% gave their permission to publish (otherwise) confidential volume data for the coming five years. Some of the companies were contacted also by email or phone to increase acceptance. As a result, the proportions of confidential data in SK ProdCom 2015 went down from 87% to 45% for Total quantity and from 69% to 44% for Sold quantity.

Hungary introduced a question into their online questionnaire asking everybody whether the reported data can be disclosed; 60% of the respondents agreed. The share of confidential cells in HU ProdCom 2015 went down: Value from 52% to 24%, Total quantity from 77% to 45% and Sold quantity from 55% to 26%. The additional question was seen to be cost neutral.

Poland conducted a special survey on giving consent to data disclosure. A letter and the questionnaire were sent to 1403 companies whose data had been protected earlier with confidentiality suppressions; 1014 replies were received containing 665 consents. These consents, valid until withdrawn, should clearly reduce suppressions starting from reference year 2016; the share of confidential ProdCom headings is projected to go down from 43% to 31%.

Netherlands sent a letter asking for a five year waiver to 1428 companies who were either behind suppressed data or new respondents. Of them 62% replied, and out of the replies 25% were positive: the share of confidential cells was reduced from 65% to below 60%. The gain was smaller than in the other pilots, but yet, after considering costs and benefits, also NL decided to continue with waivers.

It seems fair to conclude that these pilot undertakings demonstrate the high potential of asking for waivers from a large number of companies, improving significantly the availability of data for the users.

In what ways can one ask for a waiver?

First let’s distinguish two different ways of deploying waivers: systematic or targeted. The preceding examples are ‘systematic’ in that they aim to cover a broad base of companies, resulting in maximal overall reductions in confidentiality suppressions. Waivers can also be asked in a targeted way, singling out just a few companies whose significance in the statistics is very high. This is already done in several countries, often in the form of ‘contracts’ with the companies in question. Targeted waivers could be asked for also after the data collection, whereas systematic waivers need to be linked to the data collection (as part of the survey) or acquired separately beforehand. From a European perspective, it is enough if the consent is indicated simply by ticking a box in the survey questionnaire. Also separate letters/forms/contracts can be used depending on what is deemed to be necessary to comply with national legal requirements and practices.

It is useful to offer also explanations to the companies as to what the consent means, why it is asked for and possibly how improved publication of data may lead to better market information for the companies themselves. Referring to the Hungarian online questionnaire, the consent statement was kept short, but a longer version was shown in the related help window, assuming people refuse easier if the statement is too long. It was not possible to submit the questionnaire without answering the consent question. And in case of NO answer, the respondent would still be shown a window asking them to reconsider, presenting some arguments to that end.

What information should the waiver contain?

Referring to the preceding paragraphs, there is quite some diversity in country practices. Consider the following:

- Business name and ID

- Scope of the waiver: which data can be published

- Where that data can be published: NSI’s own publications and international data needs

- The source of the data covered by the waiver (e.g. survey, administrative data etc.)

- Validity period of the consent: until withdrawn?

How does one take waivers into consideration in determining which cells should be suppressed?

In the following we refer to the cells that would be suppressed according to normal rules in place. It is assumed that the contributions of the companies (also those with a waiver) are not publicly known.

- If using frequency rule (confidential if too few contributors, usually 2): the cell can be published if all have given a waiver.

- If using (1,k) rule (one contributor dominating the cell total T with a share >k%): the cell can be published if the biggest contributor has given a waiver.

- If using (2,k) rule (two companies together are >k% of cell total T): the cell can be published if the two biggest contributors A and B have given a waiver. If A has not given a waiver, the cell stays confidential. If A has given the waiver but B not, the rule needs to be modified to avoid overprotection. Consider that with a (2, 90) rule, out of the total, A is 89% and B is 2%. It would make no sense to suppress the cell. Therefore, an additional test is needed: Being the biggest contributor, A is best placed to estimate the value of B, knowing the subtotal T-A by deducting its own contribution. Therefore if B dominates that subtotal, the cell should not be published. The logic is similar to (1,k) rule, for which a common value is k=85. However, the value to be used is logically linked to the value of k used in the applied (2,k) rule. To maintain the same level of protection as offered by the rule to the biggest contributor, the value should usually be set 5-10% lower.[1] E.g. with (2,90) rule, the cell could be published if the share of B is below 85% of the subtotal which excludes the contribution of its competitor A: B<0.85(T-A). The waivers of smaller contributors C, D and E don’t matter.

- If using p% rule: the cell can be published if the two biggest contributors A and B have given a waiver. In case only A has given a waiver, the cell cannot be published if A can estimate B within a margin of p%. This is reflected by the formula: (T-A-B)<(p/100)B. The waivers of smaller contributors C, D and E don’t matter.

When applying secondary confidentiality and selecting the counterpart cells to be suppressed, it might be preferable to try to avoid selecting cells that have been ‘unlocked’ with waivers. The motivation of the dominant company having given a waiver may have exactly been to see the total of ‘its’ data cell.

How can one incorporate waivers into the data production system?

When the number of waivers is considerable, it is useful to automate their handling in the data production system. Some countries have already developed their own programs for taking the waivers into consideration when implementing confidentiality suppressions. Currently tau-Argus does not have the capability to incorporate waivers in an automated way, but that may be developed in future versions. However, it allows setting cells ‘safe’ manually.

Statistics Sweden has developed a SAS program that works with tau-Argus (SAS2Argus) and includes treatment of waivers (using p%). It is available at:

Note that although waivers complicate implementing primary confidentiality suppressions, at the same time they facilitate secondary confidentiality by reducing it, increasing data availability for the users further.

If the consent is given in the questionnaire as part of the normal data collection round, it naturally follows the data in the production process and does not need to be stored separately anywhere. Usually in that case the consent would apply only to the data given in that data collection, not to future reference periods. The situation is different when the consent is given for a longer period of time, or for multiple statistical domains. These waivers need to be registered somewhere, in some kind of database if their number is high.

Eurostat1WGM April2017/EN

[1] With (2,90) rule when the cell is suppressed, the share of A in T minus B can never be smaller than 81,8%. Using this value would guarantee exactly the same protection in all cases. Consider situation T=100, A=45, B=45. 45/55=81,8%. With (2,85) that value would be 73,9%.