Statistical analysis of Clean Energy Regulator inspection sample for non-compliant SGUs

Updated analysis including new samples

Prepared by: Data Statistics Information Consulting Pty Ltd, June 2015

Table of Contents

Introduction and Executive Summary

Population Characteristics

Sample Structure

Geography

Unit Size

Inspector Variation

Timing

Type of Fault

Introduction and Executive Summary

The following report provides an updatedand expanded analysis of the outcomes of the investigation of the information collected from SGUs samples conducted by the Clean Energy Regulator from mid-2010to the end of 2014. This report focuses on the issues of extending the fundamental estimates from the initial data analysis performed in September of 2012 and updated in August 2013as well as investigating whether there are any significant trends in these estimates over the time period of the samples. As in the original analyses, the final overall estimates are appropriately adjusted for potential bias associated with the fact that the sampling scheme was not a simple random sample, but instead focussed on those SGU installations which fell into an area where inspection was available. In addition, an adjustment is made to account for the potential of variability in the assessment propensities of the various inspectors.

Overall, a sample of 12,523 SGUs was analysed, with the following primary outcomes:

  1. There were 489 SGU installations deemed to be “unsafe”, which leads to an overall raw rate of 3.9%. This value is lower than the previous raw rate of 4.3% seen in each of the two previous reports. Indeed, for the 4,954 sampled SGUs which were installed after April, 2012 (the cut-off for installations analysed in the previous reports) the raw rate of “unsafe” installations was only 3.5%. This represents a statistically significant (p = 0.0249) downward trend in the raw “unsafe” installation rate.
  2. There were 1,907 SGU installations deemed to be “sub-standard”, which leads to an overall raw rate of 15.2%, which constitutes a decrease from the 16.8% value from the previous report (which was itself a notable decrease on the original analysis value of 21%). Indeed, only 14.5% of installations in the sampled SGUs installed after April, 2012were deemed to be “sub-standard”. This represents a highly statistically significant (p < 0.0001) downward trend in the raw rate of “sub-standard” installations.
  3. The overall distribution of sampled SGUs is now much more in line with the structure of the population, and this is a direct result of the new sampled SGUs tending to target areas where there was a previous under-representation.
  4. Using post-stratification techniques to adjust for the bias in these raw rates, the overall estimate of the proportion of unsafe installations is 3.4% with a standard error of 0.3%; and,
  5. The bias adjusted estimate of the overall proportion of sub-standard installations is 14.6% with a standard error of 1.1%.
  6. These adjusted estimates correct for observed differences in geographic composition of the sampled SGUs from the population of all SGUs and also account for notable variation in the propensity to assess installations as either “unsafe” or “sub-standard” among the 79 inspectors.
  7. There is no significant difference in “unsafe” and “sub-standard” installation rates across different rated output categories.
  8. There is a clear and statistically significant downward trend in the rates of both “unsafe” and “sub-standard” installations. The rate of “unsafe” installations is decreasing at an estimated rate of 0.9% per month (or 10.1% per annum). By contrast, the rate of “sub-standard” installations made a sharp increase in the latter part of 2012, concomitant with an assessment protocol change around that time, and has since been decreasing at an estimated rate of 3.3% per month (or 33.3% per annum).Further, there was no statistically significant evidence to suggest different time trends in “unsafe” or “sub-standard” installation rates across geographic areas.
  9. The variability in inspector propensity to assess installations as either “unsafe” or “sub-standard” appears to have diminished for latter inspections. Further, there appears to be a change in the reasons leading to “unsafe” and “sub-standard” assessments over time. While small numbers mean that statistically significant trends are not apparent, there are still some notable patterns which are detailed in the final section of the report.
  10. In breaking down the overall levels of “unsafe”, the predominant cause of “unsafe” installations (approximately 80%) is due to DC isolator enclosure water ingress issues. Further breakdowns are provided at Tables 7 and 8.
  11. In breaking down the overall levels of “sub-standard”, the main causes of “sub-standard” installations are due to issues with the DC isolator wiring (approximately 60%) and other wiring issues(approximately 23%).Further breakdowns are provided at Tables 9 and 10.

Population Characteristics

The full population of SGUs installed numbered 1,340,158 (this value covers the time period from mid-April 2001 until late-December 2014). Of course, installations were far from uniform over that period, with growth in installation numbers increasing exponentially from the start of 2006 until the middle of 2011, as indicated in Figure 1. Installations increased by approximately 11.8% per month during this growth period. From the middle of 2011, following a sharp fall off of 67.7% in July 2011 (presumably linked to changes in theSolar Credits Multiplier), installation numbers have beengenerallydecreasing (about 1.3% per month, on average).

Figure 1: Monthly breakdown of SGU Installations

The time period for sampled SGUs ranges from the start of July 2010 through the end of November 2013. During this period 1,002,841 SGUs were installed. This represents 74.8% of the full population and these SGUs form the relevant population against which the sampled SGU analysis is compared and adjusted. As Figure 2 indicates, installation numbers by geographyappear consistent through the relevant time period, particularly with regard to remoteness, which was determined by cross-correlating the postcodes of installation addresses with the ABS 2006 Remote Area Index.

Figure 2: Quarterly breakdown of SGU Installations by:

(a) by State/Territory (b) by ABS Remoteness Index

For completeness, Table 1 shows the installation numbers in the population across the relevant sampling timeframe (i.e., July 1, 2010 to November 30, 2013) broken down by states and territories. Furthermore, by cross-correlating the postcodes of the installation addresses with the ABS 2006 Remote Area Index, Table 1 also breaks down the population by five levels of remoteness: Major Cities, Inner Regional, Outer Regional, Remote and Very Remote.

Table 1: Breakdown of SGU Population by State and Remote Area Index

Remoteness Index / State or Territory
ACT / NSW / NT / QLD / SA / TAS / VIC / WA
Major Cities / 11,737 / 122,954 / - / 199,190 / 95,770 / - / 105,578 / 90,508
Inner Regional / 25 / 67,651 / - / 79,449 / 21,291 / 10,031 / 53,527 / 23,168
Outer Regional / - / 20,155 / 1,286 / 39,324 / 18,975 / 5,771 / 13,105 / 8,729
Remote / - / 1,626 / 704 / 2,918 / 4,640 / 250 / 225 / 1,823
Very Remote / - / 192 / 96 / 876 / 723 / 98 / - / 446

As in the original samples investigated, the most notable source of potential bias exists in the form of a difference in structure between the population and the sampling frame. This bias will again be corrected forthrough the process of post-stratification.However, as the post-stratification process requires a geographic breakdown of the installed SGUs in the population, we now note that, while the general installation numbers appear reasonably consistent from Figure 2, there has been a notable shift in the geographic distribution of SGUs installed between the two original sample timeframes and the full updated sample timeframe. Table 2a indicates the state-by-state installation pattern for the SGUs installed in the population during the three separate sections of the sampling timeframe (corresponding to the original two analyses and the current analysis, respectively).

Table 2a: Proportion of SGU Population by State for Original and New Timeframes

Timeframe / State or Territory
ACT / NSW / NT / QLD / SA / TAS / VIC / WA
Original (7/2010 – 8/2011) / 1.8% / 30.0% / 0.1% / 26.1% / 12.9% / 0.6% / 15.6% / 12.9%
Update (9/2011 – 4/2012) / 0.5% / 11.5% / 0.1% / 31.8% / 21.1% / 1.1% / 19.1% / 14.8%
Current (5/2012 – 11/2013) / 0.8% / 16.6% / 0.3% / 37.6% / 12.8% / 2.7% / 17.9% / 11.2%
Overall / 1.2% / 21.2% / 0.2% / 32.1% / 14.1% / 1.6% / 17.2% / 12.4%

Table 2b: Proportion of SGU Population by Remoteness for Original and New Timeframes

Timeframe / Remoteness
Major Cities / Inner Regional / Outer Regional / Remote / Very Remote
Original (7/2010 – 8/2011) / 65.5% / 24.8% / 8.3% / 1.2% / 0.2%
Update (9/2011 – 4/2012) / 63.2% / 24.2% / 10.8% / 1.4% / 0.3%
Current (5/2012 – 11/2013) / 59.3% / 26.4% / 12.8% / 1.2% / 0.3%
Overall / 62.4% / 25.4% / 10.7% / 1.2% / 0.2%

Clearly, installations in New South Wales have diminished substantially relative to those in the other large states (i.e., Queensland, South Australia, Victoria and Western Australia) over the sampling timeframe, with Queensland showing the largest sustained increase in installation numbers. However, Table 2b indicates that the installation pattern with regard to remoteness has changed very little, with only a mild shift of Major City installations into the Regional areas (primarily Outer Regional).

Table 2b also shows that the number of SGUs installed in the Remote and Very Remote regions of the country account for only 1.4% of the population. Furthermore, in the sample data, of the 12,523 inspected SGUs, only 92 (0.7%) were from either remote or very remote locations (very slightly up from the 0.6% value seen in the original analyses). For this reason, in the analysis of the sampled SGUs, the remote installations will once again be grouped together with those from the Outer Regional areas, to ensure sufficient statistical reliability. As noted in the previous report, this re-categorisation entails an intrinsic potential for bias if the installation characteristics of the remote SGUs differ from those in the Outer Regional areas. Fortunately, the maximum size of this potential bias is kept extremely small by the fact that so few SGUs have been installed in these areas to date.

By contrast to the small shift through time in remoteness breakdown of installation numbers, Figure 3 on the following page clearly demonstrates that there was a notable shift during the sampling timeframe in the type of SGUs installed, with low rated output units (in kW) constituting the vast majority of installations prior to 2010 and then a steady shift toward higher rated output units during the sampling period. As such, investigation of the relative likelihood of “unsafe” and “sub-standard” installations by level of rated output will need to be undertaken later in this report to assess whether there are differential rates of “unsafe” and “sub-standard” installations within the sample. If so, there will need to be appropriate adjustment to the overall estimates. Furthermore, as the shift in population structure is clearly temporal, if there are different rates of “unsafe” and “sub-standard” installations by rated output, then any time trends assessed will need to account for the change in the output of SGUs throughout the sampling timeframe as well.

Figure 3: Quarterly breakdown of SGU Installations by Rated Output:

Sample Structure

Anupdated sample of 12,523 inspected SGUs was provided, which included the original samples of 3,058 and 3,745 inspected SGUs. Detailed breakdowns of these SGUs by state and remoteness, as well as other characteristics, are provided in the following sections. Overall, there were 489 (3.9%) installations deemed as unsafe by inspectors and 1,907 (15.2%) installations deemed to be sub-standard. However, as in the original analyses, these raw rates are likely to be biased and require appropriate adjustment to account for the details of the sampling procedure.

The sampling procedure indicates that samples selected were restricted by various timeframe and postcode constraints. As noted above, this gives rise to a sampling frame which differs from the true population (i.e., there are some SGUs which have no possibility of being sampled). Without proper adjustment, such a difference between sampling frame and population may lead to biased estimates. However, with proper adjustment and minimal structural assumptions, appropriate adjustment can be made to arrive at unbiased estimates.

However, within the given constraints of postcode and timeframe, SGUs were sampled at random for inspection, and thus should be representative of the timeframe and geographic location from which they arose. Nevertheless, it must be noted that each sampled SGU for inspection was subject to the consent of the installation owner. As such, there is the potential for a “response bias” if those owners that either refused to or were unavailable to provide consent to inspection were substantively different than the owners of the SGUs which were actually sampled. In the current circumstances, it seems unlikely that owner consent or otherwise would be linked to the likelihood of unsafe or substandard installation, and thus it is deemed that there is minimal risk of a response bias. Finally, there was no information provided as to the extent of the number of sampled SGUs which could not be inspected.

Geography

Clearly, the sampling procedure leaves open the possibility of notable differences in the geographic breakdown of the sampling frame and the actual population of SGUs. Table 3 provides the breakdown of the 12,523 sampled SGUs by state and remoteness area. In addition, Table 3 provides the proportion of each cell which is comprised of newly sampled SGUs (i.e., SGUs which were not in either of the two original collections of SGUs analysed).

Table 3: Sampled SGUs by State and Remote Area Index (value in parentheses is proportion not in the original samples)

Remoteness Index / State or Territory
ACT / NSW / NT / QLD / SA / TAS / VIC / WA
Major Cities / 143 (22.4%) / 1,826
(13.9%) / - / 2,526
(43.1%) / 1,382
(34.9%) / - / 1,622
(33.1%) / 1,186
(39.3%)
Inner Regional / - / 701
(33.5%) / - / 863
(41.9%) / 262
(45.0%) / 98
(40.8%) / 522
(46.2%) / 358
(32.1%)
Outer Regional / - / 138
(53.6%) / 12
(25.0%) / 381
(41.2%) / 193
(36.8%) / 45
(46.7%) / 111
(46.8%) / 62
(43.5%)
Remote / - / 7
(42.9%) / 9
(0%) / 19
(52.6%) / 31
(38.7%) / 1
(100%) / - / 7
(100%)
Very Remote / - / - / 1
(100%) / 5
(60%) / 7
(57.1%) / - / - / 5
(100%)

In the previous samples, there were some notable differences between the geographic distribution of the sampled installations and that of the population. These differences will still need to be adjusted for, however, while there remain “small sample” issues in various regions, the overall distribution of sampled SGUs is now much more in line with the structure of the population, and this is a direct result of the new sampled SGUs tending to target areas where there was a previous under-representation.

Of course, the shift in target areas for sampled SGUs may potentially be the reason that apparent “unsafe” and “sub-standard” rates have declined, and care will need to be taken in appropriately adjusting rates to ensure that final estimates are not biased. As before, though, these discrepancies in geography only give rise to an actual bias in the estimation procedure if the rate of “unsafe” or “sub-standard” installations differs by state or remoteness.

Table 4 provides the observed breakdown of the number and proportion of unsafe or substandard installations in the sample by state and remoteness. Note that, as mentioned above, given the small number of SGUs sampled from remote and very remote locations, these categories have been amalgamated with the Outer Regional category. This re-categorisation has the benefit of creating statistical stability; however, the validity of any estimates based on this re-categorisation presupposes that the rates of unsafe or substandard installations in remote areas are similar to those in the outer regional areas of the corresponding state.

Table 4: Breakdown of Sampled SGUs by State and Remote Area Index which were deemed Unsafe or Substandard

(a) Unsafe

Remoteness Index / State or Territory
ACT / NSW / NT / QLD / SA / TAS / VIC / WA
Major Cities / 9
(6.3%) / 67
(3.7%) / - / 74
(2.9%) / 37
(2.7%) / - / 89
(5.5%) / 61
(5.1%)
Inner Regional / - / 26
(3.7%) / - / 31
(3.6%) / 4
(1.5%) / 8
(8.2%) / 24
(4.6%) / 21
(5.9%)
Outer Regional & Remote / - / 8
(5.5%) / 3
(13.6%) / 11
(2.7%) / 4
(1.7%) / 5
(10.9%) / 4
(3.6%) / 3
(4.1%)

(b) Substandard

Remoteness Index / State or Territory
ACT / NSW / NT / QLD / SA / TAS / VIC / WA
Major Cities / 11
(7.7%) / 378
(20.7%) / - / 373
(14.8%) / 240
(17.4%) / - / 157
(9.7%) / 188
(15.9%)
Inner Regional / - / 100
(14.3%) / - / 102
(11.8%) / 37
(14.1%) / 10
(10.2%) / 61
(11.7%) / 81
(22.6%)
Outer Regional & Remote / - / 18
(12.4%) / 7
(31.9%) / 67
(16.5%) / 44
(19.0%) / 7
(15.2%) / 18
(16.2%) / 8
(10.8%)

Small sample issues are clearly a problem (e.g., note that 3 unsafe installations were found among the sampled SGUs in the Northern Territory, leading to an observed rate of 13.6%; however, as there were only 22 sampled SGUs from the Northern Territory, if just one fewer unsafe installation had been found, the observed rate would have been 9.1%, while if only one more had been found the observed rate would have jumped all the way to 18.2%). Nevertheless there are clear differences in rates across states and levels of remoteness. In order to adequately deal with these issues, and ultimately to adjust for differences in the geographic composition of the sample from the population, we fit a logistic regression model to capture the fundamental relationship between state and remoteness and proportion of unsafe and substandard installations. The results of this logistic regression model then may be used to construct post-stratification adjusted estimates of the rates of unsafe or substandard installations. In order to account as completely as possible for the observed pattern of unsafe or substandard installations, we choose a logistic regression model with main effect terms for both state and remoteness level, as well as a number of interactive effects, to account for the potentially different relationship between rates of unsafe or sub-standard installations and remoteness level within individual states. However, as the Australian Capital Territory, Northern Territory, South Australia and Tasmania sub-samples contain very small numbers, we ignore any potential interactive effects here as practically inestimable. While this makes the validity of our adjusted estimates require the supposition of similar structure in these states and territories, this is not a large practical issue as the overall population does not contain a high proportion of SGUs from these areas and thus any mis-estimation in these areas will have a minimal effect on the overall estimated rates.

Using the above model to adjust for the geographic effects of sampling frame bias, yields estimates of unsafe and sub-standard installation proportions of 3.9% and 15.0%, respectively. Note that the adjusted estimates are nearly identical to the raw estimates (unlike the case in the previous analyses), which is in large part a reflection of the fact that the geographic structure of the sample is now in quite reasonable alignment with that of the population over the relevant time period. Further, both of these estimates are notably lower than their counterparts from the original investigation.

Unit Size

As noted previously, there was a clear temporal shift towards units with higher rated outputs. If rates of “unsafe” and/or “sub-standard” installations vary across output levels, then there will be a need to adjust for this effect in the final overall “unsafe” and “sub-standard” rate estimates.