Fitting Data to a Weibull Distribution
We will use SAS to do three things:
1)Construct a Weibull quantile-quantile plot for the Theft Claims data set. If the data are well-modeled by a Weibull distribution, the points on the plot should lie close to a straight line.
2)Estimate the parameters for a Weibull distribution, using the Theft Claims data. This will provide the estimated Weibull distribution which should provide the best fit to the data.
3)Do tests of fit, specifically the Anderson-Darling test of fit. The null hypothesis is that the data set is well-modeled by a Weibull distribution; the alternative hypothesis is that the data set is not well-modeled by a Weibull distribution.
The SAS program is shown below, followed by the output.
data one;
input claim;
datalines;
3
11
27
36
.
.
.
.
8316
11453
22274
32043
;
procunivariatedata=one;
var claim;
qqplot claim / weibull2;
title"Distribution of Theft Claim Amounts";
title2"Weibull Quantile-Quantile Plot";
;
odsselectParameterEstimatesGoodnessOfFitFitQuantilesMyHist;
;
procunivariatedata=one;
var claim;
histogram / midpoints=3.to32043.by2912.727273
weibull
vaxis = axis1
name = 'MyHist';
insetnmean(5.3) std='Std Dev'(5.3) skewness(5.3)
/ pos = ne header = 'Summary Statistics';
axis1label = (a=90r=0);
title"Distribution of Theft Claim Amounts";
title2"Tests of Fit to a Weibull Distribution";
;
run;
First, the quantile-quantile plot:
The points seem to lie along a straight line. Perhaps a Weibull distribution would provide a good fit to the data.
The histogram of the data:
The SAS output for the tests of fit:
Distribution of Theft Claim Amounts
Tests of Fit to a Weibull Distribution
The UNIVARIATE Procedure
Variable: claim
Moments
N 120 Sum Weights 120
Mean 2020.29167 Sum Observations 242435
Std Deviation 3949.85736 Variance 15601373.2
Skewness 5.16230336 Kurtosis 33.0954403
Uncorrected SS 2346352817 Corrected SS 1856563407
Coeff Variation 195.509264 Std Error Mean 360.570996
Basic Statistical Measures
Location Variability
Mean 2020.292 Std Deviation 3950
Median 868.500 Variance 15601373
Mode 656.000 Range 32040
Interquartile Range 1477
Note: The mode displayed is the smallest of 2 modes with a count of 2.
Tests for Location: Mu0=0
Test -Statistic------p Value------
Student's t t 5.603034 Pr > |t| <.0001
Sign M 60 Pr >= |M| <.0001
Signed Rank S 3630 Pr >= |S| <.0001
Quantiles (Definition 5)
Quantile Estimate
100% Max 32043.0
99% 22274.0
95% 7770.5
90% 5176.0
75% Q3 1746.0
50% Median 868.5
25% Q1 269.0
10% 125.5
5% 51.5
1% 11.0
0% Min 3.0
Distribution of Theft Claim Amounts
Tests of Fit to a Weibull Distribution
The UNIVARIATE Procedure
Variable: claim
Extreme Observations
----Lowest------Highest----
Value Obs Value Obs
3 1 8079 116
11 2 8316 117
27 3 11453 118
36 4 22274 119
47 5 32043 120
Distribution of Theft Claim Amounts
Tests of Fit to a Weibull Distribution
The UNIVARIATE Procedure
Fitted Weibull Distribution for claim
Parameters for Weibull Distribution
Parameter Symbol Estimate
Threshold Theta 0
Scale Sigma 1557.191
Shape C 0.715735
Mean 1930.722
Std Dev 2752.813
Goodness-of-Fit Tests for Weibull Distribution
Test ----Statistic------p Value------
Cramer-von Mises W-Sq 0.21278547 Pr > W-Sq <0.010
Anderson-Darling A-Sq 1.27473110 Pr > A-Sq <0.010
Quantiles for Weibull Distribution
------Quantile------
Percent Observed Estimated
1.0 11.0000 2.51801
5.0 51.5000 24.55178
10.0 125.5000 67.12141
25.0 269.0000 273.11995
50.0 868.5000 933.14389
75.0 1746.0000 2457.74795
90.0 5176.0000 4993.63801
95.0 7770.5000 7212.65806
99.0 22274.0000 13152.42347
The test of fit, using the Anderson-Darling test statistic (we will use α = 0.05):
H0: The Theft Claims data were sampled from a Weibull distribution.,
Ha: The Theft Claims data were not sampled from a Weibull distribution.
n = 120, α = 0.05.
The test statistic that we will use is the Anderson-Darling statistic. This statistic is often appropriate in actuarial situations because it gives more weight to the tail of the distribution.
From the output, we have and p-value < 0.010.
We reject H0 at the 0.05 level of significance. There is sufficient evidence to conclude that the data were not sampled from a Weibull distribution.