Fitting Data to a Weibull Distribution

We will use SAS to do three things:

1)Construct a Weibull quantile-quantile plot for the Theft Claims data set. If the data are well-modeled by a Weibull distribution, the points on the plot should lie close to a straight line.

2)Estimate the parameters for a Weibull distribution, using the Theft Claims data. This will provide the estimated Weibull distribution which should provide the best fit to the data.

3)Do tests of fit, specifically the Anderson-Darling test of fit. The null hypothesis is that the data set is well-modeled by a Weibull distribution; the alternative hypothesis is that the data set is not well-modeled by a Weibull distribution.

The SAS program is shown below, followed by the output.

data one;

input claim;

datalines;

3

11

27

36

.

.

.

.

8316

11453

22274

32043

;

procunivariatedata=one;

var claim;

qqplot claim / weibull2;

title"Distribution of Theft Claim Amounts";

title2"Weibull Quantile-Quantile Plot";

;

odsselectParameterEstimatesGoodnessOfFitFitQuantilesMyHist;

;

procunivariatedata=one;

var claim;

histogram / midpoints=3.to32043.by2912.727273

weibull

vaxis = axis1

name = 'MyHist';

insetnmean(5.3) std='Std Dev'(5.3) skewness(5.3)

/ pos = ne header = 'Summary Statistics';

axis1label = (a=90r=0);

title"Distribution of Theft Claim Amounts";

title2"Tests of Fit to a Weibull Distribution";

;

run;

First, the quantile-quantile plot:

The points seem to lie along a straight line. Perhaps a Weibull distribution would provide a good fit to the data.

The histogram of the data:

The SAS output for the tests of fit:

Distribution of Theft Claim Amounts

Tests of Fit to a Weibull Distribution

The UNIVARIATE Procedure

Variable: claim

Moments

N 120 Sum Weights 120

Mean 2020.29167 Sum Observations 242435

Std Deviation 3949.85736 Variance 15601373.2

Skewness 5.16230336 Kurtosis 33.0954403

Uncorrected SS 2346352817 Corrected SS 1856563407

Coeff Variation 195.509264 Std Error Mean 360.570996

Basic Statistical Measures

Location Variability

Mean 2020.292 Std Deviation 3950

Median 868.500 Variance 15601373

Mode 656.000 Range 32040

Interquartile Range 1477

Note: The mode displayed is the smallest of 2 modes with a count of 2.

Tests for Location: Mu0=0

Test -Statistic------p Value------

Student's t t 5.603034 Pr > |t| <.0001

Sign M 60 Pr >= |M| <.0001

Signed Rank S 3630 Pr >= |S| <.0001

Quantiles (Definition 5)

Quantile Estimate

100% Max 32043.0

99% 22274.0

95% 7770.5

90% 5176.0

75% Q3 1746.0

50% Median 868.5

25% Q1 269.0

10% 125.5

5% 51.5

1% 11.0

0% Min 3.0

Distribution of Theft Claim Amounts

Tests of Fit to a Weibull Distribution

The UNIVARIATE Procedure

Variable: claim

Extreme Observations

----Lowest------Highest----

Value Obs Value Obs

3 1 8079 116

11 2 8316 117

27 3 11453 118

36 4 22274 119

47 5 32043 120

Distribution of Theft Claim Amounts

Tests of Fit to a Weibull Distribution

The UNIVARIATE Procedure

Fitted Weibull Distribution for claim

Parameters for Weibull Distribution

Parameter Symbol Estimate

Threshold Theta 0

Scale Sigma 1557.191

Shape C 0.715735

Mean 1930.722

Std Dev 2752.813

Goodness-of-Fit Tests for Weibull Distribution

Test ----Statistic------p Value------

Cramer-von Mises W-Sq 0.21278547 Pr > W-Sq <0.010

Anderson-Darling A-Sq 1.27473110 Pr > A-Sq <0.010

Quantiles for Weibull Distribution

------Quantile------

Percent Observed Estimated

1.0 11.0000 2.51801

5.0 51.5000 24.55178

10.0 125.5000 67.12141

25.0 269.0000 273.11995

50.0 868.5000 933.14389

75.0 1746.0000 2457.74795

90.0 5176.0000 4993.63801

95.0 7770.5000 7212.65806

99.0 22274.0000 13152.42347

The test of fit, using the Anderson-Darling test statistic (we will use α = 0.05):

H0: The Theft Claims data were sampled from a Weibull distribution.,

Ha: The Theft Claims data were not sampled from a Weibull distribution.

n = 120, α = 0.05.

The test statistic that we will use is the Anderson-Darling statistic. This statistic is often appropriate in actuarial situations because it gives more weight to the tail of the distribution.

From the output, we have and p-value < 0.010.

We reject H0 at the 0.05 level of significance. There is sufficient evidence to conclude that the data were not sampled from a Weibull distribution.