Fitting data to a distribution: Arena>tools>input analyzer >file>new, an input window will open with a new control bar, to file>data file>use existing (your data will most likely be in an excel file, save as a .txt (tab delineated) (read help in input analyzer if you get lost)) a histogram and other information will appear>fit>fit all, distribution summary will appear the higher the p value, the better
Save excel file as a text delimitated
After the data file has been loaded and displayed as a histogram in a data fit window, the next step is to fit a probability distribution function to the data. To do this, first select the Fit menu item. A drop-down menu displays all of the available distribution functions. Note that the Poisson distribution will be inactive unless the Input Analyzer detects all integer data.
Next select the desired probability distribution function. The Input Analyzer will then determine the parameters that will fit the distribution function to the data. As soon as the curve-fitting calculations are complete, the resulting probability density function is drawn on top of the histogram. (In the case of the empirical distribution, the cumulative distribution function is shown instead.) Information characterizing the curve-fit, including an expression that could be included in an Arena model, is shown in the bottom section of the window.
More detailed information, including tabulations of the probability densities (histogram and probability distribution function) and the corresponding cumulative distributions, is provided in a text file that is written to the default directory with the file name [distribution].out, where [distribution] is the name of the selected distribution function. For example, if the exponential distribution was chosen, the information would be written to a file called expon.out. In addition, a summary of all distributions fitted to your data file (e.g., myfile.dst) will be written to a summary file of the same name, with the extension .sum (e.g., myfile.sum). These text files may be viewed within the Input Analyzer by choosing the Window menu option and clicking on Curve Fit Summary. For more information on these functions, refer to the Viewing Tabular Data section.
The results of the Fit All calculations should be interpreted as guidelines rather than precise scientific calculations, since the relative rankings can be affected by the number of intervals within the histogram or choice of histogram end points. Thus, if two or more distribution functions show small square errors that are relatively close to each other, it is not clear that the function with the smallest square error is necessarily "the best." It often happens that multiple distribution functions offer satisfactory representations for a given data file, and the final choice might be determined by other factors, such as the results of the goodness-of-fit tests or the computational efficiency of the functions within Arena. On the other hand, the results of the Fit All calculations do allow you to distinguish clearly between those functions that fit the data well and those that do not.
Histogram command (Options, Parameters menu)
In the Options, Parameters menu, choosing the Histogram option will make a dialog appear that allows you to change the number of intervals, the lower bound (ignoring all data below this bound), and the upper bound (ignoring all data above this bound). The number of intervals must be at least 5 and not more than 40. In addition, the histogram lower bound must be greater than or equal to the largest integer that does not exceed the minimum data value in the file. The histogram upper bound, then, must be less than or equal to the smallest integer that equals or exceeds the maximum data value in the file.
If a histogram parameter is changed after a distribution function has been selected, a new curve-fit will automatically be carried out utilizing the new histogram parameters.
Distribution command (Options, Parameters menu)
The Distribution option only becomes active once a distribution function (other than Empirical) has been fitted to the data. If you choose this option, a dialog appears allowing you to change the distribution function parameters. Once a distribution function parameter has been changed, a new evaluation of the goodness of the distribution’s fit to the data will be performed.
p values over .05 means the null hypotheses that predicted values match actual values is up held and we do not reject that hypotheses. If the p value is .05 or less we reject the null and the modeled distribution does not fit the actual distribution. If the test statistic is large relative to the degrees of freedom, the p will be small. Think of the Z test, when Z is large, the probability you will reach that point is small. In hypotheses testing, we are asking what is the probability that the actual measure is within the same distribution as the null number; the further away that number is from the null hypotheses/number, the smaller the p that the measured number is in the same distribution.
An attractive feature of the Kolmogorov-Smironov test is that the distribution of the K-S test statistic itself does not depend on the underlying cumulative distribution function being tested. Another advantage is that it is an exact test (the chi-square goodness-of-fit test depends on an adequate sample size for the approximations to be valid). Despite these advantages, the K-S test has several important limitations:
- It only applies to continuous distributions.
- It tends to be more sensitive near the center of the distribution than at the tails.
- Perhaps the most serious limitation is that the distribution must be fully specified. That is, if location, scale, and shape parameters are estimated from the data, the critical region of the K-S test is no longer valid. It typically must be determined by simulation.
2/2010