A KPI discussion and some examples

When designing hardware or software we investigate certain features of the object. We most likely investigate the electronic as well as the mechanical features of the hardware. Likewise we investigate the accuracy, reliability, etc of the software. Surely we need to do the same thing when we design a new KPI (Key Performance Indicator)?

Example 1. Some time ago we were involved in the measurements of a process of data transfer. The main variable was the number of interruptions per time unit. Such a variable is usually best described as a Poisson process (other examples of Poisson processes: number of telephone calls per hour, number of customers per day, number of particles per cubic foot, etc.)

Earlier the KPI was the mean number of interruptions calculated over some time intervals. However, as the process was very unstable this was exposed by the mean value. These results were published on the web and were of course embarrassing. Therefore there was a decision to change KPI to the median value.

The median is the middle of a number of sorted numerical values and is a more robust parameter. This means that if some time intervals contained a lot of interruptions this did not influence the median and thus bad quality was suppressed in the data and not shown.

If we look at the features of this KPI, we see some interesting things. The median value, and the minimum value and the maximum value, are so-called order statistics. (A 'statistic' is something that is calculated from the data: e.g. mean, median, min, max, variation, etc.) 'Order statistics' have a larger natural variation compared to the mean value. A simple explanation is that the mean is calculated using all the data; this is not the case with the order statistics. A KPI with large natural variation is of course less reliable.

/ The diagram has two curves; one indicating the average of the means and one the average of the medians. The intensity in a Poisson process is well monitored using the average. (This is clearly visible on the diagram. Draw a line from an X-value up to the plus sign and then over to the Y-axis where it will hit the chosen value.)
The median seems to be biased (i.e. it does not correspond to the fault intensity) and at small fault intensities, say below 0.3, it is practically zero. This is also reflected in the low standard deviation of diagram 2 for the same region of the X-axis.
/ Diagram 2 has two curves; one indicating the standard deviation of mean values and one showing the standard deviation of the medians. The X-axis shows different values of lambda (number of interruptions per time unit).
The variation of the median is almost everywhere larger than the variation of the mean. There is also a peculiar behaviour of the median at lambda in the range of, say, 0.4 – 0.8 with a local increase in the variation.
At the lower end of the scale the median of the Poisson process is practically zero, which of course gives less variation. But also an extremely insensitive KPI as the fault intensity might increase or decrease without giving any impact on the KPI!

Example 2. Some years ago a factory invented a KPI consisting of three parts. One part was a quality measure (fault rate or similar), one part was number of unplanned stops and one part was some utilisation measure of the process.

The KPI was intended to be a part of the system for wages to the staff and the main idea was that a decrease in e.g. fault rate would show a better value of the KPI. Each one of the terms in the KPI of course had its own natural variation, i.e. although the process might be stable there was a natural variation.

Using the theory of linear combinations of variables the variation (the variance) is additive and the natural variation of the KPI was inflated. One participant of a course simulated the whole process and it turned out that the natural variation of the KPI was larger that expected (if anything was expected at all). The KPI was therefore abandoned.

Simulation of a KPI is a good way to get a grip about the natural inherent variation. In such a way it is possible to, at least approximately, set two limits within which we regard the change as random only and no action will be taken.■

Ing-Stat – statistics for the industry

–1 –