The Economic Information Quantity (EIQ) of Uncertain Variables in a Cost/Benefit Analysis

The Economic Information Quantity (EIQ) of Uncertain Variables in a Cost/Benefit Analysis

The Economic Information Quantity (EIQ) of Uncertain Variables in a Cost/Benefit Analysis

February 29, 1996

Douglas W. Hubbard
The Economic Information Quantity (EIQ) of Uncertain Variables in a Cost/Benefit Analysis

Abstract

We present a systematic approach for calculating the economically optimal amount of information about uncertain quantities for many situations. If we use a formal definition of a quantity of information (Shannon) then existing methods from decision theory allow us to calculate the value of information in a given situation. If we generalize this approach we can find an expected value of information (EVI) function for a given decision. Likewise, the expected cost of information (ECI) function can be generalized for a given information gathering method. We find the economic information quantity (EIQ), where marginal EVI = marginal ECI. This calculation should prove to be useful in a large number of practical situations. We will show EIQ calculation to be especially useful in situations where we must make a typical information technology (IT) investment decision based on a cost/benefit analysis (CBA) where the variables used in the CBA have uncertain values.

Introduction

Making decisions with imperfect knowledge of several relevant factors is a reality of the decision maker in business. The decision maker must be able to act rationally even under situations of apparently extreme uncertainty. This predicament is best formulated in decision theory where the optimal choice for a decision maker under conditions of uncertainty is a well-developed body of knowledge. Decision theory also defines the basic method for calculation the value of a given amount of information (which, by definition, reduces uncertainty).

We will build on this work to develop a practical method for decision makers which estimates the economically optimal amount of information in a given decision. The paper will do this in the following sections:

1) An overview of information theory and an information quantity. The object of this section is to review how information theory defines a quantity of information, identify the practical implications of information theory for the decision maker and introduce some notation to be used in this paper.

2) An overview of decision theoretic approach to the value information value. Here we will review the key concepts of the economic value of information and introduce the notation we will use.

3) Calculating the expected cost of information. In this section we will give a general solution for the cost of information. A special solution for binomial sampling will be worked out in detail.

4) Overview of the economic information quantity calculation. Here we will show what the scope of the EIQ calculation is and what approach must be taken to calculate it for various situations.

5) A case study of an EIQ calculation for uncertain variables in a cost/benefit analysis. We will show how the uncertainties that are typical in a cost/benefit analysis can be used to quantify both the risk and return for the proposed IT investment. We then build on this to compute how additional information about each variable will affect the decision. We will build on the previously worked out binomial sampling solution to explain how we found the EIQ for a specific variable.

In each section we will treat the problem in a finite element framework. This should turn out to be the most practical approach for decision makers since some of the special continuous solutions are too complicated to derive in many situations. Also, the computing power that is available to most decision makers today will make this approach easy to implement.

1. An Overview Of Information Theory And An Information Quantity

Among most decision makers in the business world the term “information” is a non-quantifiable and ambiguous concept. Some of the common definitions used in business are “Information is...”:

  • “...data in the right place at the right time” (An Information Engineering seminar)
  • “...data in a usable form” (An Information Engineering seminar)
  • “...any formal, structured data that is required to support a business and can be stored in or retrieved from a computer” (Martin 89)

These definitions don’t seem to necessarily agree and they don’t give us much to go on for treating information as a quantity that can have a calculable economic value. Fortunately, there is a much less ambiguous definition of information from information theory. Simply put, information theory considers information to be “a reduction in uncertainty” (Shannon 48). Uncertainty (synonymously, “Entropy”) and the change in uncertainty can be put in quantifiable terms in the following manner. Let’s consider the discrete situation where is a possible state with probability and:

  • and
  • where z is the number of discrete states .

The “entropy” of this situation is defined as:

(1.1)

After a measurement (or some other type of observation) the individual probability of each may change. We will call this p(j|r) or the probability of j given the measurement r. Then the adjusted level of entropy after taking this measurement (or receiving some other type of message) is:

(1.2)

The Shannon formulation of information quantity, written I, is the change in entropy due to this measurement which we write as:

(1.3)

The “expected” quantity of information is the average quantity of information weighted with respect to the probability of receiving each possible message. This can be written as:

(1.4)

Where:

  • is a result of an information gathering effort with i=1...k discrete possible results.
  • is the probability of given the result .

The possible results and the associated implications for the probability of each j can be express as a matrix  constructed as:

(1.5)

In any matrix , we have the following important relationships from Bayes theorem:

(1.6)

(1.7)

(1.8) {the general form of Bayes theorem}

Note: When we write p(j ) we are referring to the a priori probability of j, those probabilities that were assessed prior to engaging in a study about the value of . The probability p(j|ri) is the a posteriori probability of j, or the adjusted probability given the additional knowledge from a study that resulted in ri.

We can now describe information quantity in terms of “bits” of information for any discrete situation. Note that the is a real number, not an integer. This confuses many first-time observers of Shannon’s formula (especially those with a computer programming background). It makes perfect sense, however, when we consider that is a measure of uncertainty reduction (which is, of course, a continuous quantity) and not the number of digits in a binary number (which is an integer). The terms “message” and “data” (that which carried the information) will be considered synonyms. For our purposes, this is a much more precise and unambiguous definition of information, data and the relationship between them. Later we will see that this definition fits well into the decision theory problem.

While this definition of information should always hold in our discussions, it may not always be the most intuitive method of denoting a reduction in uncertainty. There are other equivalent or approximate units-of-measure that we can use instead of “bits of uncertainty reduction”. For example, in an observation that has a binary result, the information quantity can be shown by the equivalent type I and type II errors of the result or, conversely, the chance of being correct in either situation. We may also choose to represent an information quantity by the number of samples in a survey at a given level of expected confidence in the findings. We may choose these different units-of-measure for the measure of information but the basic concept that information is a measure of uncertainty reduction still holds.

2. An Overview Of The Decision Theoretic Approach To Information Value

The decision under uncertainty in a discrete situation has been constructed as a “one-player game against states of nature”. This means that there is a single decision maker facing multiple possible outcomes and each possible action has a different “payoff” depending on which of the uncertain outcomes actually happens. This can be described in what we will call a matrix:

(2.1)

Where:

  • dais one of l possible decisions where a=1...l.
  • Va,j is the payoff for da if state j comes to pass.

Considering all possible outcomes j, each has an expected value, written EV(da) calculated from:

(2.2)

The best decision, written , given the current state of uncertainty (no additional information is to be gathered) has an expected value given by:

(2.3)

Information gathering efforts meant to reduce the uncertainty about the best decision can have two or more results, , where the probabilities of can be changed according to a matrix  as shown in (1.5).

The expected value of information for general discrete decision matrixes is then:

(2.4)

A continuous form of EVI can be stated as:

(2.5)

To find the optimal decision set and solve for d.

Of course, continuous decision problems can be approximated with a finite element approach using the discrete formulations with sufficiently large numbers of r and . Since the finite element approach will suffice for practical decisions (and it will actually be solvable in more situations), we will continue to focus on the discrete approach.

To illustrate the usefulness of this approach, let’s consider the example of the decision a loan officer makes about whether or not to grant a loan to a loan applicant. If we make some simplifying assumptions, we can set up the decision in a 2x2  matrix as follows:

(2.6)

Let’s define G, D,p(G ), p(D ), a good applicant, an applicant that will default, the probability of an applicant being a good one and the probability of defaulting, respectively. Also, let’s write the decision to grant or deny the loan as dGand dD, respectively. Finally, we can write VG,G for the payoff from granting a good applicant a loan, VG,D for the payoff of granting a loan to an applicant that defaults, etc.

Let’s set values as follows:

p(G ) = 70%

p(D ) = 30%

VG,G = $5,000

VG,D = -$15,000

VD,G = 0

VD,D = 0

In other words, without additional information, we assign the probability of %70 to the state that an applicant will not default and %30 to the probability of defaulting. The payoff for granting this particular type of loan to an applicant that will pay it back is $5,000 (net profit from interest on the loan) and -$15,000 if an applicant who will default is granted this loan (average unrecovered amount for this type of loan).

With this information we can derive the special EVI formula from the general formula (2.4) with the following result:

(2.7) EVI=p(rG)max(p(G|rG)($5,000)+ p(D|rG)(-$15,000),0)

+ p(rD)max(p(G|rD)($5,000)+ p(D|rD)(-$15,000),0)

- max(p(G)($5,000)+ p(D)(-$15,000),0)

Add to this the constraint that probabilities of all possible outcomes must add up to 1 or p(X|Y)+ p(~X|Y)=1. With this and Bayes theorem (1.6) we can derive that:

(2.8)

Now, if we are given only p(G|rG) and p(D|rD) we can calculate value of the information for the decision described by . We have enough to construct a surface graph where the axes on the horizontal plane are p(G|rG) and p(D|rD). The vertical axis is shown to be the value of the information.

This definition of the value of information is consistent with Shannon’s formal definition of information quantity in an important way. Specifically, it can be shown that unless is positive the data will have no value regardless of the other characteristics of the decision matrix.

Consider equation (2.8) in the situations where p(G|rG) p(G) or p(D|rD) p(D). The constraint that is violated in either case. Furthermore, if p(G|rG)= p(G) then p(rG)=1 and if or p(D|rD)= p(D) then p(rG)=0. In both cases the EVI = 0. The reader can also verify that it must be the case that p(G|rG) p(G) and p(D|rD) p(D) for Shannon’s definition of expected information quantity (1.4) to be positive. Therefore, if EVI>0 then0.

3. Calculating The Expected Cost Of Information

The expected cost of information (ECI) can be calculated for a given information gathering approach. Perhaps the simplest cost calculation for an information gathering approach is an outside study offered at a fixed price at a fixed information quantity. The cost of the information is simply the purchase price. If the buyer of the information knows the matrix for the decision and the vendor of the report provides an matrix then the buyer can compute the EVI (it is easy to see how we can derive a 2x2 matrix for a report with binary results if we are given the type I and type II errors of the report). If the alternatives are defined as purchasing this report or not, then the report would be purchased if EVI>ECI and it would be declined if EVI<ECI.

Another information gathering approach, which is more complicated and probably more common is random sampling where setup and sampling costs are to be incurred by the decision maker. If the initial setup cost of a study were and the cost per sample were then the cost of sampling is +nwhere n is the number of samples taken. (Other nonlinear sampling cost function are possible but we will develop the simple linear function here.)

Once we know the cost of a sample size we need to derive the expected information from the sample size in order to calculate the ECI. It is practical to approximate the expected information quantity of n samples with a finite element approach. Take, for example, a binomial sampling. Let’s take j to be one of the possible percentages of a population that has certain characteristic, s as a number of successes from a sample size n, and rs as the result of getting s of n successes then the binomial distribution equation gives us:

(3.1)

Here we introduce the expanded notation p(rs|j,n) to denote the probability of getting a result of s successes given 1) a j probability of success and 2) a sample size n. Then if we take the general form of Bayes theorem (1.8) with this expanded notation then we have:

(3.2)

It is then a matter of substitution and finite element iteration to generate the probability of each result and the adjusted probability distribution over  associated with each result. If we desired, it would be a straightforward exercise to calculate Shannon’s expected information quantity from these results.

4. Overview Of The Economic Information Quantity Calculation

Of course, where EVI-ECI is maximized the information quantity is the most economical. In the case of the binary report purchased at a fixed price the value of the most economical amount of information (given only the alternatives to purchase or not to purchase) is max(EVI-ECI,0). The information quantity associated with the economically optimal choice in this situation is the economic information quantity (EIQ). This situation, however, is not quite as common as other situations for the average business decision maker.

A comprehensive formulation of the EIQ concept should address all of the following:

1) What is the optimal amount of information when there is a continuous or discrete choice of information quantity?

2) What is the optimal information gathering method?

3) Given a choice among different unknown variables to gather information about:

a. which is the most economical one to study?

b. is there a combination of variables I should study which has a higher payoff than any one variable by itself?

4) If we assume we can recalculate the EIQ at any point during a study and that we will take the most rational course of action at that point, how is the EIQ changed?

Continuous and Discrete

The choice of whether or not to purchase a prepared Gartner Group report has been shown to be fairly straight forward. But what is the optimal amount of information when there is a continuous choice of the information quantity or a very large number of discrete choices? For example, how accurate does a demand forecast need to be? How accurately do I need to estimate the development cost of a computer system? What are the economically optimal type I and type II errors in a bank’s decision of whether to grant loan (here the decision may be binary but the errors are continuous quantities)? In all these cases the EIQ must give an optimal value along a continuum or a “near” continuum with a large number of discrete choices (should I take 100 samples or 250, or 1000,...?).

We can show the situation with a large number of discrete choices to be very common and we can show that the continuous case can always be approximated as a large number of discrete choices. Additionally, we will, in reality, implement the calculation with a finite element method. Therefore, we will focus on developing the EIQ for situations of a large number of discrete levels of expected information quantity.

There are different types of probability distributions and different methods for gathering information. Each of these will have different implications for the EIQ calculation. The general procedure for a finite element solution can be written as follows:

1) Divide the possible states of  and the possible results r into discrete units 1...z and r1...rk. (a large number of discrete units is required for a good approximation of a continuous decision problem; a good rule of thumb would be to use at least 1,000; for the binomial sampling problem it is convenient to define an rs for each possible number of successes)

2) Assign the initial probabilities p(j). This is a simple step even for a large number of discrete states of  if we apply some standard probability distribution. (For example, if  has an expected value of 0.4 with a standard deviation of 0.1 then we can compute the probability of each possible j.)

3) Use (1.7) and the relevant function for p(ri|j) (use eq. 3.1 in the binomial situation) to calculate the probability p(ri) of each possible discrete result.

4) Solve for each p(j|rs,n) by applying (1.8), the findings of step 3 and the relevant probability distribution function for that kind of analysis.

5) Determine the EVI by applying (2.4) to the results of step 3 and 4.

6) Calculate the cost of each discrete information quantity ( a sample size n, for example).

7) Determine the EIQ by finding the information quantity where EVI-ECI is maximized.

When we finish executing this procedure we will have the most economical information quantity (or equivalent sample size, confidence, or type I and type II errors) for the decision.

As mentioned earlier we will show the case of the binomial sampling situation and we will leave it to the reader to derive the special solution for other information gathering methods.