Review of Basic Statistical Concepts

Review of MGT 2110

Ø Descriptive Statistics

Ø Probability distribution

Ø Estimation (Confidence interval)

Ø Inference (Hypothesis testing)

Descriptive Statistics

Ø Numerical measures

o Mean, Median, Mode

o Variance and standard deviation

o Percentiles

o Quartiles and Interquartile-Range

o Frequency distribution (use Frequency array-function)

Ø Graphical Presentations

o Histogram

o Scatter Diagram (for two columns of data)

Probability Distribution

Random Variable (RV): A numerical description of the outcome of an experiment.

Discrete RV: A random variable that can take a countable set of values. For instance, if an experiment consists of inspecting 10 laptops produced by a manufacturer, then a random variable X can be defined as the number of defective laptops in the lot. The possible values for X are any number from zero to 10.

Continuous RV: A random variable that can take an uncountable range of values. For instance, if an experiment consists of measuring the amount of toothpaste in a 6 oz. tube, then a random variable X can be defined as the amount of toothpaste in a tube. The possible values for X could be any value between 5.8 oz. To 6.2 oz. The values within the range are not countable.

Probability Distribution: A description of how the probabilities are distributed over the values the random variable can assume. Probability distribution for a discrete RV is called a discrete probability distribution. Probability distribution for a continuous RV is called a continuous probability distribution.

Continuous probability distribution:

Normal Probability Distribution: A continuous probability distribution. The normal distribution is a symmetrical distribution with a mean, , and a standard deviation, .

Example

A department store has determined that its customers charge an average of $500 per month, with a standard deviation of $80. Assume the amounts of charges are normally distributed.

a. What percentage of customers charges less than $340 per month?

b. What percentage of customers charges more than $380 per month?

c. What percentage of customers charges between $644 and $700 per month?

d. What is least dollar amount of the top 10% of customer charges?

e. What are the minimum and maximum of the middle 95% of customer charges?

Four Excel functions for answering the above questions

To find probabilities using normal distribution:
=NORM.S.DIST(z,1) / z must first be calculated before using this function.
Returns cumulative probability
=NORM.DIST(X,m,s,1) / Returns cumulative probability for X
To find value of X, given normal probability:
=NORM.S.INV(probability) / Returns the Normal table value of z
Then, X may be computed using X = m + zs
=NORM.INV(Probability,m,s) / Returns the value of X for the given cumulative probability

Estimation (Confidence Interval)

Confidence Interval for population mean (m)

Assume a simple random sample of size n

Point Estimation:

Sample Statistic

Population Parameter

Size / n /

N

Mean /  / m
Standard deviation / S / s

Confidence Interval =  ± SE

SE = Sampling Error = (Always use t, use Z only if s is known)

Then, Confidence interval for

Two methods for calculating confidence interval

Method A – Using Excel TINV function

Step 1 / Find t-table value using the Excel function
=T.INV.2T(a,df) / a = 1 – Confidence level
df = degrees of freedom
Step 2 / Determine the sampling error (SE) / SE = ta/2 S/√n
Step 3 / Calculate the lower and upper limits of the confidence interval / LL =  – SE
UL =  + SE

Method B – Using Excel Data Analysis command

Step 1 / Run Descriptive Statistics command from Data Analysis command with Confidence Level for mean checked / The output includes the sampling error – the last item of the output table, Confidence Level
Step 2 / Calculate the lower and upper limits of the confidence interval / LL =  – SE
UL =  + SE

Example 1

A sample of 100 cans of coffee showed an average weight of 13 ounces with a standard deviation of 0.8 ounces. Develop and interpret a 98% confidence interval for the mean weight of coffee in the cans.

Example 2

For the Net Income as a % of equity, develop and interpret a 97% confidence interval for the mean.
Confidence Interval for population proportion (p)

Assume a simple random sample of size n

Point Estimation:

Sample Statistic

Population Parameter

Size / n /

N

Mean /  / p

Confidence Interval for p =  ± SE

Estimating Sampling Error (SE) =

Then, Confidence interval for p =

Step 1 / Find z-table value using the Excel function / =ZINV(a/2)
Step 2 / Determine the standard error estimate /
Step 3 / Determine the sampling error (SE) / SE =
Step 4 / Calculate the lower and upper limits of the confidence interval / LL =  – SE
UL =  + SE

Example

In a poll 600 voters were asked whether they were in favor of eliminating plastic bags in grocery stores. 390 of the voters were in favor and 210 of the voters were opposed. Develop a 92% confidence interval estimate for the proportion of all the voters who are opposed to the proposal.

Inference (Hypothesis Testing)

Step 1: Set up the null and the alternative hypotheses.

Three types of hypotheses

Type / For population mean m / For population proportion p
Two-tailed / Ho: m = a
Ha: m ≠ a / H0: p = p0
H1: p ≠ p0
One-tailed / Ho: m ≤ a
Ha: m > a / H0: p ≤ p0
H1: p > p0
One-tailed / Ho: m ≥ a
Ha: m < a / H0: p ≥ p0
H1: p < p0

Step 2: Decision rule for testing the hypotheses

Possible results of a Hypothesis Test
H0 is accepted / H0 is rejected
H0 is true / Correct decision / Type I error
H0 is false / Type II error / Correct decision

Decision rule: Reject H0 if the probability of type I error <= a, where,

a = Level of significance. i.e. the maximum tolerable value for the probability of type I error up to which the H0 can be rejected

Note: Probability of type II error = b

Step 3: Compute p-value and reject H0, if p-value <= a.

Case 1: For hypotheses about m, use t-distribution for p-value

p-value = T.DIST.2T(abs(t),df) for two tailed test

= T.DIST.RT(abs(t),df) for one tailed test

Where, , df = degrees of freedom = n-1, and k = number of tails, 1 or 2.

Case 2: For hypotheses about p, use z-distribution for p-value

p-value = 1 - NORMSDIST(abs(z)) for one-tailed tests

p-value = 2*(1 - NORMSDIST(abs(z)) for two-tailed tests

Where,

Example 1:

A sample of 81 account balances of a credit company showed an average balance of $1,200 with a standard deviation of $126. Determine if the mean of all account balances is significantly different from $1,150. Use a .05 level of significance.

Example 2:

It is assumed that at least half the membership of a national trade union is female. A random sample of 400 members showed 168 women. Does the sample show that the proportion of women among the membership is less than 50%? Use a .05 level of significance for this hypothesis test.

Example 3:

It is normally assumed that the net income as % equity for the companies in the population is no more than 13%. However, test whether the sample data shows that the net income as % equity for the companies in the population is now greater than 13%. Use a .01 level of significance.

When to use .INV and .DIST functions

Use .INV for find table values for confidence intervals only

Use .DIST for find p-value fop hypothesis testing only

Using .INV functions for Confidence Interval

If Sigmas are known: Table value Za/2 = NORM.S.INV(cell containing the value of 1-a/2)

If Sigmas are unknown: ta/2 = Table value T.INV.2T(a,df)

Using .DIST functions for Hypothesis testing

If Sigmas are known:

Step 1: Find Z using formula (don’t use functions like NORM….)

Step 2: p-value for 2-tailed test: (1-NORM.S.DIST(ABS(Z-calculated),1))*2

p-value for 1-tailed test: (1-NORM.S.DIST(ABS(Z-calculated),1))

If Sigmas are unknown:

Step 1: Find t using formula (don’t use functions like T.INV ot T.DIST...)

Step 2: p-value for 2-tailed test: T.DIST.2T(ABS(t-calculated),df)

p-value for 1-tailed test: T.DIST.RT(ABS(t-calculated),df)