Solutions to Spring 1998 exam
Business Statistics
Question 1
a. i. Scale: 2|5 = 25
0 / 5 / 71 / 0 / 1 / 2 / 2 / 2 / 4 / 5 / 6 / 7 / 7 / 8 / 8
2 / 2 / 3 / 3 / 4 / 5 / 8 / 9
3 / 1 / 4
4 / 2
After examining the stem and leaf plot, it is clear that the accounts are unimodal and positively skewed.
ii.
iii. The five figure summary consists of 5 numbers such that 25% of data values are found between each of the five numbers. The five numbers are
Lowest data value = 5
25th percentile = (0.25)(24) = 6
Therefore the 25th percentile is the average of the 6th and 7th data values ie 12
Median = (0.5)(24) = 12
Therefore the median is the average of the 12th and 13th data values ie 17.5
75th percentile = (0.75)(24) = 18
Therefore the 75th percentile is the average of the 18th and 19th data values ie 24.5
Highest data value = 42
The boxplot is then drawn to scale using these 5 numbers.
Fred’s household accounts
x------======------x
0 5 10 15 20 25 30 35 40 45
Number of litres
iv. If Fred is interested in household accounts only, then the corner store account should be left out as it is not a household account. If however Fred is interested in all account holders, then the corner store account should be included. It all depends on what the target population is.
v. Fred’s concern is warranted. These account customers are not a random sample of his customers so no inferences about his customers can be made.
b. i.
ii. Company B since it has the lowest standard deviation and coefficient of variation. Both these are measures of risk. The smaller the measure of variability, the smaller the risk associated with the investment.
Question 2
a.
A Binomial problem where
X = number of shipments that arrive late = 0, 1, 2, …, 8
n = 8
p = probability a shipment will arrive late = 0.3
Wherever possible, binomial probabilites should be determined from the binomial tables in the Appendix at the rear of the text.
i.
ii.
iii. If n = 30 we can no longer use the binomial tables to find the required probability.
Since np = 9 and nq = 21 are both greater than 5 we can use the normal distribution to approximate the required probability.
Let Y be a normally distributed random variable where
9 15.5 Y
b.
Completing the row and column totals for this contingency table we have.
Cost
Type / $10 / $15 / $20 / TotalsFiction (F) / 100 / 80 / 30 / 210
Biography (B) / 120 / 100 / 90 / 310
Historical (H) / 40 / 170 / 20 / 230
Totals / 260 / 350 / 140 / 750
i.
ii.
iii.
c. Weekly costs (X) ~ N(410, 902)
______
300 410 X
Therefore the probability that the manager will be able to keep costs below $300 this week is 0.1112.
Given the low probability (approximately 1 chance in 10), I would not be confident he could achieve this goal.
Question 3
a. i.
There is insufficient evidence to conclude that the proportion of visitors that go to the Tourist Information Centre is less than 40%.
ii.
b. i. Since s is unknown and we are using s to estimate s, we must use the
following form of the confidence interval estimator.
Therefore the true mean processing time is between 1.14 hours and 1.36 hours.
ii. The easiest way to answer this question is to use the information obtained in a. above in the following way.
Since the 95% confidence interval does not contain 1 hour, it is unlikely that the company’s claim of a hour service is correct. In fact it appears that developing and printing takes more than an hour.
Many students however chose to perform a complete hypothesis test. This is fine however since this part was only worth two marks, a lot of work was required for little return.
If the test was
If the test was
We could in fact argue for either form of the alternate hypothesis hence both of the above were marked correct.
Question 4
a. i. From the scatterplot it appears that there is a positive linear
relationship between temperature and icecream sales. As the temperature increases, so too do the icecream sales.
ii.
iii. Slope coefficient = 0.1093
This implies that as the daily high temperature increases by C, the daily sales increase by 0.1093($000’s) ie by $109.30
iv. The normality assumption can be evaluated by examining the histogram of the residuals. The residuals do appear to be approximately bell shaped. It seems reasonable to conclude that there is no overwhelming evidence of a violation of the normality assumption.
v.
There is sufficient evidence to conclude that there is a linear relationship between temperature and sales.
vi.
b. omit
Part B
1. D.
If we were to draw a simple picture of the data we would see a shape similar to the rough sketch below.
Clearly this picture has two peaks and appears to be skewed in the negative direction. Since the picture has two peaks (although not of the same height), we refer to the data as bimodal.
or C.
If we were to use the strict definition of the mode here, then there is only a single mode, ie 4, as this is the value with the highest frequency. As this is the textbook definition I will also accept C.
2. C.
There were 40 people sampled. The median can be found by
Therefore the median is the average of the 20th and 21st data values.
From the distribution in i. above, these values are 3 and 4 respectively.
Therefore the median is 3.5.
3. D.
The distribution would vary in the following way.
Number of people / Frequency0 / 1
1 / 3
2 / 10
3 / 6
4 / 15
5 / 4
15 / 1
The data value of 15 would cause the mean and the standard deviation to increase as both these measures use all the data values in their calculation.
There are still 40 data values therefore the median is still the average of the 20th and 21st values. These values have not changed, therefore the median will not change.
4. E.
5. E.
The labelling used along the x-axis in graph A. is inappropriate. This method should only be used to label a bar graph.
The labelling used along the x-axis in graph B. is incorrect. The upper limit of each class has been plotted at the midpoint of each column.
Graphs C. and D. are bar charts as there are gaps between the columns.
Graph E. is correct as it is a histogram with the midpoint of each class plotted at the midpoint of each column.
6. D.
A Poisson problem with X = number of planes, m = 5.5 planes per minute.
We require the probability of more than 7 planes arriving at the airport per minute.
7. A.
P(0.3 < Z < 2.4) = 0.4918 – 0.1179
= 0.3739
______
0 0.3 2.4 Z
8. D.
Driving times are uniformly distributed with a = 60 and b = 110.
This distribution is represented graphically below.
From the graph it is clear that the rquired probability can be represented graphically as a rectangular area under the line between 60 and 90. Therefore
9. D.
P(A < Z < 0) = 0.4
From the standard normal tables we find
P(1.28 < Z < 0) = 0.3997 (approx. 0.4)
P(Z < A) = 0.1 0.4 therefore A = -1.28
______
A 0 Z
10. E.
Since P(H) = P(T) = 0.5 for a fair coin, then
P(THTHHT) = (0.5)6 and this probability is the same for each sequence.
11. C. Refer S&S Section 8.4 pp218 - 220
12. B.
The standard error of the sample mean is denoted
13. D.
Since the numbers showing on the face of the die occur with equal probabilities the distribution of these numbers will definitely not be normally distributed.
14. C.
The null hypothesis is rejected when p-value < a. Since a is generally 0.01, 0.05 or 0.1, we would not reject the null since the p-value > a in all these instances.
Some students selected A. however this is not correct. Refer S&S p265 and p271. The conclusion is never “accept the null hypothesis”. We always conclude “reject the null”or do not reject the null”.
15. E.
None of the statements in A. through D. describe an interval, they all describe a single value.
16. A.
17. D.
18. omit
19. omit
20. omit
6