Math 131 Labs

Working with Albuquerque Home data. The Qualitative Data is NE (V1) = Located in northeast sector of city (1) or not (0) The Quantitative Data are PRICE (V2) = Selling price ($hundreds) and SQFT (V3) = Square feet of living space.

Lab 1: Sample the data.

Select a random sample of 40 from the 117 cases.

Calc -> Random Data -> Sample from Columns. Sample 40 rows from columns: Select all the columns (PRICE SQFT AGE FEATS NE CUST COR TAX)

Store samples in: type C1 C2 C3 C4 C5 C6 C7 C8.

PRICE SQFT AGE FEATS NE CUST COR TAX

945 1580 9 3 0 0 0 810

739 970 4 4 0 0 1 541

729 1007 19 6 1 0 0 513

660 1159 * 0 0 0 0 225

876 1156 * 1 1 0 0 *

1560 1920 1 5 1 1 0 1161

755 1275 * 5 1 0 0 *

750 1030 * 1 1 0 0 486

995 1500 15 4 1 0 0 743

899 1464 * 2 1 1 0 566

2050 2650 13 7 1 1 0 1639

780 1080 * 3 0 1 0 600

1050 1920 8 4 0 0 0 944

710 1083 22 4 1 0 0 504

2150 2664 6 5 1 1 0 1193

1045 1630 6 4 0 0 0 750

872 1229 6 3 0 0 0 721

1020 1478 53 3 1 0 1 626

540 1142 * 0 0 0 0 223

1449 1710 1 3 1 1 0 1010

875 1173 6 4 1 0 0 456

2150 2848 4 6 1 1 0 1487

730 1027 * 3 1 0 0 427

1270 1880 8 6 1 0 0 930

1109 1740 4 3 0 0 0 816

749 1733 43 6 1 0 0 656

805 1258 7 4 1 0 1 821

975 1500 7 3 0 1 1 700

1030 1540 6 2 0 0 1 826

600 1198 * 4 0 0 0 *

1250 2180 17 4 1 0 1 1141

1160 1720 5 4 0 0 0 867

810 1365 * 2 1 0 0 673

759 997 4 4 1 0 0 461

670 1350 * 2 1 0 0 622

725 1140 * 3 1 0 1 490

700 1505 * 2 0 0 1 591

975 1430 * 3 1 0 0 752

1695 2931 28 3 1 0 1 1142

1170 1928 18 8 1 1 0 600

Lab 2: Organize the data into frequency distributions and graphs.

A. Preparing frequency and relative Frequency distribution for the qualitative data.

Stat -> Tables -> Tally Individual Variables. Choose Count and Percent. The result presented in the Session Window is:

Tally for Discrete Variables: NE
NE Count Percent
0 14 35.00
1 26 65.00
N= 40

B. Preparing a Pie Chart

Graph -> Pie Chart Choose Chart Raw Data. Choose NE as the Categorical Variable. Click Labels Select Slice Labels, check off Category name and Percent. Click OK twice.

C. Prepare a Pareto Chart (Bar Chart)

Graph -> Bar Chart. Bars represent Counts of Unique Values. Choose Simple Table. Click OK. Choose NE as the Categorical variable.

D. They both give a good idea of the relative size of each category.

E. The NE has more sales.

Part 2: Quantitative Data

Relative Frequency Histogram for PRICE

Graph -> Histogram Select Simple Click OK, Select PRICE. Click on Scale and under the tab Y-scale Type, Choose Percent click OK To include the values on the graph, click on the Data Labels tab and select Use y-value labels click OK, OK

Part 3: Quantitative Data

Frequency Histogram for SQFT

Graph -> Histogram Select Simple Click OK, Select SQFT. To include the values on the graph, click on the Data Labels tab and select Use y-value labels. Click on Scale and under the tab Y-scale Type, Choose Frequency click OK

C. Prepare a frequency polygon using class midpoints (Note I just let Minitab choose the points)

Graph -> Histogram Select Simple Click OK, Select SQFT. To change the title to indicate Polygon, click on the Labels button, and under the Titles/Footnotes tab specify the title to ‘Polygon of SQFT’. To include the values on the graph, click on the Data Labels tab and select Use y-value labels. To make a polygon instead of a histogram, click on the Data view button. On the Data Display tab, remove check mark from Bars and place check mark on Symbols. Under the Smoother Tab, choose Lowess for Smoother, make the Degree of smoothing 0 and the Number of steps 1. Then click OK twice.

D. Prepare an ogive using upper class boundaries

Place SQFT data in column C1 of a new Worksheet then sort the data to make it easier to find frequencies in each class.

Data -> Sort -> Sort column C1 by Column C1. Choose Store sorted data in original column.

970,997,1007,1027,1030,1080,1083,1140,1142,1156,1159,1173,1198,1229,1258,1275,1350,1365,1430,

1464,1478,1500,1500,1505,1540,1580,1630,1710,1720,1733,1740,1880,1920,1920,1928,2180,2650

2664,2848,2931

Dividing the range by the number of classes: . Rounding up give a class width of 218.

Class Boundaries / Midpoints / Freq / Rel
freq / Cum freq / Cum rel freq
970 - 1187 / 969.5, 1187.5 / 1079 / 12 / 0.30 / 12 / 0.30
1188 - 1405 / 1405.5 / 1297 / 6 / 0.15 / 18 / 0.45
1406 - 1623 / 1623.5 / 1515 / 8 / 0.20 / 26 / 0.65
1624 - 1841 / 1841.5 / 1733 / 5 / 0.125 / 31 / 0.775
1842 - 2059 / 2059.5 / 1951 / 4 / 0.10 / 35 / 0.875
2060 - 2277 / 2277.5 / 2169 / 1 / 0.025 / 36 / 0.90
2278 - 2495 / 2495.5 / 2387 / 0 / 0 / 36 / 0.90
2496 - 2713 / 2713.5 / 2605 / 2 / 0.05 / 38 / 0.95
2714 -2931 / 2931.5 / 2823 / 2 / 0.05 / 40 / 1.00

To use Minitab to plot the Ogive, in a new worksheet make the Class Boundaries column C1 and the Cum rel freq column C2 as follows:

969.5 0.000

1187.5 0.300

1405.5 0.450

1623.5 0.650

1841.5 0.775

2059.5 0.875

2277.5 0.900

2495.5 0.900

2713.5 0.950

2931.5 1.000

Then select Graph -> Scatterplot -> With Connect Line. Select C2 for the Y-variable and C1 for the X-variable. Click on the Data View button and be sure that both Symbols and Connect line are selected. By choosing both Symbol and Connect line, Minitab will connect the dots at each data point on the graph. Click on Labels and title the ogive ‘Ogive of SQFT’. To label the points click the Data labels tab and choose use y-value labels. Click OK After the graph is created, it should be edited to show each upper class limit. Right-click on the X-axis of the graph and select Edit X scale. Enter the Position of ticks as 969.5: 2931.5/218. This tells Minitab that the tick marks should start at 969.5 and go up in steps of 218.

The results are:

E. They tell me that the number of homes with square feet around 1200 is the greatest.

Lab 3: Finding descriptive statistics for the quantitative data.

A-D. Stat -> Basic Statistics -> Display Descriptive Statistics Choose Price, click on Statistics Choose Mean, Median, range and standard deviation. Note Minitab does not provide Mode.

Descriptive Statistics: PRICE
Variable N N* Mean StDev Median Range
PRICE 40 0 1019.5 405.9 887.5 1610.0

E List the five number summary and make the box-and-whisker plot.

Stat -> Basic Statistics -> Display Descriptive Statistics Choose Price, click on Statistics Choose Minimum, Maximum, First Quartile, Median, Third Quartile and InterQuartile Range.

Descriptive Statistics: PRICE
Variable N N* Minimum Q1 Median Q3 Maximum IQR
PRICE 40 0 540.0 741.5 887.5 1147.3 2150.0 405.8

Click on Graph->Boxplot and select Simple boxplot. Click on OK. Select PRICE for the Graph variable. To view a horizontal boxplot (rather than a vertical one) click on Scale and select Transpose value and category scales. Click on OK twice.

Calc -> Standardize Specify PRICE as input column. Store results in C9 (an empty column). Choose Subtract mean and divide by std dev.

The results for each PRICE are given. the highest PRICE (2150) has a standard score of 2.78490. The mean has a standard score of 0, and the lowest PRICE (540) has a standard score of -1.18130.

Part 2:

A. Estimating the mean for the SQFT variable using the technique from the text p 62

From p 78: (Note on p 78 the x’s are specific values so the formula is exact; on p 79 the x’s are the midpoints of classes, so the formula is an approximation)

Substituting x = 1000, 1250,…,3000 and gives

B. Chebychev’s Theorem: .

For the SQFT, the mean is about 1575 and the standard deviation is about 532. Thus 2 standard deviations on either side of the mean goes from 511 to 2639. Chebychev’s Theorem says that 75% of the data should lie between these two values. If we look at the histogram for SQFT in Lab 2 Part 3 we see that this is easily the case.

Lab 4 Finding simple probabilities, conditional probabilities and using the Multiplication and Addition Rules.

Part 1

NE Corner / Not Northeast Corner
Median Price ($887.5)and Under / 13 / 7 / 20
Higher than the Median / 13 / 7 / 20
26 / 14

Part 2

A.  Probability that Price is less than or equal to the median = 0.5

B.  Probability that Price is greater than the median = 0.5

C.  Probability of being in the NE Corner = 0.65

D.  Probability of being in NE Corner and less than or equal to the median = 0.325

Part 3

A.  Probability that Price is less than or equal to the median given home is in NE corner = 0.5

B.  Let A be NE Corner and B be <= median: P(B|A) = P(B). So the events are independent

C.  Probability of being in NE Corner given Price is higher than the median = 0.65

D.  P(A|B) = P(A), so events are independent.

Part 4

Probability both homes are in NE Corner = P(A)*P(A|A) =(26/40)*(25/39) = 0.65*0.641 = 0.4167

Part 5

  1. P(A or B) = P(A) + P(B) – P(AB) = 0.65 + 0.5 – 0.325 = 0.825
  2. P(B) + P(B’) = 1
  3. Two mutually exclusive events are being in row 1 and in row 2, and being in col 1 and col 2.

Lab 5 Standardizing data, computing probabilities using the standard normal distribution, and finding values given probabilities.

The mean and standard deviation for the variable SQFT were estimated using the techniques aon p 62 and 79 to be

Using Minitab to get the exact values:

Stat -> Basic Statistics -> Display Descriptive Statistics. Select SQFT, click Statistics, Choose Mean and Standard Deviation

Results for: ALBHOMEPRICESLAB.MTW
Descriptive Statistics: SQFT
Variable Mean StDev
SQFT 1552.3 513.8

The values are slightly different because the technique used in Lab 3 Part 2 is an estimate based on the midpoints in the ranges.

Sort the SQFT data

Data -> Sort -> Sort Columns: Select SQFT, Sort by: Select SQFT, Store sorted data in New worksheet. Click OK

To compute the z-score in Minitab:

Calc -> Standardize. Input column is SQFT. Store results in C2. Click on Subtract mean and divide by std.dev., click OK

Line Number / SQFT / z-score =
1 / L = 970 / -1.13312
20 / M = 1430 / -0.17174
40 / H = 2931 / 2.68318

Data that goes with following z-scores:

z-score / SQFT
-2.50 / None that small
3.20 / None that large
0 / 1540 is nearest 0
0.5 / 1880 is nearest 0.5
Normal curve Arrows locate L = 970, M = 1430, and H = 2931on it based on p in table 4 Appendix B for corresponding z-score:


L = 970, z = -1.13312 (p = 0.13), M = 1430, z = -0.17174 (p = 0.43), H = 2931, z = 2.68318 (p = .996)
P(L < X < H) / P(-1.13312 < Z < 2.68318) / 1 - .1292 – (1 - .9963) = 0.1329
P(L < X < M) / P(-1.13312 < Z < -0.17174) / .4325 - .1292 = 0.3033
P(M < X < H) / P(-0.17174 < Z < 2.68318) / 1 - .4325 – (1 - .9963) = 0.5702
P(X < L) / P(Z < -1.13312) / 0.1292
P(X > M) / P(Z > -0.17174) / 0.4325
P(X < H) / P(Z < 2.68318) / 0.0037
Sketch of Normal curve with area where the lowest 10% are indicated by line, determined by finding z.10 on table 4 Appendix B and translating to x as shown.


.

10% of my population are below 894.6

Sketch of Normal curve with area where the highest 5% are shaded, determined by finding z.95 on table 4 Appendix B and translating to x as shown.



5% of my population are above 2397.5