Page1Introductory Statistics for Engineers

Project1 Descriptive Statistics

Name______Due 9/09/2016

Attach any work as separate sheets. Each solution needs to be organized, neatly written and clearly labeled with its corresponding problem number and attached in the order assigned. If you use Excel on a particular problem, attach a labeled printout of the relevant part of the spreadsheet. All attached spreadsheet printouts should be set up in Print Preview so that margins, orientation, and scaling yield a readable output that avoids confusing split pages.

I. (11 points)

Use the following two data sets to compute the requested sums. Put only the answers on this sheet and attach work separately. You may wish to use Excel to do the calculations.

i / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13
xi / 3.2 / 3.5 / 3.8 / 4.0 / 4.3 / 4.4 / 4.7 / 4.8 / 4.9 / 5.0 / 5.1 / 5.3 / 5.5
yi / 5.0 / 5.0 / 4.9 / 4.6 / 4.0 / 3.9 / 4.1 / 3.8 / 3.7 / 3.2 / 3.0 / 2.8 / 2.7

Complete the table below and answer the questions that follow. Here the shorthand convention is used that , n being the number of data points.

Summation Formula / a = -4.5 / a =0 / a =4.5 / a =9.0
A.
B.
C.
D.
E.

What pattern if any do you see between A. and B. above? Prove your assertion.

What do you notice about C.? Prove your assertion.

What pattern if any do you see between D. and E. above? Prove your assertion.

What value of a above makes D. the smallest?

Using the data on page 1 compute the following:

Summation Formula / Answer
/ ______
/ ______
/ ______
/ ______
/ ______
/ ______

For any fixed set of data xi, i running from 1 to n,determine the value of a which minimizes each of the following functions.

minimizing value of a = ______

minimizing value of a = ______

II. (11 points)

The placement scores of MATC students enrolled in Elementary Algebra were as follows:

36 / 39 / 29 / 37 / 50 / 32 / 38 / 36 / 34 / 32 / 32 / 45 / 32 / 45 / 34 / 31
48 / 48 / 54 / 38 / 44 / 44 / 38 / 38 / 39 / 42 / 44 / 42 / 44 / 36 / 36 / 29
38 / 47 / 44 / 52 / 31 / 36 / 40 / 48 / 32 / 34 / 49 / 31 / 34 / 38 / 31 / 38
36 / 32 / 42 / 40 / 32 / 34 / 44 / 38 / 47 / 44 / 36 / 40 / 38 / 34 / 36 / 45
38 / 49 / 34 / 23 / 40 / 47 / 32 / 47 / 38 / 38 / 34 / 36 / 32 / 48 / 38 / 48
32 / 23 / 42 / 34 / 47 / 42 / 36 / 38 / 29 / 42 / 31 / 29 / 42 / 31 / 29 / 36
34 / 36 / 44 / 45 / 36 / 38 / 44 / 42 / 29 / 34 / 44 / 42 / 34 / 34 / 36 / 34
38 / 40 / 38 / 29 / 42 / 38 / 51 / 48 / 34 / 44 / 48 / 38 / 38 / 34 / 42 / 47
44 / 34 / 27 / 34 / 36 / 38 / 40 / 32 / 29 / 44 / 34 / 42 / 44 / 49 / 47 / 32
40 / 42 / 26 / 36 / 34 / 45 / 38 / 44 / 36 / 45 / 36 / 40 / 42 / 42 / 48 / 27
36 / 32 / 45 / 44 / 36 / 42 / 36 / 27 / 48 / 40 / 38 / 42 / 38

Use of the free Winstats program available at makes this analysis particularly simple, but the data must be pasted in as a single column.

First organize the data into a stem-and-leaf diagram. For each stem use two lines of leaves. On the top line place the leaf digits0 through 4 and on the bottom line the leaf digits 5 through 9.

Stem / Leaves
2
3
4
5

Next organize the data into a frequency distribution and enter the results into an Excel spreadsheet. This full data set will be referred to as the 'Ungrouped'Data. For this set of scores calculate and record to the nearest thousandth the descriptive statistics requested in the left side of Table2.Now, group the data so that the lowest class is 20.5-24.5. Fill in Table 1. Using this grouped data, construct a histogram of the scores, plotting relative frequency along the vertical axis and the classes along the horizontal axis. Use the midpoint or class mark of each class to represent all of the scores in that class and repeat the samecalculations as for the UngroupedData. Next construct the ogive of the Grouped data by plotting relative cumulative frequency for less than or equal versus the class boundaries.Fill in the right side of Table 2, labeled as Grouped Data.Finally, make a box plot of both the Ungrouped and Grouped data.You should use Excel to present the frequency distributions, do the calculations, and graph the histogram and ogive. Excel does not have a ‘built-in’ function to calculate the mean and standard deviation of a frequency distribution (the Excel functions AVERAGE, STDEV and STDEVP assume each score in the argument list occurs only once.) However, by setting up a column of f *x and a column of f*x2 , the mean and standard deviation can be calculated from the formula:

.

The Excel sample spread sheet shown below illustrates such a calculation for the following frequency distribution.

x / f
5 / 3
6 / 13
7 / 23
8 / 31
9 / 19
10 / 4

The output of the above formulas is shown below.

To make the histogram bars fill up the class width as shown above, click on one of the rectangles in the histogram, then right click and select Format Data Series from the right-click menu. In the Format Data series menu select Series Options and set the Gap Width to 0%. To generate the ogive graph in Excel choose a chart type that is a line graph of connected points.

Excel can even generate the frequency distribution of the classes. This requires the Data Analysis package be available under the Tools menu. If Data Analysis is not shown in the Data Menu, click on the Office Button and choose Excel Options, then choose Add-Ins. From the list of Add Ins available: check Anaylsis ToolPak and click Go. You will need to setup a column of left-class boundaries for the grouped data. Excel calls the column of these boundaries a “BinRange”. Once the Data Analysis Tool is chosen from the Tools menu, select Histogram and click OK. From the Histogram menu select the InputRange as the cells in the column of the ungrouped data, the BinRange as the column of Right class boundaries, and then pick a cell where you want the resulting frequency distribution to begin as the OutputRange. Click OK to generate the frequency distribution.

Table 1: Algebra Placement Scores Grouped Data

Class / f / Class Mid Point / Relative f / Cumulative f / Rel. Cum. f
20.5 – 24.5 / 22.5
24.5 – 28.5
28.5 – 32.5
32.5 – 36.5
36.5 – 40.5
40.5 – 44.5
44.5 – 48.5
48.5 – 52.5
52.5 – 56.5

Table 2: Algebra Placement Scores

Descriptive Statistic / Ungrouped Data / Grouped Data
Minimum
Maximum
Range
Mode
Median, Md
Mean,
Q1
Q3
IQR
60'th Percentile, P60
Sample Standard Deviation, sx
Population Standard Deviation,
Sample coefficient of variation
Sample Variance,
Population Variance,
Box Plot of Ungrouped Data
Box Plot of Grouped Data

III. (26 points)

The data set that follows are measurements of the average diameters of a sample of lymphocytes.

A scanning electron microscope (SEM) image of a single human lymphocyte (white blood cell). /
A stained lymphocyte surrounded by red blood cells viewed using a light microscope. /

The measurements are in µm and were extracted from an image analysis of an SEM image. The software which generated the data does not always distinguish between intact lymphocytes and “pieces” or fragments of lymphocytes or between single lymphocytes and adjacent “clusters”. This data was supplied by Mike Kostma of MATC’s Electron Microscopy program.

16.3975 / 18.63855 / 15.20965 / 18.63918 / 19.4107 / 18.38818 / 3.0105 / 29.24289
14.83309 / 30.56236 / 20.1218 / 15.87201 / 15.17715 / 17.98224 / 44.1056 / 19.45832
15.07998 / 17.05593 / 19.36242 / 20.9678 / 18.22499 / 20.82515 / 17.33582 / 17.78974
18.56787 / 17.86239 / 21.59801 / 17.57864 / 17.29442 / 15.53546 / 39.16 / 19.19576
16.55765 / 18.13202 / 21.23444 / 13.99835 / 17.07939 / 2.236068 / 22.12852 / 15.85017
17.46741 / 17.8864 / 21.35174 / 19.35696 / 4.472136

This full data set will be referred to as the ‘Ungrouped Data’. For this set of scores calculate and record to the nearest thousandth the descriptive statistics requested in the left side of Table 3.

Now using a constant class width, group the data so that the lowest class is 1.5-2.5. Using this grouped data, construct a histogram of the scores, plotting relative frequency along the vertical axis and the classes along the horizontal axis. Also generate the ogive of relative cumulative frequency for less than or equal versus the class boundaries. Use the midpoint or class mark of each class to represent all of the scores in that class. Repeat the samecalculations as for the Ungrouped Data and fill in the right side of Table 3, under the heading ‘Grouped Data’. Finally, make and attach a box plot of both the Ungrouped and Grouped data.

Table 3: Lymphocyte Diameters

Descriptive Statistic / Ungrouped Data / Grouped Data
Minimum
Maximum
Range
Mode / N. A.
Median, Md
Mean,
Q1
Q3
IQR
60'th Percentile, P60
Sample Standard Deviation, sx
Population Standard Deviation,
Sample coefficient of variation
Sample Variance,
Population Variance,
Box Plot of Ungrouped LymphocyteData
Box Plot of Grouped LymphocyteData

How closely do the descriptive statistics of the grouped and ungrouped scores compare?

How well does grouping the scores into classes represent the actual data?

For the ungrouped data, what fraction of the scores is within one standard deviation of the mean?

For the ungrouped data, what fraction of the scores is within two standard deviations of the mean?

For the ungrouped data, what fraction of the scores is within three standard deviations of the mean?

For the ungrouped data, what fraction of the scores have a z score larger than 1 ?

For the ungrouped data, what fraction of the scores have a z score smaller than 1 ?

For the ungrouped data, what fraction of the scores have a (absolute value of z score) larger than 1 ?

Using the box plot of the ungrouped data, eliminate all “outliers”, i.e., all data beyond the outer fence (3.0 IQR's from the box hinges). Now recompute the mean and the sample standard deviation of this reduced data set.

Sample Mean= ______

Sample Standard Deviation sx= ______

Eliminating the outliers changed the values of both of these statistics. Compared to the results for the full ungrouped data set, which statistic showed the greatest percent change by eliminating the outliers? Explain this observation.

In general, if outliers are eliminated, will the value of this same statistic increase or decrease? Explain your answer.

For this data set give a possible justification for eliminating the outliers.
Project2Basic Probability

Name______Due 9/19/2016

Attach any work as separate sheets. Each solution needs to be organized, neatly written and clearly labeled with its corresponding problem number and attached in the order assigned. If you use Excel on a particular problem, attach a labeled printout of the relevant part of the spreadsheet. All attached spreadsheet printouts should be set up in Print Preview so that margins, orientation, and scaling yield a readable output that avoids confusing split pages.

I.

1. (2 points)The following four Venn diagrams represent the two events A and B as circles in a sample space S represented as a large rectangle. In the Venn diagram on the left below shade the event . In the Venn diagram on the right below shade the event . What conclusion do you make?

In the Venn diagram on the left below shade the event . In the Venn diagram on the right below shade the event . What conclusion do you make?

2. (8 points) The Venn diagram below represents the three events A, B and C as circles in a sample space S represented as a large rectangle. Shas a total of 11 outcomes e1 through e11. When an outcome is an element of an event it is drawn within that event in the Venn diagram.

The probabilities of the eleven outcomes are listed below.

Determine the following probabilities:

3. (4 points)

A test is performed on a manufactured product to detect a specific electronic defect.On a product without this defect the test will erroneously indicatethe presence of the defect 0.15% of the time. On a product with this defect the test will erroneously fail to detect it 0.50% of the time. Past history indicates that 3.85% of the products have this electronic defect. Calculate the following:

a) What percent of all products don’t have this specific defect?

b) What percent of all products don’t have this specific defect and the test confirms this condition?

c) What percent of all products don’t have this specific defect butthe test says otherwise?

d) What percent of all products have this specific defect and test fails to detect it?

e) What percent of all productshave this specific defect and test confirms this condition?

f) For what percent of all products does the test give incorrect results?

g) Given that a product tests as having this specific defect, what is the probability that it really does not have the defect?

h) Given that a product tests as having this specific defect, what is the probability that it really does have the defect?

i) Given that a product tests as not having this specific defect, what is the probability that it really does nothave the defect?

j) Given that a product tests as not having this specific defect, what is the probability that it really does have the defect?

4. (1 point)

A student must answer 10 of 15 questions (each worth 10 points) on a 100 point exam.

a) How many different choices as to which set of questions to answer does any one student have?

b) Answer the same question assuming that the exam rules state that everyone must answer the first seven questions.

5. (6 points) A container holds 3 blue, 5 red and 12 black marbles. Except for color, the marbles are physicallyidentical; however, all the marbles are considered distinguishable. Four marbles are drawn at random without replacement. Compute and state the following probabilities for such a drawing.

a) All of the marbles drawn are black.

b) Exactly one of the marbles drawn is black.

c) More than one of the marbles drawn is black.

d) Exactly two of the marbles drawn are blue.

e) At least two of the marbles drawn are blue.

f) At most two of the marbles drawn are red.

g)At least one red marble is drawn.

h) At least one red and at least one black marble are drawn.

i) There is at least one marble of each color drawn.

j) At least one black marble but no red marbles are drawn.

k) At least one red marble but no black marbles are drawn.

l) At least one black marble or at least one red marble is drawn.

6. (3 points) With reference to the drawing in problem 5, compute the stated conditional probabilities for parts a) through d) below.

a) At least one red marble is drawn given that at least one black marble is drawn.

b)At least one black marble is drawn given that at least one red marble is drawn.

c) At least one red marble is drawn given that no black marbles were drawn.

d) At least one black marble is drawn given that no red marbles were drawn.

e) Is the event no black marble is drawn independent of the event at least one red marble is drawn? Explain.

f) Are the events: no black marble is drawn, and no red marble is drawn, mutually exclusive?

7. (5 points) Five cards (called a “hand”) are drawn without replacement from a standard 52 playing card deck of cards consisting of four suits (diamonds, hearts, clubs and spades) and every suit has thirteen “rank” cards (2 – 10, jack, queen, king, and ace). The following names designate specific hands in terms of the number of matching rank cards. A single pair means that exactly two cards in the hand have the same rank. Three of a kind means that exactly three cards in the hand have the same rank. A full house means the hand contains both a pair and a three of a kind. Four of a kind means that the hand contains all four cards of the same rank. Compute and state the following probabilities.

a) The hand contains a single pair.

b) The hand contains two pairs but not four of a kind.

c) The hand contains three of a kind but not a full house.

d) The hand is a full house.

e) The hand contains four of a kind.

8.(3 points) Five cards are drawnwithout replacement from a standard 52 playing card deck. Let A be the event that the first two cards drawn are a pair and let B be the event that the hand contains two pairs but not four of a kind.

a) Compute and state the probability of A.

b) Compute and state the probability of B given A.

c) Compute and state the probability of B given not A.

d) Compute and state the probability of A given B.

e) Compute and state the probability of not A given not B.

f) Compute and state the probability of A or B.

9. (2 points)

a) A manufacturing process consists of four sub assembly operations performed in series. If each step is 98% reliable, what is the overall reliability of the process?

b) A power supply is rated at 95% reliability and it has two separate backups each rated as 75% reliable. Assuming that the failure of a power supply is independent of the other power supplies, what is the probability of having power?

10. (3 points)

Suppose that on any given flight of a particular kind of aircraft that the chance of an aileron malfunction is 0.012%. Assume (unrealistically!) that this probability never changes and that having an aileron malfunction is a random process.

a) Calculate the probability that an aircraft of this kind has an aileron malfunction on its first flight.

b) Calculate the probability that an aircraft of this kind will have an aileron malfunction on its 1000'th flight, given that it has already completed 999 flights without incident.

c) Calculate the probability that an aircraft of this kind makes 999 flights without incident and then has an aileron malfunction on its 1000'thflight.

d) Calculate the probability that an aircraft of this kind makes 1000 flights and never has an aileron malfunction.

e) Calculate the probability that an aircraft of this kind has an aileron malfunction sometime before it completes its 1000'th flight.

11. (2 points)

Four integrated circuits (IC's) are to be sampled for testing from a lot of 42 experimental prototypes produced.

a) How many different samples are possible?

b) What is the probability that a particular group of 4 IC's out of the 42 produced would be the ones chosen for testing?

c) Suppose that five of the 42 IC's are defective. Let x be the number of defective IC's chosen in the sample of four to be tested. Fill in the following probability distribution:

x / f(x)
0
1
2
3
4
Mean Value
Population Variance
Population Standard Deviation

II. (9 points)

Toss two six-sided dice 144 times. For each toss record the sum of the two uppermost faces. From this data construct the relative frequency distribution (i.e., the empirical probability distribution) to the nearest ten-thousandth. Using the assumption of a fair experiment (i.e., the “classical probability concept”), calculate the theoretical probabilities, also to the nearest ten-thousandth. The Mean value of the probability distribution is the mathematical expectation (expectation value) of the sum of faces. The population variance is the expectation value of the squared deviation of the sum of faces from its mean value. The population standard deviation is the square root of the population variance. For the 144 die tosses, the mean and population standard deviation are just the mean and population standard deviation of your 144 scores.