Math 135
Review for Test 1
Data Analysis and Summary
Part I: Short Answer
1. You are conducting a study at North Trust Bank to determine if there is a relationship between job classification and job satisfaction with the company's employees. Job satisfaction is measured by a survey score based on a 100 point scale (a score of 100 points indicates an employee is extremely satisfied with their job). The information below shows a random sample of employee survey scores selected from each job classification type:
Job Classification / Sample of job satisfaction survey scores / x-bar sHourly - Craft / 32, 38, 45, 55, 65, 66, 68, 70
Hourly - Clerical / 44, 52, 55, 56, 65, 68, 76, 94
Salaried / 37, 48, 48, 65, 77, 78, 88, 89
a) What type of data relationship is being investigated here? (e.g. Q-Q, C-C, C-Q)?
b) Construct an appropriate graph to help you determine if a clear relationship exists between job classification and job satisfaction.
c) Summarize your findings from your graph. Does a clear relationship exist? Defend your answer.
d) Calculate the mean, x-bar and standard deviation, s, for the job satisfaction score for each job class. You can list these values in the table above. Does this information support your conclusion in part c? Explain.
2.
a) Using the information from question # 1…. Construct an appropriate graph of just the hourly clerical job satisfaction survey scores. Describe the shape and report the appropriate numerical summaries for center and spread. Justify your description of the shape of the graph with a computation. Are there any outliers? Justify your answer with a computation. Explain.
b) The following table shows a count of employees from North Trust Bank by job classification. Make an appropriate graph/chart to display the information from the table. Summarize your findings.
GRAPH
Job Classification / # employeesHourly - Craft / 85
Hourly - Clerical / 20
Salaried / 15
c) Based on the information in your graph from part (b) and the survey scores from Question # 1, what strategy would you think North Trust management might want to consider with regard to their employees and job satisfaction?
3. North Trust Bank also wants to determine if a relationship exists between job tenure (# of years with the company) and salary (for the employees in the internal audit department). The following table displays a random sample of employees taken from the North Trust database:
Employee # / Job tenure (in years) / Annual Salary (current year only; in thousands ) / Employee # / Job tenure (in years) / Annual Salary (current year only; in thousands)1 / 15 / 50 / 32 / 12 / 66
7 / 5 / 35 / 45 / 8 / 48
13 / .7 / 38 / 47 / 38 / 47
24 / 25 / 75 / 83 / 22 / 52
a) What is the relationship type investigated in this study? Which variable would be the explanatory variable? Which is the response variable?
b) Make an appropriate graph to investigate whether a relationship seems to exist between these 2 variables. Describe the relationship shown in the graph in terms of form, direction and strength.
c) Fit a linear regression model based on the data and add the line to your graph. Report the regression line and appropriate numerical summaries. How strong is the linear relationship numerically? Describe in words what the slope of the regression line represents in this situation.
d) Are there any potential outliers in your plot? Are the outliers influential? Remove any suspected outliers (there should be one) and recalculate the regression line and correlation. Use the new regression line to predict the salary amount for someone who has worked for North Trust for 17 years.
e) Construct a plot of the residuals (using the new regression line) using Excel. Based on the residual plot, does the regression line provide a “good fit” for the data? Explain.
4. Is there a relationship between the treatment used for cocaine addiction and whether the patient has a relapse or not? A study was conducted and the following data were collected:
Cocaine relapse?
Treatment
/ Yes / NoDesipramine
/ 10 / 14Lithium / 18 / 6
Placebo / 20 / 4
a) Identify the explanatory and response variables.
b) Calculate the conditional distribution of cocaine relapse given the treatment type and draw the appropriate bar graphs (i.e. draw a bar graph for each value of the explanatory variable).
c) Does there appear to be a relationship? Explain.
5. You purchase a shipment of 60-watt bulbs to be used in a variety of your products. You want to determine if the shipment of bulbs is different from 60 or not. You measure the wattage of a random sample of 20 bulbs. Set up the appropriate null and alternative hypotheses for this scenario.
6. Classify each variable in the table below as either categorical or quantitative:
Patient Name / Illness type (1=Heart; 2=Lung) / Pain Level (1-10) / Pulse Rate (bpm) / InsuranceCreek, Martin / 1 / 9 / 76 / MVP
Dade, Susan / 2 / 3 / 68 / CDPHP
Kidman, Bart / 1 / 2 / 82 / MVP
7. Suppose you have a list of temperatures () measured in degrees Celsius and you want to change the temperature values to be measured in Fahrenheit. What effect would be produced on the old mean and old standard deviation when this conversion is completed?
8. According to Current Population Reports, self-employed individuals in the US work an average of 44.6 hours per week with a standard deviation of 14.5. If this variable is approximately normally distributed,
a) What percent of the self employed work more than 40 hours per week?
b) What percent of the self employed work less than 50 hours per week?
c) Between 50 and 60 hours per week?
Part II Multiple Choice:
1. High levels of glucose in the blood are indications of diabetes, which is becoming more prevalent in the United States. Diabetes can lead to many complications such as blindness and heart disease. A random sample of 180 individuals had their blood sugar level measured. The results are displayed in the graph.
The shape of the distribution of blood glucose levels is
a. Unimodal, left skewed.
b. Bimodal.
c. Unimodal, right skewed.
2. The 5-number summary of scores on a test is
3560657090
Based on this information
a. There are no outliers.
b. There are low outliers.
c. There are both high and low outliers.
3. Too much cholesterol in the blood increases the risk of heart disease. The cholesterol levels of young women aged 20 to 34 vary approximately normally with mean 185 milligrams per deciliter (mg/dl) and standard deviation 39 mg/dl. About what percent of young women in this age group will have cholesterol levels less than 150 mg/dl?
a. 90%.
b. 18.5%.
c. 81.5%.
4. Too much cholesterol in the blood increases the risk of heart disease. The cholesterol levels of young women aged 20 to 34 vary approximately normally with mean 185 milligrams per deciliter (mg/dl) and standard deviation 39 mg/dl. Cholesterol levels for middle-aged men vary normally with mean 222 mg/dl and standard deviation 37 mg/dl. Sandy is a young woman with a cholesterol level of 220. Her father has a cholesterol level of 250. Who has relatively higher cholesterol?
a. Sandy.
b. Sandy's father.
c. Impossible to tell because of the scaling.
5. The lifetime of a 2-volt non-rechargeable battery in constant use has a normal distribution with a mean of 516 hours and a standard deviation of 20 hours. The proportion of batteries with lifetimes exceeding 520 hours is approximately
a. 0.2000.
b. 0.5793.
c. 0.4207.
6. The lifetime of a 2-volt non-rechargeable battery in constant use has a normal distribution with a mean of 516 hours and a standard deviation of 20 hours. 90% of all batteries have a lifetime less than
a. 517.28 hours.
b. 536.00 hours.
c. 541.60 hours
7. The most common intelligence quotient (IQ) scale is normally distributed with mean 100 and standard deviation 15. Many school districts across the country seek to identify "Gifted and Talented" children for special enrichment programs. Typically, these children must have IQ scores in the top 5%. What is the minimum score to qualify a child for these programs?
a. 130.
b. 125.
c. 115.
8. The scores on the Survey of Study Habits and Attitudes (SSHA) for a sample of 150 first-year college women produced the following boxplot and descriptive statistics using MINITAB.
The number of women with scores between 93.26 and 129.23 is
a. about 75.
b. about 50.
c. about 36.
9. A teacher gave a 25 question multiple choice test. After scoring the tests, she computed a mean and standard deviation of the scores. The standard deviation was 0. Based on this information
a. All the students had the same score.
b. She must have made a mistake.
c. About half the scores were above the mean.
10. A major study examined the relationship between cause of death (heart attack, cancer, stroke, accident, etc.) and age. A good way to graphically represent the relationship is with
a. side-by-side boxplots.
b. back-to-back stemplots.
c. a scatterplot
11. At a large department store, the amount a shopper spent and the shopper's gender (male or female) were recorded. To determine if gender is useful in explaining the amount of money a shopper spends at the store we could
a. make side-by-side boxplots of the distribution of the amount spent by males and the distribution of the amount spent by females.
b. compute the correlation between the amount spent and gender.
c. compute the least-squares regression line of amount spent on gender.
12. The regression line to predict average exam grade from hours of study is y = 15 + 5.6*x. The slope of the regression line indicates
a. for any student, an extra hour of study increases the grade 5.6 points.
b. on average, an extra hour of study will increase the grade 5.6 points.
c. an extra hour of study will increase the grade 15 points.
13. A survey of 1000 adults ages 30 to 35 is conducted. The number of years of schooling and the annual salary for each person in the survey is recorded. The correlation between years of schooling and annual salary is found to be 0.27. Suppose instead the average salary of all individuals in the survey with the same number of years of schooling was calculated and the correlation between these averages and years of schooling was computed. This correlation would most likely be
a. equal to 0.27.
b. larger than 0.27.
c. less than 0.27.
14. High levels of glucose in the blood are indications of diabetes, which is becoming more prevalent in the United States. Diabetes can lead to many complications such as blindness and heart disease. A random sample of 180 individuals had their blood sugar level measured. The 5-number summary was
527991119220
How many of the people in the sample had glucose levels above 119?
a. 25.
b. 135
c. 45.
15. Below is a plot of the Olympic gold medal winning performance in the high jump (in inches) for the years 1900 to 1996.
From this plot, the correlation between the winning height and year of the jump is
a. about 0.95.
b. about 0.10.
c. about -0.50.
1