MDM4U

Visual Displays of Data

Statistics:

Population:

Census:

Sample:

Data:

Raw Data:

Variable:

o  Continuous:

o  Discrete:

Categorical Data:

Frequency:

Frequency Table:

Types of Graphs:

1) Bar Graph

2) Histograms

3) Pictographs

4) Stem and Leaf Diagram

The heights of the members of two high school classes were measured in centimeters. The results were as follows:

136 / 156 / 172 / 160 / 175 / 186 / 187 / 122 / 186 / 157
153 / 130 / 164 / 143 / 181 / 186 / 176 / 184 / 193 / 136
122 / 120 / 184 / 186 / 176 / 181 / 167 / 164 / 149 / 186
155 / 192 / 174 / 184 / 156 / 164 / 181 / 186 / 172 / 181
163 / 190 / 188 / 182 / 174 / 157 / 152 / 183 / 171 / 156

Solution:

5) Pie Chart or Circle Graph

6) Line Graph

7) Scatter Plots


Conclusions and Issues

Establishing a strong correlation between variables is just the first step in determining whether one variable affects the other.

Causal Relationship

i) Cause-and-Effect Relationship

ii) Common-cause Factor

iii)Reverse-Cause-and-effect Relationship

iv) Accidental Relationship

v) Presumed Relationship

Example # 1:

Classify the relationships in the following situations.

a)  The rate of a chemical reaction increases with temperature.

b)  Leadership ability has a positive correlation with academic achievement.

c)  The prices of butter and motorcycles have a strong positive correlation over many years.

d)  Sales of cellular telephones had a strong negative correlation with ozone levels in the atmosphere over the decade.

e)  Traffic congestion has a strong correlation with the number of urban expressways.

Extraneous Variables

Example # 2:

A medical researcher wants to test a new drug believed to help smokers overcome the addictive effects of nicotine. Fifty people who want to quit smoking volunteer for the study. The researcher carefully divides the volunteers into two groups, each with an equal number of moderate and heavy smokers. One group is given nicotine patches with the new drug, while the second group uses ordinary nicotine patches. Fourteen people in the first group quit smoking completely, as do nine people in the second group.

a)  Identify the experimental group, the control group, the independent variable, and the dependent variable.

b)  Can the researcher conclude that the new drug is effective?

c)  What further study should the researcher do?


Scatter Plots and Correlation Coefficients

Vocabulary

Scatter Plot-

Independent Variable-

Dependent Variable-

Trends-

Line of Best Fit-

Correlation-

Example # 1: Classifying Linear Correlations:

a) b) c)

d) e) f)

The Correlation Coefficient:

Example # 2:

A farmer wants to determine whether there is a relationship between the mean temperature during the growing season and the size of his wheat crop. He assembles the following data for the last six crops.

Mean Temperature (°C) / Yield (tones/hectares)
4 / 1.6
8 / 2.4
10 / 2.0
9 / 2.6
11 / 2.1
6 / 2.2

a)  Does a scatter plot of this data indicate linear correlations between the two variables

b)  Compute the correlation coefficient

Mean Temperature (°C) / Yield (tones/hectares) / x2 / y2 / xy
4 / 1.6
8 / 2.4
10 / 2.0
9 / 2.6
11 / 2.1
6 / 2.2

c)  What can the farmer conclude about the relationship between the mean temperature during the growing season and the wheat yields on his farm?

Linear Regression

Regression: is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear correlation, you can develop a mathematical model for the relationship between the two variables by finding the line of best fit. You can then use the equation for this line to make predictions by interpolation (estimating between data points) and extrapolation (estimating beyond the range of data).

Example # 3: Using a Graphing Calculator

The table below shows data for the full-time employees of a small company.

Age (years) / Annual Income ($ ,000)
33 / 33
25 / 31
19 / 18
44 / 52
50 / 56
54 / 60
38 / 44
29 / 35

a) Use a scatter plot to classify the correlation between age and income?

b) Find the equation of the line of best fit?

c) Predict the income for a new employee who is 21 and an employee who is at age 65?


Investigation - The Correlation Coefficient

You will complete this assignment individually not in groups. Follow the instructions and record your answers from the exercises below.

Data Set A Sketch the scatter plot and the line of best fit.

Equation of the Line ( y = ax + b form )

Correlation Coefficient ( r )

Data Set B Sketch the scatter plot and the line of best fit.

Equation of the Line ( y = ax + b form )

Correlation Coefficient ( r )

Data Set C Sketch the scatter plot and the line of best fit.

Equation of the Line ( y = ax + b form )

Correlation Coefficient ( r )

Data Set D Sketch the scatter plot and the line of best fit.

Equation of the Line ( y = ax + b form )

Correlation Coefficient ( r )

Data Set E Sketch the scatter plot and the line of best fit.

Equation of the Line ( y = ax + b form )

Correlation Coefficient ( r )

Data Set F Sketch the scatter plot and the line of best fit.

Equation of the Line ( y = ax + b form )

Correlation Coefficient ( r )

Data Set G Sketch the scatter plot and the line of best fit.

Equation of the Line ( y = ax + b form )

Correlation Coefficient ( r )

Data Set H Sketch the scatter plot and the line of best fit.

Equation of the Line ( y = ax + b form )

Correlation Coefficient ( r )

Data Set I Sketch the scatter plot and the line of best fit.

Equation of the Line ( y = ax + b form )

Correlation Coefficient ( r )

Data Set J Sketch the scatter plot and the line of best fit.

Equation of the Line ( y = ax + b form )

Correlation Coefficient ( r )

BIAS IN DATA

EXAMPLE # 1:

CRITICAL ANALYSIS

EXAMPLE # 1:

A manager wants to know if a new aptitude test accurately predicts employee productivity. The manager has all 30 current employees write the test and then compares their scores to their productivity as measured in the most recent performance reviews. The data is ordered alphabetically by employee surname. In order to simplify the calculations, the manager selects a systematic sample using every seventh employee. Based on this sample, the manager concludes that the company should hire only applicants who do well on the aptitude test. Determine whether the manager’s analysis is valid.

Test Score / Productivity / Test Score / Productivity / Test Score / Productivity
98 / 78 / 50 / 66 / 59 / 65
57 / 81 / 75 / 90 / 83 / 47
82 / 83 / 71 / 48 / 75 / 91
76 / 44 / 89 / 80 / 66 / 77
65 / 62 / 82 / 83 / 48 / 63
72 / 89 / 95 / 72 / 61 / 58
91 / 85 / 56 / 72 / 78 / 55
87 / 71 / 71 / 90 / 70 / 73
81 / 76 / 68 / 74 / 68 / 75
39 / 71 / 77 / 51 / 64 / 69

EXAMPLE # 2:

An advertising blitz by SuperFast Computer Training Inc. features profiles of some of its young graduates. The number of months of training that these graduates took, their job titles and their income appear prominently in the advertisements.

a) Analyze the company’s data to determine the strength of the linear correlation between the amount of training the graduates took and their incomes. Classify the linear correlation and find the equation of the linear model for the data.

b) Use the model to predict the income of a student who graduates from the company’s two-year diploma program after 20 months of training. Does this prediction seem reasonable?

c) Does the linear correlation show that SuperFast’s training accounts for the graduates high income? Identify possible extraneous variables or any other problems with the sampling technique or data.

Graduate / Months of Training / Income ($000)
Software Development / 9 / 85
Programmer / 6 / 63
Systems Analyst / 8 / 72
Computer Technician / 5 / 52
Web-site Developer / 6 / 66
Network Administrator / 4 / 60


CONCLUSIONS ABOUT CRITICAL ANALYSIS

ü  Is the sample process free from intentional and unintentional bias?

ü  Could any outliers or extraneous variables influence the results?

ü  Are there any unusual patterns that suggest the presence of a hidden variable?

ü  Has causality been inferred with only correlation evidence?

MISLEADING OR NOT????????

1. A toothpaste company boasts four out of five dentists recommend their product.

2. A drug company claims that 80% of the residents of Bruce Mines use their product.

3. A local high school claims that 75% of its graduates go on to obtain a university degree.

4. Fifty-three percent of Canadians want closer ties to the United States.

5. Canadian students ranked 21st in the latest international math test. The previous math ranking was 20th . This means our students are doing poorly in math.


Sec. 2.2 & 2.3 & 2.4

Characteristics of Data , Sampling Techniques , Creating Survey Questions

Population vs. Sample

9  The group being studied is called the , a selection of individuals taken from the population is the .

9  Data collected from the sample is called .

9  A is a collection of data.

Inference

9  A about the population using your .

Cross-Sectional Study vs. Longitudinal Study

9  A study considers individuals from different groups at the same time.

9  A study considers individuals over a long period of time.

Time Series Data

9  Data that have accumulated over a of time.

Qualitative vs. Quantitative

9  All data can be characterized into one of these two categories.

9  Quantitative variables are .

9  Qualitative variables are .

Discrete Data vs. Continuous Data

9  Data that can be considered using whole is discrete.

9  Data that can only be measured by is continuous data resulting from the measure of a quantity.

  1. Simple Random Sample

Pros:

Cons:

  1. Systematic Random Sample

3. Stratified Random Sample

Pros:

Cons:

4. Cluster Sample

Pros:

Cons:

5. Voluntary-Response Sample

Pros:

Cons:

6. Convenience Sampling

Pros:

Cons:

7. Destructive Sampling

Primary vs. Secondary Sources

Open vs. Closed Questions


Sec. 2.5 Bias in Surveys

The results of a survey can be accurate only if the sample is

and the measurements are objective. The methods used for choosing the sample and collecting the data must be .

Statistical bias is any factor that favours certain outcomes or responses and hence systematically skews the survey results. Such bias . A researcher may inadvertently use an unsuitable method or simply fail to recognize a factor that prevents a sample from being fully random. Regrettably, some people . For this reason, it is important to understand not only how to use statistics, but also how to recognize the misuse of statistics.

There are four major types of bias;

i) 

ii) 

iii) 

iv) 

Sampling Bias

Try This?

Example # 1:

Identify the bias in each of the following surveys and suggest how it could be avoided.

a)  A survey asked students at a high-school football game whether a fund for extra-curricular activities should be used to buy new equipment for the football team or instruments for the school band.

b)  An aid agency in a developing country wants to know what proportion of households have at least one personal computer. One of the agency’s staff members conducts a survey by calling households randomly selected from the telephone directory.

Non-response Bias

Example # 2:

A science class asks every fifth student entering the cafeteria to answer a survey on environmental issues. Less than half agree to complete the questionnaire. The completed questionnaires show that a high proportion of the respondents are concerned about the environment and well-informed about environmental issues. What bias could affect these results?

Measurement bias

While random errors tend to cancel out, a consistent measurement error will skew the results of a survey. Often, measurement bias results from a data-collection process that affects the variable it is measuring.

Example # 3:

Identify the bias in each of the following surveys and suggest how it could be avoided.

a)  A highway engineer suggests that an economical way to survey traffic speeds on an expressway would be to have the police officers who patrol the highway record the speed of the traffic around them every half hour.

b)  As part of a survey of the “Greatest Hits of All Time”, a radio station asks its listeners: Which was the best song by the Beatles?