Topics for Today

Elements of a good summary

Summarizing Data

Frequencies

Graphical Presentation of Frequencies

Elements of a Good Summary

  1. Who?

-What ______do the data describe?

Individuals may be people, animals, or things.

-How many individuals appear in the data?

  1. What?

-The ______of variables available.

-Exact definitions of these variables.

-Units of measurement for each variable

Weights, for example, might be recorded in grams or kilograms. Costs might be recorded in $ or millions of $.

3. Why?

-What ______do the data have?

-What are specific ______?

-To conclusions about individuals other than the ones we actually have data for?

Summarizing Specific Variables

Each type of data is most effectively summarized differently:

Proportions / Frequencies:

  • ______
  • ______
  • ______

Means (& SD) / Medians (IQR):

  • ______(counts or if there are many categories)
  • ______

But, interval and ratio data can be converted to ‘ordinal’ data and presented with proportions and frequencies.

Frequency and Frequency Distribution

Frequencies and frequency distributions are the most commonly used summary statistics

Frequency: number of _____ each unique value of a variable occurs in a data set

Frequency Distribution: listing of the frequency of unique ______of a variable in a data set

Relative frequency:

# times each value occurs. / # of obs. in data set

Percentage Frequency:

Relative Freq. X 100%

Examples of Frequencies for

Nominal data

Example 1 (Nominal Data)

How students get their information on current affairs (in 1995)

Media / Freq / Rel. Freq. / % Freq
Television / 37 / 0.4635 / 46.25
Newspaper / 35 / 0.4375 / 43.75
Radio / 7 / 0.0875 / 8.75
Magazine / 1 / 0.0125 / 1.25
Total / 80 / 1.0000 / 100.00

Example 2 (Nominal Data)

To summarize 2,439 complaints about the comfort related characteristics of its airplanes, an airline’s customer service department issues the following table:

Nature of Complaint / Rel. Freq. / Freq
Inadequate leg room / .295 / 719
Uncomfortable seats / .375 / 914
Narrow aisles / .060 / 146
Insufficient carry-on space / ____ / ___
Insufficient rest rooms / .024 / 58
Miscellaneous / .157 / 384
Total / 1.000 / 2,439

Example 3 (Ordinal Data)

Seventy-five (75) student were interviewed regarding how often they eat breakfast:

Frequeny of Breakfast Eating / % Freq / Rel. Freq. / Freq
Always / 20% / .2 / 15
Almost all the time / 14.7% / .147 / 11
Most of the time / 12% / .12 / 9
Seldom / 24% / .24 / 18
Never / _____ / ____ / __
Total / 100% / 1.000 / 75

Frequency Distributions of

Interval and Ratio Data

You can convert _____ or ______scaled data into ordinal data by grouping, to generate a frequency distribution.

For Ordinal and Nominal data, the categories are obvious; they are the ______the variable takes.

For Ratio or Interval data, you have to construct the categories, or ______by defining class boundaries, and midpoints.

Defining Classes for

Grouping Interval and Ratio data

  • Class intervals should be nonoverlapping and ______defined.
  • In most circumstances, the intervals would be of the same width. (Open ended intervals are sometimes convenient.)
  • If there are no individuals in a particular interval, it should still be included to _____ a misleading impression of the data.

Example 4 (Ratio)

The following data are percentages of persons 65 years old or older in 40large ______in 2001. Set up a frequency distribution for these data andinclude the following:

(a) frequencies

(b) midpoints of the classes

(c) percentages

(d) cumulative frequencies

(e) cumulative percentages.

Percentages of Persons 65 Years old or Older in 40 large urban locations in 2000

Location Percent 65+ Location Percent 65+

113.12112.5

2 8.12211.9

312.92311.1

410.724 6.9 (L)

5 7.82511.5

617.2 (H)26 9.6

7 8.42710.9

8 7.828 9.9

9 9.729 8.9

1010.130 8.4

11 7.031 7.7

1210.33210.8

1312.233 7.2

1411.73410.6

1512.23510.9

1610.636 8.9

1712.03710.4

18 9.23811.7

19 7.539 8.5

2013.040 7.3

We’ll eventually construct frequency distribution with software, but let’s do it by hand first of all so that we understand the construction.

First, what is the ______being measured in this example?

How many individuals are measured?

What is the ______being measured?

Now, let’s summarize the data for these individuals.

Step 1: Determine the ______of classes or groups that we wish to construct.

A useful rule of thumb is to have the number of classes (k) equal to the square root of the number of individuals.

______

Step 2:Determine the range of values and the class size (width).

We want the width to be a ’convenient’ number, so we’ll round this to 2.

Step 3:Determine the classes and tally the data. Make sure that the smallest andlargest data values are included in the tally.

Class midpoints / Class / Tally (frequency)
m1 = 7 / [6-8) / f1 =8
m2 = 9 / [8-10) / f2 = 10
m3 = 11 / [10-12) / f3 = 14
______/ ______/ ______
m5 = 15 / [14-16) / f5 = 0
m6 = 17 / [16-18) / f6 = 1

Since the endpoints of one interval are ‘adjacent’ to the endpoints of the next interval, these numbers are called the real limits or class limits.

Frequency, percent frequency, cumulative frequency and cumulative percent frequency can then be calculated.

Class / Midpoint / Frequency / Percentage / Cum. Freq. / Cum. %
m / f / % / cf / c%
[6, 8) / 7 / 8 / 100(8/40) = 20.0 / f1 = 8 / 20
[8, 10) / 9 / 10 / 100(10/40) = 25.0 / f1+f2 = 18 / 45
[10, 12) / 11 / 14 / 100(14/40) = 35.0 / f1+f2+f3 = 32 / 80
______/ __ / _ / ______/ ______/ ____
[14, 16) / 15 / 0 / 100(0/40) = 0.0 / … = 39 / 97.5
[16, 18) / 17 / 1 / 100(1/40) = 2.5 / … = 40 / 100

Note: Percentage distributions are used extensively to compare samples withdifferent sample sizes.

Advantages of Grouping

-it reduces the apparent complexity of the data by reducing the number of separate pieces of information.

-it helps to smooth out irregularities in the data.

Disadvantage of Grouping

-information is lost

Graphical Display of Data

Include title, labels, unit of measurement, etc. to describe the main features.

  1. Bar Charts
  1. Pie Charts
  1. Histograms

Bar Chart: series of rectangular bars where the ______of the bars represent ______of the respective quantities; bars have equal width, label axes, start at zero

Pie Chart: circle divided into sectors in such a way that the area of each ______is ______to the quantity represented.

Pie Charts emphasize the proportion of occurrences of each category. Bar charts focus attention on frequencies,

Example of a Pie Chart:

The following data gives the breakdown of purposes for which a population makes six million trips on a normal working day.

Purpose / No. of Trips (millions) / Relative Frequency / Angle of
Sector
To and fromwork / 2.01 / 0.335 / 120.6
To and fromSchool / 1.14 / 0.190 / 68.4
Social / 0.84 / 0.140 / 50.4
PersonalBusiness / 0.64 / 0.107 / 38.4
To and fromShops / 0.60 / 0.100 / 36.0
Other / 0.77 / 0.128 / 46.2
Total / 6.00 / 1.000 / 360.0

Pie Chart

Bar Chart

How to lie with a bar graph

Histogram

A bar chart is used for plotting frequencies of nominal or ordinal variables.

______are used for plotting frequencies of ______interval or ratio data. The main difference is there is no gap between the bars.

The frequency and relative frequency can be plotted (and will look almost identical except a different y-axis).

Frequency Histogram

Rel. Freq. Histogram

Centre of base of rectangle placed at class

mark.

Back to the seniors example earlier. Below is a frequency histogram of this data:

Frequency histogram of the percent of seniors in 40 locations

… and a relative frequency histogram of the same data:

Relative Frequency histogram of the percent of seniors in 40 locations

Notes on Histograms

  • Look for the overall ______and for striking deviations from that pattern
  • Describe the overall pattern of a histogram by its _____, centre and spread
  • Look for ______, individual values that fall outside the overall pattern.

Histogram Patterns

Symmetric: reflection on an axis, histogram falls on its image

Asymmetric: otherwise

Positively Skewed: Long tail to the right

Negatively Skewed: Long tail to the left

Modal Class: class with the largest number of observations

Unimodal: histogram with a single peak

Bimodal: histogram with 2 peaks, not necessarily equal in height

Bellshaped

Today’s Topics

Elements of a good Summary

-Who, What, Why

Summarizing Data

-Different methods for different types of data

Frequencies

-Frequency, relative frequency, percent frequency, cumulative frequency

-Frequency Distribution

-Grouping Interval and Ratio data

Graphical Presentation of Frequencies

-Pie Charts

-Bar Charts

-Histograms

Reading for next lecture

Chapter 2

Stat302Page 1 of 29

Fall 2010 – Week1, Lecture 2