Outline of Eco 251-Descriptive Stats

251descr1 11/09/06 (Open this document in 'Outline' view!)

ECONOMICS 251 COURSE OUTLINE

A. Introduction

1. Definitions

Define Statistics, Descriptive and Analytic Statistics, Induction and Deduction.

2. Uses of Statistics

B. Sources and Types of Data

1. Data

Define data sets, observation, unit of observation. Qualitative and quantitative data. Nominal, ordinal, interval and ratio data. Discrete vs. continuous data. Data is discrete if the number of values it can take are countable – most typically a variable that can only be a whole number, like the number of times you can win the lottery is discrete. If two numbers and are drawn from continuous data, any number between them also could be part of the same data set. Temperature, weight and most other things that we measure are continuous data.

a. Qualitative Data

(i) Nominal Data: There is no natural number scale - numbers are only used to define categories, so that no operations like addition or multiplication are valid.

(ii) Ordinal Data: Numbers are used only to order things (e.g. first, second, first). Differences between ranks do not always have the same meaning. Most mathematical operations are still not valid.

b. Quantitative Data

(i) Interval Data: Differences between ranks have consistent meaning, but, like Celsius temperature, there is no obvious origin, so that , although addition and subtraction can be used, multiplication and division have no real meaning.

(ii) Ratio Data: there is a meaningful origin, so that multiplication and division are valid.

2. Sources

Define primary and secondary sources, internal and external data.

3. Cross Section and Time Series Data

a. Cross Section Data

b. Time Series Data.

i. Indices

ii. Real Values

iii. Rates of change

iv Logarithms

C. Presentation of Data

1. Classification

Define collectively exhaustive and mutually exclusive classes. These are not the same thing. Collectively exhaustive means that every item you are considering has a place in a class. Mutually exclusive means that if an item belongs in any given class, it does not belong in another class as well.

2. Tables

Define parts of tables. See 251pttbl .

3. Charts and Graphs

Define parts of graphs

Line graph example http://www.epinet.org/issueguides/minwage/figure2.pdf

Pie chart example - National Priorities Project
Where Do Your Tax Dollars Go?
posted 2006
This publication shows how the federal government spent the average household's 2004 income taxes in each state and 193 cities, towns and counties.

Component part line chart example 251GDP_DPI

D. Frequency Distributions and Populations.

1. Definitions

Meaning of Population, Frame, Census, Sample, Grouped Data, Frequency, Example of Frequency Distribution, Relative Frequency. Width of a class interval.

(Always round this result up!)

Example: Let us assume that we have a sample consisting of numbers between 905 and 8756, and that we want to present the data in 5 classes. Our class interval will be at least . We will at least round this up to 1571. If we want to use 1571, our first class will begin at 905, the next at 905 + 1571 = 2476 etc. In fact we might consider a class interval of 1600 and start our lowest group at 800. The classes would be 800 – under 2400, 2400 – under 4000, 4000 – under 5600, 5600 – under 7200 and 7200-under 8800. Just make sure that the classes cover the data and that there are few empty classes.

The most commonly observed rule for deciding on the number of classes is Sturgis’ rule. The formula can be written as number of classes where is the log base 10 of the number of observations. This rule should not be taken seriously. For more on this see http://cnx.org/content/m10160/latest/.

2. Graphs of the Frequency Distribution. See http://cnx.org/content/m10927/latest/

a. The Histogram

b. The Frequency Polygon

c. The Cumulative Frequency Distribution (Ogive). See http://home.ched.coventry.ac.uk/Volume/vol0/ogive.htm

d. Relative Frequencies.

e. Smoothed Histograms

E. Sampling and Descriptive Statistics.

1. Sampling to Learn About a Population.

Infinite and finite populations, target and sampled populations, the Stability of Mass Data.

2. The Meaning of Random Sampling.

A simple random sample of items taken from a population of items must be selected in such a way that all combinations of items are equally likely.

3. Descriptive Statistics.

a. Measures of Central Tendency. (Where's the middle of the data?)

b. Measures of Dispersion. (How spread out are the data?)

c. Measures of Asymmetry etc. (What else can I say about the shape?)

F. Measures of Central Tendency.

1. The Arithmetic Mean of Ungrouped Data.

a. The Population Mean.

b. The Sample Mean.

Example: Consider the following data set.

Row

1 10000

2 17000

3 23000

4 30000

5 80000

160000

It makes no difference whether we call this a sample or a population, so let’s say that this is a sample. We can write , so . The alert observer will note that the mean has been raised by the highest number so that it is actually above all the numbers but the highest one.

2. The Arithmetic Mean of Grouped Data.

To make an ungrouped data formula into a grouped data formula, substitute for . For substitute the midpoint of the group. This is defined for our purposes as the arithmetic mean of the lower limit of the group in question and the lower limit of the next group. In other words if we have the group 10 to 10.99, followed by 11 to 11.99 the midpoint of the first group is 10.50, not 10.495. The formula for a population mean for grouped data is thus . The sample mean formula and the population mean formula are essentially identical. .

Example: It makes no difference whether we call this a sample or a population, so let’s say that this is a sample.

Row

1 10 3 30

2 12 3 36

3 14 5 70

4 16 3 48

5 18 1 18

15 202

Note that there is no reason to sum x. We can write , so . Not also that if we use in place of f , we do not have to divide by .

3. The Weighted Arithmetic Mean.

, . Example: We have three firms with profit rates of 10%, 12%, and 15%, which would average 12.33%. If we want a rate of return on capital we might want to know that the assets of the firms are respectively $2 billion, $1 billion and $1 billion. It is also common in a situation like this to use relative weights found by dividing the original weights by the sum of the weights, in this case 4.

Row

1 10 2 20 .50 5

2 12 1 12 .25 3

3 15 1 15 .25 3.75

4 47 1.00 11.75

So , and . If we use relative weights, we can read the weighted mean as the sum of the column.

4. The Median of Ungrouped Data.

Defined simply as the middle point when the data is in order. If there are two middle points, take their arithmetic mean. In continuous data half the points will be above or below the median.

Consider the data set that we used for the mean.

Row

1 10000

2 17000

3 23000

4 30000

5 80000

160000

Note that the middle number is the third number and that . In general the index

of the median is . If this is a sample, we can write . If this is a population . The alert observer will note that median is not much affected by the highest number so that it seems more typical that the mean. Now consider a second data set.

Row

1 10000

2 17000

3 23000

4 27000

5 30000

6 80000

160000

Note that . This formula seems to be telling us that, since there is no one middle number, we have to average the third and fourth number. If this is a sample, we can write . If this is a population .

5. The Median of Grouped Data.

This is a special case of the formulas for fractiles of grouped data below, where . . . For the formulas and the example used in class see 251median.

6. The Mode

The mode is simply the most common point, not very useful in discrete ungrouped data. For grouped data it is defined as the midpoint of the largest group. If we dredge up our example for grouped data below.

Profit Rate

9-10.99% 3 3 10

11-12.99% 3 6 12

13-14.99% 5 11 14

15-16.99% 3 14 16

17-18.99% 1 15 18

Total 15

Since 13-14.99 is the largest group and its midpoint is 14, we can write .

Note that a distribution can have two modes, which would make it bimodal. If it has only one mode it is unimodal. Of all the measures of central tendency, the mode is the most resistant to a few very high numbers and the mean is least resistant.

Populations made up of data like wealth and income almost always have a few outliers to the right of most of the data. They tend to be cut off on the left by the fact that a minimum income is necessary to sustain life. We say that a population of this type is skewed to the right. Typically for such a population . On the other hand a population that is skewed to the left would have . So what would you expect if a population is unimodal and symmetrical?

7. Other Means.

a. The Geometric Mean.

Example 1: Find the geometric mean of 1, 2 and 3

Or, using natural logarithms, So that

Or, using logarithms to the base 10, . So that

Example 2: A stock’s value grows at 50% in period 1 and 5% in period 2. Find the average growth rate.

Add 1 to the growth rates and take a geometric mean. So the average growth rate is 25.5%.

b. The Harmonic Mean.

Example: Find the harmonic mean of 50 and 30.

c. The Root-Mean-Square.

Example: Find the rms of 1, 2 and 3

d. What Formulas for Means Have in Common.

8. Measures of Position.

Percentiles, deciles, quintiles, quartiles and fractiles.

The two formulas below are two-step formulas. The first step is multiplying (or )* by . represents the fractile of the data wanted measured from the bottom. For example, if we want the 91st percentile, is .91. Note that the number you have found is called (i.e. 9% from the top!). If we want the third quartile,, is or 0.75. If we want the first quartile,, is or 0.25. Of course, for the median . or represents the number of items in the population or sample, not the number of groups or classes.

a. Finding a Fractile of Grouped Data.

To use this formula, we must first compute the cumulative distribution of the group and determine in which group the desired fractile is located with the calculation *. Once we have found the group that this is in, let be the frequency of the chosen group, and let be the cumulative frequency up to but not including the chosen group. The formula here is . In this formula, is the class interval (the interval between the lower limit of the chosen group and the lower limit of the next group) and is the lower limit of the chosen group.

Example: Suppose that in the example below we must find the first quartile. Since the first quartile is the .25 fractile, is .25. To locate the group use = 0.25(16)=4.
Profit Rate Using the cumulative distribution
9-10.99% 3 3 column, we find the fourth item in the sample.

11-12.99% 3 6 Since 4 is above 3 and below 6 in thecolumn,

13-14.99% 5 11 we pick the group 11-12.99%. is

15-16.99% 3 14 15, and for the group we have picked, =

17-18.99% 1 15 13 - 11 = 2,, = 3, and .

Total 15 If we put these numbers into the formula,

we find that .

Note: Sometimes is negative. In this case choose the group before the one you would ordinarily have chosen. Example: If you want the 19th percentile of the data above =.19(16) = 3.04, which would normally take us into 11-12.99. But so use the group 9-10.99 instead. But see c below.

b. Finding a Fractile of Ungrouped Data.

This time when we compute, we divide it into an integer part, , and a fractional part, . We then use the formula to find the actual value.

Example: If our set of numbers is = 10, and we wish to find the first quartile, = 0.25, so that = 0.25 (11) = 2.75. Then , and . Now find and , in this case and , and use the formula . , and , so that

c. Experimental formula (Don't read this unless you are really ready to ask questions!) See 251dscr_A .

Document continues in 251descr2 .

* Experimentation indicates that a better formula is . This is compatible with the formula for the median and seems to work in more places.