Numerical Descriptions of Data

  • Central Tendency
  • Variation
  • Shape
  • Empirical Rule
  • Relationship
  • Excel

1. Central Tendency – Center of data

Purpose: (a) To give symbols for the population and sample mean found in the second Building Block of the course (sample mean – population mean) ≠ 0)

  • Median
  • Mode
  • Mean or Average –
  • 1.1 Quantitative Data

1.1.1 Sample Mean –

  • Average of sample
  • SymbolX

1.1.2 Population Mean –

  • Average of population
  • Symbol, 

1.2Qualitative Data

Consider a case where you have three successes and two failures. The proportion of successes is then 3 out of 5 or 0.60. Let successes be represented by the value 1 and failures by the value 0. The data then consists of the five numbers: 1,1,1,0,0. When you average those five numbers you also get 3/5 or .6. The proportion is a special case of the average, when you average 0’s and 1’s.

1.1.1 Sample proportion –

  • Proportion of successes in the sample
  • Symbol p̂

1.1.2 Population proportion –

  • Proportion of successes in the f population
  • Symbol, P

2. Variation – Spread of data

  • Range
  • Variance
  • 2.1 Quantitative Data

2.1.1 Population Variance

  • Average squared distance values are from the center of the data
  • Symbol, 

2.1.2 Sample Variance

  • Estimate of population variance
  • Symbol S2

Standard Deviation – Purpose: this is the measure of variation we will use in the third Building Block (The standard error depends on two values: a measure of variation and a measure of knowledge)

2.1.3 Population standard deviation

  • Square root of population variance
  • Symbol, 

2.1.4 Sample standard deviation

  • Square root of sample variance
  • Symbol, S

2.1.5 Example of use:

2.1.6 Calculation:

a. Calculate the average of the values.

b. Subtract the average from each value to see how far each value is from the average.

c. Squaring each difference.

d. Sum all the squared values

e. i. For the population divide the sum by the number of values.

ii. For the sample, divide by the number of values minus one.

f. To find the standard deviation take the square root of the average in e.

Both population and sample uses steps a-c and e. The difference between them occurs at step d

2.1.7 Example:Calculatethe population and sample standard deviations for a set of five numbers.

Double Click on the Embedded Excel file below. Click the F9 function key to get new examples:(When finished click anywhere outside the worksheet)

If the above embedded Excel file does not work, then go to this link:Variance and Standard Deviation Calculations Examples

2.2 Qualitative Data: The symbols for standard deviation and variance for the sample and population are the same as in qualitative data.

2.2.1 Population Variance: You may use the same rule as for quantitative data or you can use a shortcut formula for the population variance. The variance in a population of 0’s and 1’s can be shown to be



p(1-p)

Sample Variance: Unlike the sample variance for quantitative data where you change the divisor from n to n-1, traditionally the approach in proportion is to multiply the sample proportion of successes times the sample proportion of failures

S2 = p̂ ( 1-p̂)

3. Shape – Distribution of values

  • Right skewed
  • Left skewed
  • Symmetric
  • Relationship to mean and median

4. Empirical rule – particular distribution

Purpose: We introduce how to determine probabilities by rule instead of counting. Probability is used in second note of the third Building Block: (To evaluate an error we will use probability.)

The empirical rule is an approximation to the bell-shaped curve: It is graphed as a histogram with ranges based on the mean and standard deviation. If the distribution of the data is a bell curve then the following (approximate) percentages will result; actual % differ.

Range / Approximate
 - 3* up to  - 2* / 2.50%
 - 2* up to  -  / 13.50%
 -  up to  / 34.00%
 up to  +  / 34.00%
 +  up to  + 2*  / 13.50%
 + 2* up to  + 3* / 2.50%

If the embedded Excel file does not work, then go to the following link

Exercise: Using Internet Explorer answer the questions on the following web page. If you use Internet Explorer you may attempt this more than once by Backing up. (The page keeps track of the number of attempts.) Print the page when successful and bring it to class next time.

5. Relationship. Purpose: To introduce applications of means and variances when relating two variables.

5.1 Definitions

  • Straight line – mathematical versus tendency
  • Slope – the average amount of change in Y with a one unit increase in X
  • Intercept – the average value of Y when X is zero.
  • Correlation – the strength of the linear association between Y and X.
  • Coefficient of Determination – the percent of sample variation in Y associated with variation in X

Use of the standard deviation, the correlation, the empirical rule and the slope (called beta) with stock price risk:

Definition of coefficient of determination in finance terms:

5.2 Symbols:

Term / Population / Sample
Intercept / 0 / b0
Slope / 1 / b1
Correlation /  / r
Coefficient of Determination /  / R2

5.3 Examples

For interpreting sample slope and intercept double click the embedded Excel file below

or go to

For interpreting R2 Double click below

Or go to

6. Excel

6.1 Center and Variation Cell Formulas.

In the examples below you have data in cells A1 to A5

Statistic / Population / Sample
Mean / =Average(a1:a5) / =Average(a1:a5)
Median / =MEDIAN(A1:A5) / =MEDIAN(A1:A5)
Variance / =VARP(A1:A5) / =VAR(A1:A5)
Standard Deviation / =STDEVP(A1:A5) / =STDEV(A1:A5)

6.2 Pivot tables can also be used for quantitative variables to calculate means, variances and standard deviations (but not mode or median):

  • Click on the quantitative variable
  • Insert a Pivot table
  • Click on and drag the quantitative variables down to the “ Values” section,
  • Right click over the “Count of” section and select Value Field Settings
  • Choose the description you want (average, variance, or standard deviation)

Note (1) if you wish more descriptions then repeatedly drag the quantitative variable to “ Values” but choose a different description each time.

(2) if you wish to find the descriptions within the levels of a qualitative variable, then after dragging the quantitative variable to the “ Values” section, drag the qualitative variable to the row (or column) section.

6.2 Relationship

  • Click the Insert tab
  • Click the XY Scatter button

On the Series tab choose X Range and Y Range

  • Right click any point
  • Click Add Trendline

At the bottom of resulting window– check Display Equation and Display R-squared