Dr. Nafez M. Barakat
Lecture no.1
Descriptive measures
· Measure of center
o Mean
Definition: the mean of the data set is the sum of the observations divided by the number of observation symbolized by
Example1 : ( exam scores) let x1 = 88, x2 = 75, x3=95, and x4=100
Compute the mean
Solution :
o Median
Definition: median of a data set
Arrange the data in increasing order.
-if the number of observation is odd then the median is the observation exactly in the middle of the ordered lest
- if the number of observation is even then the median is the mean of the two middle observations in the ordered list.
Example2 : weekly salaries
Find the median for the tow weekly salaries data
Table (1.1)
Data set I( n=13) odd / Data set II( n= 10)even300 / 300 / 300 / 940 / 300 / 300 / 300 / 940 / 450 / 400
300 / 400 / 300 / 400 / 450 / 400 / 300 / 300 / 1050 / 300
800 / 450 / 1050
SOLUTION : Median for data set (I) = 400$, and for data set (II) = 350$
o Mode
Definition: the mode is the value that occurs most frequently in a data set
Example : find the mode in data set (I) in table 1.1
Solution: the frequency distribution of the data shown in table 1.2
Table 1.2
Salary / 300 / 400 / 450 / 800 / 940 / 1050Frequency / 6 / 2 / 2 / 1 / 1 / 1
NOTE:
Median< mean Median =mean mean Median
Right-skewed symmetric left-skewed
· Measure of variation (measure of spread)
o Range
Definition: the range of the data set is the difference between its maximum and minimum observations : Range = Max - Min
o Standard deviation
Definition : Standard deviation equal to the square root of the arithmetic mean of the squares of the deviations from the arithmetic mean denoted by S.
or
o Variance
Definition : the variance equal the square of Standard deviation
Example : the height, in inches of five players on team II are 67, 72, 76, 76 and 84. Obtain the Standard deviation Of these height
Solution :
X / /67 / -8 / 64
72 / -3 / 9
76 / 1 / 1
76 / 1 / 1
84 / 9 / 81
156
inches
Try to use the formula to commute Standard deviation
o Inter quartile range
Definition : inter quartile range or ( IQR), is the difference between the first and third quartiles, that is Standard deviation
IQR = Q3 – Q1
Example : find the IQR fore these data
25 / 41 / 27 / 32 / 43 / 66 / 35 / 31 / 15 / 534 / 26 / 32 / 38 / 16 / 30 / 38 / 30 / 20 / 21
Solution : Q1 = 23 , Q3 = 36.5
IQR = 36.5- - 23 = 13.5
Hypothesis Test for One Population Mean
One sample t test for population Mean
Definition : The One-Sample T Test compares the mean score of a sample to a known value. Usually, the known value is a population mean.
Definition : Null hypotheses and Alternative hypothesis
Null hypotheses : a hypothesis to be tested, We use the symbol H0 to represent the null hypothesis.
Alternative hypothesis: a hypothesis to be conceder as alternative to null hypothesis, We use the symbol Ha to represent the alternative hypothesis.
Hypotheses:
Null: There is no significant difference between the sample mean and the population mean.
Alternate: There is a significant difference between the sample mean and the population mean
We present two step by step procedure for performing a one sample t-test. Procedure (I) covers the critical-value approach, and Procedure (II) covers the p-value approach.
· One sample t test for population Mean
(critical-value approach)
Assumptions
1. Normal population or large sample
2. unknown
Step 1: the null hypothesis is and the alternative hypothesis is
Step 2 : decide on the significance level,
Step 3: compute the value of the test statistic
Step 4: the critical value (s) are
or
or
with degrees of freedom (df= n-1)
Step 5 : if the value of the t test statistics falls in the rejection region, reject HO ; otherwise, fail to reject H0
Step 6 : interpret the results of the hypothesis test.
Example : table below show the pH levels for 15 lakes; test if the lakes has pH greater than 6 at 5% significant level.( use the critical value approach)
7.2 / 7.3 / 6.1 / 6.9 / 6.6 / 7.3 / 6.3 / 5.56.3 / 6.5 / 5.7 / 6.9 / 6.7 / 7.9 / 5.8
Solution :
Step 1: state the null and alternative hypotheses
( mean PH Level is not greater than 6)
(mean PH Level is greater than 6)
Step 2 : decide on the significance level,
Step 3: compute the value of the test statistic
Step 4: the critical value for a right-tailed test is (from table) with df = 15-1 = 14
Step 5: the value of the test statistic, found in step 3 is T=3.458 fail in the rejection region. Consequently , we reject HO
· One sample t test for population Mean
(P-Value Approach)
Assumptions
3. Normal population or large sample
4. unknown
Step 1: the null hypothesis is and the alternative hypothesis is
Step 2 : decide on the significance level,
Step 3: compute the value of the test statistic
Step 4: find the p-value by using table
with degrees of freedom (df= n-1)
Step 5 : if the P- value less than or equal , (), reject HO ; otherwise, fail to reject H0
Step 6 : interpret the results of the hypothesis test.
Example : table below show the pH levels for 15 lakes; test if the lakes has pH greater than 6 at 5% significant level. ( use the p-value Approach)
Solution :
Step 1: state the null and alternative hypotheses
( mean PH Level is not greater than 6)
(mean PH Level is greater than 6)
Step 2 : decide on the significance level,
Step 3: compute the value of the test statistic
Step 4: the p-value = p ( t>= 3.458) = 0.00192 (with df = 15-1 = 14 )
Step 5: p value < 0.05) so we reject HO
Interval Estimation
Interval Estimation of a Population Mean: with s Unknown
· Interval Estimate
where 1 -a = the confidence coefficient
ta/2 = the t value providing an area of a/2 in the upper tail of a t distribution
with n - 1 degrees of freedom
s = the sample standard deviation
n = sample size
example :
suppose that we have a sample employees salary with the following information : n = 10, mean = $550, standard deviation = $60, we want to estimate a 95% confidence interval of the mean, assume this population to be normally distributed:
solution :
At 95% confidence, 1 - a = .95, a = .05, and a/2 = .025.
t.025 is based on n - 1 = 10 - 1 = 9 degrees of freedom.
In the t distribution table we see that t.025 = 2.262
Interval Estimation of a Population Mean:
= 550 + 42.92
or $507.08 to $592.92
We are 95% confident that the mean salary of the population is between $507.08 and $592.92.
use SPSS program
example 1: use the SPSS program to perform the hypothesis in previous example
STEP 1: Enter The Data As Shown Below
Step 3 : the result shown below
example 2: use spa file called training to test if the mean of training time equal 60 days, also find 95% confidence interval for the mean population
solution :
Step 1: state the null and alternative hypotheses
( mean training equal 60 days)
(mean training not equal 60 days)
Step 2 : decide on the significance level,
Step 3: compute the value of the test statistic,
from output t = -3.482
Step 4: the p-value = 2*p ( t>= 3.482) = 0.004 (with df = 15-1 = 14 )
Step 5: the value of the test statistic, found in step 3 is T=-3.482 fail in the rejection region (-2.14, 2.14). Consequently , we reject HO
or the p-value =0.004 < 0.05 so we reject HO
SPSS output :
95% confidence interval for the mean population
SPSS OUTPUT
95% Confidence Interval for Mean / Lower Bound / 50.09Upper Bound / 57.65
=[50.09, 57.65]
NOTE that the mean test = 60 not include in the C.I so we reject null hypotheses
NONPARAMETRIC TEST
Use Sign Test (Binomial Test)
24