STATISTICS FOR SOCIAL & BEHAVIORAL SCIENCES

Recitation, week 7

Probabilities
Sampling Distribution

Part 1 : Probabilities

Remember the following probability rules:

  1. For any event A: P(not A) = 1 – P(A)
  2. If the event A and the event B do not overlap, i.e. cannot happen at the same time, P(A and B)=0, then P(A or B) = P(A) + P(B)
  3. For any two events A,B, then P(A and B) = P(A) P(B|A)

To make our life easier, we add one probability rule:

  1. P(A) = P(A|B) P(B) + P(A|not B) P(not B)

We read P(A|B) as “the probability of A given B” or the “probability of A conditional on B”.

Solve the following exercise from last week’s quiz:

A judge in Switzerland realizes that out of 100 individuals convicted, 20 are innocent, and out of 100 individuals not convicted, 50 are innocent.
The ministry of justice realizes that a judge can thus make two kinds of errors: he can either convict an innocent, or not convict an individual who is not innocent.
The ministry is thus interested in the probability of conviction given that an individual is not innocent. Out of 100 individuals facing trial, 40 are convicted.
1. What is the probability of conviction given that the individual is not innocent?

2. We want to quantify what is the probability that, colloquially, a judge makes mistakes. There are two kinds of mistakes: A judge can either convict an innocent or not convict a guilty individual. What is the probability of conviction given innocence and the probability of not conviction given that the individual is guilty?

Part 2 :Sampling Distribution, Central Limit Theorem

Exercise #1

The distribution of family size in a particular tribal society is skewed to the right, with µ = 5.2 and σ = 3.0. These values are unknown to an anthropologist, who samples families to estimate mean family size. For a random sample of 36 families, she gets a mean of 4.6 and a standard deviation of 3.2.

  • (a) Identify the population distribution. State its mean and standard deviation.
  • (b) Identify the sample data distribution. State its mean and standard deviation.
  • (c) Identify the sampling distribution of y. State its mean and standard error and explain what it describes.

Exercise #2:

At a university, 60% of the 7400 students are female. The student newspaper reports results of a survey of a random sample of 50 students about various topics involving alcohol abuse, such as participation in binge drinking. They report that their sample contained 26 females.

(a) Explain how you can set up a variable y to represent gender.

(b) Identify the population distribution of gender at this university.

(c) Identify the sample data distribution of gender for this sample.

(d) The sampling distribution of the sample proportion of females in the sample is approximately a normal distribution with mean 0.60 and standard error 0.07. Explain what this means.

Exercise #3

According to the U.S. Census Bureau, in 2000 the number of people in a household had a mean of 2.6 and a standard deviation of 1.5. Suppose the Census Bureau instead had estimated this mean using a random sample of 225 homes, and that sample had a mean of 2.4 and standard deviation of 1.4.

  • (a) Identify the variable y.
  • (b) Describe the center and spread of the population distribution.
  • (c) Describe the center and spread of the sample
  • data distribution.
  • (d) Describe the center and spread of the sampling distribution of the sample mean for 225 homes. What does that distribution describe?