6.3 Sampling Distribution of the Sample Proportion

The Central Limit Theorem tells us that the distribution of the sample means follow a normal distribution under the right conditions. This allows us to answer probability questions about the sample mean [latex]\overline[/latex]. Now we want to investigate the sampling distribution for another important parameter—the sampling distribution of the sample proportion. Once we know what distribution the sample proportions follow, we can answer probability questions about sample proportions.

A proportion is the percent, fraction, or ratio of a sample or population that have a characteristic of interest. The population proportion is denoted by [latex]p[/latex] and the sample proportion is denoted by [latex]\hat

[/latex].

If the random variable is discrete, such as for categorical data, then the parameter we wish to estimate is the population proportion. This is, of course, the probability of drawing a success in any one random draw. Because we are interested in the number of successes, we are dealing with the binomial distribution. The random variable [latex]X[/latex] is the number of successes and the parameter we wish to know is [latex]p[/latex], the probability of drawing a success, which is of course the proportion of successes in the population. What is the distribution of the sample proportion [latex]\hat

[/latex]?

THE CENTRAL LIMIT THEORM FOR SAMPLE PROPORTIONS

Suppose all samples of size [latex]n[/latex] are taken from a population with proportion [latex]p[/latex]. The collection of sample proportions forms a probability distribution called the sampling distribution of the sample proportion.

The mean of the distribution of the sample proportions, denoted [latex]\mu_<\hat
>[/latex], equals the population proportion. [latex]\begin\\ \mu_<\hat
> & = & p \\ \\ \end[/latex]
The standard deviation of the of the sample proportions (called the standard error of the proportion), denoted [latex]\sigma_<\hat
>[/latex], is [latex]\begin \\ \sigma_<\hat
>&= & \sqrt> \\ \\ \end[/latex]
The distribution of the sample proportion is:
- Normal if [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex].
- Binomial if one of [latex]n \times p \lt 5[/latex] and [latex]n \times (1-p) \lt 5[/latex].

When [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex], the central limit theorem states that the sampling distribution of the sample proportions follows a normal distribution. In this case the normal distribution can be used to answer probability questions about sample proportions and the [latex]z[/latex]-score for the sampling distribution of the sample proportions is

where [latex]p[/latex] is the population proportion and [latex]n[/latex] is the sample size.

CALCULATING PROBABILITIES ABOUT SAMPLE PROPORTIONS IN EXCEL (NORMAL)

When the distribution of the sample proportions follows a normal distribution (when [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex]), the norm.dist(x,[latex]\mu[/latex],[latex]\sigma[/latex],logic operator) function can be used to calculated probabilities associated with a sample proportion.

For x, enter the value for [latex]\hat
[/latex].
For [latex]\mu[/latex], enter the mean of the sample proportions [latex]p[/latex]. Because the mean of the sample proportions equals the proportion of the population the sample is taken from, we enter [latex]p[/latex], the population proportion.
For [latex]\sigma[/latex], enter the standard error of the proportion [latex]\displaystyle>>[/latex].
For the logic operator, enter true. Note: Because we are calculating the area under the curve, we always enter true for the logic operator.

NOTE

In this case, we want to calculate probabilities associated with a sample proportion. The sample proportions follow a normal distribution (under the right conditions), which allows us to use the norm.dist function to calculate probabilities. Because we are working with sample proportions, we must enter the mean and the standard distribution of the distribution of the sample proportions into the norm.dist function. The mean of the sample proportions equals the population proportion, so we are entering the value of [latex]p[/latex] into the second field of the norm.dist function. But the standard distribution of the sample proportion equals [latex]\displaystyle>>[/latex], so we must enter this value into third field of the norm.dist function.

We use the norm.dist function in the same way as we learned previously to calculate the probability a sample proportion is less than a given value, a sample proportion is greater than a given value, or a sample proportion is in between two given values.

An alternative approach in Excel is to use the norm.s.dist(z,true) function. In the norm.s.dist function, we enter the [latex]z[/latex]-score for the corresponding value of [latex]\hat

[/latex] (using the [latex]z[/latex]-score for sample proportions given above).

EXAMPLE

A recent study asked working adults if they worked most of their time remotely. The study found that 30% of employees spend the majority of their time working remotely. Suppose a sample of 150 working adults is taken.

What is the distribution of the sample proportion? Explain.
What is the mean and standard deviation of the sample proportion?
What is the probability that at most 27% of the workers in the sample work remotely most of the time?
What is the probability that at least 51 of the workers in the sample work remotely most of the time?
What is the probability that between 32% and 35% of the workers in the sample work remotely most of the time?

Solution:

[latex]n=150[/latex] and [latex]p=0.3[/latex]. Checking [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex]: [latex]\begin \\ n \times p & = & 150 \times 0.3=45 \geq 5 \\ \\n \times (1-p) & = & 150 \times (1-0.3)=105 \geq 5 \\ \\ \end[/latex] Because both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex] the distribution of the sample proportion is normal.
The mean of the distribution of the sample proportions is [latex]\mu_<\hat
>=0.3[/latex]. The standard deviation of the sample proportions is [latex]\displaystyle<\sigma_<\hat
>=\sqrt>=\sqrt>=0.0374>[/latex].

Function	norm.dist	Answer
Field 1	0.27	0.2113
Field 2	0.3
Field 3	sqrt(0.3*(1-0.3)/150)
Field 4	true

Function	1-norm.dist	Answer
Field 1	0.34	0.1425
Field 2	0.3
Field 3	sqrt(0.3*(1-0.3)/150)
Field 4	true

Function	norm.dist	-norm.dist	Answer
Field 1	0.35	0.32	0.2058
Field 2	0.3	0.3
Field 3	sqrt(0.3*(1-0.3)/150)	sqrt(0.3*(1-0.3)/150)
Field 4	true	true

TRY IT

According to a recent study, 17.5% of the adult population of Canada are smokers. Suppose a random sample of 200 adult Canadians is taken.

What is the distribution of the sample proportion? Explain.
What is the mean and standard deviation of the sample proportion?
What is the probability that less than 32 of the adults in the sample are smokers?
What is the probability that more than 20% of the adults in the sample are smokers?
What is the probability that between 34 and 44 of the adults in the sample are smokers?

Function	norm.dist	Answer
Field 1	0.16	0.2883
Field 2	0.175
Field 3	sqrt(0.175*(1-0.175)/200)
Field 4	true

Function	1-norm.dist	Answer
Field 1	0.2	0.1761
Field 2	0.175
Field 3	sqrt(0.175*(1-0.175)/200)
Field 4	true

Function	norm.dist	-norm.dist	Answer
Field 1	0.22	0.17	0.9530
Field 2	0.175	0.175
Field 3	sqrt(0.175*(1-0.175)/200)	sqrt(0.175*(1-0.175)/200)
Field 4	true	true

When one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex], the sampling distribution of the sample proportions follows a binomial distribution, and so we must use the binomial distribution to answer probability questions about sample proportions. In these cases, we are actually answering probability questions about the number of items with the characteristic of interest, [latex]x[/latex]. In other words, we are answering questions about the number of successes [latex]x[/latex] we get in [latex]n[/latex] trials (the sample size) where the probability of success is the population proportion [latex]p[/latex]. These are exactly the same type of questions we answered previously with the binomial distribution.

CALCULATING PROBABILITIES ABOUT SAMPLE PROPORTIONS IN EXCEL (BINOMIAL)

When the distribution the sample proportions follows a binomial distribution (when one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex]), the binom.dist(x,n,p,logic operator) function can be used to calculated probabilities associated with a sample proportion.

For x, enter the number of items with the characteristic of interest [latex]x[/latex].
For n, enter the sample size [latex]n[/latex]. The sample size is the number of trials in the binomial experiment.
For p, enter the population proportion [latex]p[/latex]. The population proportion is the probability of success.
For the logic operator, enter true. Note: Because probabilities for sample proportions are generally inequalities ([latex]\lt, \leq, \gt, \geq[/latex]), we enter true for the logic operator. We would only enter false in the case that the probability of the sample proportion exactly equals a given value.

NOTE

We use the binom.dist function in the same way as we learned previously to calculate the probability a sample proportion is less than a given value, a sample proportion is at most a given value, a sample proportion is greater than a given value, or a sample proportion is at least a given value.

EXAMPLE

At the local humane society, 3% of the dogs have heartworm disease. Suppose a sample of 60 dogs at the humane society is taken.

What is the distribution of the sample proportion? Explain.
What is the probability that at most 5% of the dogs in the sample have heartworm disease?
What is the probability that less than 7 of the dogs in the sample have heartworm disease?
What is the probability that more than 8% of the dogs in the sample have heartworm disease?
What is the probability that at least 6 of the dogs in the sample have heartworm disease?

Solution:

Because [latex]n \times p=60 \times 0.03=1.8 \lt 5[/latex] the distribution of the sample proportions is binomial.
We want to find [latex]P(\hat
\leq 0.05)[/latex]. Because we are using the binomial distribution, we have to convert 5% into the number of items [latex]x[/latex] in the sample with the required characteristic: [latex]x=0.05 \times 60=3[/latex]. In terms of the binomial distribution, we need to find [latex]P(x \leq 3)[/latex].

Function	binom.dist	Answer
Field 1	3	0.8943
Field 2	60
Field 3	0.03
Field 4	true

Function	binom.dist	Answer
Field 1	6	0.9979
Field 2	60
Field 3	0.03
Field 4	true

Function	1-binom.dist	Answer
Field 1	4	0.0340
Field 2	60
Field 3	0.03
Field 4	true

Function	1-binom.dist	Answer
Field 1	5	0.0091
Field 2	60
Field 3	0.03
Field 4	true

TRY IT

During the past tax season, 92% of tax returns were filed using an electronic filing system. Suppose a sample of 40 tax returns are selected.

What is the distribution of the sample proportions?
What is the probability at most 35 of the tax returns in the sample were filed electronically?
What is the probability less than 93% of the tax returns in the sample were filed electronically?
What is the probability more than 36 of the tax returns in the sample were filed electronically?
What is the probability at least 88% of the tax returns in the sample were filed electronically?

Function	binom.dist	Answer
Field 1	35	0.2132
Field 2	40
Field 3	0.92
Field 4	true

Function	binom.dist	Answer
Field 1	37	0.6306
Field 2	40
Field 3	0.92
Field 4	true

Function	1-binom.dist	Answer
Field 1	36	0.6007
Field 2	40
Field 3	0.92
Field 4	true

Function	1-binom.dist	Answer
Field 1	33	0.9624
Field 2	40
Field 3	0.92
Field 4	true

Concept Review

The distribution of the sample proportions follows a

normal distribution if both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex].
binomial distribution if one of [latex]n \times p \lt 5[/latex] and [latex]n \times (1-p) \lt 5[/latex].

The mean of the sample proportion [latex]\mu_<\hat

>[/latex] equals the population proportion [latex]p[/latex]. The standard deviation of the sample proportions [latex]\sigma_<\hat

>[/latex] is equal to [latex]\displaystyle>>[/latex] where [latex]p[/latex] is the population proportion and [latex]n[/latex] is the sample size.