The Central Limit Theorem tells us that the distribution of the sample means follow a normal distribution under the right conditions. This allows us to answer probability questions about the sample mean [latex]\overline[/latex]. Now we want to investigate the sampling distribution for another important parameter—the sampling distribution of the sample proportion. Once we know what distribution the sample proportions follow, we can answer probability questions about sample proportions.
A proportion is the percent, fraction, or ratio of a sample or population that have a characteristic of interest. The population proportion is denoted by [latex]p[/latex] and the sample proportion is denoted by [latex]\hat
[/latex].
If the random variable is discrete, such as for categorical data, then the parameter we wish to estimate is the population proportion. This is, of course, the probability of drawing a success in any one random draw. Because we are interested in the number of successes, we are dealing with the binomial distribution. The random variable [latex]X[/latex] is the number of successes and the parameter we wish to know is [latex]p[/latex], the probability of drawing a success, which is of course the proportion of successes in the population. What is the distribution of the sample proportion [latex]\hat
[/latex]?
Suppose all samples of size [latex]n[/latex] are taken from a population with proportion [latex]p[/latex]. The collection of sample proportions forms a probability distribution called the sampling distribution of the sample proportion.
>[/latex], equals the population proportion. [latex]\begin\\ \mu_<\hat
> & = & p \\ \\ \end[/latex]
>[/latex], is [latex]\begin \\ \sigma_<\hat
>&= & \sqrt> \\ \\ \end[/latex]
When [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex], the central limit theorem states that the sampling distribution of the sample proportions follows a normal distribution. In this case the normal distribution can be used to answer probability questions about sample proportions and the [latex]z[/latex]-score for the sampling distribution of the sample proportions is
where [latex]p[/latex] is the population proportion and [latex]n[/latex] is the sample size.
When the distribution of the sample proportions follows a normal distribution (when [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex]), the norm.dist(x,[latex]\mu[/latex],[latex]\sigma[/latex],logic operator) function can be used to calculated probabilities associated with a sample proportion.
[/latex].
In this case, we want to calculate probabilities associated with a sample proportion. The sample proportions follow a normal distribution (under the right conditions), which allows us to use the norm.dist function to calculate probabilities. Because we are working with sample proportions, we must enter the mean and the standard distribution of the distribution of the sample proportions into the norm.dist function. The mean of the sample proportions equals the population proportion, so we are entering the value of [latex]p[/latex] into the second field of the norm.dist function. But the standard distribution of the sample proportion equals [latex]\displaystyle>>[/latex], so we must enter this value into third field of the norm.dist function.
We use the norm.dist function in the same way as we learned previously to calculate the probability a sample proportion is less than a given value, a sample proportion is greater than a given value, or a sample proportion is in between two given values.
An alternative approach in Excel is to use the norm.s.dist(z,true) function. In the norm.s.dist function, we enter the [latex]z[/latex]-score for the corresponding value of [latex]\hat [/latex]
A recent study asked working adults if they worked most of their time remotely. The study found that 30% of employees spend the majority of their time working remotely. Suppose a sample of 150 working adults is taken.
Solution:
>=0.3[/latex]. The standard deviation of the sample proportions is [latex]\displaystyle<\sigma_<\hat
>=\sqrt>=\sqrt>=0.0374>[/latex].
Function | norm.dist | Answer |
Field 1 | 0.27 | 0.2113 |
Field 2 | 0.3 | |
Field 3 | sqrt(0.3*(1-0.3)/150) | |
Field 4 | true |
Function | 1-norm.dist | Answer |
Field 1 | 0.34 | 0.1425 |
Field 2 | 0.3 | |
Field 3 | sqrt(0.3*(1-0.3)/150) | |
Field 4 | true |
Function | norm.dist | -norm.dist | Answer |
Field 1 | 0.35 | 0.32 | 0.2058 |
Field 2 | 0.3 | 0.3 | |
Field 3 | sqrt(0.3*(1-0.3)/150) | sqrt(0.3*(1-0.3)/150) | |
Field 4 | true | true |
According to a recent study, 17.5% of the adult population of Canada are smokers. Suppose a random sample of 200 adult Canadians is taken.
Function | norm.dist | Answer |
Field 1 | 0.16 | 0.2883 |
Field 2 | 0.175 | |
Field 3 | sqrt(0.175*(1-0.175)/200) | |
Field 4 | true |
Function | 1-norm.dist | Answer |
Field 1 | 0.2 | 0.1761 |
Field 2 | 0.175 | |
Field 3 | sqrt(0.175*(1-0.175)/200) | |
Field 4 | true |
Function | norm.dist | -norm.dist | Answer |
Field 1 | 0.22 | 0.17 | 0.9530 |
Field 2 | 0.175 | 0.175 | |
Field 3 | sqrt(0.175*(1-0.175)/200) | sqrt(0.175*(1-0.175)/200) | |
Field 4 | true | true |
When one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex], the sampling distribution of the sample proportions follows a binomial distribution, and so we must use the binomial distribution to answer probability questions about sample proportions. In these cases, we are actually answering probability questions about the number of items with the characteristic of interest, [latex]x[/latex]. In other words, we are answering questions about the number of successes [latex]x[/latex] we get in [latex]n[/latex] trials (the sample size) where the probability of success is the population proportion [latex]p[/latex]. These are exactly the same type of questions we answered previously with the binomial distribution.
When the distribution the sample proportions follows a binomial distribution (when one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex]), the binom.dist(x,n,p,logic operator) function can be used to calculated probabilities associated with a sample proportion.
We use the binom.dist function in the same way as we learned previously to calculate the probability a sample proportion is less than a given value, a sample proportion is at most a given value, a sample proportion is greater than a given value, or a sample proportion is at least a given value.
At the local humane society, 3% of the dogs have heartworm disease. Suppose a sample of 60 dogs at the humane society is taken.
Solution:
\leq 0.05)[/latex]. Because we are using the binomial distribution, we have to convert 5% into the number of items [latex]x[/latex] in the sample with the required characteristic: [latex]x=0.05 \times 60=3[/latex]. In terms of the binomial distribution, we need to find [latex]P(x \leq 3)[/latex].
Function | binom.dist | Answer |
Field 1 | 3 | 0.8943 |
Field 2 | 60 | |
Field 3 | 0.03 | |
Field 4 | true |
Function | binom.dist | Answer |
Field 1 | 6 | 0.9979 |
Field 2 | 60 | |
Field 3 | 0.03 | |
Field 4 | true |
Function | 1-binom.dist | Answer |
Field 1 | 4 | 0.0340 |
Field 2 | 60 | |
Field 3 | 0.03 | |
Field 4 | true |
Function | 1-binom.dist | Answer |
Field 1 | 5 | 0.0091 |
Field 2 | 60 | |
Field 3 | 0.03 | |
Field 4 | true |
During the past tax season, 92% of tax returns were filed using an electronic filing system. Suppose a sample of 40 tax returns are selected.
Function | binom.dist | Answer |
Field 1 | 35 | 0.2132 |
Field 2 | 40 | |
Field 3 | 0.92 | |
Field 4 | true |
Function | binom.dist | Answer |
Field 1 | 37 | 0.6306 |
Field 2 | 40 | |
Field 3 | 0.92 | |
Field 4 | true |
Function | 1-binom.dist | Answer |
Field 1 | 36 | 0.6007 |
Field 2 | 40 | |
Field 3 | 0.92 | |
Field 4 | true |
Function | 1-binom.dist | Answer |
Field 1 | 33 | 0.9624 |
Field 2 | 40 | |
Field 3 | 0.92 | |
Field 4 | true |
The distribution of the sample proportions follows a
The mean of the sample proportion [latex]\mu_<\hat
>[/latex] equals the population proportion [latex]p[/latex]. The standard deviation of the sample proportions [latex]\sigma_<\hat
>[/latex] is equal to [latex]\displaystyle>>[/latex] where [latex]p[/latex] is the population proportion and [latex]n[/latex] is the sample size.