Introduction
Introduction
It is important for you to understand when to use the central limit theorem. If you are being asked to find the probability of the mean, use the clt for the means. If you are being asked to find the probability of a sum or total, use the clt for sums. This also applies to percentiles for means and sums.
NOTE
If you are being asked to find the probability of an individual value, do not use the clt. Use the distribution of its random variable.
Examples of the Central Limit Theorem
Examples of the Central Limit Theorem
Law of Large Numbers
The law of large numbers says that if you take samples of larger and larger sizes from any population, then the mean of the samples tends to get closer and closer to μ. From the central limit theorem, we know that as n gets larger and larger, the sample means follow a normal distribution. The larger n gets, the smaller the standard deviation gets. (Remember that the standard deviation for is .) This means that the sample mean must be close to the population mean μ. We can say that μ is the value that the sample means approach as n gets larger. The central limit theorem illustrates the law of large numbers.
Central Limit Theorem for the Mean and Sum Examples
Example 7.8
A study involving stress is conducted among the students on a college campus. The stress scores follow a uniform distribution with the lowest stress score equal to one and the highest equal to five. Using a sample of 75 students, find:
- the probability that the mean stress score for the 75 students is less than 2
- the 90th percentile for the mean stress score for the 75 students
- the probability that the total of the 75 stress scores is less than 200
- the 90th percentile for the total stress score for the 75 students
Let X = one stress score.
Problems (a) and (b) ask you to find a probability or a percentile for a mean. Problems (c) and (d) ask you to find a probability or a percentile for a total or sum. The sample size, n, is equal to 75.
Because the individual stress scores follow a uniform distribution, X ~ U(1, 5) where a = 1 and b = 5 (see Continuous Random Variables for an explanation of a uniform distribution),
In the formula above, the denominator is understood to be 12, regardless of the endpoints of the uniform distribution.
For problems (a) and (b), let = the mean stress score for the 75 students. Then,
a. Find P( 2). Draw the graph.
a. P( 2) = 0
The probability that the mean stress score is less than 2 is about zero.
normalcdf
= 0
Reminder
The smallest stress score is one.
b. Find the 90th percentile for the mean of 75 stress scores. Draw a graph.
b. Let k = the 90th precentile.
Find k, where P( k) = 0.90.
The 90th percentile for the mean of 75 scores is about 3.2. This tells us that 90 percent of all the means of 75 stress scores are at most 3.2, and that 10 percent are at least 3.2.
invNorm
= 3.2
For problems (c) and (d), let ΣX = the sum of the 75 stress scores. Then, ΣX ~ N[(75)(3),(1.15)].
c. Find P(Σx 200). Draw the graph.
c. The mean of the sum of 75 stress scores is (75)(3) = 225.
The standard deviation of the sum of 75 stress scores is (1.15) = 9.96.
The probability that the total of 75 scores is less than 200 is about zero.
normalcdf
(75,200,(75)(3),(1.15)).
Reminder
The smallest total of 75 stress scores is 75, because the smallest single score is one.
d. Find the 90th percentile for the total of 75 stress scores. Draw a graph.
d. Let k = the 90th percentile.
Find k where P(Σx k) = 0.90.
The 90th percentile for the sum of 75 scores is about 237.8. This tells us that 90 percent of all the sums of 75 scores are no more than 237.8 and 10 percent are no less than 237.8.
invNorm
(0.90,(75)(3),(1.15)) = 237.8
Try It 7.8
Use the information in Example 7.8, but use a sample size of 55 to answer the following questions:
- Find P( 7).
- Find P(Σx > 170).
- Find the 80th percentile for the mean of 55 scores.
- Find the 85th percentile for the sum of 55 scores.
Example 7.9
Suppose that a market research analyst for a cell phone company conducts a study of their customers who exceed the time allowance included on their basic cell phone contract. The analyst finds that for those people who exceed the time included in their basic contract, the excess time used follows an exponential distribution with a mean of 22 minutes.
Consider a random sample of 80 customers who exceed the time allowance included in their basic cell phone contract.
Let X = the excess time used by one INDIVIDUAL cell phone customer who exceeds his contracted time allowance.
X ∼ Exp. From previous chapters, we know that μ = 22 and σ = 22.
Let = the mean excess time used by a sample of n = 80 customers who exceed their contracted time allowance.
~ N by the central limit theorem for sample means.
- Find the probability that the mean excess time used by the 80 customers in the sample is longer than 20 minutes. This is asking us to find P( > 20). Draw the graph.
- Suppose that one customer who exceeds the time limit for his cell phone contract is randomly selected. Find the probability that this individual customer's excess time is longer than 20 minutes. This is asking us to find P(x > 20).
- Explain why the probabilities in parts (a) and (b) are different.
-
Find: P( > 20)
P( > 20) = 0.79199 using
normalcdf
The probability is 0.7919 that the mean excess time used is more than 20 minutes, for a sample of 80 customers who exceed their contracted time allowance.
Reminder1E99 = 1099 and –1E99 = –1099. Press the
EE
key for E. Or just use 1099 instead of 1E99. - Find P(x > 20). Remember to use the exponential distribution for an individual. .
-
- P(x > 20) = 0.4029, but P( > 20) = 0.7919
- The probabilities are not equal because we use different distributions to calculate the probability for individuals and for means.
- When asked to find the probability of an individual value, use the stated distribution of its random variable; do not use the clt. Use the clt with the normal distribution when you are being asked to find the probability for a mean.
Using the clt to find percentiles
Let k = the 95th percentile. Find k where P( k) = 0.95.
k = 26.0 using invNorm
= 26.0
The 95th percentile for the sample mean excess time used is about 26.0 minutes for a random samples of 80 customers who exceed their contractual allowed time.
95 percent of such samples would have means under 26 minutes; only five percent of such samples would have means above 26 minutes.
Try It 7.9
Use the information in Example 7.9, but change the sample size to 144.
- Find P(20 30).
- Find P(Σx is at least 3000).
- Find the 75th percentile for the sample mean excess time of 144 customers.
- Find the 85th percentile for the sum of 144 excess times used by customers.
Example 7.10
U.S. scientists studying a certain medical condition discovered that a new person is diagnosed every two minutes, on average. Suppose the standard deviation is 0.5 minutes and the sample size is 100.
- Find the median, the first quartile, and the third quartile for the sample mean time of diagnosis in the United States.
- Find the median, the first quartile, and the third quartile for the sum of sample times of diagnosis in the United States.
- Find the probability that a diagnosis occurs on average between 1.75 and 1.85 minutes.
- Find the value that is two standard deviations above the sample mean.
- Find the IQR for the sum of the sample times.
- We have μx = μ = 2 and σx = = = 0.05. Therefore,
- 50th percentile = μx = μ = 2,
- 25th percentile = invNorm(0.25,2,0.05) = 1.97, and
- 75th percentile = invNorm(0.75,2,0.05) = 2.03.
- We have μΣx = n(μx) = 100(2) = 200 and σμx = (σx) = 10(0.5) = 5. Therefore,
- 50th percentile = μΣx = n(μx) = 100(2) = 200,
- 25th percentile = invNorm(0.25,200,5) = 196.63, and
- 75th percentile = invNorm(0.75,200,5) = 203.37.
- P(1.75 1.85) =
normalcdf
(1.75,1.85,2,0.05) = 0.0013 - Using the z-score equation, , and solving for x, we get x = 2(0.05) + 2 = 2.1.
- The IQR is 75th percentile – 25th percentile = 203.37 – 196.63 = 6.74.
Try It 7.10
Based on data from the National Health Survey, women between the ages of 18 and 24 have an average systolic blood pressures (in mm Hg) of 114.8 with a standard deviation of 13.1. Systolic blood pressure for women between the ages of 18 to 24 follows a normal distribution.
- If one woman from this population is randomly selected, find the probability that her systolic blood pressure is greater than 120.
- If 40 women from this population are randomly selected, find the probability that their mean systolic blood pressure is greater than 120.
- If the sample was four women between the ages of 18–24 and we did not know the original distribution, could the central limit theorem be used?
Example 7.11
A study was done about a medical condition that affects a certain group of people. The age range of the people was 14–61. The mean age was 30.9 years with a standard deviation of nine years.
- In a sample of 25 people, what is the probability that the mean age of the people is less than 35?
- Is it likely that the mean age of the sample group could be more than 50 years? Interpret the results.
- In a sample of 49 people, what is the probability that the sum of the ages is no less than 1,600?
- Is it likely that the sum of the ages of the 49 people are at most 1,595? Interpret the results.
- Find the 95th percentile for the sample mean age of 65 people. Interpret the results.
- Find the 90th percentile for the sum of the ages of 65 people. Interpret the results.
- P( 35) =
normalcdf
(-E99,35,30.9,1.8) = 0.9886 - P( > 50) =
normalcdf
(50, E99,30.9,1.8) ≈ 0. For this sample group, it is almost impossible for the group’s average age to be more than 50. However, it is still possible for an individual in this group to have an age greater than 50. - P(Σx ≥ 1,600) =
normalcdf
(1600,E99,1514.10,63) = 0.0864 - P(Σx ≤ 1,595) =
normalcdf
(-E99,1595,1514.10,63) = 0.9005. This means that there is a 90 percent chance that the sum of the ages for the sample group n = 49 is at most 1,595. - The 95th percentile =
invNorm
(0.95,30.9,1.1) = 32.7. This indicates that 95 percent of the people in the sample of 65 are younger than 32.7 years, on average. - The 90th percentile =
invNorm
(0.90,2008.5,72.56) = 2101.5. This indicates that 90 percent of the people in the sample of 65 have a sum of ages less than 2,101.5 years.
Try It 7.11
According to data from an aerospace company, the 757 airliner carries 200 passengers and has doors with a mean height of 72 inches. Assume for a certain population of men we have a mean of 69 inches inches and a standard deviation of 2.8 inches.
- What mean doorway height would allow 95 percent of men to enter the aircraft without bending?
- Assume that half of the 200 passengers are men. What mean doorway height satisfies the condition that there is a 0.95 probability that this height is greater than the mean height of 100 men?
- For engineers designing the 757, which result is more relevant: the height from part (a) or part (b)? Why?
HISTORICAL NOTE
Normal Approximation to the Binomial
Historically, being able to compute binomial probabilities was one of the most important applications of the central limit theorem. Binomial probabilities with a small value for n (say, 20) were displayed in a table in a book. To calculate the probabilities with large values of n, you had to use the binomial formula, which could be very complicated. Using the normal approximation to the binomial distribution simplified the process. To compute the normal approximation to the binomial distribution, take a simple random sample from a population. You must meet the following conditions for a binomial distribution:
- There are a certain number, n, of independent trials.
- The outcomes of any trial are success or failure.
- Each trial has the same probability of a success, p.
Recall that if X is the binomial random variable, then X ~ B(n, p). The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five (np > 5 and nq > 5; the approximation is better if they are both greater than or equal to 10. The product >5 is more or less accepted as the norm here.). This is another accepted rule. So, for whatever value of x we are looking at (the number of successes) we add 0.5 if we are looking for the probability that is less than or equal to that number. We subtract 0.5 if we are looking for the probability that is greater than or equal to that number. Then the binomial can be approximated by the normal distribution with mean μ = np and standard deviation σ = . Remember that q = 1 – p. In order to get the best approximation, add 0.5 to x or subtract 0.5 from x (use x + 0.5 or x – 0.5).
This is another accepted rule. So, for whatever value of x we are looking at (the number of successes) we add 0.5 if we are looking for the probability that is less than or equal to that number. We subtract 0.5 if we are looking for the probability that is greater than or equal to that number. The number 0.5 is called the continuity correction factor and is used in the following example.
Example 7.12
Suppose in a local kindergarten through 12th grade (K–12) school district, 53 percent of the population favor a charter school for grades K through 5. A simple random sample of 300 is surveyed.
- Find the probability that at least 150 favor a charter school.
- Find the probability that at most 160 favor a charter school.
- Find the probability that more than 155 favor a charter school.
- Find the probability that fewer than 147 favor a charter school.
- Find the probability that exactly 175 favor a charter school.
Let X = the number that favor a charter school for grades K through 5. X ~ B(n, p) where n = 300 and p = 0.53. Because np > 5 and nq > 5, use the normal approximation to the binomial. The formulas for the mean and standard deviation are μ = np and σ = . The mean is 159, and the standard deviation is 8.6447. The random variable for the normal distribution is Y. Y ~ N(159, 8.6447). See The Normal Distribution for help with calculator instructions.
For Part (a), you include 150 so P(X ≥ 150) has a normal approximation P(Y ≥ 149.5) = 0.8641.
normalcdf
(149.5,10^99,159,8.6447) = 0.8641.
For Part (b), you include 160 so P(X ≤ 160) has a normal approximation P(Y ≤ 160.5) = 0.5689.
normalcdf
(0,160.5,159,8.6447) = 0.5689
For Part (c), you exclude 155 so P(X > 155) has normal approximation P(y > 155.5) = 0.6572.
normalcdf
(155.5,10^99,159,8.6447) = 0.6572.
For Part (d), you exclude 147 so P(X 147) has normal approximation P(Y 146.5) = 0.0741.
normalcdf
(0,146.5,159,8.6447) = 0.0741
For Part (e), P(X = 175) has normal approximation P(174.5 Y 175.5) = 0.0083.
normalcdf
(174.5,175.5,159,8.6447) = 0.0083
Because of calculators and computer software that let you calculate binomial probabilities for large values of n easily, it is not necessary to use the the normal approximation to the binomial distribution, provided that you have access to these technology tools. Most school labs have computer software that calculates binomial probabilities. Many students have access to calculators that calculate probabilities for binomial distribution. If you type in binomial probability distribution calculation in an internet browser, you can find at least one online calculator for the binomial.
For Example 7.10, the probabilities are calculated using the following binomial distribution: (n = 300 and p = 0.53). Compare the binomial and normal distribution answers. See Discrete Random Variables for help with calculator instructions for the binomial.
P(X ≥ 150) :1 - binomialcdf
(300,0.53,149) = 0.8641
P(X ≤ 160) :binomialcdf
(300,0.53,160) = 0.5684
P(X > 155) :1 - binomialcdf
(300,0.53,155) = 0.6576
P(X 147) :binomialcdf
(300,0.53,146) = 0.0742
P(X = 175) :(You use the binomial pdf.)binomialpdf
(300,0.53,175) = 0.0083
In a city, 46 percent of the population favors the incumbent, Dawn Morgan, for mayor. A simple random sample of 500 is taken. Using the continuity correction factor, find the probability that at least 250 favor Dawn Morgan for mayor.