A hat on a greek letter indicates an estimator. For example, when you see \(\hat \mu\) it means an estimator of the population mean \(\mu\), such as \(\bar{x}\).
the \(Z\) distribution is another name for the Standard Normal Distribution, \(N(0, 1)\)
TipUseful Formulas
One Sample Categorical Data (Approximate)
Let \(\pi_0\) be the proportion under the null hypothesis, \(\hat \pi\) the sample proportion, \(n\) the sample size
To test: \(\large{Z = \frac{\hat \pi - \pi_0}{\sqrt{\pi_0(1-\pi_0)/n}} \sim N(0,1)}\)
For CI: \(\large{\hat \pi \pm z_{\alpha/2}\sqrt{\frac{\hat \pi(1-\hat \pi)}{n}}}\)
Caution: This approximation is precise if \(n\) is large and \(\pi_0\) is not close to 0 or 1. (in the case of the CI, \(\hat{\pi}\) can’t be close to 0 or 1.)
Normal Approximation (\(\mu\) and \(\sigma\) are known parameters)
Let \(\mu\) be the true population mean, \(\sigma\) be the population standard deviation.
The distribution of X is normal. (This does not have a confidence interval. We would just find the middle 95% of the data directly.)
As \(n\) grows larger, the t distribution will approximate the normal distribution more closely
If \(n\) is small, the underlying distribution of \(X\) must be normal. Otherwise, the t approximation is questionable.
Practice Problems
As always, feel free to work in pairs or small groups to accomplish these exercises!
Problem 1
Suppose that the true average IQ is 95. Using the lead-IQ dataset as a sample, perform a test to see if the children have an average IQ that is different from the true average. Also, create a 95% confidence interval for the mean IQ based on this data.
Tip
We will be using a new distribution here–the t-distribution. Open the functions document to learn about the functions we will be using for t-distribution calculations.
Solution
Define our null hypothesis to be \(H_0: \mu = 95\)
The result, \(p- = 0.0029815\), indicats strong evidence that the true average IQ of children in this dataset is not 95.
ci <- mu.hat +qt(c(.025,.975),n-1) * s/sqrt(n)ci
[1] 88.52022 93.64107
This gives a confidence interval of (88.52, 93.641). We could also use the t.test function (as shown below) for this dataset, and it provides us both the p-value and the 95% confidence interval. This function works similarly to the ‘binom.test’ function.
t.test(leadIQ$IQ, mu=95)
One Sample t-test
data: leadIQ$IQ
t = -3.03, df = 123, p-value = 0.002981
alternative hypothesis: true mean is not equal to 95
95 percent confidence interval:
88.52022 93.64107
sample estimates:
mean of x
91.08065
Problem 2
Suppose that the current commonly used screening test for breast cancer has a sensitivity of 68%. A new screening test was used to test 200 breast cancer patients, in which 147 patients tested positive.
Create a 95% confidence interval for the sensitivity of the new test. Use an approximation procedure.
Solution
p <-147/200p +c(-1, 1) *qnorm(0.975) *sqrt((p*(1-p))/200)
[1] 0.6738355 0.7961645
Perform a hypothesis test to determine if there is a significant difference in the sensitivity of the old and new test. Use Normal approximation.
Solution
Define \(H_0:0.68\)
p_hat <-147/200p0 <-0.68z <- (p_hat - p0) / (sqrt((p0*(1-p0))/200))2*pnorm(z, lower.tail = F) # don't forget to make it two-sided!
[1] 0.09542845
Using R, calculate the exact confidence interval and conduct a hypothesis test comparing the sensitivity of the new test to that of the current commonly used tests.
Hint: Use the binomial distribution where X = number of positive test results.
Solution
binom.test(x =147, n =200, p =0.68)
Exact binomial test
data: 147 and 200
number of successes = 147, number of trials = 200, p-value = 0.1111
alternative hypothesis: true probability of success is not equal to 0.68
95 percent confidence interval:
0.6681299 0.7947609
sample estimates:
probability of success
0.735
Problem 3
A patient recently diagnosed with Alzheimer’s disease takes a cognitive abilities test. The population mean of this test is \(\mu = 52\) and the population standard deviation is \(\sigma = 5\). Assume the cognitive abilities test scores are normally distributed. Find the answers to the following questions with the Z distribution table, your calculators, or in R. Remember the Z table gives you the left-tailed probability.
What percent of individuals scored between a 47 and a 56?
Solution
First, standardize the values of 47 and 56 into \(z\) values, then use standard normal curve to find area between these \(z\) values.
Patients can be considered for an alternative treatment if they score below a 43 on this test. What percent of patients can be considered for this treatment?
Solution
First, standardize 43 into a z value, then use standard normal curve to find area below this \(z\) value.
mu <-52sd <-sqrt(25)n <-9z <- (43- mu) / (sd)perc_alt_tx <-pnorm((z))perc_alt_tx
[1] 0.03593032
Find the test score where 27.1% of patients lie above.
Solution
Find z value and “unstandardize” back to the scale of the data.
z <-qnorm(.271, lower.tail=FALSE)52+ z*5
[1] 55.04896
What is the probability that at least 2 patients of 25 sampled Alzheimer’s patients will be considered for the alternative treatment?
Solution
From part c, we found the probability that a patient would be considered for alternative treatment. We use this probability in the binomial distribution, Binom(\(n,\pi\)) where \(\pi = 0.0359303\) and \(n=25\).
Remember that using lower.tail=FALSE shades the binomial probability from the right and stops at the value greater than ours. That is \(P(X > 1)\)
pbinom(1, 25, perc_alt_tx, # found in part c, saved to this variablelower.tail =FALSE) # shading to the right
[1] 0.2261473
Problem 4
Wilson’s orchard’s pumpkins’ weights are known to follow a normal distribution with population mean \(\mu = 18 lbs.\) and standard deviation \(\sigma = 4 lbs\). Each year Wilson’s orchard randomly selects 4 pumpkins and measures the mean weight of the pumpkins.
Using this distribution, calculate the probability that this year’s sample mean weight is less than 16 lbs.
Solution
Standardize 16 to find the \(z\) value. Use pnorm to find the probability.
z <- (16-18) /sqrt(4)pnorm(z)
[1] 0.1586553
What is the probability that this year’s sample mean weight is greater than 21 lbs?
Solution
Standardize 21 to find the \(z\) value. Use pnorm to find the probability. Remember to either subtract from 1 or use lower.tail=FALSE to find the shaded area to the right.
z <- (21-18) /sqrt(4)1-pnorm(z)
[1] 0.0668072
What is the probability that at least 2 of the next 5 years’ sample means are between 14 and 20 lbs?
Solution
First, standardize the values of 14 and 20 to be \(z\) values.
Second, use the normal distribution to find the probability of a sample mean being between 14 and 20.
Third, use this probability for \(\pi\) in the binomial(\(n, \pi\)) probability distribution. Make sure to correctly shade above 1 \(P(X \ge 2) = 1-P(X<2) = 1 - P(X\le1)\)1.
From the output above, we see that the probability of a sample mean being between 14 and 20 is 0.819. Using this in our binomial distribution, we find the probability that at least 2 of the next 5 years’ sample means are between these values is 0.995.