Lab 9

Published

March 24, 2026

Download R code

Objectives

NoteNotation
  • A hat on a greek letter indicates an estimator. For example, when you see \(\hat \mu\) it means an estimator of the population mean \(\mu\), such as \(\bar{x}\).

  • the \(Z\) distribution is another name for the Standard Normal Distribution, \(N(0, 1)\)

TipUseful Formulas

One Sample Categorical Data (Approximate)

  • Let \(\pi_0\) be the proportion under the null hypothesis, \(\hat \pi\) the sample proportion, \(n\) the sample size
  • To test: \(\large{Z = \frac{\hat \pi - \pi_0}{\sqrt{\pi_0(1-\pi_0)/n}} \sim N(0,1)}\)
  • For CI: \(\large{\hat \pi \pm z_{\alpha/2}\sqrt{\frac{\hat \pi(1-\hat \pi)}{n}}}\)
  • Caution: This approximation is precise if \(n\) is large and \(\pi_0\) is not close to 0 or 1. (in the case of the CI, \(\hat{\pi}\) can’t be close to 0 or 1.)

Normal Approximation (\(\mu\) and \(\sigma\) are known parameters)

  • Let \(\mu\) be the true population mean, \(\sigma\) be the population standard deviation.
  • The distribution of X is normal. (This does not have a confidence interval. We would just find the middle 95% of the data directly.)
  • \(\large{Z = \frac{x - \mu}{\sigma} \sim N(0,1)}\)

One Sample Continuous Data (\(\mu\) known, \(\sigma\) unknown)

  • Let \(s\) be the sample standard deviation
  • \(\large{T = \frac{\bar{x} - \mu}{s/\sqrt{n}} \sim t_{n-1}}\)
  • \(\large{CI = \bar{x} \pm t_{\alpha/2, \space n-1}{\frac{s}{\sqrt{n}}}}\)
  • As \(n\) grows larger, the t distribution will approximate the normal distribution more closely
  • If \(n\) is small, the underlying distribution of \(X\) must be normal. Otherwise, the t approximation is questionable.

Practice Problems

As always, feel free to work in pairs or small groups to accomplish these exercises!

Problem 1

Suppose that the true average IQ is 95. Using the lead-IQ dataset as a sample, perform a test to see if the children have an average IQ that is different from the true average. Also, create a 95% confidence interval for the mean IQ based on this data.

We will be using a new distribution here–the t-distribution. Open the functions document to learn about the functions we will be using for t-distribution calculations.

Solution

Define our null hypothesis to be \(H_0: \mu = 95\)

leadIQ <- read.delim('https://raw.githubusercontent.com/IowaBiostat/data-sets/main/lead-iq/lead-iq.txt')

mu <- 95
mu.hat <- mean(leadIQ$IQ)
s <- sd(leadIQ$IQ)
n <- length(leadIQ$IQ)
df <- n-1
t <- (mu.hat-mu)/(s/sqrt(n))
pval <- 2*pt(t,df)
pval
[1] 0.002981458

The result, \(p- = 0.0029815\), indicats strong evidence that the true average IQ of children in this dataset is not 95.

ci <- mu.hat + qt(c(.025,.975),n-1) * s/sqrt(n)
ci
[1] 88.52022 93.64107

This gives a confidence interval of (88.52, 93.641). We could also use the t.test function (as shown below) for this dataset, and it provides us both the p-value and the 95% confidence interval. This function works similarly to the ‘binom.test’ function.

t.test(leadIQ$IQ, mu=95)

    One Sample t-test

data:  leadIQ$IQ
t = -3.03, df = 123, p-value = 0.002981
alternative hypothesis: true mean is not equal to 95
95 percent confidence interval:
 88.52022 93.64107
sample estimates:
mean of x 
 91.08065 

Problem 2

Suppose that the current commonly used screening test for breast cancer has a sensitivity of 68%. A new screening test was used to test 200 breast cancer patients, in which 147 patients tested positive.

  1. Create a 95% confidence interval for the sensitivity of the new test. Use an approximation procedure.
Solution
p <- 147/200
p + c(-1, 1) * qnorm(0.975) * sqrt((p*(1-p))/200)
[1] 0.6738355 0.7961645
  1. Perform a hypothesis test to determine if there is a significant difference in the sensitivity of the old and new test. Use Normal approximation.
Solution

Define \(H_0:0.68\)

p_hat <- 147/200
p0 <- 0.68

z <- (p_hat - p0) / (sqrt((p0*(1-p0))/200))

2*pnorm(z, lower.tail = F) # don't forget to make it two-sided!
[1] 0.09542845
  1. Using R, calculate the exact confidence interval and conduct a hypothesis test comparing the sensitivity of the new test to that of the current commonly used tests.

Hint: Use the binomial distribution where X = number of positive test results.

Solution
binom.test(x = 147, n = 200, p = 0.68)

    Exact binomial test

data:  147 and 200
number of successes = 147, number of trials = 200, p-value = 0.1111
alternative hypothesis: true probability of success is not equal to 0.68
95 percent confidence interval:
 0.6681299 0.7947609
sample estimates:
probability of success 
                 0.735 

Problem 3

A patient recently diagnosed with Alzheimer’s disease takes a cognitive abilities test. The population mean of this test is \(\mu = 52\) and the population standard deviation is \(\sigma = 5\). Assume the cognitive abilities test scores are normally distributed. Find the answers to the following questions with the Z distribution table, your calculators, or in R. Remember the Z table gives you the left-tailed probability.

  1. What percent of individuals scored between a 47 and a 56?
Solution

First, standardize the values of 47 and 56 into \(z\) values, then use standard normal curve to find area between these \(z\) values.

mu <- 52
sd <- sqrt(25)
z1 <- (47 - mu) / sd
z2 <- (56 - mu) / sd

pnorm(z2) - pnorm(z1)
[1] 0.6294893
  1. Suppose we have a sample of 9 individuals. Calculate the probability that the sample mean test score is greater than 60.
Solution

First, standardize 60 into a z value, then use standard normal curve to find area above this \(z\) value.

mu <- 52
sd <- sqrt(25)
n <- 9
z <- (60 - mu) / (sd / sqrt(n))
pnorm(z, lower.tail=FALSE)
[1] 7.933282e-07
  1. Patients can be considered for an alternative treatment if they score below a 43 on this test. What percent of patients can be considered for this treatment?
Solution

First, standardize 43 into a z value, then use standard normal curve to find area below this \(z\) value.

mu <- 52
sd <- sqrt(25)
n <- 9
z <- (43 - mu) / (sd)
perc_alt_tx <- pnorm((z))
perc_alt_tx
[1] 0.03593032
  1. Find the test score where 27.1% of patients lie above.
Solution

Find z value and “unstandardize” back to the scale of the data.

z <- qnorm(.271, lower.tail=FALSE)
52 + z*5
[1] 55.04896
  1. What is the probability that at least 2 patients of 25 sampled Alzheimer’s patients will be considered for the alternative treatment?
Solution

From part c, we found the probability that a patient would be considered for alternative treatment. We use this probability in the binomial distribution, Binom(\(n,\pi\)) where \(\pi = 0.0359303\) and \(n=25\).

Remember that using lower.tail=FALSE shades the binomial probability from the right and stops at the value greater than ours. That is \(P(X > 1)\)

pbinom(1, 
       25, 
       perc_alt_tx, # found in part c, saved to this variable
       lower.tail = FALSE) # shading to the right
[1] 0.2261473


Problem 4

Wilson’s orchard’s pumpkins’ weights are known to follow a normal distribution with population mean \(\mu = 18 lbs.\) and standard deviation \(\sigma = 4 lbs\). Each year Wilson’s orchard randomly selects 4 pumpkins and measures the mean weight of the pumpkins.

  1. What distribution do the sample means follow?
Solution

\(\bar{X} \sim N(\mu, \frac{\sigma}{\sqrt{n}}) \sim N(18, \frac{4}{\sqrt{4}}=2)\)

  1. Using this distribution, calculate the probability that this year’s sample mean weight is less than 16 lbs.
Solution

Standardize 16 to find the \(z\) value. Use pnorm to find the probability.

z <- (16 - 18) / sqrt(4)
pnorm(z)
[1] 0.1586553
  1. What is the probability that this year’s sample mean weight is greater than 21 lbs?
Solution

Standardize 21 to find the \(z\) value. Use pnorm to find the probability. Remember to either subtract from 1 or use lower.tail=FALSE to find the shaded area to the right.

z <- (21 - 18) / sqrt(4)
1 - pnorm(z)
[1] 0.0668072
  1. What is the probability that at least 2 of the next 5 years’ sample means are between 14 and 20 lbs?
Solution

First, standardize the values of 14 and 20 to be \(z\) values.

Second, use the normal distribution to find the probability of a sample mean being between 14 and 20.

Third, use this probability for \(\pi\) in the binomial(\(n, \pi\)) probability distribution. Make sure to correctly shade above 1 \(P(X \ge 2) = 1-P(X<2) = 1 - P(X\le1)\)1.

z1 <- (14 - 18) / sqrt(4)
z2 <- (20 - 18) / sqrt(4)

(p <- pnorm(z2) - pnorm(z1))
[1] 0.8185946
(ge2 <- pbinom(1, 5, p, lower.tail = FALSE))
[1] 0.9953711
From the output above, we see that the probability of a sample mean being between 14 and 20 is 0.819. Using this in our binomial distribution, we find the probability that at least 2 of the next 5 years’ sample means are between these values is 0.995.

Handwritten solutions

Footnotes

  1. If this doesn’t make sense, ask your instructor about this mathematical statement.↩︎