Lab 15

Published

May 5, 2026

Download R code

Objectives

  1. Survival Analysis in R
    • Kaplan-Meier Estimator
    • Log-rank test
  2. Final Review

Survival Analysis

Kaplan-Meier Estimator

Define \(n(t)\) as the number of subjects at risk for the event in the study at time \(t\) and \(d(t)\) as the number of events that occur at time \(t\).

\[ \hat S(t) = \prod_i \frac{n(t_i) - d(t_i)}{n(t_i)} \]

Load Survival Package
# automatically install package if it doesn't exist on your machine, 
# then loads the library
if(!require(survival)){
  install.packages("survivial")
  library(survival)
} else {
    library(survival)
  }

Model Fitting

First we will fit a model with no grouping variable to estimate the overall survival function. We do this by

  1. Calculating the response with the Surv function

  2. Fitting survfit(S ~ 1) where the 1 indicates we want to fit without a grouping variable. This calculates the survival curve that we learned how to compute in class

anemia <- read.delim('https://raw.githubusercontent.com/IowaBiostat/data-sets/main/anemia/anemia.txt')

S <- with(anemia, Surv(Time,Status!=0)) # get response
fit <- survfit(S~1) 

Recall if you know the time of death and number of subjects at risk, we can calculate survival probability. For example, here is the probability estimated at the first five events and the cumulative product used to estimate the survival curve:

time

n(t)

d(t)

[n(t)-d(t)]/n(t)

cumproduct

3

46

1

0.9783

0.9783

12

45

1

0.9778

0.9565

25

44

1

0.9773

0.9348

30

43

1

0.9767

0.9130

44

42

1

0.9762

0.8913

To plot the entire estimated survival curve, use:

plot(fit, ylab = "Probability", xlab = "Time")

This is the Kaplan-Meier survival function estimate of the survival function, ignoring the different treatment groups.

Now stratifying by group:

fit2 <- with(anemia, survfit(S ~ Trt))

plot(fit2, ylab = "Overall Survival", 
     xlab = "Time", 
     col = c("red","blue"))
legend("bottomleft", c("MTX","MTX + CSP"), 
       text.col = c("red","blue"), bty = "n")

Log-rank test

\(H_0:\) The survival curves are equal.

fit2 <- with(anemia, Surv(Time, Status != 0) ~ Trt)
survdiff(fit2)
Call:
survdiff(formula = fit2)

             N Observed Expected (O-E)^2/E (O-E)^2/V
Trt=MTX     24        9     6.45     1.007      2.01
Trt=MTX+CSP 22        4     6.55     0.992      2.01

 Chisq= 2  on 1 degrees of freedom, p= 0.2 

Final Review Exercises

ImportantCommon Mistakes and General Suggestions

The following points of misunderstanding have come up frequently in the homework throughout the semester:

  • Probabilities should always be between 0 and 1.
  • Confounding is a major source of bias, but when we conduct a randomized, double-blind, placebo controlled trial confounding is not a concern
  • When presented with a problem with categorical data and asked to conduct a test, it is not recommended that you use a test of two proportions unless you are already familiar with this method; try fitting your scenario into a setup for a chi-squared test
  • Slope is a rate of change with regards to units (e.g., feet per minute, water buffalo per acre). Correlation measures the strength (and direction) of a linear relationship between two variables. Correlation is a standardized measure and is thus measured by standard deviations.
  • Please include context in your interpretations, otherwise they are incomplete
  • Understand the differences between approximate and exact tests.
  • Keep track of your hypotheses are and what you are testing.
  • The type of data you have and the thing you want to know are the two biggest indicators of which test is most appropriate.
  • P-values are not necessary to determine significance1
  • We can not say that we have evidence for the null2

Which Test should I use?

Read the following case studies and indicate what statistical method might be best to for each situation:

  • In a study of 16 overweight young adults in India, participants were given, in turns, a dose of an extract made from unroasted coffee beans and a placebo, three times a day over 22 weeks. Their diet throughout the study was unchanged, and they were physically active. Between trials, the participants were given a two-week break for their bodies to reset. Though a few participants given the extract only lost 7 pounds, others lost as much as 26 pounds. On average, the subjects lost 17.5 pounds each, and reduced their body weight by 10.5 percent. Body fat also declined by 16 percent, even though the participants were eating an average of 2,400 calories and burning roughly 400.

    Answer

    Paired t-test

  • Researchers from Penn State found that increasing the amount of spices in your diet may lower the level of potentially harmful fat in your bloodstream. The experiment compared two groups of healthy, overweight men. One group ate meals seasoned with the special spice blend; the other ate the same meals prepared without the spices. Men who ate the spicy food saw a decrease of one-third in the level of triglycerides (a type of fat linked to heart disease) in their bloodstreams, and 20 percent lower insulin levels overall — even when the meals were high in fat and made with heavy oils.

    Answer

    2 sample t-test

  • Researchers at Colgate wished to test the effectiveness of a new toothpaste. They collected a sample of 143 individuals and assigned them to either use the current Colgate toothpaste or the new toothpaste for 2 weeks. Participants waited one week and then switched to using the other toothpaste for two weeks. Based on plaque build-up, they determined that 77 participants did better on the new toothpaste than the old. (Note: This study is fictional)

    Answer

    Binomial Exact test or paired t-test

  • Exposure to cosmic radiation during deep-space missions may damage an astronaut’s heart, a new NASA-funded study suggests. Researchers at Florida State University compared the deaths of 35 astronauts who never traveled into space with those of 42 astronauts who ventured beyond Earth’s protective magnetic field, including seven Apollo veterans who flew to the moon between 1968 and 1972. The study found that lunar astronauts were five times more vulnerable to heart disease—43 percent of them died from cardiovascular ailments compared with only 9 percent of the astronauts that didn’t journey to the moon. A follow-up study involving mice reveals that radiation can trigger long-term changes in the lining of blood vessels associated with atherosclerosis, or “hardening of the arteries.”

    Answer

    Chi-sq or Fisher’s exact test

  • An investigator collected the annual earnings of 1642 Iowans and 1563 Nebraskans to compare income level by state. The Iowa group had a mean of $65,000, a median of $59,000, and a standard deviation of $12,000. The Nebraska group had a mean of $64,000, a median of $61,000, and a standard deviation of $12,000. (Note: This study is fictional)

    Answer

    Log transformed 2-sample t-test or Mann-Whitney/Wilcoxon Rank Sum test

  • Researchers at the University of College London surveyed nearly 8,000 participants over the age of 52 attempting to measure whether people read the instructions on medication bottles. Using a fake aspirin bottle complete with instructions as the testing instrument, researchers asked participants to answer four basic questions, including “What is the maximum number of days you may take this medicine?” and “List three situations for which you should consult a doctor.” All the answers could be found on the label. One third of the adults failed to correctly answer all four questions, and one in eight got two or more wrong. Researchers then monitored the volunteers’ health for five years. During that time, 621 of the participants died, and people who missed two or more questions were more than twice as likely to have died than those who got the answers correct.

    Answer

    Chi-squared or Fisher’s Exact

  • In a study published in Psychological Science, researchers had groups of participants ages 18 to 65 perform simple exercises, such as pressing a button when a letter appeared onscreen or tapping in time with their own breathing. The experts checked periodically to ask the volunteers whether their minds were on the task or they were thinking of something else. At the end, participants were tested on their ability to remember a series of letters while doing math problems; individuals who let their mind wander scored higher on the test.

    Answer

    2 sample t-test

  • According to the US Census Bureau, the national poverty rate is 11.5%. We wish to see if poverty in Johnson County differs significantly than the national average. We collect a random sample of 1000 individuals and record whether or not they fall into the “poverty” category based on their income.

    Answer

    Binomial Exact test

Footnotes

  1. what’s the other way?↩︎

  2. this can also be expressed in various forms, just be careful you don’t express this idea in your interpretations↩︎