Died Survived Sum
1st 122 203 325
2nd 167 118 285
3rd 528 178 706
Crew 673 212 885
Lab 2
Objectives:
Address some general information about the labs and course data sets
Meet our neighbors
Practice controlling for confounding factors
General Information
Navigating Course Datasets: Navigate to the course website and follow along with your instructor.
Using Software: While the lab will be teaching the basics of analysis in R, we emphasize that the objective is software literacy. Feel free to use other software if you would like. If you are concerned about whether using your chosen software will complicate the final assessment, please reach out.
Meet Your Neighbors
Turn to the person on your left or right and take 30 seconds to get to know them.
Now turn to the person on your other side and take 30 seconds to get to know them.
Mortality on the Titanic
The RMS Titanic was a British luxury steamship that embarked on its maiden voyage from Southampton to New York City in April 1912. The vessel was widely considered a marvel of modern engineering and was famously labeled as practically unsinkable due to its advanced system of watertight compartments. However, the ship struck an iceberg in the North Atlantic and sank in less than three hours, leading to a massive loss of life and a total overhaul of international maritime safety regulations.
Suppose we want to investigate survivability rates for each class aboard the Titanic. Since we believe that gender is a potential confounding factor, we will want to control for this in our final summary.
For the sake of practice, we will first work through the manual calculations and then run through the process using R to check our work.
Manual Calculations Controlling for Sex
Reference Tables
Survival totals by:
Died Survived Sum
Female 126 344 470
Male 1364 367 1731
, , Sex = Female
Survived
Class Survived Total
1st 141 145
2nd 93 106
3rd 90 196
Crew 20 23
, , Sex = Male
Survived
Class Survived Total
1st 62 180
2nd 25 179
3rd 88 510
Crew 192 862
A
Using the tables provided above, calculate the overall percentages of survival for each class.
B
For each class, calculate the percentage of passengers that survived for females and males, respectively.
C
Calculate the proportion (fraction) of female and male passengers on the ship.
D
Construct a weighted average of the percentage of passengers in each class who survived, controlling for the effect of sex (there will only be one number for each class).
Compare this answer to the proportions you calculated in part A. What insight do we gain from controlling for sex?
Using R
# read in our dataset
# What information does each column contain?
titanic <- read.delim('https://raw.githubusercontent.com/IowaBiostat/data-sets/main/titanic/titanic.txt')A
Calculate the overall percentages of survival for each class.
# create table of counts
tclass <- table(titanic$Class, titanic$Survived) |>
addmargins() # creates automatic 'sum' column
tclass[,2] / tclass[,3] # divide count of 'survived' by the total 1st 2nd 3rd Crew Sum
0.6246154 0.4140351 0.2521246 0.2395480 0.3230350
B
For each class, calculate the percentage of passengers that survived for females and males, respectively.
classtable <- table(titanic$Sex,titanic$Class,titanic$Survived)
classes <- prop.table(classtable, 1:2)[,,2]
t(classes)
Female Male
1st 0.9724138 0.3444444
2nd 0.8773585 0.1396648
3rd 0.4591837 0.1725490
Crew 0.8695652 0.2227378
C
Calculate the proportion (fraction) of female and male passengers on the ship.
weights <- with(titanic, table(Sex)) |>
prop.table()
weightsSex
Female Male
0.2135393 0.7864607
D
Construct a weighted average of the percentage of passengers in each class who survived, controlling for the effect of sex (there will only be one number for each class).
# first we create vectors that contain only survival proportions for females and males, respectively
fem_props <- t(classes)[1:4, 1]
mal_props <- t(classes)[1:4, 2]
# we can now weight those proportions by the percentage of passengers of each sex
(fem_props * weights[1]) + (mal_props * weights[2]) 1st 2nd 3rd Crew
0.4785406 0.2971914 0.2337568 0.3608609
Extra Practice
Now let’s say that we want to investigate the difference in survival by sex for the Titanic data set. Use direct standardization to calculate the percentage of passengers for each sex who survived, controlling for the effect of class.
A
Find the proportion of people who survived by sex.
B
Find the percent of passengers of each sex broken down by class.
C
Find total percent of passengers for each class.
D
Construct a direct standardization of the percentage of passengers in each sex who survived, controlling for the effect of class (there will only be one number for each sex).
Solutions
Handwritten solutions can be found here
A
Proportion of people who survived by sex:
# Create a table of Sex × Survived counts
sextab <- table(titanic$Sex, titanic$Survived)
# Compute proportion survived for each sex
prop_survived_sex <- sextab[, "Survived"] / rowSums(sextab)
# Print the result
prop_survived_sex Female Male
0.7319149 0.2120162
B
Percent of passengers of each sex broken down by class:
# Create a 3-way table: Class × Sex × Survived
classtab <- table(titanic$Class, titanic$Sex, titanic$Survived)
# Compute survival proportion for each Class × Sex
prop_survived <- classtab[,, "Survived"] /
(classtab[,, "Survived"] + classtab[,, "Died"])
# Print only the final table
prop_survived
Female Male
1st 0.9724138 0.3444444
2nd 0.8773585 0.1396648
3rd 0.4591837 0.1725490
Crew 0.8695652 0.2227378
C
Total percent of passengers for each class.
# Create a table of counts for each class (all sexes combined)
class_counts <- table(titanic$Class)
# Compute proportion of total passengers in each class
class_proportions <- class_counts / sum(class_counts)
# Print the result
class_proportions
1st 2nd 3rd Crew
0.1476602 0.1294866 0.3207633 0.4020900
D
Construct a direct standardization of the percentage of passengers in each sex who survived, controlling for the effect of class (there will only be one number for each sex).
# survival proportions by sex and class
fem_props <- t(classes)[1:4, 1] # female survival rates per class
mal_props <- t(classes)[1:4, 2] # male survival rates per class
# proportion of passengers in each class (class weights)
class_counts <- table(titanic$Class)
total_passengers <- sum(class_counts)
class_weights <- class_counts / total_passengers
# weighted average survival for each sex, controlling for class
weighted_survival_female <- sum(fem_props * class_weights)
weighted_survival_male <- sum(mal_props * class_weights)
# combine into a vector
weighted_survival_by_sex <- c(F = weighted_survival_female,
M = weighted_survival_male)
# print
weighted_survival_by_sex F M
0.7541256 0.2138535