BIOS:4120 List of Functions

Modified

March 3, 2026

Please advise your TA if Date Modified is not current with the most recent lecture.

This document is intended to summarize all the functions necessary to complete the homeworks and ultimately the computational assessment at the end of the semester. You are discouraged from using AI tools to generate your code because it is likely to give you code that you don’t actually need for this class. There are hundreds of functions in R and many ways to accomplish the same task. For simplicity, try to use only the code provided here. Know that it is possible to complete any task that will be asked of you.

Good luck!

Utility Functions

by(dataset$var, dataset$group, mean) 
  • Applies a function to a variable “var” split by levels of “group”

  • “group” argument isn’t necessary, but when used will configure the output by the levels of the grouping variable

with(dataset, mean(variable)) 
  • Evaluates an expression in an environment constructed from data

  • Allows you to specify your dataset only once instead of using the form dataset$variable every time

  • can be used with any other functions to accomplish a task

lm(outcome_var ~ explanatory_var, data = dataset)
  • constructs a linear model of the form

\[ \overbrace{Y}^{\text{Outcome}} = \underbrace{\alpha + \beta \overbrace{X}^{\text{Explanatory var}}}_{\text{Linear Predictor}} \]

  • the order of the variables matters!
choose(n, r)
  • binomial coefficient given by the formula

\[ \frac{n!}{x!(n-x)!} \]

  • A “Combination” a selections of items from a set where order is unimportant Binomial lecture

  • n is the total number

  • r is the selection of items from the set

sum(x)
  • sums a numeric variable x

For Distributions

dbinom(x, n, prob) 
  • binomial density function

    • x = number of successes

    • n = number of trials

    • prob = probability of success

  • calculates the probability of a particular outcome given parameters \(n\) and \(\pi\).

  • can be supplied a vector of outcomes as in this example

pbinom(x, n, prob)
  • sums the probabilities starting from the lower tail (left)
binom.test(x, n)
binom.test(x, n)$conf.int
binom.test(x, n)$p.value
  • calculates the test statistic and p-value for a test of proportion being different from 0.5 (by default, but this can be changed)

  • appending $conf.int at the end will return only the confidence interval

  • appending $p.value will return only the p-value

pnorm(q)
  • where q is the “quantile”, or number of standard deviations away from 0

  • calculates the area to the left of q under a standard normal density curve (mean = 0, sd = 1)

  • to find area to the right of q, we can use the compliment rule 1-pnorm(q)

qnorm(p)
  • finds the percentile given a probability p

    • ie. to find 60th percentile, we input p=60 to tell R that the probability is 60% and we want to know what the associated percentile is under the standard normal curve

Summary Statistics

For Categorical Variables

(Descriptive Statistics Slide 7)

xtabs(~ x + y, data = df)

# three-way table
xtabs(~ x + y + z, data = df) 
  • ~ x + y: Creates a 2x2 contingency table of counts

  • ~ x + y + z creates a 3x3 contingency table for three categorical variables

proportions(table) 
  • Converts a table of counts into a table of proportions

  • Can used in conjunction with xtabs

For Continuous variables

median(x) 
  • Calculates the middle value of a numeric variable
mean(x) 
  • Calculates the arithmetic average of a numeric variable
sd(x)
  • Calculates the sample standard deviation of a numeric variable
quantile(x, probs = c(q1, q2, q3, ...)) 
  • Produces sample quantiles corresponding to given probabilities
cor(x, y) 
  • Computes the correlation coefficient between two numeric variables

  • Correlation Slide 13 for example

summary(dataset$variable) 
  • gives 5 number summary of continuous variable
min(dataset$variable) 
  • gives minimum of continuous variable
max(dataset$variable) 
  • gives maximum of continuous variable
sort(dataset$variable) 
  • sorts variable in ascending order by default

  • can be continuous or categorical

ggplot2

library(ggplot2) # always load the library
ggplot(data, aes(x, y)) +
  geom_bar() +        # bar chart
  geom_histogram() +  # histogram
  geom_point() +      # scatterplot
  geom_boxplot() +    # box and whisker plot
  facet_wrap(~var) +   # Split plot into a multi-panel layout by a variable
  facet_grid(~var) # essentially the same as facet_wrap, with slightly different functionality