by(dataset$var, dataset$group, mean) BIOS:4120 List of Functions
Please advise your TA if Date Modified is not current with the most recent lecture.
This document is intended to summarize all the functions necessary to complete the homeworks and ultimately the computational assessment at the end of the semester. You are discouraged from using AI tools to generate your code because it is likely to give you code that you don’t actually need for this class. There are hundreds of functions in R and many ways to accomplish the same task. For simplicity, try to use only the code provided here. Know that it is possible to complete any task that will be asked of you.
Good luck!
Utility Functions
Applies a function to a variable “var” split by levels of “group”
“group” argument isn’t necessary, but when used will configure the output by the levels of the grouping variable
with(dataset, mean(variable)) Evaluates an expression in an environment constructed from data
Allows you to specify your dataset only once instead of using the form
dataset$variableevery timecan be used with any other functions to accomplish a task
lm(outcome_var ~ explanatory_var, data = dataset)- constructs a linear model of the form
\[ \overbrace{Y}^{\text{Outcome}} = \underbrace{\alpha + \beta \overbrace{X}^{\text{Explanatory var}}}_{\text{Linear Predictor}} \]
- the order of the variables matters!
choose(n, r)- binomial coefficient given by the formula
\[ \frac{n!}{x!(n-x)!} \]
A “Combination” a selections of items from a set where order is unimportant Binomial lecture
nis the total numberris the selection of items from the set
sum(x)- sums a numeric variable
x
For Distributions
dbinom(x, n, prob) binomial density function
x= number of successesn= number of trialsprob= probability of success
calculates the probability of a particular outcome given parameters \(n\) and \(\pi\).
can be supplied a vector of outcomes as in this example
pbinom(x, n, prob)- sums the probabilities starting from the lower tail (left)
binom.test(x, n)
binom.test(x, n)$conf.int
binom.test(x, n)$p.valuecalculates the test statistic and p-value for a test of proportion being different from 0.5 (by default, but this can be changed)
appending
$conf.intat the end will return only the confidence intervalappending
$p.valuewill return only the p-value
pnorm(q)where
qis the “quantile”, or number of standard deviations away from 0calculates the area to the left of
qunder a standard normal density curve (mean = 0, sd = 1)to find area to the right of
q, we can use the compliment rule1-pnorm(q)
qnorm(p)finds the percentile given a probability
p- ie. to find 60th percentile, we input
p=60to tell R that the probability is 60% and we want to know what the associated percentile is under the standard normal curve
- ie. to find 60th percentile, we input
Summary Statistics
For Categorical Variables
(Descriptive Statistics Slide 7)
xtabs(~ x + y, data = df)
# three-way table
xtabs(~ x + y + z, data = df) ~ x + y: Creates a 2x2 contingency table of counts~ x + y + zcreates a 3x3 contingency table for three categorical variables
proportions(table) Converts a table of counts into a table of proportions
Can used in conjunction with
xtabs
For Continuous variables
median(x) - Calculates the middle value of a numeric variable
mean(x) - Calculates the arithmetic average of a numeric variable
sd(x)- Calculates the sample standard deviation of a numeric variable
quantile(x, probs = c(q1, q2, q3, ...)) - Produces sample quantiles corresponding to given probabilities
cor(x, y) Computes the correlation coefficient between two numeric variables
Correlation Slide 13 for example
summary(dataset$variable) - gives 5 number summary of continuous variable
min(dataset$variable) - gives minimum of continuous variable
max(dataset$variable) - gives maximum of continuous variable
sort(dataset$variable) sorts variable in ascending order by default
can be continuous or categorical
ggplot2
library(ggplot2) # always load the library
ggplot(data, aes(x, y)) +
geom_bar() + # bar chart
geom_histogram() + # histogram
geom_point() + # scatterplot
geom_boxplot() + # box and whisker plot
facet_wrap(~var) + # Split plot into a multi-panel layout by a variable
facet_grid(~var) # essentially the same as facet_wrap, with slightly different functionality