This document contains the worked answers to the exercises.

We will be using tidyverse so lets load this first, and set a ggplot theme.

library(tidyverse)
theme_set(theme_bw(base_size = 14))

Exercise 1: Power

Activated T cells

Providing the observed difference between WT and KO cells is of scientific interest, what sample size is needed to achieve 80% power?

power.prop.test(n = NULL, p1 = 10/41, p2 = 14/35, 
                sig.level = 0.05, power = 0.8, 
                alternative="two.sided")
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 139.4512
##              p1 = 0.2439024
##              p2 = 0.4
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

To achieve a power of 80%, you will need a total sample of 280 cells, with 140 in each group.

Mice weight

What sample size is needed to be able to spot a 10% difference with 80% power?

First, lets calculate a mean, sd, and 10% difference.

mouse_mean <- mean(c(27.2,  25.5,   26, 29.1))
mouse_sd <- sd(c(27.2, 25.5,26,29.1))
mouse_mean2 <- mouse_mean*0.9
mouse_mean
## [1] 26.95
mouse_mean2
## [1] 24.255
mouse_sd
## [1] 1.601041

Then do the power analysis

power.t.test(delta = mouse_mean-mouse_mean2,
          sd = mouse_sd, 
          sig.level = 0.05, 
          power = 0.8, 
          type = "two.sample", 
          alternative = "two.sided")
## 
##      Two-sample t test power calculation 
## 
##               n = 6.652643
##           delta = 2.695
##              sd = 1.601041
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

To be able to spot a 10% difference in the mice weight, at 80% power, assuming the KO group have the same variance as the WT, you will need 7 mice per group.

Arachnophobes

Use the data to calculate the values for a power calculation

First, lets read in the data (spider.data.csv)

spider <- read_csv("Datasets to use/spider.data.csv")
## Rows: 10 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Group
## dbl (2): Participant, Scores
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spider

Now lets calculate what we need for the power analysis

spider %>%
  group_by(Group) %>%
    summarise(mean=mean(Scores), sd=sd(Scores))

Finally, lets run the power analysis

power.t.test(delta = 52 - 39,
          sd = 9.75, 
          sig.level = 0.05, 
          power = 0.8, 
          type = "two.sample", 
          alternative = "two.sided")
## 
##      Two-sample t test power calculation 
## 
##               n = 9.889068
##           delta = 13
##              sd = 9.75
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Providing the preliminary results are to be trusted, to be able to pick up such a difference between the 2 groups, with a power of 80% and a significance level of 5%, we will need at least 10 arachnophobes in each group.

How many arachnophobes would you need to achieve a power of 80%, if based on a paired design? For this we want to look at the differences between the two scores, so lets calculate these

spider %>%
  pivot_wider(names_from = Group, values_from = Scores) %>%
  mutate(difference = Picture - Real) %>%
  summarise(mean = mean(difference), sd = sd(difference))

And run the paired power analysis.

power.t.test(delta = -13,
          sd = 5.7, 
          sig.level = 0.05, 
          power = 0.8, 
          type = "paired", 
          alternative = "two.sided")
## 
##      Paired t test power calculation 
## 
##               n = 3.782171
##           delta = 13
##              sd = 5.7
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs

To achieve a power of 80%, based on a paired design, we would need 4 arachnophobes.

Exercise 2: t-test

Set up the packages we need.

library(tidyverse)
library(rstatix)
library(ggpubr)
library(ggsignif)
theme_set(theme_bw(base_size = 14))

Working memory

Is there an effect of treatment on the monkeys’ performance?

First, read in the data and calculate the differences

working.memory <- read_csv("Datasets to use/working.memory.csv")
working.memory
working.memory <- working.memory %>%
  mutate(difference = DA.depletion - placebo) 
working.memory

Lets have a look at our data and check assumptions by producing some plots of the differences.

working.memory %>%
  ggplot(aes("DA.Depletion", difference))+
  geom_jitter(height=0, width=0.05, size=4, colour="chartreuse3")+
  stat_summary(geom="errorbar",fun="mean", fun.min="mean", fun.max="mean",  width=0.3, linewidth=1)+
  stat_summary(geom="errorbar", fun.data=mean_cl_normal, width=0.15)+
  scale_y_continuous(breaks=-16:0, limits=c(-16, 0))+
  xlab(NULL)+
  ylab("Mean difference +/- 95% CI")

working.memory %>%
  ggplot(aes("DA.Depletion", difference))+
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(height=0, width=0.05, size=4, colour="chartreuse3")+
  scale_y_continuous(breaks=-16:0, limits=c(-16, 0))+
  xlab(NULL)+
  ylab("Mean difference +/- 95% CI")

They look fairly normal, but lets do a test and have a look at the QQ plot.

#library(rstatix)
working.memory %>%
  shapiro_test(difference)
working.memory %>%
  ggplot(aes(sample = difference)) +
  stat_qq()+
  stat_qq_line()

#library(ggpubr)
ggqqplot(working.memory, "difference")

Our data pass the normality test and the QQ plot looks good, so lets run a paired t-test - there are two possible ways of doing this, by doing a one-sample t-test on the differences:

working.memory %>%
  t_test(difference ~ 1, mu=0)
working.memory %>%
  t_test(difference ~ 1, mu=0, detailed = TRUE)

Or by doing a paired t-test on a long format of the data:

NOTE: if you do this, make sure to sort by your ID column (using arrange(ID_col), in this example arrange(Subject)), as paired t_test assumes the 1st data point in your first group (placebo) corresponds to the 1st data point in the second group (DA.depletion). If your order is randomised, it will incorrectly pair your samples, and the results will be incorrect.

working.memory.long <- working.memory %>%
  pivot_longer(cols = c(placebo, DA.depletion), names_to = "treatment", values_to = "score")
working.memory.long %>%
  arrange(Subject) %>%
  t_test(score ~ treatment, paired = TRUE, detailed = TRUE)

Both options give the same result: The injection of a dopamine-depleting agent significantly affects working memory in rhesus monkeys (t=-8.62, df=14, p=5.715e-7).

Now we can produce nice plots of our data to present the results - there are many ways of doing this, but below are a few examples.

working.memory.long %>%
  arrange(Subject) %>%
  ggplot(aes(x=treatment, y=score, group=Subject))+
  geom_line(linewidth=1, colour = "grey")+
  geom_point(colour= "black", size = 2) +
  scale_y_continuous(breaks=seq(from =0, by=5, to=60), 
                     limits = c(0,60)) +
  geom_signif(comparisons = list(c("placebo", "DA.depletion")), 
              test = "t.test", test.args = list(paired=TRUE),
              map_signif_level = TRUE)

working.memory.long %>%
  arrange(Subject) %>%
  ggline(x = "treatment", y = "score", group = "Subject", 
         color = "grey", point.color = "treatment", palette = "Dark2", 
         point.size = 1.5, line.size = 0.4, legend = "none")+ 
  scale_y_continuous(breaks=seq(from =0, by=5, to=60), 
                     limits = c(0,60))+  
  geom_signif(comparisons = list(c("placebo", "DA.depletion")), 
              test = "t.test", test.args = list(paired=TRUE),
              map_signif_level = TRUE)

working.memory.long %>% 
  arrange(Subject) %>%
  t_test(score ~ treatment, paired = TRUE) -> stat.test
working.memory.long %>%
  ggpaired(x = "treatment", y = "score", color = "treatment", id = "Subject", 
           palette = "Dark2", line.color = "gray", line.size = 0.4, legend = "none")+
  scale_y_continuous(breaks=seq(from =0, by=5, to=60), 
                     limits = c(0,60))+
  stat_pvalue_manual(stat.test, label=" p = {p}", y.position = 55)

Coffee

Which beans should you choose for your coffee?

Start by reading in and plotting the data.

coffee <- read_csv("Datasets to use/coffee.bean.species.csv")
coffee
coffee %>%
  ggplot(aes(Species,Scores.Total, colour = Species)) +
  geom_jitter(height=0, width=0.2, size = 2, show.legend = FALSE) +
  stat_summary(geom="crossbar", fun=mean, width=0.6, linewidth=0.3, show.legend = FALSE) +
  ylab("Score")

coffee %>%
  ggplot(aes(x=Species, y=Scores.Total)) +
  geom_boxplot()

coffee %>%
  ggplot(aes(x=Species, y=Scores.Total))+
  geom_violin(trim=TRUE, linewidth=1, show.legend=FALSE)+
  geom_jitter(height=0, width=0.2, size = 2, show.legend = FALSE) +
  ylab("Score")+
  scale_fill_brewer(palette="Dark2")

Lets have a look at the normality tests.

ggqqplot(coffee, x = "Scores.Total", facet.by = "Species") +
  theme_bw()

model  <- lm(Scores.Total ~ Species, data = coffee)
# Create a QQ plot of residuals
ggqqplot(residuals(model)) +
  theme_bw() 

shapiro_test(residuals(model))
coffee %>%
  group_by(Species) %>%
  shapiro_test(Scores.Total)
coffee %>%
  levene_test(Scores.Total ~ Species)

The data do not look perfect, and there is one point outside the confidence intervals of the QQ plot using residuals, but looking at the groups individually they look OK and we pass the other tests, so we will carry on with a t-test for now.

coffee %>%
  t_test(Scores.Total~Species,  var.equal = TRUE, detailed = TRUE) 

There is a significant difference in taste score between the two coffee beans (t=4.47, df=34, p = 8.36e-05). You should choose the Arabica beans as these get a higher score.

Lets produce a final plot of our data.

coffee %>%
  t_test(Scores.Total~Species,  var.equal = TRUE,)  -> t_results

coffee %>%
  ggplot(aes(Species, Scores.Total)) +
  stat_boxplot(geom="errorbar", width=0.2)+ 
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(height=0, width=0.1, size = 2, alpha = 0.5, colour="red")+
  ylab("Score")+
  geom_signif(comparisons = list(c("Arabica", "Robusta")), annotations =  t_results$p)

Exercise 3: ANOVA

Set up the packages we need.

library(tidyverse)
library(rstatix)
library(ggpubr)
library(ggsignif)
theme_set(theme_bw(base_size = 14))

Crop yield

Read in the data and do some initial plots. We have to force the fertiliser type to be a factor or character, as it gets automatically detected as a “double” (numeric), when it is actually a categorical variable.

crop <- read_csv("Datasets to use/crop.yield.csv")
crop <- crop %>%
  mutate(fertilizer = as.factor(fertilizer))
crop
crop %>%
  ggplot(aes(x=fertilizer, y=yield, colour=fertilizer))+
  geom_jitter(height=0, width=0.3, size=3)+
  stat_summary(geom="crossbar", fun=mean, width=0.6, linewidth=0.3, show.legend = FALSE)

crop %>%
  ggplot(aes(x=fertilizer, y=yield, fill=fertilizer))+
  geom_boxplot(show.legend = FALSE)

crop %>%
  ggplot(aes(x=fertilizer, y=yield, colour=fertilizer))+
  geom_violin(show.legend = FALSE)+
  geom_jitter(height=0, width=0.25, alpha=0.5, size=3, show.legend = FALSE)

Lets have a look at the QQ plot and normality tests.

model  <- aov(yield ~ fertilizer, data = crop)
ggqqplot(residuals(model)) +
  theme_bw()

shapiro_test(residuals(model))
#Or on each group
crop %>%
  group_by(fertilizer) %>% 
  shapiro_test(yield)
## Second assumption with Levene
crop %>%
  levene_test(yield ~ fertilizer)

We look to be meeting all assumptions required for the ANOVA, so we will carry out the test.

# omnibus step #
crop %>%
  anova_test(yield~fertilizer, detailed=TRUE)
# Tukey #
crop %>%
  tukey_hsd(yield~fertilizer)

After running the ANOVA, we can see there is a significant difference between the three fertilisers from the omnibus test (F=7.86, p=0.0007), so we can reject the null hypothesis that there is no difference. From the multiple comparisons test, we can see that fertiliser 3 is significantly different from both 1 and 2 (p=0.0006 and p=0.0209, respectively), but there is no significant difference between fertilisers 1 and 2 (p=0.495). Therefore, we would want to use fertiliser 3 to maximise yield.

Finally, lets plot our results.

crop %>%
  tukey_hsd(yield~fertilizer)%>%
  mutate(comparison = paste(group1, sep="-", group2)) -> tukey.conf

tukey.conf %>%
  ggplot(aes(x=comparison, y=estimate, ymin=conf.low, ymax=conf.high)) +
  geom_errorbar(colour="black", linewidth=1)+
  geom_point(size=3, colour="darkred")+
  geom_hline(yintercept=0, linetype="dashed", color = "red")+
  coord_flip()

sig.comp <- tukey.conf %>%
  filter(p.adj<0.05)

crop %>%
  ggplot(aes(x=fertilizer, y=yield, colour=fertilizer)) +
  geom_boxplot(show.legend = FALSE)+    
  geom_jitter(height=0, width=0.1, alpha=0.5, size=5, show.legend = FALSE)+
  geom_signif(comparisons = list(c("1","3"),
                                 c("2","3")),
              annotations = sig.comp$p.adj,
              y_position = c(179.3, 179.7),
              colour = "black",
              show.legend = FALSE)

crop %>%
  ggplot(aes(x=fertilizer, y=yield, colour=fertilizer)) +
  geom_boxplot(show.legend = FALSE)+    
  geom_jitter(height=0, width=0.1, alpha=0.5, size=5, show.legend = FALSE)+
  stat_pwc(method = "tukey_hsd", label = "p.adj.signif", hide.ns = TRUE, show.legend = FALSE)

Neutrophils

Is there a difference between KO with/without treatment and WT?

Start by reading in the data, convert the Condition to a factor, specifying the levels, to make WT the first group.

neutrophils.long <-read_csv("Datasets to use/neutrophils.long.csv")

neutrophils.long %>%
  mutate(Condition=factor(Condition, levels = c("WT", "KO", "KO+T1", "KO+T2"))) -> neutrophils.long
neutrophils.long

Then plot the data, ideall showing the matching by experiment, so we can see how this looks alongside the different conditions.

neutrophils.long %>%
  ggplot(aes(x=Condition, y=Values, group=Experiment, colour=Experiment, fill=Experiment))+
  geom_line(linewidth=2)+
  geom_point(size=3, shape = 21, colour= "black", stroke=2)

neutrophils.long %>%
  ggplot(aes(x=Condition, y=Values))+
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(height = 0, width = 0.2, size=3)

neutrophils.long %>%
  ggplot(aes(x=Condition, y=Values))+
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(aes(x=Condition, y=Values, colour=Experiment), height = 0, width = 0.2, size=4, alpha=0.7)

Now lets look at the normality tests & QQ plot.

neutrophils.long %>%
  group_by(Condition) %>%
  shapiro_test(Values)
ggqqplot(neutrophils.long, "Values", facet.by = "Condition")

We can also build a model to test the whole dataset together, looking at the residuals, however this takes some additional parts to the formula given to aov() and specifying the correct data from this model. For this, we need to include an error term for the Experiment and add this to the formula as below. The model produced contains two “Stratum 1” and “Stratum 2” containg information on the contribution of the experiment and the differences between the conditions “Within” the experiments, which is what we are interested in.

This is getting towards more advanced model building, which is beyong the scope of this course, so do not worry too much about this part. If in doubt, use the above approach to look at each group individually.

model <- aov(Values~Condition+Error(factor(Experiment)), data = neutrophils.long)
model
## 
## Call:
## aov(formula = Values ~ Condition + Error(factor(Experiment)), 
##     data = neutrophils.long)
## 
## Grand Mean: 63.55
## 
## Stratum 1: factor(Experiment)
## 
## Terms:
##                 Residuals
## Sum of Squares     4208.7
## Deg. of Freedom         4
## 
## Residual standard error: 32.43725
## 
## Stratum 2: Within
## 
## Terms:
##                 Condition Residuals
## Sum of Squares   10947.75   1532.50
## Deg. of Freedom         3        12
## 
## Residual standard error: 11.30081
## Estimated effects may be unbalanced
ggqqplot(residuals(model$Within)) +
  theme_bw() 

So overall the data don’t look amazing on these tests, but each group passes normality tests and the QQ plots OK, if slightly borderline, so we will continue with our ANOVA.

Do the omnibus and post-hoc tests. For the omnibus test, we need to include the experiment as part of the analysis, so we use a slightly different approach to inputting to anova_test(), as below.

We need to again specify that there is pairing across the conditions when doing post-hoc tests (remembering to sort by our ID, so the correct pairings are inferred), and in this case we are interested in comparing to the WT group, so we include this as a ref.group.

neutrophils.long %>%
  anova_test(dv = Values, wid = Experiment, within = Condition) -> res.aov
res.aov
## ANOVA Table (type III tests)
## 
## $ANOVA
##      Effect DFn DFd      F        p p<.05   ges
## 1 Condition   3  12 28.575 9.51e-06     * 0.656
## 
## $`Mauchly's Test for Sphericity`
##      Effect     W    p p<.05
## 1 Condition 0.121 0.36      
## 
## $`Sphericity Corrections`
##      Effect   GGe    DF[GG]    p[GG] p[GG]<.05  HFe     DF[HF]    p[HF]
## 1 Condition 0.692 2.07, 8.3 0.000179         * 1.45 4.35, 17.4 9.51e-06
##   p[HF]<.05
## 1         *
get_anova_table(res.aov)
# post-hoc test: bonferroni #
neutrophils.long %>%
  arrange(Experiment) %>%
  pairwise_t_test(Values~Condition, paired=TRUE, ref.group = "WT", p.adjust.method = "bonferroni")
# post-hoc test: holm #
neutrophils.long %>%
  arrange(Experiment) %>%
  pairwise_t_test(Values~Condition, paired=TRUE, ref.group = "WT", p.adjust.method = "holm") 

Put together into a final plot - we can look at the difference to WT and/or show the linkages

neutrophils.long %>%
  group_by(Experiment) %>%
  mutate(Difference=Values-Values[Condition=="WT"]) %>%
  ungroup() -> neutrophils.long

neutrophils.long %>%
  filter(Condition !="WT") %>%
  select(-Values) -> neutrophils.differences

neutrophils.differences %>%
  ggplot(aes(x=Condition, y=Difference, fill=Condition)) +
  geom_bar(stat = "summary", fun="mean", colour="black", show.legend = FALSE)+
  stat_summary(geom="errorbar", fun.data=mean_cl_normal, width=0.15)+
  geom_jitter(aes(x=Condition, y=Difference, colour=Experiment), height=0, width=0.1, size=4, show.legend=FALSE)+
  ylab("Mean difference from WT with 95% CI")+
  scale_y_continuous(breaks=seq(from=-40, by=10, to=80))+
  scale_fill_brewer(palette = "PuOr")

neutrophils.long %>%
  arrange(Experiment) %>%
  pairwise_t_test(Values~Condition, paired=TRUE, ref.group = "WT", p.adjust.method = "holm") %>%
  add_xy_position(x="Condition") -> neutro.results

neutrophils.long %>%
  ggplot(aes(x=Condition, y=Values))+
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(aes(x=Condition, y=Values, colour=Experiment), height = 0, width = 0.2, size=4, alpha=0.5)+
  stat_pvalue_manual(neutro.results, label = "p = {p.adj}", label.size=5)+
  scale_y_continuous(breaks=seq(from=0, by=20, to=180))

results.sig <- neutro.results %>%
  filter(p.adj<0.05)

neutrophils.long %>%
  ggplot(aes(x=Condition, y=Values, group=Experiment, colour=Experiment, fill=Experiment))+
  geom_line(linewidth=2)+
  geom_point(size=3, shape = 21, colour= "black", stroke=2) +
  geom_signif(comparisons = list(c("WT","KO"),
                                 c("WT","KO+T2")),
              annotations = results.sig$p.adj,
              y_position = c(132, 140),
              colour = "black",
              show.legend = FALSE)

Exercise 4: Correlation

Set up the packages we need.

library(tidyverse)
library(rstatix)
library(gglm)
theme_set(theme_bw(base_size = 14))

Roe deer

Is there a relationship between parasite burden and body mass in roe deer?

Read in the data and plot.

deer <- read_csv("Datasets to use/Roe.deer.csv")
deer
deer %>%
  ggplot(aes(PL, BM, colour = sex)) + 
  geom_point(size=3) +
  xlab("Parasite buden") +
  ylab("Body mass")

Lets create a line of best fit and add it to the graph

deer.female <- deer %>%
  filter(sex == "F")
lm(BM ~ PL, data=deer.female)-> fit.deer.female
coefficients(fit.deer.female) -> coef.deer.female
coef.deer.female
## (Intercept)          PL 
##   25.041029   -1.889378
deer.male <- deer %>%
  filter(sex == "M")
lm(BM ~ PL, data=deer.male)-> fit.deer.male
coefficients(fit.deer.male) -> coef.deer.male
coef.deer.male
## (Intercept)          PL 
##   30.204398   -4.621496
deer %>%
  ggplot(aes(PL, BM, colour = sex)) + 
  geom_point(size=3) +
  geom_abline(intercept = coef.deer.female[1], slope = coef.deer.female[2], colour = "orange2", linewidth=1) +
  geom_abline(intercept=coef.deer.male[1],slope=coef.deer.male[2], colour = "purple2", linewidth=1) +
  scale_colour_manual(values=c("orange2", "purple2"))+
  xlab("Parasite buden") +
  ylab("Body mass")

How good are our models? Lets calculate Cook’s Distance and the residuals, and plot the diagnostic plots.

deer.female$cook <- cooks.distance(fit.deer.female)
deer.female$residual <- rstandard(fit.deer.female)
deer.female %>%
  arrange(desc(cook)) %>%
  head(5)
deer.female %>%
  arrange(desc(abs(residual))) %>%
  head(5)
par(mfrow=c(2,2))
plot(fit.deer.female)

deer.male$cook <- cooks.distance(fit.deer.male)
deer.male$residual <- rstandard(fit.deer.male)
deer.male %>%
  arrange(desc(cook)) %>%
  head(5)
deer.male %>%
  arrange(desc(abs(residual))) %>%
  head(5)
par(mfrow=c(2,2))
plot(fit.deer.male)

There aren’t any points that cause concern based on Cook’s distance or the residuals and the QQ plots look OK so we will carry on with our correlation.

deer %>%
  group_by(sex) %>%
  cor_test(PL, BM)

There is a negative correlation between parasite load and fitness but this relationship is only significant for the males (r = -0.75, p = 0.0049 vs. females: r = -0.30, p = 0.2940).

Exam anxiety

Read in and plot the data

read_csv("Datasets to use/exam.anxiety.csv") -> exam.anxiety
exam.anxiety
exam.anxiety %>%
  ggplot(aes(x=Revise, y=Anxiety, colour=Gender)) + geom_point(size=3)

Create fits for males and females separately and plot.

exam.anxiety %>%
  filter(Gender=="Female") -> exam.anxiety.female
lm(Anxiety~Revise, data=exam.anxiety.female) -> fit.female
coefficients(fit.female) -> cf.fit.female

exam.anxiety %>%
  filter(Gender=="Male") -> exam.anxiety.male
lm(Anxiety~Revise, data=exam.anxiety.male) -> fit.male
coefficients(fit.male) -> cf.fit.male

exam.anxiety %>%
  ggplot(aes(x=Revise, y=Anxiety, colour=Gender, label=Code))+
  geom_point(size=2)+
  geom_abline(intercept=cf.fit.male[1], slope=cf.fit.male[2], colour="purple", linewidth=1)+
  geom_abline(intercept=cf.fit.female[1], slope=cf.fit.female[2], colour="orange", linewidth=1)+
  scale_colour_manual(values=c("orange", "purple"))

It looks like there might be a couple of outliers in the dataset - lets add labels so we can see which ones they are.

exam.anxiety %>%
  ggplot(aes(x=Revise, y=Anxiety, colour=Gender, label=Code))+
  geom_point(size=2)+
  geom_abline(intercept=cf.fit.male[1], slope=cf.fit.male[2], colour="purple", linewidth=1)+
  geom_abline(intercept=cf.fit.female[1], slope=cf.fit.female[2], colour="orange", linewidth=1)+
  scale_colour_manual(values=c("orange", "purple"))+
  geom_text(nudge_x = 3, size=4, show.legend = FALSE)

Lets have a look at the diagnostics plots.

par(mfrow=c(2,2))
plot(fit.female)

plot(fit.male)

As suspected, there are definitely some problematic looking points. In females, it looks like there are two potential outliers (43 and 11 on the diagnostic plots) and in males there looks to be one outlier that is very far off from the rest of the data (40).

The numbers in the diagnostic plots are the row numbers of the points in the input data, so we can pull out which ID this is in the original data by pulling out those rows

exam.anxiety.female[c(11, 43),]
exam.anxiety.male[40,]

So in females these correspond to 87 and 24 in the original dataset, and in the males it corresponds to 78.

Alternatively, if we rename the rownames for the input data we can make the values shown on the diagnostic plots the same as the ID (Code) in the original data:

exam.anxiety %>%
  filter(Gender=="Female") %>%
  column_to_rownames("Code") -> exam.anxiety.id.female
lm(Anxiety~Revise, data=exam.anxiety.female) -> fit.id.female

exam.anxiety %>%
  filter(Gender=="Male") %>%
  column_to_rownames("Code") -> exam.anxiety.id.male
lm(Anxiety~Revise, data=exam.anxiety.male) -> fit.id.male

par(mfrow=c(2,2))
plot(fit.id.female)

plot(fit.id.male)

So now we can directly see that the outlying points are 87 and 24 in females and 78 in males. This also works if you have text labels for your points.

Now we have identified these points, lets have a look at how good the existing model is and consider whether to remove them.

summary(fit.male)
## 
## Call:
## lm(formula = Anxiety ~ Revise, data = exam.anxiety.male)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -73.124  -2.900   2.221   6.750  16.600 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  84.1941     2.6213  32.119  < 2e-16 ***
## Revise       -0.5353     0.1016  -5.267 2.94e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.3 on 50 degrees of freedom
## Multiple R-squared:  0.3568, Adjusted R-squared:  0.344 
## F-statistic: 27.74 on 1 and 50 DF,  p-value: 2.937e-06
summary(fit.female)
## 
## Call:
## lm(formula = Anxiety ~ Revise, data = exam.anxiety.female)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.687  -6.263  -1.204   4.197  38.628 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 91.94181    2.27858   40.35  < 2e-16 ***
## Revise      -0.82380    0.08173  -10.08 1.54e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.42 on 49 degrees of freedom
## Multiple R-squared:  0.6746, Adjusted R-squared:  0.668 
## F-statistic: 101.6 on 1 and 49 DF,  p-value: 1.544e-13
exam.anxiety %>%
  group_by(Gender) %>%
    cor_test(Revise, Anxiety)

We can see that based on all the data including outliers we do not get a very good fit for males (r=-0.6, r2=0.3568), and we could potentially also improve on the fit for females (R2=0.6746). Lets have a look at removing our influential outliers.

First, focus on the males - calculate residuals and cooks distance to filter based on these.

cooks.distance(fit.male) -> cook.m
rstandard(fit.male) -> st.resid.m

exam.anxiety.male %>%
  add_column(st.resid.m) %>%
  add_column(cook.m) -> exam.anxiety.male
exam.anxiety.male %>%
  arrange(desc(abs(st.resid.m)))
exam.anxiety.male %>%
  filter(abs(st.resid.m)<3) -> exam.anxiety.male.78

78 has a residual below -3, so we remove this one. Lets fit another model without 78 and see if this has improved it.

lm(Anxiety~Revise, data=exam.anxiety.male.78) -> fit.male.78

summary(fit.male.78)
## 
## Call:
## lm(formula = Anxiety ~ Revise, data = exam.anxiety.male.78)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.0296  -3.8704   0.5626   6.0786  14.2525 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 86.97461    1.64755  52.790  < 2e-16 ***
## Revise      -0.60752    0.06326  -9.603 7.59e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.213 on 49 degrees of freedom
## Multiple R-squared:  0.653,  Adjusted R-squared:  0.6459 
## F-statistic: 92.22 on 1 and 49 DF,  p-value: 7.591e-13
exam.anxiety.male.78 %>%
  cor_test(Revise, Anxiety)
par(mfrow=c(2,2))
plot(fit.male.78)

The new fit is a big improvement after removal of 78, now r=-0.81 and r2=0.653 (up from r=-0.6, r2=0.3568). The diagnostic plots look a lot more reasonable now too.

Lets move on to the female data - first we will try filtering using the standardised residuals, as we did for the males.

cooks.distance(fit.female) -> cook.f
rstandard(fit.female) -> st.resid.f

exam.anxiety.female %>%
  add_column(st.resid.f) %>%
  add_column(cook.f) -> exam.anxiety.female
exam.anxiety.female %>%
  arrange(desc(abs(st.resid.f)))
exam.anxiety.female %>%
  filter(abs(st.resid.f)<3) -> exam.anxiety.female.87

This removes 87 - lets have a look at the fit now.

lm(Anxiety~Revise, data=exam.anxiety.female.87) -> fit.female.87

summary(fit.female.87)
## 
## Call:
## lm(formula = Anxiety ~ Revise, data = exam.anxiety.female.87)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.7518  -5.7069  -0.7782   3.2117  18.5538 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 92.24536    1.93591   47.65   <2e-16 ***
## Revise      -0.87504    0.07033  -12.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.849 on 48 degrees of freedom
## Multiple R-squared:  0.7633, Adjusted R-squared:  0.7584 
## F-statistic: 154.8 on 1 and 48 DF,  p-value: < 2.2e-16
exam.anxiety.female.87 %>%
  cor_test(Revise, Anxiety)
par(mfrow=c(2,2))
plot(fit.female.87)

This has slightly improved the model: r=-0.87, r2=0.7633 (from r=-0.82, r2=-0.6746). However, from the diagnostic plots we can see that 11 (24 in dataset) has a Cook’s distance greater than 1, so lets have a look at removing this point too.

exam.anxiety.female.87 %>%
  arrange(desc(cook.f))
exam.anxiety.female.87 %>%
  filter(cook.f<1) -> exam.anxiety.female.87.24

lm(Anxiety~Revise, data=exam.anxiety.female.87.24) -> fit.female.87.24

summary(fit.female.87.24)
## 
## Call:
## lm(formula = Anxiety ~ Revise, data = exam.anxiety.female.87.24)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.4485  -5.9633  -0.6506   4.4159  17.3276 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 90.61640    1.93047   46.94  < 2e-16 ***
## Revise      -0.77307    0.07697  -10.04 2.78e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.354 on 47 degrees of freedom
## Multiple R-squared:  0.6822, Adjusted R-squared:  0.6754 
## F-statistic: 100.9 on 1 and 47 DF,  p-value: 2.78e-13
exam.anxiety.female.87.24 %>%
  cor_test(Revise, Anxiety)
par(mfrow=c(2,2))
plot(fit.female.87.24)

Removing 24 has actually made the model slightly worse, with r=-0.83, r2=0.6822, down from r=-0.87, r2=0.7633. So, if we want to select the model that is the best fit for the data, we will stick with removal of 87 but keep 24.

We can check our models another way using the Shapiro normality test.

shapiro_test(st.resid.m)
shapiro_test(st.resid.f)
exam.anxiety.male.78 %>%
  shapiro_test(st.resid.m)
exam.anxiety.female.87 %>%
  shapiro_test(st.resid.f)

Before removal of the outliers, our data were failing normality tests but this improves after outlier removal. That being said, the male data do still fail this test, so this is something to keep in mind. Another thing to note, is that how to deal with outliers can be tricky to decide, especially when it is not your own data and you do not have the context behind it. You should decide how you will deal with outliers before seeing your data, and be consistent across experiments (i.e. do not delete them when you get a significant result without them and leave them in when you get a significant result with them). How to deal with them also depends on your end goal - if you want to create the best model for predicting future data, you might want to remove outliers to increase model fit for the bulk of the data, however if you are looking at the correlation/relationship between your variables then outliers might be real data points that shouldn’t just be ignored. In this case, we will carry on with the rest of the analysis based on removing the two outliers.

Finally, we can join together our males and females and see if there are differences between them, plus plot a final graph of the results.

bind_rows(exam.anxiety.female.87, exam.anxiety.male.78) -> exam.anxiety.87.78

lm(Anxiety ~ Revise + Gender + Revise*Gender, data=exam.anxiety.87.78) -> fit.genders

summary(fit.genders)
## 
## Call:
## lm(formula = Anxiety ~ Revise + Gender + Revise * Gender, data = exam.anxiety.87.78)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.0296  -5.6022  -0.3294   5.6091  18.5538 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       92.24536    1.86694  49.410  < 2e-16 ***
## Revise            -0.87504    0.06783 -12.901  < 2e-16 ***
## GenderMale        -5.27075    2.53296  -2.081  0.04008 *  
## Revise:GenderMale  0.26752    0.09445   2.832  0.00562 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.534 on 97 degrees of freedom
## Multiple R-squared:  0.7228, Adjusted R-squared:  0.7142 
## F-statistic: 84.32 on 3 and 97 DF,  p-value: < 2.2e-16

We can conclude that: Among male and female students, there is a significant relationship between time spent revising for an exam and anxiety (r=-0.81, p=7.59e-13 and r=-0.87, p<2e-16, respectively). Furthermore, that relationship is significantly stronger in females than in males (p=0.0056).

Finally, lets produce a plot of our results.

coefficients(fit.male.78) -> cf.fit.male.78
coefficients(fit.female.87) -> cf.fit.female.87

exam.anxiety.87.78 %>%
  ggplot(aes(Revise, Anxiety, colour=Gender))+geom_point(size=4)+
  geom_abline(aes(intercept=cf.fit.male.78[1], slope=cf.fit.male.78[2]), colour="purple", linewidth=1)+
  geom_abline(aes(intercept=cf.fit.female.87[1], slope=cf.fit.female.87[2]), colour="orange", linewidth=1)+
  scale_colour_manual(values = c("orange", "purple"))+
  scale_x_continuous(breaks=seq(from=0, by=10, to=100))+
  scale_y_continuous(breaks=seq(from=0, by=10, to=100))+
  annotate("text", label = "Females: r = -0.87", x = 50, y = 30, size = 6, colour = "orange")+
  annotate("text", label = "Males: r = -0.81", x = 80, y = 50, size = 6, colour = "purple")+
  annotate("text", label = "Interaction: Revise*Gender: p=0.00562", x = 40, y = 0, size = 6, colour = "black")+
  ggtitle("Relationship between time spent revising and anxiety in male and female students")

Exercise 5: Non-parametric

Set up the packages we need.

library(tidyverse)
library(ggpubr)
library(rstatix)
library(ggsignif)
theme_set(theme_bw(base_size = 14))

Oxbridge rivalry: T-shirts [Mann-Whitney]

Can Cambridge students tell the difference between T-shirts from Oxford or Cambridge?

Read in and plot the data.

read_csv("Datasets to use/smelly.teeshirt.csv") -> smelly.teeshirt

smelly.teeshirt %>%
  ggplot(aes(x=university, y=smell, fill=university))+
  geom_boxplot(alpha=0.2, show.legend = FALSE, outlier.shape = NA)+
  geom_jitter(height=0, width=0.1, size=4, colour="firebrick4", alpha = 0.8, show.legend = FALSE)+
  scale_fill_brewer(palette="Dark2")+
  scale_y_continuous(breaks=seq(from=0, by=1, to=8), limits = c(0, 8))

As we are using a scoring system, this fails the assumption of linearity (or interval data) - that the distance between points of the scale should be equal at all parts along the scale - so we will use a non-parametric test for our analysis.

Based on the above assumption not being met and so using a non-parametric test, we do not need our data to meet any parametric assumptions, so it is not necessary to carry out tests to make sure our data meet these. This means we can go straight for the test after having a look at our data.

smelly.teeshirt %>%
  wilcox_test(smell~university)

Based on these data, we conclude that Cambridge students can tell the difference between t-shirts from Oxford and Cambridge students, based on smell (U = 5, p = 0.00479).

However, there is a significant flaw in the design of the experiment: the T-shirts had logos on, meaning the study is not blinded, and subject to bias. It therefore seems likely that the students rated the Oxford t-shirts worse because they had the logo on, rather than anything to do with the smell. A better designed study would have used the same t-shirts for both Oxford and Cambridge students, and could have used a paired design to better account for variability in scoring between individuals.

Nevertheless, lets plot our results so we get a final figure.

smelly.teeshirt %>%
  wilcox_test(smell~university) %>%
  add_xy_position()-> smelly.test

smelly.teeshirt %>%
  ggplot(aes(x=university, y=smell))+
  geom_boxplot(aes(fill=university), alpha=0.2, show.legend = FALSE)+
  geom_jitter(height=0, width=0.1, size=4, colour="firebrick4", alpha = 0.8, show.legend = FALSE)+
  scale_fill_brewer(palette="Dark2")+
  scale_y_continuous(breaks=seq(from=0, by=1, to=8), limits = c(0, 8))+
  stat_pvalue_manual(smelly.test, label="Mann-Whitney test, p = {p}", size=5)

smelly.teeshirt %>%
  ggplot(aes(x=university, y=smell))+
  geom_boxplot(aes(fill=university), alpha=0.2, show.legend = FALSE)+
  geom_jitter(height=0, width=0.1, size=4, colour="firebrick4", alpha = 0.8, show.legend = FALSE)+
  scale_fill_brewer(palette="Dark2")+
  scale_y_continuous(breaks=seq(from=0, by=1, to=8), limits = c(0, 8))+
  geom_signif(comparisons = list(c("Cambridge", "Oxford")), test = "wilcox.test", 
              map_signif_level = TRUE, textsize = 8)

Botulinum [Wilcoxson paired]

Do botulinum toxin injections reduce muscle spasticity levels?

Read in and have an initial look at the data and the differences.

read_csv("Datasets to use/botulinum.long.csv") -> botulinum

botulinum %>%
  mutate(treatment = factor(treatment, levels = c("before", "after"))) -> botulinum

botulinum %>%
  ggplot(aes(x=treatment, y=scores))+
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(height=0, width=0.1) +
  scale_x_discrete(limits =c("before", "after"))

botulinum %>%
  group_by(id) %>%
  mutate(difference = scores - scores[treatment == 'before']) %>%
  filter(treatment == "after") %>%
  ggplot(aes("Difference", difference))+
  geom_jitter(height = 0, width=0.1, size=5, alpha=0.5)+
  stat_summary(geom="errorbar", fun=median,fun.min=median,fun.max=median, width = 0.5)+
  scale_y_continuous(breaks=seq(from=-7, by=1, to=0), limits = c(-7, 0))+
  xlab(NULL)

botulinum %>%
  ggpaired(x = "treatment", 
           y = "scores", 
           color = "treatment",
           id = "id",
           palette = "Dark2", 
           line.color = "gray", 
           line.size = 0.4,
           xlab = "Treatment",
           ylab = "Scores", legend = "none")+
  scale_y_continuous(breaks=seq(from =0, by=1, to=10), limits = c(0,10))

Again, this is based on a scoring system so would not meet the assumption of linearity, so we will go straight to running our non-parametric test.

Run the Wilcoxson paired test

botulinum %>%
  arrange(id) %>%
  wilcox_test(scores~treatment, paired = TRUE)

Answer: There is a significant difference pre- and post- treatment in ratings of muscle spasticity (W = 45, p = 0.00826).

Produce the final figure

botulinum %>%
  arrange(id) %>%
  wilcox_test(scores~treatment, paired = TRUE) -> bot.test

botulinum %>%
  ggpaired(x = "treatment", 
           y = "scores", 
           color = "treatment",
           id = "id",
           palette = "Dark2", 
           line.color = "gray", 
           line.size = 0.4,
           xlab = "Treatment",
           ylab = "Scores", legend = "none")+
  scale_y_continuous(breaks=seq(from =0, by=1, to=12), limits = c(0,12))+
  stat_pvalue_manual(bot.test, label="Wilcoxon's signed-rank test
                     p = {p}", y.position = 11)

Creatine [Kruskal-Wallis]

Does the average weight gain depend on the creatine group to which people were assigned?

Read in and plot the data

read_csv("Datasets to use/creatine.csv")  -> creatine

creatine %>%
  ggplot(aes(creatine, gain, colour=creatine))+
  geom_jitter(height = 0, width=0.1, size=4, show.legend = FALSE)

creatine %>%
  ggplot(aes(creatine, gain, fill=creatine))+
  geom_boxplot(show.legend = FALSE)+
  scale_fill_brewer(palette="Oranges")

The data do not look very normally distributed, and there look to be large differences in the variability across the groups - lets have a look at a QQ plot, normality test and levene’s test.

model  <- aov(gain~creatine, data = creatine)
ggqqplot(residuals(model)) +
  theme_bw()

ggqqplot(creatine, x = "gain", facet.by = "creatine")

shapiro_test(residuals(model))
creatine %>%
  group_by(creatine) %>% 
  shapiro_test(gain)
creatine %>%
  levene_test(gain ~ creatine)

The Shapiro test is highly significant and there are a couple of points outside the QQ plot confidence bands.

We may want to test for outliers in this dataset, to see which points are identified

creatine %>%
  group_by(creatine) %>%
  identify_outliers(gain)

We can see that 2 values in the once group are identified as extreme outliers, as well as one value in the no group. We could look to remove these, but that is a relatively high proportion of our data to exclude without a very good reason to do so (20% of the data). It therefore seems more reasonable to leave these in but use a non-parametric approach.

Run the Kruskal Wallis and post-hoc Dunn’s tests.

creatine %>%
  kruskal_test(gain~creatine)
creatine %>%
  dunn_test(gain~creatine) 

Answer: This study did not demonstrate any effect of creatine on weight gain (H = 3.87, p = 0.145).

It may be that the very high point in the once group is actually an implausible amount of gain, and therefore we think this is a data error. Assuming this is the case, lets repeat our analysis after removal of this point.

creatine.outlier <- creatine %>%
  filter(gain<4000)

creatine.outlier %>%
  ggplot(aes(creatine, gain, fill=creatine))+
  geom_boxplot(outlier.shape = NA, show.legend = FALSE)+
  geom_jitter(height = 0, width=0.1, size=4, alpha=0.5, show.legend = FALSE)+
  scale_fill_brewer(palette="Oranges")

model  <- aov(gain~creatine, data = creatine.outlier)
ggqqplot(residuals(model)) +
  theme_bw()

ggqqplot(creatine.outlier, x = "gain", facet.by = "creatine")

shapiro_test(residuals(model))
creatine.outlier %>%
  group_by(creatine) %>% 
  shapiro_test(gain)
creatine.outlier %>%
  levene_test(gain ~ creatine)

Even after removal of the extreme outlier in the once group, the data still do not pass normality tests and also fail the assumption of homogeneity of variance, so we will stick with a non-parametric approach.

creatine.outlier %>%
  kruskal_test(gain~creatine)
creatine.outlier %>%
  dunn_test(gain~creatine) 

After removing the outlier we get a significant result on the omnibus part of the test (H=6.0, p=0.0498) but none of the pairwise comparisons reach significance.

Violin [Friedman]

Which violin is the best according to the 10 violinists?

Read in and plot the data

read_csv("Datasets to use/violin.csv")  -> violin

violin %>%
  ggplot(aes(Violin, Score)) +
  geom_violin()+
  geom_point(aes(colour=Rater)) +
  geom_line(aes(group = Rater, colour=Rater)) 

This is based on a scoring system, so we will use a non-parametric appraoch, meaning we can now run the test.

violin %>%
  friedman_test(Score ~ Violin|Rater)
violin %>%
  arrange(Rater) %>%
  wilcox_test(Score ~ Violin, paired = TRUE, p.adjust.method = "bonferroni")

There is a significant difference between the three violins (p=0.0053), with violin A being ranked higher than violin C (p=0.017).

violin %>%
  arrange(Rater) %>%
  wilcox_test(Score ~ Violin, paired = TRUE, p.adjust.method = "bonferroni") %>%
  add_xy_position(step.increase = 0.4) %>%
  mutate(y.position = y.position + 2)-> violin.test

violin %>%
  ggplot(aes(Violin, Score)) +
  geom_violin(trim=FALSE, aes(fill=Violin), alpha = 0.5, show.legend = FALSE)+
  geom_point(aes(colour=Rater), size=4) +
  geom_line(aes(group = Rater, colour=Rater), size=1) +
  scale_y_continuous(breaks=seq(from =0, by=1, to=14), limits = c(0,14))+
  scale_fill_brewer(palette="Pastel1")+
  stat_pvalue_manual(violin.test, label = "p = {p.adj}")

results.sig <- violin.test %>%
  filter(p.adj<0.05)

violin %>%
  ggplot(aes(Violin, Score)) +
  geom_violin(aes(fill=Violin), alpha = 0.5, show.legend = FALSE)+
  geom_line(aes(group = Rater), size=0.8, colour = "grey") +
  geom_point(aes(colour=Rater), size=4) +
  scale_y_continuous(breaks=seq(from =2, by=1, to=11), limits = c(2,11))+
  scale_fill_brewer(palette="Accent")+
  geom_signif(comparisons = list(c("Violin A","Violin C")),
              annotations = results.sig$p.adj,
              y_position = c(10, 11),
              colour = "black",
              show.legend = FALSE)

Dominance [Spearman Rank]

Is social dominance associated with parasitism?

Read in and plot the data

read_csv("Datasets to use/dominance.csv") -> dominance

dominance %>%
  ggplot(aes(rank, eggs.per.gram))+
  geom_col(fill="midnightblue", colour="black", size=1)+
  scale_x_continuous(breaks=seq(1:6))+
  scale_y_continuous(breaks = seq(0, 6000, 1000))

dominance %>%
  ggplot(aes(rank, eggs.per.gram))+
  geom_line()+
  scale_x_continuous(breaks=seq(1:6))+
  scale_y_continuous(breaks = seq(0, 6000, 1000))

It looks like there probably is a relationship, so lets run a Spearman Rank test

dominance %>%
  cor_test(rank,eggs.per.gram, method = "spearman")

Answer: the relationship between dominance and parasitism is significant (ρ = -0.94, p = 0.0167) with high ranking males harbouring a heavier burden.

dominance %>%
  ggplot(aes(rank, eggs.per.gram))+
  geom_col(fill="midnightblue", colour="black", size=1)+
  scale_x_continuous(breaks=seq(1:6))+
  scale_y_continuous(breaks = seq(0, 6000, 1000))+
  annotate("text", label = "rho=-0.94, p=0.017", x=5, y=5500, size=8)+
  ggtitle("Association between dominance and parasitism")

Exercise 6: Categorical/qualitative

Set up the packages we need.

library(tidyverse)
library(rstatix)
library(ggpubr)
theme_set(theme_bw(base_size = 14))

Cats & dogs

Is there an association between reward and dancing in dogs?

Read in and have a look at the data.

read_csv("Datasets to use/dogs.csv") -> dogs
dogs

The dogs data is already in a contingency table format, so we do not need to reformat it for the test. However, to plot the data we need to restructure it, so that we have proportions that can be plotted.

dogs %>%
  mutate(Total = No+Yes) %>%
  pivot_longer(cols= 2:4, names_to = "Dance", values_to = "count" ) -> dogs.long 

dogs.long %>%
  group_by(Training) %>%
  mutate(fraction = count/count[Dance =="Total"]) %>%
  filter(Dance != "Total") %>%
  ungroup() -> dogs.long
  
dogs.long %>%
  ggplot(aes(x=Training, y=fraction, fill=Dance)) +
  geom_col(colour="black")+
  scale_fill_brewer(palette = 1)

We have a 2x2 table so will use Fisher’s Exact test.

dogs %>%
  select(No,Yes) %>%
  fisher_test(detailed = TRUE)

For dogs, there is no significant association between dancing and reward. This is in contrast to cats, who are more likely to dance having received food.

Cane toads

Is the proportion of cane toads infected by intestinal parasites the same in 3 different areas of Queensland?

Read in and plot - we again need to restructure in order to do this.

cane.toad <- read_csv("Datasets to use/cane.toad.csv")

cane.toad %>%
  pivot_longer(cols= 2:4, names_to = "area", values_to = "count" ) -> cane.toad.long

cane.toad.long %>%
  group_by(area) %>%
  mutate(fraction = count/count[Infection =="Total"]) %>%
  filter(Infection != "Total") %>%
  ungroup() -> cane.toad.long

cane.toad.long %>%
  ggplot(aes(x=area, y=fraction, fill=Infection)) +
  geom_col(colour="black")+
  scale_fill_brewer(palette = 2)

First, look at the overall differences

cane.toad %>%
  filter(Infection != "Total") %>%
  select(-Infection) %>%
  chisq_test()

There is a significant association between infection and area (p=0.00154), so lets do some pairwise comparisons to see which areas have significantly different proportions.

For this, if we convert our “Infection” column to rownames, then these will be used in the analysis output, making it easier to read and interpret.

cane.toad %>%
  filter(Infection != "Total") %>%
  column_to_rownames("Infection") %>%
  pairwise_fisher_test(p.adjust.method = "holm")->toad.test
toad.test

The proportion of cane toads infected by intestinal parasites is significantly lower in Bowden than Mackay and Rockhampton (adjusted p=0.0037 and 0.045, respectively).

Lets put it on a graph

cane.toad.long %>%
  ggplot(aes(x=area, y=fraction)) +
  geom_col(colour="black", aes(fill=Infection))+
  scale_fill_brewer(palette = 2)+
  scale_y_continuous(breaks=seq(from =0, by=0.2, to=1.25), limits = c(0,1.25))+
  stat_pvalue_manual(toad.test,  y.position = c(1.05, 1.13, 1.17), label = "p = {p.adj}")+
  annotate("text", label="Fisher's exact tests with Holm correction", x=2, y=1.25, size=5)

Exercise 7: Mixed

Set up the packages we need.

library(tidyverse)
library(rstatix)
library(ggpubr)
library(ggsignif)
library(gglm)
theme_set(theme_bw(base_size = 14))

Iris flowers

iris.species

Does the mean petal length of iris flowers differ according to their species?

Start by reading in and plotting the data.

iris.species <- read_csv("Datasets to use/iris.species.csv")
iris.species
iris.species %>%
  ggplot(aes(Species,Petal.Length, colour = Species)) +
  geom_jitter(height=0, width=0.2, size = 2, show.legend = FALSE) +
  stat_summary(geom="crossbar", fun=mean, width=0.6, linewidth=0.3, show.legend = FALSE)

iris.species %>%
  ggplot(aes(x=Species, y=Petal.Length)) +
  geom_boxplot()

iris.species %>%
  ggplot(aes(x=Species, y=Petal.Length))+
  geom_violin(trim=TRUE, linewidth=1, show.legend=FALSE)+
  geom_jitter(height=0, width=0.2, size = 2, show.legend = FALSE) +
  ylab("Length (cm)")

This looks very clear cut, so you hardly need to do stats at all! However, the data look like they might be deviating a little from normality, with the Virginia species looking slightly skewed, and they look to have potentially different variability.

Lets have a look at the formal tests.

ggqqplot(iris.species, x = "Petal.Length", facet.by = "Species") +
  theme_bw()

model  <- lm(Petal.Length ~ Species, data = iris.species)
ggqqplot(residuals(model)) +
  theme_bw() 

shapiro_test(residuals(model))
iris.species %>%
  group_by(Species) %>%
  shapiro_test(Petal.Length)
iris.species %>%
  levene_test(Petal.Length ~ Species)

As suspected, there are points outside the confidence bands on the QQ plot, and the Virginica species fails the Shapiro-Wilk test. The Levene’s test is also significant, indicating there is not homogeneity of variance.

Therefore, lets run the non-parametric Mann-Whitney test.

iris.species %>%
  wilcox_test(Petal.Length~Species) 

Answer: the Virginica species of iris has significantly longer petal length than the Setosa species (P<0.0001)

Lets put the result on a final graph.

iris.species %>%
  wilcox_test(Petal.Length~Species) %>%
  add_xy_position()-> smelly.test

iris.species %>%
  ggplot(aes(x=Species, y=Petal.Length, colour=Species))+
  geom_jitter(height=0, width=0.1, size=3, alpha = 0.8, show.legend = FALSE)+
  scale_color_brewer(palette="Dark2")+
  stat_summary(geom="crossbar", fun=median, width=0.6, linewidth=0.3, show.legend = FALSE) +
  scale_y_continuous(breaks=seq(from=0, by=1, to=8), limits = c(0, 8))+
  stat_pvalue_manual(smelly.test, label="Mann-Whitney test, p = {p}", size=5)

iris.sepal

You want to know whether the mean petal length of iris flowers correlates with the mean sepal length in the virginica species.

read_csv("Datasets to use/iris.sepal.csv") -> iris.sepal
iris.sepal
iris.sepal %>%
  ggplot(aes(Sepal.Length, Petal.Length)) + 
  geom_point(size=3)

There looks to be a strong outlier, so lets fit a line of best fit and look at the diagnostic tests.

lm(Petal.Length ~ Sepal.Length, data=iris.sepal)-> fit.sepal
coefficients(fit.sepal) -> coef.sepal

summary(fit.sepal)
## 
## Call:
## lm(formula = Petal.Length ~ Sepal.Length, data = iris.sepal)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3240 -0.2041  0.1031  0.3069  0.6280 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -1.8660     1.0940  -1.706    0.101    
## Sepal.Length   1.0961     0.1645   6.662 6.85e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5956 on 24 degrees of freedom
## Multiple R-squared:  0.649,  Adjusted R-squared:  0.6344 
## F-statistic: 44.38 on 1 and 24 DF,  p-value: 6.849e-07
iris.sepal$cook <- cooks.distance(fit.sepal)
iris.sepal$residual <- rstandard(fit.sepal)
iris.sepal
par(mfrow=c(2,2))
plot(fit.sepal)

The outlier looks like it is having an impact on our model. We can go back to the raw data (in iris.data) and see that this point actually comes from another iris species (setosa) and was incorrectly copied across, so we can definitely remove this.

Lets remove and fit a new model.

iris.sepal %>%
  filter(cook<1) -> iris.sepal.outlier
lm(Petal.Length ~ Sepal.Length, data=iris.sepal.outlier)-> fit.sepal.outlier
coefficients(fit.sepal.outlier) -> coef.sepal.outlier

summary(fit.sepal.outlier)
## 
## Call:
## lm(formula = Petal.Length ~ Sepal.Length, data = iris.sepal.outlier)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.61228 -0.20779 -0.03223  0.19445  0.38323 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.49857    0.54875   0.909    0.373    
## Sepal.Length  0.75561    0.08185   9.231 3.38e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2681 on 23 degrees of freedom
## Multiple R-squared:  0.7875, Adjusted R-squared:  0.7782 
## F-statistic: 85.22 on 1 and 23 DF,  p-value: 3.381e-09
par(mfrow=c(2,2))
plot(fit.sepal.outlier)

The fit looks a lot more reasonable now - lets run a Pearson’s correlation test and plot this on a graph.

iris.sepal.outlier %>%
  cor_test(Sepal.Length, Petal.Length)
iris.sepal.outlier %>%
  ggplot(aes(Sepal.Length, Petal.Length)) +
  geom_point(size=3) +
  geom_abline(intercept = coef.sepal.outlier[1], slope = coef.sepal.outlier[2])+
  annotate("text", label = "r = 0.89, p = 3.38e-09, r2 = 78.75%", x=6.3, y=6.4, size=6)

There is a significant positive relationship between petal length and sepal length (r=0.89, r2=0.7875, p=3.38e-09).

Recycling

Which intervention should the city use to maximise household recycling?

Read in and have a look at the data.

read_csv("Datasets to use/recycling.csv") -> recycling
recycling

Lets calculate proportions, so the data can be plotted on a graph.

recycling %>%
  mutate(Total = Recycle+Not.recycle) %>%
  pivot_longer(cols= 2:4, names_to = "Recycles", values_to = "count" ) -> recycling.long 

recycling.long %>%
  group_by(Intervention) %>%
  mutate(fraction = count/count[Recycles =="Total"]) %>%
  filter(Recycles != "Total") %>%
  ungroup() -> recycling.long
  
recycling.long %>%
  ggplot(aes(x=Intervention, y=fraction, fill=Recycles)) +
  geom_col(colour="black")+
  scale_fill_brewer(palette = 4)

Do a Chi-squared test, and pairwise comparisons between the interventions.

recycling %>%
  select(Recycle,Not.recycle) %>%
  chisq_test()
recycling %>%
  column_to_rownames("Intervention") %>%
  pairwise_fisher_test(p.adjust.method = "holm")->recycling.test
recycling.test

There is a significant difference between our interventions and recycling uptake (p=0.0075). Both interventions are significantly better than the control group (adjusted p=0.036 for both) but there is no significant difference between the use of a flyer or a phone call. Therefore, choice of intervention may come down to other considerations, incl. cost, resources, time, etc.

Lets put it on a graph

recycling.long %>%
  ggplot(aes(x=Intervention, y=fraction)) +
  geom_col(colour="black", aes(fill=Recycles))+
  scale_fill_brewer(palette = 4)+
  stat_pvalue_manual(recycling.test,  y.position = c(1.05, 1.10, 1.15), label = "p = {p.adj}")+
  annotate("text", label="Fisher's exact tests with Holm correction", x=2, y=1.25, size=5)

Income data

Generate a linear model describing the relationship between income and happiness. How happy would you expect someone earing 50k to be, based on your model?

read_csv("Datasets to use/income.data.csv") -> income
income
income %>%
  ggplot(aes(income, happiness)) + 
  geom_point(size=2)

Lets fit a line of best fit and look at the diagnostic tests.

lm(happiness ~ income, data=income)-> fit.income
coefficients(fit.income) -> coef.income

summary(fit.income)
## 
## Call:
## lm(formula = happiness ~ income, data = income)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.02479 -0.48526  0.04078  0.45898  2.37805 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.20427    0.08884   2.299   0.0219 *  
## income       0.71383    0.01854  38.505   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7181 on 496 degrees of freedom
## Multiple R-squared:  0.7493, Adjusted R-squared:  0.7488 
## F-statistic:  1483 on 1 and 496 DF,  p-value: < 2.2e-16
income$cook <- cooks.distance(fit.income)
income$residual <- rstandard(fit.income)
income
par(mfrow=c(2,2))
plot(fit.income)

There do not seem to be any outliers or other issues with our data, so lets plot it.

income %>%
  ggplot(aes(income, happiness)) +
  geom_point(size=2) +
  geom_abline(intercept = coef.income[1], slope = coef.income[2], 
              colour = "blue", linewidth = 1)+
  xlab("Income (x10^-4)")

We now want to calculate how happy someone earning 50k would be, based on the model. We can get the information on the line of best fit from the model summary.

summary(fit.income)
## 
## Call:
## lm(formula = happiness ~ income, data = income)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.02479 -0.48526  0.04078  0.45898  2.37805 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.20427    0.08884   2.299   0.0219 *  
## income       0.71383    0.01854  38.505   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7181 on 496 degrees of freedom
## Multiple R-squared:  0.7493, Adjusted R-squared:  0.7488 
## F-statistic:  1483 on 1 and 496 DF,  p-value: < 2.2e-16

Using the model intercept and slope (income coefficient), we can predict the happiness score we would expect for someone earning £50k

happy.50 <- 0.204 + 0.714*5
happy.50
## [1] 3.774

Answer: From the linear regression happiness = 0.204 + 0.714 x income(x10^-4) [y-int + slope x income] so if earning 50k, happiness=0.204+0.714*5=3.774.

Lets also add the line of best fit formula to the graph. We could also show our predicted point, if we wanted.

income %>%
  ggplot(aes(income, happiness)) +
  geom_point(size=2) +
  geom_abline(intercept = coef.income[1], slope = coef.income[2], 
              colour = "blue", linewidth = 1)+
  xlab("Income (x10^-4)") + 
  annotate("text", label = "Happiness=0.204+0.714*income(x10^-4)", x=3.5, y=6.5, size=6)

income %>%
  ggplot(aes(income, happiness)) +
  geom_point(size=2) +
  geom_abline(intercept = coef.income[1], slope = coef.income[2], 
              colour = "blue", linewidth = 1)+
  annotate("point", x=5, y=happy.50, colour = "red", size = 4)+
  xlab("Income (x10^-4)") + 
  annotate("text", label = "Happiness=0.204+0.714*income(x10^-4)", x=3.5, y=6.5, size=6)

Coffee - part 2

Which beans should you buy?

coffee <- read_csv("Datasets to use/coffee.country.csv")
coffee
coffee %>%
  ggplot(aes(x=Country, y=Score, colour=Country))+
  geom_jitter(height=0, width=0.3, size=3)+
  stat_summary(geom="crossbar", fun=mean, width=0.6, linewidth=0.3, show.legend = FALSE)

coffee %>%
  ggplot(aes(x=Country, y=Score, colour=Country))+
  geom_boxplot(show.legend = FALSE)

coffee %>%
  ggplot(aes(x=Country, y=Score, colour=Country))+
  geom_violin(show.legend = FALSE)+
  geom_jitter(height=0, width=0.25, alpha=0.5, size=3, show.legend = FALSE)

It looks like there might be a couple of outliers, particularly the low point in the Costa Rica group - lets have a look at the QQ plot and normality tests.

model  <- aov(Score ~ Country, data = coffee)
ggqqplot(residuals(model)) +
  theme_bw()

shapiro_test(residuals(model))
coffee %>%
  group_by(Country) %>% 
  shapiro_test(Score)
coffee %>%
  levene_test(Score ~ Country)

We can also test for outliers

coffee %>%
  group_by(Country) %>%
  identify_outliers(Score)

This dataset again has a question of whether or not to remove outliers. In fact, there is only one extreme outlier identified and it seems likely that the outliers are real results, as they are not implausible. Therefore, doing the analysis with the outliers included is likely preferable, unless there is a reason to exclude them or there was a decision to exclude any identified outliers prior to data analysis.

Including the potential outliers, the data fail normality tests and there are multiple points outside the confidence bands on the QQ plot. Also, given that the measure is a scoring system, where non-parametric tests are generally used, we will use a Kruskal-Wallis test.

coffee %>%
  kruskal_test(Score ~ Country)
coffee %>%
  dunn_test(Score ~ Country) 

Answer: Overall, there is an association between the country the beans are from and their score (H=23.65, p=2.95e-05). Coffee beans from Ethiopia receive significantly higher scores than those from China (p=0.0164), Costa Rica (p=8.64e-05), and Indonesia (p=3.55-04).

Finally, lets put our results on a plot.

coffee %>%
  dunn_test(Score ~ Country) %>%
  filter(p.adj<0.05) -> coffee.sig

coffee %>%
  ggplot(aes(Country, Score, colour=Country)) +
  geom_jitter(height=0, width=0.1, alpha=0.5, size=4, show.legend = FALSE)+
  stat_summary(geom="crossbar", fun=median, width=0.6, linewidth=0.3, show.legend = FALSE) +
  geom_signif(comparisons = list(c("China","Ethiopia"),
                                 c("Costa Rica","Ethiopia"),
                                 c("Ethiopia","Indonesia")),
              annotations = signif(coffee.sig$p.adj, digits=3),
              y_position = c(91, 92.5,93.5),
              colour = "black",
              show.legend = FALSE)

coffee %>%
  ggplot(aes(Country, Score, colour=Country)) +
  geom_jitter(height=0, width=0.1, alpha=0.5, size=4, show.legend = FALSE)+
  stat_summary(geom="crossbar", fun=median, width=0.6, linewidth=0.3, show.legend = FALSE) +
  stat_pwc(method = "dunn_test", label = "p.adj.signif", hide.ns = TRUE, show.legend = FALSE,
           )