Introduction

This is an example of loading in a search result from MaxQuant and analysing it in MSstats. We can do this using their shiny application, but doing it in R directly gives us more control over the process and lets us examine intermediate steps in the processing.

The data we are using comes from PRIDE accession PXD043985. It’s a yeast dataset which contains both whole proteome and an affinity pull down using the eIF2a protein. The interest here is to look at the proteins which are enriched in the pull down compared to the whole proteome.

Setup

We’re going to base this analysis on the MSstats package, but will supplement this with the standard tidyverse packages. We’ll use pheatmap for drawing heatmaps of the hits.

library(MSstats)
library(MSstatsConvert)
library(tidyverse)
library(pheatmap)
theme_set(theme_bw())

Loading Data

We are going to import two files from the MaxQuant output. These are:

  1. evidence.txt this is the main quantified file at the peptide level
  2. proteinGroups.txt this file provides protein level quantitation and shows how the peptides were combined

We’ll read in the evidence file first to look at some of the properties of the data, but we can then get MSStats to convert this into its standard format. We’re importing data from MaxQuant, but this would equally work with data from other search platforms.

Read in evidence file

We read in the file using the standard read_delim but we use the same column name repair that `read.delim would use as MSstats expects the names to be in this format.

read_delim(
  "evidence.txt",
  name_repair = "universal"
) -> evidence

head(evidence)

Read in protein file

We can also load the protein level information.

read_delim(
  "proteinGroups.txt", 
  name_repair = "universal"
) -> protein_groups

head(protein_groups)

Create annotation file

For the downstream analysis we also need to make up a tibble of annotations to say which group each file belongs to. There are only two groups here, there full proteomes and the affinity tag pull downs. We’ll make the annotation from the data in the evidence file.

We’re not doing a mass tagged experiment so we need to say that all of the samples are using light (L), ie normal masses.

evidence %>%
  distinct(`Raw.file`) %>%
  mutate(Condition = str_replace(Raw.file,"^.*-","")) %>%
  mutate(Condition = str_replace(Condition,"TAP_Prot","Prot")) %>%
  mutate(Condition = str_replace(Condition,"_Rep.","")) %>%
  arrange(Raw.file) %>%
  group_by(Condition) %>%
  mutate(BioReplicate = 1:n()) %>%
  ungroup() %>%
  add_column(IsotypeLabelType="L")  -> annotation

annotation

Properties of input data

We’ve already looked at the QC of this data using PTXQC, but we can also look directly into the evidence and protein data to see what we’re working with. MSstats will do some filtering for us when the data is loaded, but the exact metrics which are used will vary between different search programs.

Retention time

It’s good to see that we’re getting a nice even spread of peptides coming into the experiment through the duration of the retention time. We can see this visually.

evidence %>%
  ggplot(aes(x=Retention.time, colour=Raw.file)) +
  geom_density()