Introduction

The nature of many sequencing platforms is that their chemistry requires that specific adapter sequences are added to the end of the fragments to be sequenced.  These adapters fulfil roles such as allowing for specific amplification and priming to support the underlying sequencing chemistry. Ideally a sequencing library should only contain valid adapter+insert constructs and manufacturers use a variety of techniques and modifications to try to ensure that these are highly enriched in the library.

However, it can happen that these precautions sometimes fail and a set of adapter dimers – a pair of ligated adapters with no insert sequence – end up in the library.  These can still be sequenced because they contain all of the relevant parts of the sequencing template, but will produce no useful sequence.  If these constructs end up present at high proportions of the library they can soak up significant amounts of the sequencing capacity in a lane and cause a number of QC metrics to be triggered.

The Symptoms

There are a couple of different ways to spot that this type of contamination has occurred.  Because adapter dimers will always produce exactly the same sequence they will superimpose their sequence on the per-cycle base content plot.  The example below is obviously extreme, but a reduced version of this pattern would appear in more modesty contaminated libraries.

per_base_dimers

The introduction of a number of identical sequences will also show up as a sharp spike in the overall GC profile for the run.

adapter_dimer_gc_profile

If you are monitoring over-represented sequences then this screen will also turn up this type of contamination since the sequences generated will be exactly the same each time, and will be present many thousands of times in the library.

 

Diagnosis

The easiest way to spot this is in the overrepresented sequences screen where the exact sequence will be shown and should be shown to match against the adapter sequence used.  If the contamination is too low to trigger this module then you can normally see it as a fixed sequence super-imposed on the per base sequence content plot.

Mitigation

Adapter trimmers will generally remove this sort of data, but even without these the dimers do not normally map to a reference genome so don’t cause any further downstream disruption.

Prevention

Preventing this type of contamination must happen at the library preparation stage.  Being careful about the amount of adapters added to the ligation mix, and being stringent on the size selection step for your library are going to be the most important parts for avoiding the formation and selection of dimers.

 

February 1, 2016

2 thoughts on "Contamination with adapter dimers"

  • Mel N

    Thank you very much! But what about other cases where you also see a hump in the GC content plot, together with anomalies in the overrepresented sequences, but that is not as pronounced and is not caused by adapter dimer contamination? (for example some of these humps are marked as red in MultiQC, but are not as extreme as the one in this article´s plot). It would be very helpful if you have some examples and made an article about that. Thank you, again, very much for all this helpful information! A lot of QC reports are less clear about what’s going on and that it makes it hard to make decisions such as filtering/trimming.

  • Mel N

    Sorry, I forgot to give more information. The data I’m working on would be an example of this. It’s ChIP-seq data of human H4K12. Sequencing: Illumina 1.9, 50 bp sequence length. The quality plots are all in green, very nice quality, but there´s that anomaly only in one of my samples (one out of 12). Other two samples have the same bump in the same place but less pronounced , and they appear in yellow. The rest of the samples don´t have this problem. The per tile sequence quality also has some green spots in some positions.

Comments are closed