QC Fail Sequencing » Software

Illumina 2 colour chemistry can overcall high confidence G bases

With the introduction of the NextSeq system Illumina changed the way their image data was acquired so that instead of capturing 4 images per cycle they needed only 2. This speeds up image acquisition significantly but also introduces a problem where high quality calls for G bases can be made where there is actually no signal on the flowcell.

May 4, 2016 Simon Andrews NextSeq, All Applications, Cutadapt, FastQC

MAPQ values are really useful but their implementation is a mess

One of the standard fields in the SAM/BAM file format is the mapping quality (MAPQ) value. This value can be very useful to help filter mapped reads before doing downstream analysis – unfortunately the implementation of this value is in no way consistent between different aligners so it takes a fair bit of research to know how to use it appropriately. Mis-applying the filter could cause reads to be inappropriately excluded from an analysis.

March 17, 2016 Simon Andrews All Technologies, All Applications, BamQC, SeqMonk

Biased sequence composition can lead to poor quality data on Illumina sequencers

In some experimental designs a large proportion of the sequences in a library can have identical sequence at their 5′ end. These types of library can cause problems for the data collection and base calling on illumina sequencers, leading to the generation of poor quality data.

March 15, 2016 Simon Andrews Illumina, All Applications, FastQC

Mispriming in PBAT libraries causes methylation bias and poor mapping efficiencies

Random priming in PBAT libraries introduces drastic biases in the base composition and methylation levels especially at the 5′ end of all reads. As a result, affected bases should be removed from the libraries before the alignment step.

March 11, 2016 Felix Krueger Illumina, Methylation, PBAT, BamQC, Bismark, FastQC, Trim Galore!

Read-through adapters can appear at the ends of sequencing reads

Many sequencing platforms require the addition of specific adapter sequences to the end of the fragments to be sequenced. For an individual fragment, if the length of the sequencing read is longer than the fragment to be sequenced then the read will continue into the adapter sequence on the end. Unless it is removed this adapter sequence will cause problems for downstream mapping, assembly or other analysis.

February 7, 2016 Simon Andrews Cutadapt, FastQC, Skewer, Trim Galore!

Libraries can contain technical duplication

The assumption when analysing sequence datasets is that every sequence comes from a different biological fragment in the original sample. Many library preparation techniques though include one or more PCR steps which introduce the possibility that the same original fragment can be observed multiple times, biasing the results produced. In some cases this type of duplication can be extreme and have a serious effect on the ability to analyse the data correctly.

February 6, 2016 Simon Andrews FastQC, MultiQC, Preseq, SeqMonk

Contamination with a different species you can guess

One of the biggest problems with sequencing libraries is that the material might be contaminated with something unexpected. One of the simplest forms of contamination is where you have material from a different species than expected. In many cases the rogue material will come from a species you can guess based on their other species commonly used in your lab. Screening for this type of contamination will help spot when you have contaminated samples, and can also help when you have completely switched samples.

February 1, 2016 Simon Andrews FastQ Screen

Contamination with adapter dimers

The construction of sequencing libraries on many platforms requires the addition of specific adapter sequences to the ends of the fragments to be sequenced. Although there are steps in place to ensure that only valid adapter+insert combinations make it onto the sequencer it is possible to get adapter dimers with no valid insert making it through the sequencing process.

February 1, 2016 Simon Andrews Illumina, FastQC

Positional sequence bias in random primed libraries

In a randomly primed library there is no reason to expect that a specific sequencing cycle should contain more of one specific base than any other cycle. It is commonly observed though that this type of library does contain a cycle-specific sequence bias, most frequently in the initial bases of the run.

January 31, 2016 Simon Andrews mRNA-Seq, FastQC