One common mode of failure for sequencing experiments is when mapping the sequence to the expected reference produces poor alignment efficiency. There are many potential reasons for this, but one of the most common is that some significant proportion (or possibly all) of the library comes from a species other than the expected one. Many times where material is from the wrong species, or is contaminated with a secondary species the contaminating species is one which is commonly used in the lab which created the library. It can therefore be useful to screen libraries across a range of commonly used species to determine the degree to which the library matches what is expected.
Often the initial symptom observed would be a loss of mapping efficiency, but sometimes even the wrong species can map with reasonable efficiency and problems are not revealed until later in the analysis, so some pre-emptive QC on the libraries is useful.
In a simple case you should see the vast majority of the sequences mapping to the expected species.
This plot also illustrates that it can be useful to distinguish uniquely from multi-mapped reads. In this case the library is supposed to be mouse, but around 20% maps to Rat due to the overall similarity between the species. Separating uniquely and multiply mapped sequences allows us to quickly see that the rat mapped sequence is all multi-mapped and probably also maps to mouse. In an extreme case you could see what looks like reasonable mapping to the wrong species if the library is composed of sequences which are very easy to map (low complexity or short sequences for example).
Where there is significant contamination or a sample switch this type of screening will not only identity the problem, but will also provide a clue as to the source of the contamination and will say whether the library is completely the wrong species, suggesting a sample switch, or partially contaminated. The example below was supposed to be a mouse library, but can clearly be seen to be mostly E.coli but with some human sequence as well.
It’s always worth investigating the whole content of your library. Even with partial contamination you will get some cross mapping to the species you expect from the contaminant and this can be difficult to spot if this type of screening is not run.
The plots shown in this article were created with FastQ screen which is a program which can screen a proportion of your library against a set of species you select and can customise to match the environment you’re working in.