Illumina sequencing technology is based around sequencing by synthesis. What this means is that instead of sequencing the entirety of one sequence before moving on to another, all of the sequences in a run are sequenced simultaneously. Sequencing progresses by running a chemistry cycle which adds a tagged base to the end of each sequence cluster (which generates a single output sequence), followed by an imaging step where the newly added base is read.
It is commonly observed that as the number of sequencing cycles performed is increased the average quality of the base calls, as reported by the Phred Scores produced by the sequencer falls. The rate at which this fall happens will vary according to the type of sequencer used, the version of the sequencing chemistry and the nature of the library being sequenced.
This degradation in base call quality is most often observed in a plot of Phred scores vs chemistry cycle. This plot is produced internally in the illumina sequencing software but is also trivially reproduced by many of the common QC packages.
This loss of quality is really an expected side effect of the way the illumina platform works. In the Illumina sequencing system you do not sequence a single molecule, but rather an ensemble of identical molecules called a cluster. Clusters are necessary in order to generate enough signal to be seen by the imaging system but they also introduce the possibility of generating mixed signals.
When you run a chemistry cycle the assumption is that every molecule on the flowcell is extended by one base, but this isn’t actually true. Although most molecules will be extended a very small proportion will escape and will remain on the previous base. The means that after a few cycles of chemistry the signal coming from a cluster will actually be a mix of signals from the current base, but also some signal from the previous few bases. Illumina have a system in place to try to detect this effect, called phasing, and correct for it, but this correction can never be perfect so as more cycles of sequencing are performed the signal from a cluster becomes more mixed and the ability to determine the correct base call diminishes, eventually to the point that the true signal is effectively lost.
Mitigation and Prevention
There isn’t really any work round for this. The illumina base calling software already has code in place to try to correct for this effect, and improvements in the sequencing chemistry have aimed to increase the fidelity of the extensions, but further progress will need to come from illumina themselves.
This effect can easily be detected with the FastQC per base quality plot.