Generating plasmid clones of DNA sequences is one of the most common and important methods in biology research. These methods are indispensable, because, with modest effort, propagation of a single DNA plasmid that was transformed into E. coli provides an indispensable tool capable of producing millions of exact copies, or clones, of that DNA. However, it has also been shown that the fidelity of DNA replication in E. coli is not perfect . Secondary structure, and damage to the DNA through oxidation and other sources can lead to errors in replication . When these errors occur early during colony propagation, they can become a significant proportion of the plasmid sequence population by the time colonies are selected for sequence purification.
Because Sanger sequencing relies on a consensus signal generated from a population of molecules, small variations within the population can be obscured within the signal. In contrast, next generation sequencing (NGS) uses massively parallel analysis of sequence traces for thousands of individual molecules. This makes it possible to detect small variances within populations of molecules.
Examples of sequencing results from samples with contaminating subpopulations can be seen in Figures 1 and 2. In Figure 1, the Sanger trace data for a synthetic gene allows detection of a mixed population of plasmid sequences, visible as strong overlapping peaks of different bases (Figure 1B). A quantitative analysis of the same plasmid preparation using NGS shows that this subpopulation is actually ~17% of the total population (Figure 1A). Figure 2 shows data from a cloned and purified plasmid that contains a subpopulation with a sequence mutation(s) comprising ~7% of the total sequence population. While detectable by NGS analysis (Figure 2A), this contamination is not visible by Sanger sequencing of the same DNA preparation (Figure 2B). While many subpopulations are identifiable by Sanger sequencing, surprisingly large subpopulations go undetected. Minor subpopulations, however, are easily identified in NGS data.
Figure 1. Clonally purified plasmid with a contaminating subpopulation visible in both Sanger sequencing and NGS data. A) A gene sequenced by NGS reveals a subpopulation of approximately 17% incorrect clones that has a 14 bp deletion between bases 431 and 445. B) Bidirectional Sanger sequencing data of the same sample also shows two populations of DNA in the traces. The bottom Sanger trace shows mixed populations beginning at base 431 and continuing in the reverse direction while the top Sanger trace shows the populations are identical from base 445 onwards. The expected sequence is AACAGCAACAACTG.
Figure 2. Clonally purified plasmid with a contaminating subpopulation visible in NGS data but not Sanger sequencing trace. A) A Gene sequenced by NGS shows a ~7% error between bp 400 and 500. B) Bidirectional Sanger sequencing of the same sample, shows no error within the same region. The expected sequence is GTTAACGGTATGCACGCGCCGGGTCTGTGCG.
NGS is more sensitive for detection of subpopulations, but Sanger is still useful
While Sanger sequencing has been the standard for sequence confirmation for decades, these examples illustrate the superior ability of NGS analysis to detect subpopulations within cloned plasmid preparations that would otherwise go undetected by Sanger sequencing.
It is important to note that Sanger sequencing continues to be a useful tool for many labs, and for molecular cloning applications. However, as with any method, it is necessary to understand what the technology limitations are, and to verify important results through replication and/or with other methods.