Widespread adoption of next generation sequencing (NGS) has led to an exponential increase in cataloged sequence data. One consequence of this has been a dramatic increase in the overall number of identified single nucleotide polymorphisms (SNPs; see sidebar, SNPs defined, below). As of Nov 7, 2016, Build 149 of the NCBI dbSNP reference database listed 558 million submitted SNPs (subSNP) for Homo sapiens, of which 154 million were referenced (refSNP) . This represents >19X increase in the number of subSNPs over 10 years, when there were 28 million subSNPs (2006, Build 126); and ~13X increase in the number of refSNPs (Figure 1).
Figure 1. Dramatic increase in the number of human SNPs over the past 5 and 10 years.
* NCBI dbSNP Build 149 (Nov 7, 2016); www.ncbi.nlm.nih.gov/dbvar/content/org_summary/ (accessed Dec 19, 2016).
† www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi (accessed Dec 19, 2016).
‡ refSNP, or reference SNP cluster, is defined as a SNP or group of SNPs that map to a specific genomic sequence region. The SNPs of an existing build are all refSNPs. In creating a new build, the refSNPs from the prior build and new subSNPs are both compared to updated genome sequence data to minimize duplications among refSNPs and subSNPs. This process will assign subSNPs to existing refSNP clusters or new refSNPs.
§ subSNP stands for “submitted SNP” and is defined as a SNP submitted since the last build that was found to be distinct from refSNPs after multiple cycles of BLAST analyses.
Based on the number of refSNPs in Build 149, and a genome size of 3.4 x 109 bp , the human genome should contain a SNP approximately once every 22 bases. Other common model systems show a similarly high frequency of SNPs (Table 1).
|Species||NCBI dbSNP build*||subSNP† (million)||refSNP‡ (million)||Genome size (bp)§||SNPs per base|
|Homo sapiens (human)||Build 149 (Nov 7, 2016)||557.9||154.2||3.40 x 109||1 in 22|
|Bos taurus (cow)||Build 148 (Jun 24, 2016)||293.8||100.2||3.62 x 109||1 in 36|
|Mus musculus (mouse)||Build 146 (Nov 24, 2015)||135.7||80.4||3.23 x 109||1 in 40|
|Sus scrofa (pig)||Build 145 (Jul 31, 2015)||135.5||60.4||3.13 x 109||1 in 52|
|Drosophila melanogaster (fruit fly)||Build 148 (Jun 24, 2016)||5.2||5.2||0.176 x 109||1 in 34|
* Taken from NCBI dbSNP; www.ncbi.nlm.nih.gov/dbvar/content/org_summary/(accessed Dec 19, 2016).
† subSNP stands for “submitted SNP” and is defined as a SNP submitted since the last build that was found to be distinct from refSNPs after multiple cycles of BLAST analyses.
‡ refSNP, or reference SNP cluster, is defined as a SNP or group of SNPs that map to a specific genomic sequence region. The SNPs of an existing build are all refSNPs. In creating a new build the refSNPs from the prior build and new subSNPs are both compared to updated genome sequence data to minimize duplications among refSNPs and subSNPs. This process will assign subSNPs to existing refSNP clusters or new refSNPs.
§ Gregory, T.R. (2005). Animal Genome Size Database; www.genomesize.com (accessed Dec 19, 2016); where genome size (bp) = (0.978 x 109) x DNA content (pg)
Taking SNPs into account when designing PCR/qPCR assays
Given the high frequency of SNP occurrence, it is unrealistic to try to avoid SNPs altogether when designing PCR/qPCR assays. However, it is important to consider their specific positioning, if located within a primer or probe sequence. Performing PCR using primers and probe sequences that overlie SNP sites can dramatically impact a reaction or can have little to no impact at all. Specifically, the position of SNPs underlying a primer or probe can influence primer and probe Tm, efficiency of polymerase extension, and even target specificity. To obtain the most accurate data, it therefore becomes important to know how your assay designs overlie SNPs and manage this positioning.
Positional effects. SNPs that occur in primer and probe binding sites can destabilize oligonucleotide binding and reduce target specificity. Mismatches can affect the hybridization of oligos, reducing the Tm of an oligonucleotide by as much as 5–18°C (Figure 2). The degree of effect on Tm depends on the mismatch position, type of mismatch (e.g., A/A, A/C, G/T), as well as the surrounding environment/sequence .
When probes hybridize, the destabilizing effects are highest for mismatches located in the interior of the duplex [5,6,7]. Mismatches at the terminus or penultimate position are less discriminatory (1 or 2 base pairs from the terminus) [5,8]. Use the free, online IDT OligoAnalyzer tool (available at www.idtdna.com/scitools) to make such predictions.
Figure 2. Significant decrease in probe or primer melting temperature from a single mismatch. The example shows how a single mismatch can alter probe or primer melting temperature, affecting the efficiency of the PCR and, ultimately, the interpretation of experimental results. These particular mismatches create non-standard base pairing that should not disrupt the helix. However, a single mismatch can substantially decrease melting temperature—by over 8°C (compare the Tm values highlighted with green and red arrows). The screen shots show output from the free, online OligoAnalyzer® Tool.
Figure 3. Mismatches at the 3’ end of primers reduce qPCR performance. The data show the difference in Cq (ΔCq) between perfect match and mismatch primers as a function of the position of a single mismatch, using 5 different master mixes (A, B, C, D, and E). p values were calculated using one-way analysis of variance (ANOVA). The shift due to a SNP at the 3’ end of a primer varies up to 7 Cq, representing a 128-fold change in gene expression, dependent on the master mix used. (Data adapted from Lefever et al. , with permission of the publisher.)
Base composition effects. Lefever and colleagues also showed that reactions containing purine/purine and pyrimidine/pyrimidine mismatches at the 3’ terminal position in the primer produced larger ΔCq values (mismatch vs. perfect match) and reduced end-point fluorescence values, with A/G and C/C showing the largest Cq differences compared to perfect matches .
Their data demonstrated that the shift in Cq between a perfect-matched oligo/target and an oligo/target with a single mismatch decreased with increasing distance of the mismatch from the 3’ end. Single mismatches located more than 5 nucleotides from the 3’ end could still have a moderate effect on qPCR amplification. Further experiments by this group showed that the reduction in Tm and shift in Cq were exacerbated when SNPs occur in both primers (forward and reverse) or when more than one mismatch occurs within a given primer.
The free, online OligoAnalyzer tool allows researchers to set mismatches and then calculate Tm. Use this tool to also examine potential hairpin and dimer formation. The DECODED article, Determining the physical characteristics of your oligo—the OligoAnalyzer program, provides guidance on how to identify these characteristics.
Effect on qPCR amplification. In many cases, a single SNP may not prevent amplification, but can cause inefficient annealing and amplification . This can lead to a delay shift in Cq and underestimation of the amount of gene expression or even copy number loss in SNP-containing sequences.
Using a modified single-base extension assay, Wu and colleagues  investigated how the type and position of a mismatch affected extension efficiency during the initial PCR cycle. They concluded that mismatches within the last 3–4 bases of the 3’ end of the primer blocked primer extension. Wu et al. attributed the low extension efficiency to reduced binding of the DNA polymerase. While other research groups have contested this finding, describing a similar affinity of DNA polymerase for correctly paired and mispaired duplexes , Lefever and colleagues  confirm and extend the results from Wu et al.
Safeguard your experiments
Researchers often adopt primer and probe sequences identified in prior publications. It can be tempting to use legacy published or “lab-validated” RT-PCR assay designs. However, given the continual addition of new sequence information, it is important to reevaluate and understand the location of SNPs relative to primer and probe sequences in your PCR/qPCR assays. The following are tips for managing SNP impact on your assay results:
- To obtain an up-to-date list of possible SNPs in your sequence, scroll down to the Alignments section of your BLAST search results page, and click on Graphics at the top left. At the top right of the sequence graphic, click on Tracks and select the Variation tab. From there you can select the type of SNPs for which you want information.
- If the “rs” number—the Reference SNP cluster ID (accession number) that refers to a specific SNP—is known, check SNP information in NCBI dbSNP (www.ncbi.nlm.nih.gov/snp).
- If a SNP is identified, check whether the frequency of the SNP (minor allele frequency, or MAF) is relevant in your population.
- When you cannot avoid a SNP underlying your probe sequence, use the free, online IDT OligoAnalyzer Tool to predict the Tm of mismatched probe sequences.
- In cases where a SNP underlies a primer sequence, minimize or eliminate SNP effects by positioning the SNP towards the 5’ end of the primer. For help with such designs, contact our technical support group at firstname.lastname@example.org, or by phone, using the local phone number on our Contact page.
- For genotyping experiments where relevant SNPs occur adjacent to your SNP of interest, avoid allele dropout by using mixed bases (Ns) or inosines in the primer or probe to cover the adjacent site(s). Since genomic information is constantly in flux, it is important to recheck previously used primer and probe sequences for underlying SNPs.
Adopting a new paradigm in assay design
SNPs are now a regular occurrence, with more discovered every day. It is no longer practical, or even possible, to avoid them when designing PCR/qPCR assays. This means we must adjust our thinking about experimental design, and design our PCR/qPCR assays intelligently, with SNPs in mind.