Genes with multiple transcripts cause confusion Different transcripts for the same gene may differ from one another not just by the addition or deletion of exons, but also the presence of alternative, internal splice sites within an exon. Researchers sometimes ask us why the exon location data we provide on our PrimeTime® qPCR Assay ordering page is different from what they see when using other tools, such as the gene database at NCBI (www.ncbi.nlm.nih.gov/gene).
Until recently, NCBI numbered exons within each transcript individually, and there was no consistent gene-based numbering system across these transcripts for identifying exons. Thus, transcript variants could be annotated to each have the same number of exons, but due to alternative splicing or different transcriptional start and stop sites, the size and location of these exons would not be the same in all cases.
Simplifying exon numbering
To improve upon this situation, IDT has been working towards generating a consensus exon-numbering system that will be meaningful across these inconsistently annotated transcripts. This approach provides naming consistency for the purposes of identifying the appropriate exons and the IDT PrimeTime qPCR Assays used to detect them.
In the example shown in Figure 1, 6 splice variants of human HMGA1 have been described. The NCBI numbers exons sequentially for each individual RefSeq entry (Figure 1A). So exon 3 of NM_145899.2 is equivalent to exon 2 of NM_145901.3.
When designing PrimeTime qPCR Assays targeting these sequences, IDT design tools consolidate the exon data for the different variants into a single numbering system, as shown in Figure 1B. This is the numbering system displayed on the Results page when identifying an amplicon region in the IDT PrimeTime qPCR Assay Library. The exon numbering scheme used by NCBI (based on specific transcripts) is still retained under the RefSeq # tab for each assay ID.
The results of a search for the human HMGA1 gene are shown in Figure 2. Assay ID# Hs.PT.58.38699366 spans exons 4–6, as per the consolidated numbering system used by IDT and shown on the Results page of the assay selection tool, in the Exon Location column of Figure 2 (blue). Referring to Figure 1, these are the exons nearest the 3’ end of the gene. Note that in the Transcript Locations pop-up window, there are exon naming differences across the transcripts.
When consolidation cannot be performed
There are occasions when it is not possible to give a single, uniform consolidated exon location. In these instances, IDT reverts to giving transcript exon numbers (following the NCBI systems) and marks them with a superscripted “1”. The superscripted 1 means that the researcher needs to review the assay to confirm that it is recognizing the desired sequence/location because there are conflicting exon numbers for that gene within the different NCBI databases.
Figure 3 shows an example of an assay with this type of notation (Hs.PT.58.4968362). In this case the assay will amplify from two splice variants, NM_002131.3 and NM_145899.3. The forward primer binds in the first exon and the reverse primer binds in the third exon of each transcript. Figure 1B shows that although these transcripts have identical first exons, they have slightly different third exons which in our consolidated numbering system are numbered 3a and 3b.
Occasionally, when we download the RefSeq transcript data, we find there is no description of exon locations within the record. For assays that will amplify these transcripts we designate the exon numbers 0-0 to indicate the lack of data. In addition, because we cannot verify exon boundaries in these transcripts, we will designate all assays associated with such transcripts as ".g" (i.e., not genomically protected).
NCBI is currently generating a consolidated exon numbering system for each gene in the human genome. At the time of writing (July 2013), they had covered ~20% of the genes. Details of NCBI exon numbering may be found in the GenBank file for RefSeq Genes (those with a Genbank accession number starting with “NG_”). IDT will likely adopt the NCBI system once it is completed.
PrimeTime qPCR Assays can be ordered at www.idtdna.com/primetime.
The NCBI Gene database can be found at www.ncbi.nlm.nih.gov/gene.
For further assistance, please contact IDT at firstname.lastname@example.org.