Although the decreasing cost of DNA sequencing has enabled more mainstream research applications [1
], the upfront cost of sequencing large numbers of samples is still prohibitive for many research labs. This leads researchers to enrich subsets of the genome (target enrichment by hybrid capture) before sequencing (Figure 1), which reduces cost and allows them to focus sequencing efforts on genomic regions relevant to their study. In addition, focusing on specific genomic regions enables multiplexing, the sequencing of many samples simultaneously. Certain applications can benefit from target enrichment as well—these include genotyping, identification of splice variants and indels, and profiling of genomic recombination and viral and transposon integration sites.
Figure 1. Target capture using xGen® Lockdown® Probes.
Target Enrichment increases throughput
Genotyping many targets in many samples
A typical genotyping experiment of 100 samples performed by PCR requires many individually optimized PCR assays. Developing these assays is time consuming and prone to error due to the high number of individual reactions. Furthermore, an important discovery in the course of the experiment can necessitate expanding the sample size. Finally, genotyping with PCR can fail if mutations exist in the priming sites. Typically, primers are optimized using a few samples and then applied to a larger sample set. Unfortunately, this practice is based on the assumption that priming sites are constant across all samples. This might be true for a small sample set, or for regions that are critical for function, but as the number of assays multiplied by the number of samples increases, the probability of a variation skewing any given assay also increases (Figure 2A).
As an alternative to PCR, targeted next generation sequencing (NGS), consolidates all the genotyping reactions for different genes and mutations into a single, focused sequencing run using the design shown in Figure 2. Hybrid capture of SNP sequences uses probes centered on the SNP sequence and is often not affected by additional mutations that result in sequence mismatch with the target capture probe. In fact, IDT xGen®
Probes can tolerate as many as 7 mutations within the target region without affecting hybrid capture. The enriched DNA is then sequenced in a single run, further making the laborious setup of individual PCRs and gels for analysis unnecessary. Finally, it is easy to expand an experiment to contain additional probes targeting other SNPs without having to re-optimize the assay.
A. Target enrichment via PCR. B. Target enrichment by hybrid capture.
Figure 2. Hybrid capture simplifies genotyping analysis. (A) PCR primers are very sensitive to binding sequence mismatches. A single mismatch underlying a PCR primer binding site can cause up to 7°C ΔTm which can dramatically reduce PCR efficiency or cause PCR failure. (B) Target capture probe(s) are centered on mutation under investigation. IDT xGen® Lockdown® Probes used for this purpose have a tolerance for as many as 7 mutations within the target region without affecting hybrid capture.
Identifying viral and transposon integration sites
Experiments studying gene function often involve ectopic expression of the gene and its subsequent knockdown using RNA interference (RNAi). A retrovirus is used to introduce the exogenous DNA into the genome of the organism under study. The DNA fragment may be integrated into an innocuous intergenic region or into a transcribed region , which can alter gene function, causing spurious unrelated phenotypes. To screen for problematic integration sites across multiple samples, targeted sequencing using capture probes that correspond to regions of the known exogenous DNA sequence can be performed to identify the flanking integration sites (Figure 3). Samples that have exogenous DNA incorporated within acceptable genomic regions can then be selected for further study.
Figure 3. Probe design for identification of integration sites. Capture probes are designed to enrich for regions of exogenous, known DNA (viral/transposon). These fragments can then be used to identify flanking integration site sequences by next generation sequencing.
The method is also important for identifying integration sites of transposons and infection-causing viruses. These insights may improve understanding of the associated diseases, especially by providing information about the molecular mechanisms of integration, which may be leveraged for use in other applications. Identification of viral DNA in a host organism can also be used to diagnose a disease state . Performing targeted sequencing of human DNA that has been enriched using probes designed to target viral genomes can identify viral sequences within the human genome and, therefore, confirm infection by the virus.
Alternative splicing occurs in ~95% of human genes that have more than 1 exon, and erroneous splicing is implicated in many diseases [4
]. Some of these diseases can be diagnosed using an assay that detects alternatively spliced genes. RNA sequencing (RNA-seq) is a suitable NGS technique for this purpose. RNA-seq provides a snapshot of the quantity of RNA present at a specific moment in time; therefore, it is sensitive to differential gene expression. To perform RNA-seq, RNA is extracted from the sample, converted to DNA, and processed for sequencing according to standard DNA sequencing procedures. To ensure that low levels of disease-related splice variants are easily detected, and particularly for multiplex sequencing of different samples, probes specific to the transcripts under investigation can be used to enrich for those transcripts, facilitating their detection (Figure 4). Additionally, if different protein isoforms are detected in a proportion of samples, the remaining samples can be more easily screened for alternatively spliced transcripts rather than trying to detect the different proteins using western blots.
Hybrid capture improves flexibility
Figure 4. Multiple methods for examining alternative splicing. (A) To capture splice variants probes can be designed to span exon junctions, comprising sequence information from both exons of interest. (B) Unknown regions that have been incorporated into a transcript can be identified by designing probes targeting the known transcript. Subsequent sequencing will extend the read into the unknown region, helping to identify the source. (C) Identify recombination by designing probes targeting different exons and examining relative positions of the sequence reads.
Targeted NGS using hybrid capture allows flexibility to combine applications (i.e., using the same probe set to simultaneously answer different questions) and easily increase the number of targeted sites. The ability to increase the number of target sites is particularly useful for various reasons. As an example, if relevant SNP data for a new genomic region becomes available, additional probes can be ordered for detection of these regions. These probes can also be added to the existing capture panel for use in subsequent genotyping experiments. Generally, adding additional probes does not diminish the performance of the existing probes, and it allows researchers to extract more data per sequencing experiment. This flexibility enables researchers to respond quickly to the rapid rate of NGS publications, reducing the wait time for making new discoveries.
Can your workflow benefit from target capture? Although target capture has many applications, it may not be the most appropriate solution for all researchers. For example, the percentage of sequence reads obtained for a given region of interest increases with increasing probe number. Larger probe sets are usually more efficient and cost-effective because users can make greater use of their sequencing capacity. Smaller target capture panels are useful for retroviral and transposon applications; however, researchers should be aware that results may vary for the different panel sizes and applications (e.g., repeat regions may result in higher on-target rates because they exist in higher numbers within the genome), and must be able to interpret them. It should be noted that target capture may not be the best option for every type of experiment; e.g., identification of a single target in thousands of samples may be best achieved through PCR-based methods rather than using in-solution hybridization for target enrichment. xGen® Lockdown® Probes are fully customizable probe sets that can be used to create capture panels of any size (see the below sidebar, Product Focus). If you have questions about your application and how to fit hybrid capture with Lockdown Probes into your workflow, contact us at email@example.com.
Product focus: xGen® Lockdown® Probes
xGen® Lockdown® Probes are individually synthesized probes for target enrichment by hybrid capture. They have been specifically developed for next generation sequencing. xGen Lockdown Probes can be used alone to create custom panels that can be optimized, expanded, and combined with other panels as necessary. They can also be used to supplement existing capture panels to rescue poorly represented regions, such as areas of high GC content. Find out more about xGen Lockdown Probes.
xGen Acute Myeloid Leukemia Cancer Panel
The xGen Acute Myeloid Leukemia Cancer Panel comprises 11,743 xGen Lockdown Probes that target over 260 genes implicated in acute myeloid leukemia (AML). This panel was developed by the Genome Institute at Washington University in St Louis (St Louis, MO, USA) in collaboration with The Cancer Genome Atlas (TCGA) initiative, and is based on genes identified in their previous study . The AML panel can be used by researchers as is to study this subset of genes. Alternatively, the panel can be customized by supplementing with additional probes. Find out more about the xGen Acute Myeloid Leukemia Cancer Panel.
- Hayden EC (2013) Gene sequencing leaves the laboratory. Nature 494(7437):290-291.
- Ambrosi A, Cattoglio C, Di Serio C. (2008) Retroviral integration process in the human genome: is it really non-random? A new statistical approach. PLoS Comput Biol 4(8):e1000144.
- Depledge DP, Palser AL, et al. (2011) Specific capture and whole-genome sequencing of viruses from clinical samples. PLoS ONE 6(11): e27805.
- Matlin AJ, Clark F, Smith CW. (2005) Understanding alternative splicing: towards a cellular code. Nat Rev Mol Cell Biol, 6(5):386–398.
- Cancer Genome Atlas Research Network. (2013) Genomic and epigenomic landscapes of adult _de novo_ acute myeloid leukemia. N Engl J Med, 368(22):2059–2074.
Author: Rami Zahr, MS, is an NGS Field Application Specialist at IDT.
Now read some examples of researchers applying target capture to their NGS analysis: