Talking to scientists who regularly use target capture for their next generation sequencing about the performance of their current or desired enrichment panel, I realize the diversity in the number and interpretation of metrics that are believed to be important. To help reduce confusion from the vast number of metrics circulating in the NGS field, I have decided to discuss those that scientists at IDT consider to be important for our analysis of short read sequencers. IDT uses the following measurements to evaluate the performance of an enrichment panel:
- Unique vs. duplicate reads
- Percentage of reads mapping on-target
- Coverage depth
- Uniformity of coverage
Unique vs. duplicate reads
For sequencing that uses hybridization capture, duplicate reads (sequenced DNA fragments), especially in paired-end sequencing, are assumed to be the result of reading 2 or more PCR copies of the same original DNA fragment. When sequencing randomly fragmented, PCR-amplified DNA, some amount of duplication is unavoidable. The goal is to have sufficient diversity persisting within the library even after enrichment so that random sampling of reads will rarely detect the same fragment multiple times. Most sequencing analysis pipelines remove PCR duplicates; therefore, using protocols that maintain a low frequency of duplicate DNA fragments results in a greater amount of usable sequencing data at the end. However, for applications requiring higher sensitivity—such as rare allele detection—the unique vs. duplicate reads metric is important for monitoring diversity and for more accurate measurement of copy number variation.
The figure above demonstrates protocol optimization which led to a 3-fold reduction in “duplicate” reads. This optimization was primarily accomplished by controlling PCR parameters before and after targeted enrichment. The xGen® AML Panel was used for duplicate optimization.
Percentage of reads mapping on-target
The measurement of on-target bases or reads is typically represented as the ratio of number of bases within a target region to total number of bases output by the sequencer, expressed as a percentage. Usually we calculate these values after duplicate reads are removed from the read pool.
Method #1: % On-target
On-target reads / Total aligned reads
Method #2: % On-target
On-target bases / Total aligned bases
A base within a read is considered on target if it is aligned with a targeted region. A read is considered on target if a single base within a read aligns to a targeted region. In the example above, we would say that this particular result was 75% on-target if we calculate by reads (reads 1, 2, and 4 are on target; read 3 is not), but approximately 50% on-target if we calculate by bases (only half of the bases within the reads are aligned with a targeted region). IDT measures reads on-target because that more accurately depicts reliable pull down of target fragments regardless of variables such as shear size.
This figure demonstrates ~50% increase in on-target reads achieved through protocol optimization. This optimization was primarily accomplished by adjusting the temperatures used during hybridization and wash steps. On-target refers to “On-target reads” as defined in the earlier section. The flank includes the target region +100 bases. Note that the amount of a given read mapping in the region flanking a target will change with shear size. Larger shear sizes correspond to more read mapping in the flank region.
Coverage represents the number of times a sequenced DNA fragment (i.e., a read) maps to a genomic target. The deeper the coverage of a target region (i.e., the more times the region is sequenced), the greater the reliability and sensitivity of the sequencing assay. Typically, the minimum depth of coverage required for genomic resequencing of diploid organisms, such as human, mouse, or rat, is 20–30X. However, different applications, labs, or bioinformatics groups may require lower or higher minimum coverage depth. I have met researchers who find coverage as low as 1–2X sufficient, while at the other end of the scale, some researcher require 500—1000X coverage of target regions; higher coverage depth allows for higher detection sensitivity of genomic sequence variations. A good method for estimating the required depth of coverage for a particular application is to begin with 20X and divide by the expected allele frequency; e.g., for detecting mutations with 5% (0.05) allele frequency, you would need 400X coverage depth.
To assess how well targets are covered, we plot % of Targets > X Coverage on the Y axis against coverage on the X axis. This data has been normalized to 1 million mapped reads, making it easier to calculate and compare the depth of coverage achieved for different platforms and levels of multiplexing. The Illumina MiSeq platform can support up to 30M reads. Protocol v2 clearly demonstrates deeper coverage across a larger range of targets, with ~93% of targets covered at 20X compared to ~86% with Protocol v1.
Uniformity of coverage
Uniformity can be expressed in various ways. IDT uses different methods to calculate coverage uniformity. The primary method, which is applicable to the widest range of applications, is to calculate the proportions of sequences that have greater than 0.2, 0.5, and 1.0 times the mean coverage. We find this method useful for helping researchers to understand the lower coverage limits—certainly, the drawbacks of under-sequencing are greater than those of over-sequencing.
The other methods used at IDT for calculating uniformity of coverage are more useful for assessment of copy number variation (CNV). One method is to calculate the coefficient of variation (CV), which is the standard deviation divided by the mean. Lower numbers indicate better uniformity. This can be made more granular by calculating CV for targets grouped by GC content. We typically observe wider distributions at the extreme ends of the GC spectrum.
This figure shows uniformity statistics between the first and second versions of the xGen® Rapid Capture Protocol. Although protocol v1 provides slightly higher uniformity (which may be more important for CNV applications), protocol v2 compensates by providing deeper overall coverage.
Product focus: Target capture reagents
xGen® Lockdown® Probes—target capture probe pools for NGS
xGen Lockdown Probes are pools of individually synthesized, quality controlled, and normalized hybridization probes. Use them to generate custom capture panels for targeted sequencing to enhance the performance of existing panels. xGen Predesigned Gene Capture Probe Pools are available for any human RefSeq coding gene. Select from predesigned and custom probes that offer:
- Sensitive detection of SNPs, indels, CNV, LOH, and translocations
- GMP compliance for clinical and diagnostics research
- Flexibility to augment existing panels or create completely custom panels
- Quick delivery
Discover more about xGen Lockdown Probes.
xGen Lockdown Panels
xGen Lockdown Panels are preconfigured, validated, and stocked pools of xGen Lockdown Probes for targeted next generation sequencing of defined gene families:
- xGen Exome Research Panel
- xGen Acute Myeloid Leukemia Panel
- xGen Pan-Cancer Panel
- xGen Inherited Diseases Panel
- xGen Human ID Research Panel
- xGen Human mtDNA Research Panel
Discover more about xGen Lockdown Panels.
xGen Lockdown Reagents—hybridization and wash kit
xGen Lockdown Reagents have been optimized to deliver deep, even coverage of targets captured using xGen Lockdown Probes and Panels. Achieve uniform coverage with hybridization and wash buffers that are optimized for target enrichment using xGen Lockdown Probes and Panels. A short, 4-hour hybridization protocol generates results quickly.
Discover more about xGen Lockdown Reagents.
xGen Blocking Oligos
xGen Universal Blocking Oligos for single- or dual-index adapters used with common sequencing platforms improve on-target performance for multiplexed samples by reducing adapter participation in hybridization enrichment. Custom adapters can be manufactured for other barcodes or to meet the needs of customers who require specific modifications or services to improve performance in unique applications.
Discover more about xGen Blocking Oligos.
Read an overview article about target enrichment:
Target enrichment facilitates focused next generation sequencing—Understand the rationale and benefits of enriching subsets of the genome (target enrichment by hybrid capture) prior to sequencing. Use this strategy for genotyping, identifying splice variants and indels, and profiling genomic recombination events as well as viral and transposon integration sites.
Read other articles demonstrating use of target capture for focused NGS:
NGS target capture recommendations for FFPE samples—Webinar review: Learn about NGS library preparation and target capture from formalin-fixed, paraffin-embedded samples.
Advantages of high quality, probe-based gene capture panels—Target enrichment by hybrid capture lets you focus your genomic analysis on specific regions of interest, increasing depth of coverage of targeted sequences and improving the detection of rare genomic events. You can create custom human gene capture panels quickly and cost-effectively using IDT preconfigured pools of probes targeting the coding sequences (CDS) of human protein-coding genes, or with predesigned disease or exome panels.
Target Enrichment Identifies Mutations that Confer Fitness Effects—Researchers at the University of Texas use target enrichment with xGen® Lockdown® Probes and NGS to track frequency of mutations in evolving bacterial populations over a given time course and to gauge their importance based on their fitness effect.
Next Generation Sequencing in the Clinic: A Perspective from Dr Elaine Mardis, and Current Research at The Genome Institute—A discussion about the future of NGS in the clinic, and how the Genome Institute is using xGen® Lockdown® Probes for Targeted Sequencing.
Review other DECODED Online newsletter articles on NGS applications.
You can also browse our DECODED Online newsletter for additional application reviews, lab tips, and citation summaries to facilitate your research.
Author: Ibrahim Jivanjee is the product manager for NGS at IDT.
© 2014, 2015, 2017 Integrated DNA Technologies. All rights reserved. Trademarks contained herein are the property of Integrated DNA Technologies, Inc. or their respective owners. For specific trademark and licensing information, see www.idtdna.com/trademarks.