Next generation sequencing (NGS) has found wide applicability, including determining and characterizing unknown sequences, detecting changes and variability in known sequences, and quantifying gene expression. One of the greatest advantages of NGS is the ability to sequence and analyze large numbers of samples simultaneously.
During library prep, the DNA extracted from each sample is first fragmented, either mechanically or enzymatically, and then ligated with known index sequences. Several sample libraries, each tagged with unique index sequences, can then be pooled or multiplexed for sequencing. As many as 20,000 indexes are then designed for high-accuracy multiplexing .
Yet, sample processing on a high-throughput scale presents cross-contamination and sample swapping risks that result in data analysis errors and misassigned reads [2, 3]. Use of plasmids and synthetic oligonucleotide fragments are recommended as spike-in controls for NGS applications, to identify swaps and contamination, and measure and validate assay parameters [2–7].
gBlocks Gene Fragments
gBlocks Gene Fragments are linear, double-stranded DNA fragments that can be custom-designed as spike-in controls for NGS. These in-process controls—subject to the same experimental conditions as the sample—can be used to (1) measure and quantify technical bias, (2) track samples, and (3) measure assay parameters for quantification, quality control analysis, and validation (Figure 1).
The sequence of gBlocks fragments, which are 125–3000 bp in length, is designed to be completely non-homologous to the target DNA sequence or to resemble the target in terms of library insert length and GC content . These controls can also contain specific sequences such as primer binding sites or other regions of interest present within the target sequence.
When using these fragments as spike-in controls, gBlocks are re-suspended to a specific copy number (or a known concentration) then added to samples either before DNA extraction, after fragmentation, or after library prep. It is essential to measure the concentration (or copies) of the control added to samples to make sure that the gBlocks fragments do not overwhelm target sequencing data.
We will present some ways that you can incorporate gBlocks controls in your whole-genome and targeted sequencing applications.
Cross-contamination or sample swaps during processing and library prep could result in misassigned reads [2–6]. To mitigate this risk, a unique gBlocks control fragment are spiked into each sample and used to track the presence of its associated target DNA sequence during data analysis to readily identify any sample swap events.
Controls for sample tracking are added to samples either before DNA fragmentation (so that the gBlocks fragment is subject to the same fragmentation conditions as the target DNA), or controls are added after fragmentation, as long as the gBlocks lengths are similar to the mean fragment size.
- gBlocks controls should bear no homology to the target sequence.
- When using multiple gBlocks controls to track several samples, make sure that there is no homology between the control sequences themselves.
- Fragmentation technique (enzymatic or mechanical) should be considered when designing a gBlocks control, if spiking in before fragmentation. Complexities within sequences such as GC extremes could influence enzymatic fragmentation.
- If samples are derived from cell-free DNA, gBlocks controls can be designed to match target DNA lengths (125–250 bp) without the need for fragmentation.
- For targeted sequencing using hybridization capture, a probe designed against the gBlocks control sequences should be spiked into the panel.
SNP detection in amplicon sequencing
Sample tracking can also be used when detecting single nucleotide polymorphisms (SNPs) by amplicon sequencing techniques, such as the rhAmpSeq™ system. With these applications, gBlocks controls can be designed to encode the region flanking the SNP, while a known barcode or mutant sequence can be incorporated in place of the SNP as a marker. This barcode or mutant sequence differentiates control reads from that of the experimental reads, making sample tracking more efficient.
- Primer binding sites for amplifying the SNP region can be represented in the gBlocks fragments so that the same set of primers are used to amplify both the target and the control.
- The length and GC profile of gBlocks controls should resemble those of the target amplicon sequences.
For hybridization capture-based targeted sequencing approaches, such as those employing xGen™ Lockdown™ Probe Pools, gBlocks controls can be used to test capture efficiency and detect if there is any bias in the capture. These synthetic controls with the sequence of interest can be spiked into samples before library prep at known concentrations (or copy numbers) and quantified post-capture by qPCR to measure relative capture efficiencies for target vs. control sequences. Controls can also be designed to represent different experimental conditions (e.g., a range of GC content: 20%, 40%, 60%, and 80%) and measure capture efficiencies within that range.
- If the control is sufficiently homologous to the target sequence with only a few mutations, then a separate probe against the control would not need to be spiked into the panel/probe pool.
- If the controls represent different experimental conditions and do not resemble the target sequence, then probes complementary to the controls must be spiked into the panel/probe pool.
- The gBlocks controls and complementary probes spiked into the sample and panel respectively should not affect the capture of the target sequences.
- Make sure that the relative copy numbers of the controls and their complementary probes are in the same range as the targets and their complementary probes.
Controls used in amplicon sequencing can indicate differences in amplification rates across targets of different lengths and GC content. Every gBlocks control is spiked in at a known copy number and amplifies simultaneously with the target sequence. Controls can also be quantified to normalize inputs across multiple targets before sequencing.
- The gBlocks control sequence should bear no homology to the target amplicon sequence.
- Primer binding sites in the target sequence should be included in the associated gBlocks sequence.
- Control sequences should be about the same length and GC content as target amplicon sequences.
Controls are also used as a quick check to assess the rate of amplification errors that may be introduced during library amplification. Errors or mismatches in gBlocks control reads of a known sequence can be compared to potential errors introduced in target reads. Here, the control sequence should not be homologous to the target, but it should be representative in length and GC content.
In addition, controls can also help in assessing adapter ligation efficiencies by qPCR before sequencing. The gBlocks controls can be designed to be in the same size range as the mean fragment size of the target DNA post-fragmentation. Control fragments can be spiked into the fragmented target DNA sample before the end-repair and A-tailing steps of library prep, for use with TA-ligation adapters such as TruSeq™-Compatible Full-length Adapters.
From platform adapters to our full portfolio of xGen products for targeted sequencing, IDT delivers innovative, high performing NGS products to enable you to flex your discovery power. Consider using gBlocks Gene Fragments for sample tracking, quality checks, or as controls to test different experimental conditions in your next NGS assay. Learn more about customizing the design of gBlocks fragments and calculating copy number by visiting this product page.