Tips for designing spike-in controls for next generation sequencing analysis

gBlocks Gene Fragments

Every experiment needs controls to guarantee results. The performance of your next generation sequencing experiments can be tracked using synthetic DNA fragments. Read about the applications of gBlocks Gene Fragments as sequencing controls.

Background

Next generation sequencing (NGS) has found wide applicability, including determining and characterizing unknown sequences, detecting changes and variability in known sequences, and quantifying gene expression. One of the greatest advantages of NGS is the ability to sequence and analyze large numbers of samples simultaneously.

During library prep, the DNA extracted from each sample is first fragmented, either mechanically or enzymatically, and then ligated with known index sequences. Several sample libraries, each tagged with unique index sequences, can then be pooled or multiplexed for sequencing. As many as 20,000 indexes can be designed for high-accuracy multiplexing [1].

Yet, sample processing on a high-throughput scale presents cross-contamination and sample swapping risks that result in data analysis errors and misassigned reads [2, 3]. Use of plasmids and synthetic oligonucleotide fragments have been recommended as spike-in controls for NGS applications, to identify swaps and contamination, as well as to measure and validate assay parameters [2–6].

gBlocks Gene Fragments

gBlocks Gene Fragments are linear, double-stranded DNA fragments that can be custom-designed as spike-in controls for NGS. These controls, subject to the same experimental conditions as the target sequence, can be used in sample tracking, measuring assay parameters, quality control analysis, and validation.

The sequence of gBlocks fragments, which are 125–3000 bp in length, can be designed to be completely non-homologous to the target DNA sequence or resemble the target in terms of library insert length and GC content. These controls can also contain specific sequences such as primer binding sites or other regions of interest present within the target sequence.

When using these fragments as spike-in controls, gBlocks can be re-suspended to a specific copy number (or a known concentration) and added to samples either before DNA extraction, after fragmentation, or after library prep. It is essential to measure the concentration or copies of the control added to samples to ensure that the gBlocks fragments do not overwhelm target sequencing data.

This article discusses some ways that you can incorporate gBlocks controls in your whole-genome and targeted sequencing applications.

Sample tracking

Cross-contamination or sample swaps during processing and library prep could result in misassigned reads [2–6]. To mitigate this risk, a unique gBlocks control fragment can be spiked into each sample and used to track the presence of its associated target DNA sequence during data analysis to identify any cross-contamination events.

Controls for sample tracking can be added to samples either prior to DNA fragmentation such that the gBlocks fragment is subject to the same fragmentation conditions as the target DNA, or controls can be added after fragmentation, as long as the gBlocks lengths are similar to the mean fragment size. Alternatively, gBlocks can also be spiked in after library prep (but before pooling) by designing indexes and flow-cell attachment sites into the gBlocks sequences themselves.

Design considerations:

  • gBlocks controls should bear no homology to the target sequence.
  • When using multiple gBlocks controls to track several samples, ensure that there is no homology between the control sequences themselves.
  • Fragmentation technique (enzymatic or mechanical) should be considered when designing a gBlocks control, if spiking in before fragmentation. Complexities within sequences, such as GC extremes could influence enzymatic fragmentation.
  • If samples are derived from cell-free DNA, gBlocks controls can be designed to match target DNA lengths (125–250 bp) without the need for fragmentation.
  • For targeted sequencing using hybridization capture, a probe designed against the gBlocks control sequences should be spiked into the panel.

SNP detection in amplicon sequencing

Sample tracking can also be used when detecting single nucleotide polymorphisms (SNPs) by amplicon sequencing techniques, such as the rhAmpSeq system. Here, gBlocks controls can be designed to encode the region flanking the SNP, while a known barcode or mutant sequence can be incorporated in place of the SNP as a marker. This barcode or mutant sequence differentiates control reads from that of the target.

Design considerations:

  • Primer binding sites for amplifying the SNP region can be represented in the gBlocks fragments such that the same set of primers can be used to amplify both the target and the control.
  • The length and GC profile of gBlocks controls should resemble those of the target amplicon sequences.

Hybridization capture

For hybridization capture-based targeted sequencing approaches, such as those employing xGen Lockdown Probe Pools, gBlocks controls can be used to test capture efficiency and detect if there is any bias in capture. These synthetic controls with the sequence of interest can be spiked into samples before library prep at known concentrations or copy numbers and quantified post-capture by qPCR to measure relative capture efficiencies for target vs. control sequences. Controls can also be designed to represent different experimental conditions (e.g., a range of GC content: 20%, 40%, 60%, and 80%) and measure capture efficiencies within the range.

Design considerations:

  • If the control is sufficiently homologous to the target sequence with only a few mutations, then a separate probe against the control need not be spiked into the panel/probe pool.
  • If the controls represent different experimental conditions and do not resemble the target sequence, then probes complementary to the controls must be spiked into the panel/probe pool. 
  • The gBlocks controls and complementary probes spiked into the sample and panel respectively should not affect the capture of the target sequences.
  • Ensure that the relative copy numbers of the controls and their complementary probes are in the same range as the targets and their complementary probes.

Amplicon sequencing

Controls used in amplicon sequencing can indicate differences in amplification rates across targets of different lengths and GC content. Each gBlocks control is spiked in at a known copy number and amplifies simultaneously with the target sequence. Controls can also be quantified to normalize inputs across multiple targets before sequencing.

Design considerations:

  • The gBlocks control sequence should bear no homology to the target amplicon sequence.
  • Primer binding sites in the target sequence should be included in the associated gBlocks sequence.
  • Control sequences should be about the same length and GC content as target amplicon sequences.

Library preparation

Controls can also be used as a quick-check to assess the rate of amplification errors that may be introduced during library amplification. Errors or mismatches in gBlocks control reads of a known sequence can be compared to potential errors introduced in target reads. Here, the control sequence should not be homologous to the target, but it should be representative in length and GC content.

In addition, controls can also help in assessing adapter ligation efficiencies by qPCR prior to sequencing. The gBlocks controls can be designed to be in the same size range as the mean fragment size of the target DNA post-fragmentation. Control fragments can be spiked into the fragmented target DNA sample before the end-repair and A-tailing steps of library prep, for use with TA-ligation adapters such as TruSeq™-Compatible Full-length Adapters.

From platform adapters to our portfolio of xGen products for targeted sequencing, IDT delivers innovative, high performing NGS products to enable you to increase your discovery power. Consider using gBlocks gene fragments for sample tracking, quality checks, or as controls to test different experimental conditions in your next NGS assay. Learn more about customizing the design of gBlocks fragments and calculating copy number by visiting the product page.

References

  1. Costea PI, Lundeberg J, et al. (2013) TagGD: fast and accurate software for DNA tag generation and demultiplexing. PloS One 8(3):e57521.
  2. Tourlousse DM, Ohashi A, et al. (2018) Sample tracking in microbiome community profiling assays using synthetic 16S rRNA gene spike-in controls. Sci Rep. 8(1):9095.
  3. Quail MA, Smith M, et al. (2014) SASI-Seq: sample assurance spike-ins, and highly differentiating 384 barcoding for Illumina sequencing. BMC Genomics 15:10.
  4. Jennings LJ, Arcila ME, et al. (2017) Guidelines for validation of next-generation sequencing-based oncology panels: a joint consensus recommendation of the Association for Molecular Pathology and College of American Pathologists. J Mol Diagn. 19(3):341–365.
  5. Sims DJ, Harrington RD, et al. (2016) Plasmid-based materials as multiplex quality controls and calibrators for clinical next-generation sequencing assays. J Mol Diagn. 18(3):336–349.
  6. Kim J, Park WY, et al. (2017) Good laboratory standards for clinical next-generation sequencing cancer panel tests. J Pathol Transl Med. 51(3):191–204.

Published Aug 5, 2019

TruSeq is a registered trademark of Illumina, Inc., used with permission. All rights reserved.