Depending on your sample type or experiment goals, you can choose to use UMIs or ignore them altogether. The xGen Prism DNA Library Prep Kit Analysis Guidelines leads you through the recommended analysis pipeline using open-source tools starting with FASTQ files and resulting in variant calling.
As an overview, fixed UMI sequences, such as those used with the xGen Prism DNA Library Prep Kit, enable identification and correction of sequencing or PCR errors, even if they appear within the UMI sequence.
Combined read families analysis: A more stringent method of error correction is also enabled by the xGen Prism DNA Library Prep Kit. During Ligation 1, UMIs are added to the top and bottom strands by single stranded ligation, which are subsequently added to the other strand by gap filling during Ligation 2. Thus, both strands can be tracked back to the same original molecule. This approach makes use of start-stop position and a combination of single read families originating from the same original molecule (Figure 2D). Again, rather than simply choosing the highest quality read, this method uses all reads within a combination of both single read families to choose the most likely base at each position from beginning to end. This process yields a collapsed combined read family that can be used for variant calling, which greatly decreases the chances of false positives. Tools like GroupReadsByUmi plus CallDuplexConsensusReads can be used for this analysis.
Note: Using UMIs for error correction analysis usually requires significantly deeper sequencing and may not be appropriate for damaged samples like low-quality formalin-fixed paraffin-embedded (FFPE) samples.
Figure 1. Schematic of error correction methods with UMIs.