With rapid advances in next generation sequencing (NGS) technology and its increased use in clinical settings, the dawn of widespread personalized medicine appears to be upon us. Dr Elaine Mardis (pictured), institute Co-director and Director of Technology Development at The Genome Institute, Washington University in St Louis, Missouri, USA, shares her views on the current uses of NGS, the challenges that NGS technologies face, and what can be expected in the future.
How were you introduced to science and what piqued your interest in next generation sequencing in particular?
I knew from the age of four that I wanted to be a scientist. My dad was a chemistry professor, who taught at the high school and junior college levels for over 36 years. So I was immersed in a scientific environment by virtue of knowing what he did and by visits to his laboratory. Growing up in Nebraska, I also received a very good public education that was rich in the sciences, in particular, chemistry and biology. As early as the high school level, I was provided with many opportunities to do research and explore areas of chemistry, including organic chemistry synthesis.
As an undergraduate zoology major at the University of Oklahoma, I earned the basic chemistry, biology, and mathematics backgrounds that I needed to move forward into PhD studies, which I also did at the University of Oklahoma. My PhD mentor was a man named Bruce Roe, who had learned to do Sanger sequencing in the lab of Fred Sanger himself while Roe was on sabbatical in the UK in 1980.
So, essentially, I jumped into an environment rich in DNA sequencing know-how, and sequencing and sequencing technology development are what I have done throughout my entire career. Being one of Bruce’s trainees gave me immediate exposure to the science of DNA sequencing, and, in the late 1980s, with the advent of the earliest fluorescent sequencing systems from Applied Biosystems, we started trying to make sequencing reactions that would work well on those instruments. It was a fun environment to be a part of because in those days we had so much flexibility in terms of what we could do and experiments we could try. Things are a lot more locked down nowadays in terms of the reagents and kits that are available, so it is not as flexible as it used to be.
Next generation sequencing (NGS) is just a natural evolution from those early methods of fluorescent slab gel and fluorescent capillary electrophoresis that were used to sequence the human and mouse genomes, among others. Since 2005, my group has focused on NGS and application development around these sequencers, really to facilitate the work that we do here, primarily with funding from the National Human Genome Research Institute.
Your research is focused on NGS in relation to cancer care. In your opinion, what is the significance of NGS to cancer care?
The significance is huge and growing. NGS gives us this great advantage of generating data that is very comprehensive, yet available to us in a very short time frame. This combination is key for relevance to cancer patients and their oncologists, providing information in a timely manner to allow them to make decisions on how best to treat the disease.
Perhaps the timely analysis of the vast amounts of data from NGS instruments is the biggest and most significant hurdle we have to face to make an impact on cancer care. NGS has these advantages, but also brings with it incredible demands on the computing that is required to tease out the analysis and medical interpretation of the data.
There is also a role for NGS in monitoring patients. Therefore, its use in cancer care is not a one-time application—patients will require monitoring as they are treated with the therapies indicated by the genomics of their cancer. They will require secondary and tertiary sampling of new, recurrent cancers to understand how the genome has changed, and what new therapeutics might be applicable to treating their disease. If we are successful in our aims, NGS is going to become a focus of disease characterization and disease monitoring in modern molecular pathology.
How do you see disease monitoring influencing patient care?
Monitoring allows clinicians to look at markers in circulation in the blood, often referred to as liquid biopsy. Many, but not all, cancer types actively shed cells into the circulation. A lot of these circulating cells are in the process of apoptosis, so they are actually releasing DNA from the tumor cells into the circulation. The idea of monitoring is to determine if the circulating blood can be used to observe the impact of therapy. So, for example, you sequence a tumor that has been removed, you identify some key markers that are highly unique only to the tumor and not to the individual’s germ line, and then as the patient goes through therapy following surgery, you periodically sample the blood and evaluate it with NGS. Because NGS is digital, if your therapy is effective you should be able to see a decrease over time in the tumor DNA content in the blood.
These have to be very sensitive approaches because there are not huge amounts of tumor DNA in the blood, but it is clearly present for most cancer types.
The other aspect of monitoring is that if treatment is unsuccessful, an increase, or at least a stasis, of the level of tumor DNA in the blood will be observed. This gives clinicians a much more accurate way of monitoring the response of the patient to the identified therapy.
What new NGS technologies should we expect and what hurdles do they need to overcome to be adopted in clinical applications?
A critical path that has not yet been answered by NGS for the human genome is very long read lengths. Most of the technologies we are currently using to sequence whole human genomes are short-read technologies. What I am hoping is next on the horizon—and this could come from a number of different devices—is longer read technologies. These would give us the ability to assemble human chromosomes at the first pass, rather than just aligning the reads to the reference genome; we know the latter is a limited approach at best.
The other hurdle will continue to be accuracy and coverage of the genome, because the biggest worry, especially in the clinical realm, is the notion of false negatives; i.e., “What have you missed?” and “Could that missing information also be important?” Certainly, false positives are a worry, but we always have the ability to validate or take a second look using a different technology or a different approach to confirm that the same sites are being altered, which gives us confidence that our NGS analyses are actually correct. But you cannot do that with false negatives because you can’t validate what you missed in the first place. So I think that is going to continue to be an issue. As the methods improve, the read lengths get longer, and accuracy increases, we should be able to better address some of these concerns about false negatives.
How soon do you see NGS being routinely used in clinical settings, or is it already in routine use?
I would not call its current use routine. In the specific realm of cancer care, I think within the next two years we will see NGS use becoming widespread because its application to cancer care is very clear. There have been some successes when looking at families, especially those with children who have abnormalities. Again, NGS use is not routine, but increasing, especially for those with abnormalities that do not render to a traditional diagnostic approach.
Probably within the next 10 years, as we understand the genome better and the methods and the sequencers improve, genome sequencing will become part of the routine workup of the child, much like the Apgar score and the heel stick for blood are now, and with the exception of children whose parents do not consent, every child will have a genome sequence. This would provide a baseline for identifying their disease susceptibilities and predicting their health as adults.
What benefits, if any, will be gained by having our genomes sequenced multiple times in a typical lifetime; e.g., at birth, in our 20s, 50s, and again at 75 years old; and also by comparing diseased and normal states?
I think the only benefit would be in comparing diseased and normal states. I do not know that sequencing the genomes of healthy individuals necessarily provides much information for healthy individuals, especially with our current understanding of the genome. However, I really liked the recent report from Mike Snyder’s lab  where Mike had his own baseline genome sequenced—he’s not a newborn, obviously—and identified a higher than average susceptibility to Type II diabetes. During the course of the year that he was studying himself, he had some upper respiratory infections and his genome was evaluated throughout the course of those infections. His baseline data and the identified susceptibility to Type II diabetes drove him to make a point of monitoring his blood sugar.
There is a well-known correlation between certain viral infections and the onset of Type II diabetes. Most people get past it and some do not even realize they have developed symptoms of diabetes. Because of the baseline information that Mike had about his susceptibility, and because he was monitoring his blood sugar, he was able to immediately pinpoint that he was developing elevated blood sugar. So he made changes to his diet and exercise routine and lost some weight. Approximately 6 months later, he now has a normal blood sugar level and a healthier diet and exercise routine. That, I think, is the right context in which to think about ad hoc sequencing. However, I do not think it is worth having your genome sequenced multiple times. The only exception to that would be if you developed a cancer. Then it is worth having your cancer genome sequenced, and might be cheaper if you have already obtained a baseline germline sequence to serve as a comparator.
The other thing that Mike’s study pointed out nicely for me, and hopefully for others, is that genome sequence information does not stand alone. It interplays with all of the other traditional medical measurements to help manage healthcare. I think that was the point of the exercise. It is not that everything becomes about NGS or your genome sequence; it is just a piece of the puzzle available for evidence-based medicine to include, should that become necessary in your healthcare.
What will be the influence of ENCODE [see sidebar, The Encyclopedia of DNA Elements (ENCODE) Consortium] on NGS efforts and what does it mean for exome sequencing in cancer research?
I am a fan of whole genome sequencing in cancer research because it gives the whole picture, including sites now better defined by ENCODE. While exome sequencing of cancers is a cost-savings measure, I find it very short-sighted in terms of the complexity of cancer genomes that can really only be examined through sequencing whole genomes.
I find the ENCODE results phenomenally exciting because very early on in the sequencing of cancer genomes we came up with a way to annotate the genome based on what was known. We broke our annotation down into 4 tiers (1–4): Tier 1 corresponded to all the known genes; Tier 2 corresponded to the so-called known regulatory sequences— those that were highly conserved throughout evolutionary time, which invokes the notion that they are probably important for something; Tier 3, which was everything else that was not annotated, but did not fall into the repetitive category; and then Tier 4 was everything that was annotated as a repetitive element in the genome.
Now that we have approximately 1000 whole genome sequences from cancer patients— tumor and normal, so nearly 2000 whole genome sequences in all—sequenced at our institute, we are excited about layering on the ENCODE data to the genome annotation to determine how it enhances our knowledge of what is in Tiers 2 and 3. Having sequenced that many whole genomes, we can already identify recurrent mutations in Tier 2 and 3 regions of the genome. However, we have had no context by which to interpret those recurrent mutations until now.
In cancer sequencing, recurrence is an important measure of whether a region might be involved in the development of the disease. Gene involvement can be interpreted rather easily, but for regions that have little annotation in terms of their function, interpretation is almost impossible. ENCODE just enriches our understanding of Tiers 2 and 3, reinforces how important those regions are in the genome in terms of the biology of the cell in which they occur, especially if it is a cancer cell, and gives us the ability to interpret our data across those regions much better than we have been able to in the past, and I think that should continue.
Hopefully, there will be additional ENCODE-like efforts that go on. These can now be done in individual labs, of course, because of the reduced cost and the genome coverage from NGS methods. It is tremendously exciting and gives us a better understanding of the genome overall, which will be important for medical applications as well.
Given the findings of the ENCODE consortium, how soon will we understand what is happening in the entire genome?
That will take longer, obviously. This initial data set is a good start and is the reason for doing big science projects, because it takes a big science mechanism to generate the data. The functional aspects of understanding these regions can now get parsed out to different labs using different model animal systems, human cell lines, etc. Fortunately, there are many small labs that do functional biology, whereas there are precious few labs that are capable and have the scale of doing big science projects.
The Encyclopedia of DNA Elements (ENCODE) Consortium
The ENCODE Consortium is an international collaboration of research groups, funded by the National Human Genome Research Institute (NHGRI) and set up to identify all functional elements in the human genome. Their initial findings were described in a set of 30 papers published on September 5, 2012. More information about the ENCODE project is available at www.encodeproject.org/ENCODE.
Your labs tend to perform primarily exome sequencing and target capture. What technologies do you use and how has IDT facilitated your research?
We do exome capture using commercially available exome reagents. IDT probes (see sidebar, IDT Product Focus) have played a great role in what I generically call sub-exome capture; in particular, sets of genes, most notably in cancer, that we repeatedly query across multitudes of samples. When Vince Magrini (see sidebar, Current Research at The Genome Institute) first came to me and said, “I was talking to John Havens [Business Development Manager at IDT] and he mentioned these new IDT probes that are long oligos biotinylated on one end.” I laughed at him and said, “You want to try those?” He replied, “Yeah,” and then I said, “You know that is not going to work, right?” because I just didn’t feel that one biotin would be enough. What we were using at the time were multiple biotins incorporated, typically, by PCR. So he said, “I guess we won’t know until we try them.” I replied, “You try them, and then you can come back and tell me that I was right.”
Of course, I was wrong—that actually happens more often than I would like to admit—but it was a great experiment to have done because one of the ways this has facilitated our research is that we have probes on hand for these genes that we routinely query. So we can easily put together a custom set of probes for specific gene sets using these long, biotinylated IDT oligos.
The other important area, especially in terms of sub-exome screening for mutations, is that these [xGen® Lockdown®] probes work phenomenally well with formalin-fixed samples, which are the norm in pathology. That is typically how tissue is preserved because most pathology in the past has revolved around protein and protein-based assays such as immunohistochemistry, and dipping the tissue in formalin and then fixing it in paraffin is just the way that things get done. It does horrible things to DNA and RNA—we know that—but those samples seem to still work really, really well with the IDT probes.
“We continue to surprise ourselves with how well the IDT probes perform, even across regions that historically performed very poorly. With an exome capture reagent we are at the mercy of different probe efficiencies across a very large number of probes and across the exome and some probes just won’t work as well as others, notably in high GC regions, which unfortunately, occur in many of the first exons of genes that we care about. In such cases we always observe even and reproducible representation of those regions when we use the IDT probes for capture.”—Dr Elaine Mardis
Do you use the IDT probes in combination with your regular exome capture reagents or as stand-alone probes?
We have used them in combination and we use them often as a stand-alone capture reagent, so the answer to your question is “both.” Most recently IDT synthesized some probes that correspond to a variety of long noncoding RNAs with which we supplemented a conventional exome reagent for target. That is another strength of the IDT probes—you can spike them into an exome capture reagent to augment the exome space and that works beautifully.
xGen® Lockdown® Probes are individually synthesized probes for target enrichment by hybrid capture, specifically developed for next generation sequencing. These probes enhance the performance of existing capture panels, rescuing poorly represented regions such as areas of high GC content. xGen Lockdown Probes are suitable for creating custom capture panels that can be optimized, expanded, and combined with other panels as necessary.
More information about xGen Lockdown Probes is available at www.idtdna.com.
What do you do to relax or wind down when you’re not busy performing or supervising NGS?
Probably the best source of relaxation for me is doing Taekwondo, which I do early in the mornings. It puts me in the right frame of mind for the rest of the day. Other than that, I really don’t have time to do much else, which is a little embarrassing. If I am in a nice location and time permits, I also play golf. I have had the pleasure of golfing in a number of nice places this year; the number being two (Hawai’i and Kiawah Island, SC), but it’s better than zero, I guess. I like that in particular just because it is very relaxing for me to be outdoors and in a nice location, so that is a really great way to unwind.
Current Research at The Genome Institute
We spoke to Dr Vincent Magrini (top) and Robert “Bob” Fulton (bottom), scientists at the Genome Institute and members of Dr Mardis’ team, about some of the research being performed at the institute and how the use of IDT xGen® Lockdown® Probes has facilitated this research.
Vince Magrini, Senior Group Leader, Technology Development, obtained his PhD in Microbiology at the University of Idaho, Moscow, Idaho, USA. His first postdoctoral fellowship was at Washington University, studying the dimorphism genes of the fungus Histoplasma capsulatum. He has been at The Genome Institute for 10 years.
Bob Fulton, Group Leader for Sequence Improvement, obtained his master’s degree in molecular biology at Southern Illinois University in Edwardsville, Illinois, USA. When he first joined Washington University, he worked on the human chromosome X and 7 mapping projects under Eric Green, current director of the National Human Genome Research Institute (NHGRI). He later joined the institute, where he has been for over 18 years, and oversees targeted sequencing and production activities at the institute.
Much of the sequencing currently being performed at the institute is human genome resequencing, with a focus on cancer genomes. The institute has collaborations with several consortia, including The Cancer Genome Atlas project (TCGA); St Jude Children’s Research Hospital in Memphis, Tennessee, USA; and physicians and researchers at the University of Washington and other research organizations.
xGen® Lockdown® Probes Augment Target Capture
As part of TCGA, the researchers are involved in cancer sample resequencing. They have recently used IDT biotinylated Ultramer® Oligonucleotide probes (the precursor to xGen® Lockdown® Probes; see sidebar, IDT Product Focus) to develop a probe set for human papilloma virus (HPV), known to be a high risk factor for cervical cancer. These probes have been successfully spiked into the researchers’ current exome kits, enabling them to more easily identify patient samples that have integrated a virus. Using exome kits alone, the scientists were unable to identify the virus unless it was specifically integrated into an exon and easily pulled out for identification; however, with the xGen Lockdown Probes they are now able to identify specific integration sites very clearly for all of those samples.
The technology of xGen Lockdown Probes enables the scientists to sequence very specific, targeted regions across a single patient sample or multiple samples from a patient, and pull out the same regions from multiple libraries. This is advantageous because they can sequence a small targeted region in a large number of individuals and look at recurrent, common, or unique mutations. Thus, as they identify significantly mutated genes from a given cancer type, the scientists are able to easily obtain a specific probe set targeted at those genes and pull out these regions from many individuals.
“The beauty of the IDT probe set [xGen® Lockdown® Probes] is that you can mix and match, pull specific probes out, and add others in without the need to order whole, large sets. They are fully customizable. If we get a ‘jackpot’ probe that captures too much repetitive sequence, for instance, we could pull it out, redesign, and add in the new probe. That flexibility is very nice.”—Bob Fulton
Identifying Clonal Populations
Another good application for xGen Lockdown Probes, according to the researchers, is the identification of clonal populations in cancer genomes. They are able to monitor the prevalence of putative disease-associated variants that are identified by whole-genome shotgun sequencing. Thus, the variant allele frequencies can be monitored at different time points or treatment stages to determine the effect of a specific regime on a given clonality. A good overview of clonal evolution can be found in an article by Welch et al .
Documenting Multiple Genomes for A Given Cancer Type
The Genome Institute obtains most of the samples they sequence through their involvement in large consortia. The institute is one of the large-scale sequencing centers involved in the TCGA project, and is assigned a particular cancer type to sequence a given ratio of whole genomes and exomes to better understand that cancer type. The numbers they are required to sequence of each cancer type varies with the complexity of the cancer; e.g., breast cancer has multiple subtypes, so the researchers sequence more breast cancer samples than colon, for instance. To date, the institute has sequenced over 800 breast cancer genomes through the TCGA project alone. The samples are fed to the institute through a complex system of sample acquisition and biorepositories that form part of a broad National Cancer Institutes (NCI) initiative.
For the St Jude Children’s Hospital collaboration, the goal is to sequence 600 pediatric cancer cases over 3 years. The institute receives specific cancer types from St Jude, performs the sequencing, feeds back the data, and collaborates on the analyses. In addition, the institute is involved in several open studies at Washington University, where physicians feed samples through the sequencing and analysis pipelines.
Comparing Cohorts for Allelic Differences
Another major project for the institute is cohort sequencing. For this, the researchers are sequencing 7000 samples from two different cohorts, the North Finland Birth Cohort 66 (individuals born in northern Finland in 1966, who are well-characterized medically and phenotyped) and the FUSION cohort, another north Finland sample set. They sequence target regions associated with metabolic syndrome and diabetes, for which they have allelic spectra. The institute also works with a cleft lip consortium, sequencing the genomes of affected and unaffected family members. So there are many different mechanisms by which they obtain samples for sequencing.
Reporting Genomic Involvement in Disease
Being a research facility, The Genome Institute has no direct involvement with patients. All of the samples sequenced at the institute are anonymized, so the researchers never know the identity of the patients. Communication of any findings to patients is through protocols that enable the treating physician (who banked the sample) to feed back to the patients. The researchers analyze the data and have access to information such as phenotype and pathology reports, but nothing that links back to an individual, which is protected health information. Information obtained from the sequencing data determines the direction of the project. Disregulated genes may be part of a known pathway, and have existing pharmaceutical target drugs; may be used for a clinical trial; or may be followed up by functional studies in the lab.
This fact, combined with the multitude of projects that interface with human health questions, means that researchers at The Genome Institute are carrying out the mission of their Institute, to advance human health and the understanding of how the genome impacts disease development.
- Chen R, Mias G, et al. (2012) Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell, 148(6):1293–1307.
- Welch JS, Ley TJ, et al. (2012) The origin and evolution of mutations in acute myeloid leukemia. Cell, 150(2):264-278.
Author: Nicola Brookman-Amissah, PhD is a Scientific Writer at IDT.