How were you introduced to science and what piqued your interest in next generation sequencing in particular?
I knew from the age of four that I wanted to be a scientist. My dad was a chemistry professor, who taught at the high school and junior college levels for over 36 years. So I was immersed in a scientific environment by virtue of knowing what he did and by visits to his laboratory. Growing up in Nebraska, I also received a very good public education that was rich in the sciences, in particular, chemistry and biology. As early as the high school level, I was provided with many opportunities to do research and explore areas of chemistry, including organic chemistry synthesis.
As an undergraduate zoology major at the University of Oklahoma, I earned the basic chemistry, biology, and mathematics backgrounds that I needed to move forward into PhD studies, which I also did at the University of Oklahoma. My PhD mentor was a man named Bruce Roe, who had learned to do Sanger sequencing in the lab of Fred Sanger himself while Roe was on sabbatical in the UK in 1980.
So, essentially, I jumped into an environment rich in DNA sequencing know-how, and sequencing and sequencing technology development are what I have done throughout my entire career. Being one of Bruce’s trainees gave me immediate exposure to the science of DNA sequencing, and, in the late 1980s, with the advent of the earliest fluorescent sequencing systems from Applied Biosystems, we started trying to make sequencing reactions that would work well on those instruments. It was a fun environment to be a part of because in those days we had so much flexibility in terms of what we could do and experiments we could try. Things are a lot more locked down nowadays in terms of the reagents and kits that are available, so it is not as flexible as it used to be.
Next generation sequencing (NGS) is just a natural evolution from those early methods of fluorescent slab gel and fluorescent capillary electrophoresis that were used to sequence the human and mouse genomes, among others. Since 2005, my group has focused on NGS and application development around these sequencers, really to facilitate the work that we do here, primarily with funding from the National Human Genome Research Institute.
Your research is focused on NGS in relation to cancer care. In your opinion, what is the significance of NGS to cancer care?
The significance is huge and growing. NGS gives us this great advantage of generating data that is very comprehensive, yet available to us in a very short time frame. This combination is key for relevance to cancer patients and their oncologists, providing information in a timely manner to allow them to make decisions on how best to treat the disease.
Perhaps the timely analysis of the vast amounts of data from NGS instruments is the biggest and most significant hurdle we have to face to make an impact on cancer care. NGS has these advantages, but also brings with it incredible demands on the computing that is required to tease out the analysis and medical interpretation of the data.
There is also a role for NGS in monitoring patients. Therefore, its use in cancer care is not a one-time application—patients will require monitoring as they are treated with the therapies indicated by the genomics of their cancer. They will require secondary and tertiary sampling of new, recurrent cancers to understand how the genome has changed, and what new therapeutics might be applicable to treating their disease. If we are successful in our aims, NGS is going to become a focus of disease characterization and disease monitoring in modern molecular pathology.
How do you see disease monitoring influencing patient care?
Monitoring allows clinicians to look at markers in circulation in the blood, often referred to as liquid biopsy. Many, but not all, cancer types actively shed cells into the circulation. A lot of these circulating cells are in the process of apoptosis, so they are actually releasing DNA from the tumor cells into the circulation. The idea of monitoring is to determine if the circulating blood can be used to observe the impact of therapy. So, for example, you sequence a tumor that has been removed, you identify some key markers that are highly unique only to the tumor and not to the individual’s germ line, and then as the patient goes through therapy following surgery, you periodically sample the blood and evaluate it with NGS. Because NGS is digital, if your therapy is effective you should be able to see a decrease over time in the tumor DNA content in the blood. These have to be very sensitive approaches because there are not huge amounts of tumor DNA in the blood, but it is clearly present for most cancer types.
The other aspect of monitoring is that if treatment is unsuccessful, an increase, or at least a stasis, of the level of tumor DNA in the blood will be observed. This gives clinicians a much more accurate way of monitoring the response of the patient to the identified therapy.
What new NGS technologies should we expect and what hurdles do they need to overcome to be adopted in clinical applications?
A critical path that has not yet been answered by NGS for the human genome is very long read lengths. Most of the technologies we are currently using to sequence whole human genomes are short-read technologies. What I am hoping is next on the horizon—and this could come from a number of different devices—is longer read technologies. These would give us the ability to assemble human chromosomes at the first pass, rather than just aligning the reads to the reference genome; we know the latter is a limited approach at best.
The other hurdle will continue to be accuracy and coverage of the genome, because the biggest worry, especially in the clinical realm, is the notion of false negatives; i.e., “What have you missed?” and “Could that missing information also be important?” Certainly, false positives are a worry, but we always have the ability to validate or take a second look using a different technology or a different approach to confirm that the same sites are being altered, which gives us confidence that our NGS analyses are actually correct. But you cannot do that with false negatives because you can’t validate what you missed in the first place. So I think that is going to continue to be an issue. As the methods improve, the read lengths get longer, and accuracy increases, we should be able to better address some of these concerns about false negatives.
How soon do you see NGS being routinely used in clinical settings, or is it already in routine use?
I would not call its current use routine. In the specific realm of cancer care, I think within the next two years we will see NGS use becoming widespread because its application to cancer care is very clear. There have been some successes when looking at families, especially those with children who have abnormalities. Again, NGS use is not routine, but increasing, especially for those with abnormalities that do not render to a traditional diagnostic approach.
Probably within the next 10 years, as we understand the genome better and the methods and the sequencers improve, genome sequencing will become part of the routine workup of the child, much like the Apgar score and the heel stick for blood are now, and with the exception of children whose parents do not consent, every child will have a genome sequence. This would provide a baseline for identifying their disease susceptibilities and predicting their health as adults.
What benefits, if any, will be gained by having our genomes sequenced multiple times in a typical lifetime; e.g., at birth, in our 20s, 50s, and again at 75 years old; and also by comparing diseased and normal states?
I think the only benefit would be in comparing diseased and normal states. I do not know that sequencing the genomes of healthy individuals necessarily provides much information for healthy individuals, especially with our current understanding of the genome. However, I really liked the recent report from Mike Snyder’s lab  where Mike had his own baseline genome sequenced—he’s not a newborn, obviously—and identified a higher than average susceptibility to Type II diabetes. During the course of the year that he was studying himself, he had some upper respiratory infections and his genome was evaluated throughout the course of those infections. His baseline data and the identified susceptibility to Type II diabetes drove him to make a point of monitoring his blood sugar.
There is a well-known correlation between certain viral infections and the onset of Type II diabetes. Most people get past it and some do not even realize they have developed symptoms of diabetes. Because of the baseline information that Mike had about his susceptibility, and because he was monitoring his blood sugar, he was able to immediately pinpoint that he was developing elevated blood sugar. So he made changes to his diet and exercise routine and lost some weight. Approximately 6 months later, he now has a normal blood sugar level and a healthier diet and exercise routine. That, I think, is the right context in which to think about ad hoc sequencing. However, I do not think it is worth having your genome sequenced multiple times. The only exception to that would be if you developed a cancer. Then it is worth having your cancer genome sequenced, and might be cheaper if you have already obtained a baseline germline sequence to serve as a comparison.
The other thing that Mike’s study pointed out nicely for me, and hopefully for others, is that genome sequence information does not stand alone. It interplays with all of the other traditional medical measurements to help manage healthcare. I think that was the point of the exercise. It is not that everything becomes about NGS or your genome sequence; it is just a piece of the puzzle available for evidence-based medicine to include, should that become necessary in your healthcare.
What will be the influence of ENCODE [see sidebar, The Encyclopedia of DNA Elements (ENCODE) Consortium] on NGS efforts and what does it mean for exome sequencing in cancer research?
I am a fan of whole genome sequencing in cancer research because it gives the whole picture, including sites now better defined by ENCODE. While exome sequencing of cancers is a cost-savings measure, I find it very short-sighted in terms of the complexity of cancer genomes that can really only be examined through sequencing whole genomes.
I find the ENCODE results phenomenally exciting because very early on in the sequencing of cancer genomes we came up with a way to annotate the genome based on what was known. We broke our annotation down into 4 tiers (1–4): Tier 1 corresponded to all the known genes; Tier 2 corresponded to the so-called known regulatory sequences— those that were highly conserved throughout evolutionary time, which invokes the notion that they are probably important for something; Tier 3, which was everything else that was not annotated, but did not fall into the repetitive category; and then Tier 4 was everything that was annotated as a repetitive element in the genome.
Now that we have approximately 1000 whole genome sequences from cancer patients— tumor and normal, so nearly 2000 whole genome sequences in all—sequenced at our institute, we are excited about layering on the ENCODE data to the genome annotation to determine how it enhances our knowledge of what is in Tiers 2 and 3. Having sequenced that many whole genomes, we can already identify recurrent mutations in Tier 2 and 3 regions of the genome. However, we have had no context by which to interpret those recurrent mutations until now.
In cancer sequencing, recurrence is an important measure of whether a region might be involved in the development of the disease. Gene involvement can be interpreted rather easily, but for regions that have little annotation in terms of their function, interpretation is almost impossible. ENCODE just enriches our understanding of Tiers 2 and 3, reinforces how important those regions are in the genome in terms of the biology of the cell in which they occur, especially if it is a cancer cell, and gives us the ability to interpret our data across those regions much better than we have been able to in the past, and I think that should continue.
Hopefully, there will be additional ENCODE-like efforts that go on. These can now be done in individual labs, of course, because of the reduced cost and the genome coverage from NGS methods. It is tremendously exciting and gives us a better understanding of the genome overall, which will be important for medical applications as well.
Given the findings of the ENCODE consortium, how soon will we understand what is happening in the entire genome?
That will take longer, obviously. This initial data set is a good start and is the reason for doing big science projects, because it takes a big science mechanism to generate the data. The functional aspects of understanding these regions can now get parsed out to different labs using different model animal systems, human cell lines, etc. Fortunately, there are many small labs that do functional biology, whereas there are precious few labs that are capable and have the scale of doing big science projects.
Your labs tend to perform primarily exome sequencing and target capture. What technologies do you use and how has IDT facilitated your research?
We do exome capture using commercially available exome reagents. IDT probes (see sidebar, Product focus—Target capture reagents from IDT) have played a great role in what I generically call sub-exome capture; in particular, sets of genes, most notably in cancer, that we repeatedly query across multitudes of samples. When Vince Magrini (see sidebar, Current Research at The Genome Institute) first came to me and said, “I was talking to John Havens [Business Development Manager at IDT] and he mentioned these new IDT probes that are long oligos biotinylated on one end.” I laughed at him and said, “You want to try those?” He replied, “Yeah,” and then I said, “You know that is not going to work, right?” because I just didn’t feel that one biotin would be enough. What we were using at the time were multiple biotins incorporated, typically, by PCR. So he said, “I guess we won’t know until we try them.” I replied, “You try them, and then you can come back and tell me that I was right.”
Of course, I was wrong—that actually happens more often than I would like to admit—but it was a great experiment to have done because one of the ways this has facilitated our research is that we have probes on hand for these genes that we routinely query. So we can easily put together a custom set of probes for specific gene sets using these long, biotinylated IDT oligos.
The other important area, especially in terms of sub-exome screening for mutations, is that these [xGen® Lockdown®] probes work phenomenally well with formalin-fixed samples, which are the norm in pathology. That is typically how tissue is preserved because most pathology in the past has revolved around protein and protein-based assays such as immunohistochemistry, and dipping the tissue in formalin and then fixing it in paraffin is just the way that things get done. It does horrible things to DNA and RNA—we know that—but those samples seem to still work really, really well with the IDT probes.
“We continue to surprise ourselves with how well the IDT probes perform, even across regions that historically performed very poorly. With an exome capture reagent we are at the mercy of different probe efficiencies across a very large number of probes and across the exome and some probes just won’t work as well as others, notably in high GC regions, which unfortunately, occur in many of the first exons of genes that we care about. In such cases we always observe even and reproducible representation of those regions when we use the IDT probes for capture.”
—Dr Elaine Mardis
Do you use the IDT probes in combination with your regular exome capture reagents or as stand-alone probes?
We have used them in combination and we use them often as a stand-alone capture reagent, so the answer to your question is “both.” Most recently IDT synthesized some probes that correspond to a variety of long noncoding RNAs with which we supplemented a conventional exome reagent for target. That is another strength of the IDT probes—you can spike them into an exome capture reagent to augment the exome space and that works beautifully.
What do you do to relax or wind down when you’re not busy performing or supervising NGS?
Probably the best source of relaxation for me is doing Taekwondo, which I do early in the mornings. It puts me in the right frame of mind for the rest of the day. Other than that, I really don’t have time to do much else, which is a little embarrassing. If I am in a nice location and time permits, I also play golf. I have had the pleasure of golfing in a number of nice places this year; the number being two (Hawai’i and Kiawah Island, SC), but it’s better than zero, I guess. I like that in particular just because it is very relaxing for me to be outdoors and in a nice location, so that is a really great way to unwind.