The regulation of gene expression during development and differentiation is orchestrated by complex networks of intergenic and intragenic cis-regulatory elements that turn specific genes on and off, ensuring that these genes are expressed at the correct levels in the correct cells. Understanding gene regulation is a fundamental concept of biology that also has clinical relevance. Cancers, for example, are usually characterized by inappropriate gene expression.
However, the regulation of gene expression can be very complicated, as it often involves multiple regulatory elements, some of which lie 10’s to 100’s of kilobases away from their targets, and can even be located within the introns of an unrelated gene. Thus, these distal, cis-regulatory sequences—or enhancers—and their interactions with the promoters of specific genes are very hard to identify.
Identifying distal regulatory elements
Dr Jim Hughes studies such distal regulatory elements. At the Weatherall Institute of Molecular Medicine (Oxford University, United Kingdom), his laboratory studies cis-regulatory elements in the erythroid system. Because the distal regulatory elements must physically touch the promoter of a gene to regulate it, scientists have had to derive techniques to identify sequence interactions that occur over large distances across the genome.
The Hughes lab has adapted Chromosomal Conformational Capture (3C) technologies to create a unique method, Next Generation Capture-C or NG Capture-C, to isolate sequence interactions in 3 dimensional space. By combining Capture-C experiments with next generation sequencing (NGS), the Hughes lab can now interrogate the regulatory landscapes of hundreds of genes in a single experiment (Figure 1). They then combine data from their Capture-C experiments with existing genomics and transcriptomics data to link genes and regulatory elements en masse.
Capturing 3 dimensional sequence interactions
Genomic conformation within cells is captured by crosslinking the cells with formaldehyde, resulting in protein:protein and protein:DNA links. The cross links between proteins and with DNA temporarily hold genomic sequence interactions (distal regulatory elements touching the promoter) together. To join these interacting sequences covalently so they may be analyzed, DNA in the crosslinked complexes is digested with a frequently cutting (e.g., 4 base recognition sequence) restriction enzyme. DNA fragments that were touching are ligated together, thus covalently associating in a single ligation product genomic regions that may have been linearly 100’s of kilobases apart. The long, concatamerized ligation products are sonicated to a specific size (e.g., 200 bp), sequence adapters are attached, and the sequences are amplified and purified (Figure 1; ).
Enriching for enhancers and their target promoters
The resulting High resolution 3C library contains all possible sequence interactions for all regions of the genome. Thus, it is a massively complex library containing so much data that even with current high throughput sequencing methods, it is still impossible to sequence at sufficient depth to understand it all—billions and billions of reads would be required for informative analysis at high resolution. Since the ligation fragments represent all interactions within the genome, the interactions from regions of interest need to be selected from the total pool of fragments.
While enrichment steps have been part of prior 3C methods, Dr Hughes’ group employed a novel library enrichment step—oligonucleotide target capture hybridization and are now employing IDT xGen® Lockdown® Probes for this purpose. “We chose this method,” recounted Dr Hughes, “because it is very precise and not dependent on anything other than sequence—we can target the oligos to sequences anywhere in the genome and can choose and alter the target location at will. More importantly, the biotinylated capture oligos can be multiplexed, targeting multiple locations simultaneously.” Having to analyze elements such as gene promoters one by one to identify their regulatory elements has been a major limitation of prior versions of 3C technology. Using pools of probes targeting sets of enhancers and promoters, these scientists can now interrogate 100’s of genes within the same, single assay.
Initially the Hughes group adapted the Agilent SureSelect system, which originally was designed to enrich for exons, to enrich for enhancer:promoter interactions. While this work led to a paper published in Nature Genetics in 2014 , the Agilent SureSelect system had some major limitations: it was very inflexible and expensive regarding the number of probes that could be ordered. The SureSelect system required a minimum order of 40,000 probes. So if a researcher only want to analyze 10 genes, they had to continually redesign the oligos until they reached 40,000 probes, or include probes to targets not of interest—thus, the enrichment step was not very scalable.
In addition, the more probe pools used, the more sequences were captured, requiring deeper, or more extensive, sequencing. The group didn’t want to include additional, uninformative sequences when they were, for example, just interested in 3 genomic regions to address a particular biological question.
“We needed a manufacturing partner who could provide the scalability and flexibility we required”, emphasized Dr Hughes. “We wanted to buy a small number of individual target capture probe pools to just our specific genomic regions of interest. And we wanted to have those probe pools provided in individual tubes or plate wells so that we could mix and match them, creating our own panels of probes. We also wanted to be able to augment the probe sets by adding in additional probes as research questions changed and clarified. IDT could provide all that.”
Dr Hughes went on to note, “It was really nice to work with IDT where we could design a set of xGen Lockdown Probes and have them come back as just the probe pools we ordered, and provided individually. Clearly, this is exactly the kind of flexibility desired by others in the field—a method that is generally applicable, but also flexible."
"We wanted to buy individual target capture probes to create our own probe panel that we could mix and match, and also augment, to build different designs, so that we could analyze one gene or a 1000 genes, and everything in between." – Dr Jim Hughes
By incorporating oligonucleotide capture technology into 3C technology and adding high throughput sequencing, the NG Capture-C technology enables researchers to interrogate cis interactions at hundreds of selected loci at high resolution in a single assay.
After publishing the initial working NG Capture-C procedure, the Hughes lab has continued to optimize it. “Incorporation of the IDT xGen Lockdown Probes for specific target capture has greatly improved the method. IDT provides a reagent of sufficient quality that not only works, but works robustly.” notes Dr Hughes. Currently the researchers are writing up the optimized protocol as a methods paper and are providing the updated procedure to collaborators.
Refining “The Looping Model” for enhancer:promoter interaction
To look at general trends and properties of enhancer elements and their interactions with promoters, the Hughes lab examines a series of different regulatory circuits in erythroid cells. Erythroid cells are an ideal model system for these experiments. In addition to being easily expandable, they can be differentiated through a series of interim cell types and eventually into mature red blood cells, just like they would in the body, suggesting that the expression of genes and proteins are regulated properly.
One of the outcomes of these studies addresses the mechanism by which enhancers and promoters associate. Enhancers are able to interact with multiple specific promoters, despite the large distances from promoters. Initial results from the Hughes lab  suggest that the accepted “looping model”—where a large loop in the DNA brings enhancer and promoter into proximity facilitating their interaction—is not accurate.
Instead, the researchers have observed that regions in and around these elements interact much more with one another than would be postulated by a looping mechanism . Dr Hughes explains, “It’s more like they are scanning a specific area to find one another. It’s as if the nucleus has put all of the molecular components that need to interact into a small, what we’ve called, “a compartment”, to facilitate that interaction.”
Disease caused by mutations in regulatory elements far from disease genes
The Hughes lab also has a large, ongoing collaboration with the Genome Wide Association Study (GWAS) community. GWAS looks within the human population for people who have an increased chance of acquiring a certain disease—for example, heart disease, bowel cancer, type 2 diabetes, multiple sclerosis—and endeavors to associate sequence changes (mutations) with predisposition of the disease.
Unexpectedly, the GWAS field has found that such changes often do not always occur in the disease genes themselves, but distant from these genes—exactly the sort of places where one would expect regulatory elements to exist. It is now becoming clear that many of the tools and approaches developed to understand gene regulatulation, such as NG Capture-C, will have a prominent role in determining how sequence changes in our genomes ultimately may lead to the diseases we develop in our lifetime.
Expanding the scope of gene regulation research
Look for the Hughes group’s upcoming manuscript that describes their recent improvements to the target capture step of the NG Capture-C method, and how it provides a completely scalable, sensitive, multiplexed assay for chromatin conformation capture analysis. The group is also designing online tools to help collaborators and other researchers interested in applying the method to their own research questions.
With a large proportion of all cis-regulatory elements within the genomes of 100’s of cell types already identified, the NG Capture-C high-throughput approach to analyzing multiple specific cis interactions at high resolution in a single experiment will facilitate linking these elements to the genes they control, and determine how their variants alter gene expression. This method should also be able to help identify functional DNA variants underlying complex diseases.