Until recently, many of these complex regions could be compared to the far side of the moon: known to exist, but unseen.
When the Human Genome Project first launched in 1990, technological limitations made it impossible to fully uncover repetitive regions in the genome. Available sequencing technology could only read about 500 nucleotides at a time, and these short fragments had to overlap one another in order to recreate the full sequence. Researchers used these overlapping segments to identify the next nucleotides in the sequence, incrementally extending the genome assembly one fragment at a time.
These repetitive gap regions were like putting together a 1,000-piece puzzle of an overcast sky: When every piece looks the same, how do you know where one cloud starts and another ends? With near-identical overlapping stretches in many spots, fully sequencing the genome by piecemeal became unfeasible. Millions of nucleotides remained hidden in the first iteration of the human genome.
Since then, sequence patches have gradually filled in gaps of the human genome bit by bit. And in 2021, the Telomere-to-Telomere (T2T) Consortium, an international consortium of scientists working to complete a human genome assembly from end to end, announced that all remaining gaps were finally filled.
This was made possible by improved sequencing technology capable of reading longer sequences thousands of nucleotides in length. With more information to situate repetitive sequences within a larger picture, it became easier to identify their proper place in the genome. Like simplifying a 1,000-piece puzzle to a 100-piece puzzle, long-read sequences made it possible to assemble large repetitive regions for the first time.
With the increasing power of long-read DNA sequencing technology, geneticists are positioned to explore a new era of genomics, untangling complex repetitive sequences across populations and species for the first time. And a complete, gap-free human genome provides an invaluable resource for researchers to investigate repetitive regions that shape genetic structure and variation, species evolution and human health.
But one complete genome doesn't capture it all. Efforts continue to create diverse genomic references that fully represent the human population and life on Earth. With more complete, "telomere-to-telomere" genome references, scientists' understanding of the repetitive dark matter of DNA will become more clear.
Gabrielle Hartley is a Ph.D. candidate in molecular and cell biology at the University of Connecticut. She receives funding from the National Science Foundation.
This article is republished from The Conversation under a Creative Commons license. You can find the original article here.