The development of a reference genome was absolutely critical for progress in human genomics, and was of central importance in the sequencing revolution, serving as a foundational tool for sequencing alignment methods as well as genome assembly methods. The initial draft of the human genome and all following patch updates have consisted of the euchromatic regions, which comprises roughly 92% of the genome. Addressing this remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium has finished the first truly complete 3.055b base pair sequence of a human genome, representing the largest improvement to the human reference genome since its initial release. A crucial aspect to realize about making improvements to the reference genome is that it has tremendous downstream impact for research and engineering in genomics. Because it is such a foundational coordinate system, it impacts everything that relies on it. This means that all new sequencing data can be more accurately mapped with a complete reference.

2023-05-13: We keep “completing” the human genome, now with more pangenome.
Reference genomes are crucial coordinate systems for genomic analyses. However, the references that scientists currently work from when studying humans (the draft human genome and its complete, gap-free successor, dubbed T2T-CHM13) are both based mostly on single individual genomes. A linear genome sequence of this type cannot adequately represent genetic diversity within our species. Instead, such diversity is more accurately described using a graph-based system of branching and merging paths, the first human reference pangenome. Using the pangenome for read mapping and variant calling resulted in 34% fewer errors in calling small variants (those shorter than 50 bases) than did using a linear reference. The difference was particularly pronounced in challenging repetitive DNA regions. Impressively, the pangenome identified 2x as many large genomic alterations, called structural variants, per person than is possible using a linear reference. However, challenges remain. Alignment of sequences against highly variable repetitive regions in the pangenome could be improved by more-accurate assemblies or new algorithms. More samples from diverse groups are also needed. Finally, widespread adoption of the pangenome by scientists could take time, because new methods supporting pangenome analysis are continually being developed, and scientists will often require training to use them.

2023-08-31: Even more complete, now with more Y chromosome
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30n base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.