Tag: dna

Genome Tectonics

Researchers tracked changes in chromosomes that occurred as much as 800 ma BP. They identified 29 big blocks of genes that remained recognizable as they passed into 3 of the earliest subdivisions of multicellular animal life. Using those blocks as markers, the scientists deduced how the chromosomes fused and recombined as those early groups of animals became distinct. The researchers call this approach “genome tectonics.” Researchers can trace the evolution of entire chromosomes back to their origin. They can then use that information to make statistical predictions and rigorously test hypotheses about how groups of organisms are related. But what would cause blocks of genes to stay linked together? 1 explanation for this phenomenon, which is called synteny, relates to gene function. It may be more efficient for genes that work together to also be physically located together; that way, when a cell needs to transcribe genes, it doesn’t have to coordinate transcription from multiple locations on different chromosomes. Unless a chromosome rearrangement conveys a big functional advantage, it’s inherently hard for the rearrangement to spread. And rearrangements are typically not advantageous: During meiosis and the formation of gametes, all chromosomes need to pair up with a matching partner. Without a partner, an odd-sized chromosome won’t become part of a viable gamete, so it is unlikely to make it into the next generation. Small mutations that reshuffle the gene order within chromosomes can still occur.

Universal Tick Vaccine

Over 10 diseases can be transmitted by tick bites. The most well-known is Lyme disease, caused by a bacterium called Borrelia burgdorferi. In the past, vaccines have successfully been developed to specifically target this Lyme disease bacterium. However, this new vaccine candidate takes a different approach, using mRNA technology to target the tick itself. This particular vaccine directs cells to produce a number of proteins found in the saliva of the black-legged tick Ixodes scapularis. This vaccine is unique in the way it targets a carrier of a pathogen rather than the pathogen itself. This means it should offer a broad-based protection from all kinds of tick-induced disease and not just a single pathogen. “When you feel a mosquito bite, you swat it. With the vaccine, there is redness and likely an itch so you can recognize that you have been bitten and can pull the tick off quickly, before it has the ability to transmit B. burgdorferi.”

2022-02-24: A gene drive might be an alternative:

This approach is already being applied to malaria-transmitting mosquitoes, but scientists have run into a wall trying to use CRISPR to prevent tick-borne diseases — or, more accurately, a hard shell. The problem is that scientists need to be able to insert their CRISPR system into ticks when they’re at the embryo stage. But ticks grow in eggs coated in a hard wax, which can literally shatter the glass needles used for injections. “Previously, no lab has demonstrated genome modification is possible in ticks. Some considered this too technically difficult to accomplish.” They have now demonstrated 2 different techniques that make gene editing a viable option for fighting tick-borne diseases. So far, all we know is that it’s possible to get a CRISPR system into ticks — we still don’t know what edits, if any, can prevent the spread of tick-borne diseases.

RNAi pesticides

If you could introduce dsRNA into a pesky pathogen—a particularly irritating fungus, for example—you could instruct that pathogen’s cells to destroy its own mRNA and stop it from making crucial proteins. In essence, they could switch off genes within pathogens at will. RNA crop sprays could have some major advantages over the current toolbox of chemical-based pesticides. Microbes break down RNA in the soil within a couple of days, which lessens the problem of environmental buildup. And because RNA sprays would target genes specific to individual species, there is—at least theoretically—a much lower chance that other organisms would get caught in the crossfire. Even 2 very similar species have enough genetic differences that it’s possible to make RNA sprays that target one bug while leaving the other one alone. Resistance is always a concern. “It’s unavoidable. But we will do everything we can to make sure that growers use the products the way we believe minimizes that risk.” Growers might be directed to use dsRNA only at certain times of the year, and that since RNA breaks down so quickly in the environment it’s less likely that pests will be exposed enough to develop resistance. RNA sprays will likely be mixed with existing pesticides—attacking pests from several angles rather than taking a single one-spray-to-kill-them-all approach. “It’s [reducing] the number of ag chemicals that are used, but not full replacement of them”.

Alignment-free Sequencing

Single-cell RNA sequencing (scRNA-seq) is largely reliant on the existence of a reference genome to which the new sequencing reads can be aligned to. Unfortunately, that rules out 99.9% of organisms! “Single-cell transcriptomics for the 99.9% of species without reference genomes” proposes a new computational pipeline called Kmermaid that relies on the power of k-mers in an attempt to obviate the need for a reference genome when using scRNA-seq. The first step processes the reads into amino acid translation frames, because “protein sequences are more evolutionarily conserved than the underlying DNA.” The last step is to use these k-mer representations to search in a database of expression profiles for common cell types to make the final prediction. This constitutes an exciting new paradigm for alignment-free cross-species prediction of cell types that throws out far less data

Completed Human Genome

The development of a reference genome was absolutely critical for progress in human genomics, and was of central importance in the sequencing revolution, serving as a foundational tool for sequencing alignment methods as well as genome assembly methods. The initial draft of the human genome and all following patch updates have consisted of the euchromatic regions, which comprises roughly 92% of the genome. Addressing this remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium has finished the first truly complete 3.055b base pair sequence of a human genome, representing the largest improvement to the human reference genome since its initial release. A crucial aspect to realize about making improvements to the reference genome is that it has tremendous downstream impact for research and engineering in genomics. Because it is such a foundational coordinate system, it impacts everything that relies on it. This means that all new sequencing data can be more accurately mapped with a complete reference.

2023-05-13: We keep “completing” the human genome, now with more pangenome.

Reference genomes are crucial coordinate systems for genomic analyses. However, the references that scientists currently work from when studying humans (the draft human genome and its complete, gap-free successor, dubbed T2T-CHM13) are both based mostly on single individual genomes. A linear genome sequence of this type cannot adequately represent genetic diversity within our species. Instead, such diversity is more accurately described using a graph-based system of branching and merging paths, the first human reference pangenome. Using the pangenome for read mapping and variant calling resulted in 34% fewer errors in calling small variants (those shorter than 50 bases) than did using a linear reference. The difference was particularly pronounced in challenging repetitive DNA regions. Impressively, the pangenome identified 2x as many large genomic alterations, called structural variants, per person than is possible using a linear reference. However, challenges remain. Alignment of sequences against highly variable repetitive regions in the pangenome could be improved by more-accurate assemblies or new algorithms. More samples from diverse groups are also needed. Finally, widespread adoption of the pangenome by scientists could take time, because new methods supporting pangenome analysis are continually being developed, and scientists will often require training to use them.


2023-08-31: Even more complete, now with more Y chromosome

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30n base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

Cell Size Regulation

“It’s been a profound mystery for many, many decades in biology, how cells are able to accomplish this task of almost magically knowing what their size is”. Shape and size regulation are important because they are closely tied to how a cell functions: Too large and it can be difficult for the cell to quickly retrieve information contained in its own DNA; too small and the cell doesn’t have enough space to split properly, causing errors in division and growth that could lead to disease. The secret to cell size regulation lies in the concentration of KRP4 in each new cell. Though the daughter cells inherit an equal amount of KRP4, because they might be different sizes, the concentration of this protein in each cell isn’t necessarily the same. Smaller cells started with a higher concentration of KRP4 and spent more time growing. For bigger cells, the concentration was diluted, so they grew less. Overall, this balanced out any asymmetries in cell size.

Retron Library Recombineering

RLR generates up to millions of mutations simultaneously, and “barcodes” mutant cells so that the entire pool can be screened at once, enabling massive amounts of data to be easily generated and analyzed. “RLR enabled us to do something that’s impossible to do with CRISPR: we randomly chopped up a bacterial genome, turned those genetic fragments into single-stranded DNA in situ, and used them to screen millions of sequences simultaneously. RLR is a simpler, more flexible gene editing tool that can be used for highly multiplexed experiments, which eliminates the toxicity often observed with CRISPR and improves researchers’ ability to explore mutations at the genome level.”

City Microbiomes

Researchers took 4700 samples from mass transit systems in 60 cities across the world, swabbing common touch points like turnstiles and railings in bustling subways and bus stations across the world. Using metagenomic sequencing, they created a global atlas of the urban microbial ecosystem, the first systematic catalog of its kind. The results suggest that no 2 cities are alike, with each major metropolis studied so far revealing a unique molecular echo of the microbial species that inhabit it, distinct from populations found in other urban environments.

2023-10-12: Dark matter

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 27k metagenomes and identify 1.17b protein sequences longer than 35 amino acids with no similarity to any sequences from reference genomes. Using massively parallel graph-based clustering, we group these proteins into 106k novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.

Transposons

Scientists have long known that transposons can fuse with established genes because they have seen the unique genetic signatures of transposons in a handful of them, but the precise mechanism behind these unlikely fusion events has largely been unknown. By analyzing genes with transposon signatures from nearly 600 tetrapods, the researchers found 106 distinct genes that may have fused with a transposon. The human genome carries 44 genes likely to have been born this way.

The structure of genes in eukaryotes is complicated, because their blueprints for making proteins are broken up by introns. These noncoding sequences are transcribed, but they get snipped out of the messenger RNA transcripts before translation into protein occurs. A transposon can occasionally hop into an intron and change what gets translated. In some of these cases, the protein made by the fusion gene is a mashup of the original product and the transposon’s splicing enzyme (transposase).

Once the fusion protein is created, “it has a ready-made set of potential binding sites scattered all over the genome”, because its transposase part is still drawn to transposons. The more potential binding sites for the fusion protein, the higher the likelihood that it changes gene expression in the cell, potentially giving rise to new functions. “These aren’t just new genes, but entire new architectures for proteins”.

2023-03-30: Introns might be parasitic

If introners find their way into hosts primarily through horizontal gene transfers in aquatic environments, that could explain the irregular patterns of big intron gains in eukaryotes. Terrestrial organisms aren’t likely to have the same bursts of introns, since horizontal transfer occurs far less often among them. The transferred introns could persist in genomes for many millions of years as permanent souvenirs from an ancestral life in the sea and a fateful brush with a deft genomic parasite.

Introners acting as foreign, invasive elements in genomes could also be the explanation for why they would insert introns so suddenly and explosively. Defense mechanisms that a genome might use to suppress its inherited burden of transposons might not work on an unfamiliar genetic element arriving by horizontal transfer.