Tag: google

Lead Gen Spam

When you dial any of the listings you are put through to a sophisticated automated call tree that asks the same qualifying questions each time: if you already have insurance or no if you have been continuously insured for 12 months or no what is your zip code which insurance carrier you currently have Based on the answers the system sends you along to an insurance agent that can sell you a competing product. The idea is that if you already have Allstate and are calling for “cheap car insurance” you must want a different brand.

Solving protein structures

This explains a lot why pharma companies are so terrible at coming up with new drugs.

There is perhaps no better example of this than protein structure prediction, a problem that is very close to these companies’ core interest (along with docking), but on which they have spent virtually no resources. The little research on these problems done at pharmas is almost never methodological in nature, instead being narrowly focused on individual drug discovery programs. While the latter is important and obviously contributes to their bottom line, much like similar research done at tech companies, the lack of broadly minded basic research may have robbed biology of decades of progress, and contributed to the ossification of these companies software and machine learning expertise

2020-11-30: Nature perspective on AlphaFold 1

DeepMind has made a gargantuan leap in solving one of biology’s grandest challenges — determining a protein’s 3D shape from its amino-acid sequence. “This is a big deal. In some sense the problem is solved.”

Perspective by someone in the field

Which brings me to what I think is the most exciting opportunity of all: the prospect of building a structural systems biology. In almost all forms of systems biology practiced today, from the careful and quantitative modeling of the dynamics of a small cohort of proteins to the quasi-qualitative systems-wide models that rely on highly simplified representations, structure rarely plays a role. This is unfortunate because structure is the common currency through which everything in biology gets integrated, both in terms of macromolecular chemistries, i.e., proteins, nucleic acids, lipids, etc, but also in terms of the cell’s functional domains, i.e., its information processing circuitry, its morphology, and its motility. A structural systems biology would take this seriously, deriving the rate constants of enzymatic and metabolic reactions, protein-protein binding affinities, and protein-DNA interactions all from structural models. We don’t yet know how much easier, if at all, it will be to predict these types of quantities from structure than from sequence—we need to put the dogma of “structure determines function” to the test. Even if the dogma were to fail in some instances, which it almost certainly will, partial success will open up new avenues.

2021-07-23: AlphaFold 2

DeepMind has used its AI to predict the shapes of nearly every protein in the human body, as well as the shapes of 100Ks of other proteins found in 20 of the most widely studied organisms, including yeast, fruit flies, and mice. So far the trove consists of 350k newly predicted protein structures. DeepMind says it will predict and release the structures for more than 100m more in the next few months—more or less all proteins known to science. In the new version of AlphaFold, predictions come with a confidence score that the tool uses to flag how close it thinks each predicted shape is to the real thing. Using this measure, DeepMind found that AlphaFold predicted shapes for 36% of human proteins with an accuracy that is correct down to the level of individual atoms. Previously, after decades of work, only 17% of the proteins in the human body have had their structures identified in the lab. Drug discovery is all about those biological effects – what else could it be concerned with? And these are higher-order things than just the naked protein structure, as valuable as that can be. Remember, our failure rate in the clinic is around 90% overall, and none of those failures were due to lack of a good protein structure. They were caused by much harder problems: what those proteins actually do in a living cell, how those functions differ in health and disease, how they differ between different sorts of human patients and between humans in general and the animal models that were used to develop the compounds, what other protein targets the drug candidate might have hit and the downstream effects (usually undesirable) that those kicked off, and on and on. So structural biology has been greatly advanced by these new tools. But it has not been outmoded, replaced, or rendered irrelevant. It’s more relevant than ever, and now we can get down to even bigger questions with it.

2022-04-12: Protein complexes

ColabFold later incorporated the ability to predict complexes. And in October 2021, DeepMind released an update called AlphaFold-Multimer that was specifically trained on protein complexes, unlike its predecessor. It predicted around 70% of the known protein–protein interactions.
Elofsson’s team used AlphaFold to predict the structures of 65k human protein pairs that were suspected to interact on the basis of experimental data. And a team led by Baker used AlphaFold and RoseTTAFold to model interactions between nearly every pair of proteins encoded by yeast, identifying more than 100 previously unknown complexes. Such screens are just starting points. They do a good job of predicting some protein pairings, particularly those that are stable, but struggle to identify more transient interactions. “Because it looks nice doesn’t mean it is correct. You need some experimental data that show you’re right.”
Attempts to apply AlphaFold to various mutations that disrupt a protein’s natural structure, including one linked to early breast cancer, have confirmed that the software is not equipped to predict the consequences of new mutations in proteins, since there are no evolutionarily-related sequences to examine.
The AlphaFold team is now thinking about how a neural network could be designed to deal with new mutations. This would require the network to better predict how a protein goes from its unfolded to its folded state. That would probably need software that relies only on what it has learnt about protein physics to predict structures. “One thing we are interested in is making predictions from single sequences without using evolutionary information. That’s a key problem that does remain open.”
AlphaFold-inspired tools could be used to model not just individual proteins and complexes, but entire organelles or even cells down to the level of individual protein molecules. “This is the dream we will follow for the next decades.”

2022-07-28: AlphaFold goes from 350k to 214m predictions.

Researchers have used AlphaFold to predict the structures of 214m proteins from 1m species, covering nearly every known protein on the planet. According to EMBL-EBI, around 35% of the 214m predictions are deemed highly accurate, which means they are as good as experimentally determined structures. Another 45% were deemed confident enough to rely on for many applications. DeepMind has committed to supporting the database for the long haul, and he could see updates occurring annually.

2022-08-03: AlphaFold is open source with no commercial restrictions. What is the end game for Deepmind?

DeepMind has made policy decisions that have played a significant part in the transformation in structural biology. This includes its decision last July to make the code underlying AlphaFold open source, so that anyone can use the tool. Earlier this year, the company went further and lifted a restriction that hampered some commercial uses of the program. It has also helped to establish, and is financially supporting, the AlphaFold database maintained with EMBL-EBI. DeepMind deserves to be commended for this commitment to open science.

2022-11-02: Meta enters the fold with a large language model. The amazing generality of language models continues.

ESMFold isn’t quite as accurate as AlphaFold, but it is 60x faster at predicting structures. “What this means is that we can scale structure prediction to much larger databases.”

As a test case, they decided to wield their model on a database of bulk-sequenced ‘metagenomic’ DNA from environmental sources including soil, seawater, the human gut, skin and other microbial habitats. The vast majority of the DNA entries — which encode potential proteins — come from organisms that have never been cultured and are unknown to science. The team predicted the structures of 617m proteins. Of these 617m predictions, the model deemed 33% to be high quality. Millions of these structures are entirely novel, and unlike anything in databases of protein structures determined experimentally or in the AlphaFold database of predictions from known organisms. A good chunk of the AlphaFold database is made of structures that are nearly identical to each other, and ‘metagenomic’ databases “should cover a large part of the previously unseen protein universe”.

In terms of what % of protein space has been covered by these models, estimates vary widely. But it’s possible that life itself has explored all of protein space. If we take a median estimate of 1030 proteins, and 108 with structure, we have a long way to go.

To examine how much of sequence space could have been explored, it is simplest to make upper and lower limit estimates for the number of unique amino acid sequences produced since the origin of life. Considering the upper limit, it is clear that bacteria dominate the planet in terms of the product of the number of cells (1030) multiplied by the number of genes in each genome (104). Let us assume that every single gene in this total of 1034 is unique and that evolution has been working on these genes for 4 Ga completely changing each gene to some other unique, new gene every single year. This gives an extreme upper limit of 4×1043 different amino acid sequences explored since the origin of life. The contribution to this number of sequences by viral and eukaryotic genomes is difficult to estimate but it is very unlikely to be orders of magnitude greater than the 4×1043 sequences from bacteria. If their contribution is similar or smaller, then it can be ignored in our rough calculation. A lower limit to the number of sequences explored is more difficult to estimate but it has been estimated that there are 109 different bacterial species on Earth. If we assume that each species has a unique complement of 103 sequences (an underestimate) and that only 1 sequence has changed per species per generation (a reasonable estimate based upon analysis of mutation rates in bacteria), and that the generation time is 1 year (a considerable underestimate for many modern bacteria, but perhaps reasonable for an ancient organism or one growing slowly in a poor environment), then we arrive at a figure of 4×1021 different protein sequences tested since the origin of life.

Although the oft-quoted 10130 size of sequence space is far above these limits, the other more plausible estimates for the size of sequence space, particularly with limited amino acid diversity or reduced length, are near to or within these 2 limits. Considering the upper limit, all sequences containing 20, 8 and 3 types of amino acids have been explored if the chains are 33, 50 and 100 amino acids in length, respectively. Considering the lower limit, then virtually all chains of length 33 and 50 amino acids containing 5 or 3 types of amino acid, respectively, could have been explored. (The exploration of longer chains of 100 amino acids with only 2 types of residue is obviously much less complete but it is not a negligible fraction of the total.) Therefore it is entirely feasible that for all practical (i.e. functional and structural) purposes, protein sequence space has been fully explored during the course of evolution of life on Earth (perhaps even before the appearance of eukaryotes).


2022-11-26: An open source reimplementation of AlphaFold does even better.

OpenFold is trained from scratch. Compared to AlphaFold2, OpenFold runs on proteins that are 1.7x larger, runs 2x as fast on short proteins, and is slightly more accurate. As more people can help drive this technology, we’ll get more and better discoveries.

2023-07-03: Foldseek

Sequence searches are fast, like searching a hard drive for a file name. But they often miss good matches because proteins with similar shapes can have vastly different sequences. Structure-based search methods look for shapes instead of sequences, but this can take thousands of times longer, because it’s computationally difficult to compare complex 3D objects. With Foldseek, researchers got the best of both worlds: the software represents a protein’s shape as a string of letters — a ‘structural alphabet’ — thereby offering the sensitivity of shape-based searches but at the speed of sequence-based ones. Foldseek outperformed 2 popular structure-based search tools, TM-align and Dali — performing 24% and 8% better, respectively — and 35k times and 20k times faster. Compared with a structural-alphabet-based tool called CLE-SW, Foldseek was 23% better, and 11x as fast

2023-10-12: Create vaccines for predicted mutations

EVEscape is an impressive SARS-CoV-2 soothsayer. 50% of the mutations the model predicted in a region of the cell-invading spike protein most prone to change have already been observed in real-world SARS-CoV-2 variants, a figure that should grow as the virus continues to evolve. The team used the model to create a set of potential sequences for the SARS-CoV-2 spike protein, some containing as many as 46 mutations from the ancestral strain, with the hope of anticipating the virus’s future evolution and contributing to the development of experimental vaccines.

The model isn’t limited to SARS-CoV-2. It could also predict the evolution of HIV, influenza, Nipah and the virus that causes Lassa haemorrhagic fever. When a new virus with pandemic potential pops up, the team hopes to be ready with predictions for its evolution — and perhaps even vaccines based on those predictions.

AlphaZero

Amazing progress.

we are delighted to introduce the full evaluation of AlphaZero, published in the journal Science, that confirms and updates those preliminary results. It describes how AlphaZero quickly learns each game to become the strongest player in history for each, despite starting its training from random play, with no in-built domain knowledge but the basic rules of the game.

Jeff & Sanjay

“We’ve been doing it since before Google. But I don’t know why we decided it was better to do it in front of 1 computer instead of 2. I would walk from my D.E.C. research lab 2 blocks away to his D.E.C. research lab. There was a gelato store in the middle.” “So it’s the gelato store!” Sanjay said, delighted.

Eric Schmidt

COWEN: So you receive an offer to run Google. Why were you so skeptical about Google at first?

SCHMIDT: Well, I assumed that search wasn’t very important, and I assumed the ads didn’t work. I was so concerned about the ads that, after I accepted the offer — because it just seemed like it was interesting, and a lot of luck comes from doing things that are interesting, and sort of creating your own luck — I hauled the then–sales executive, whose name was Tim Armstrong, who you all know well, and I said, “Tim, prove to me that these ads work.”

Battle for the Home

While the home may be the current battleground in consumer technology, is it actually a distinct product area — a new epoch, if you will? When it came to mobile, it didn’t matter who had won in PCs; Microsoft ended up being an also-ran. The fortunes of Apple, in particular, depend on whether or not this is the case. If it is a truly new paradigm, then it is hard to see Apple succeeding. It has a very nice speaker, but everything else about its product is worse. On the other hand, the HomePod’s close connection to the iPhone and Apple’s overall ecosystem may be its saving grace: perhaps the smartphone is still what matters. More broadly, it may be the case that we are entering an era where there are new battles, the scale of which are closer to skirmishes than all-out wars a la smartphones. What made the smartphone more important than the PC was the fact they were with you all the time. Sure, we spend a lot of time at home, but we also spend time outside (AR?), entertaining ourselves (TV and VR), or on the go (self-driving cars); the one constant is the smartphone, and we may never see anything the scale of the smartphone wars again.

Data Factories

Facebook quite clearly isn’t an industrial site (although it operates multiple data centers with lots of buildings and machinery), but it most certainly processes data from its raw form to something uniquely valuable both to Facebook’s products (and by extension its users and content suppliers) and also advertisers (and again, all of this analysis applies to Google as well): Users are better able to connect with others, find content they are interested in, form groups and manage events, etc., thanks to Facebook’s data. Content providers are able to reach far more readers than they would on their own, most of whom would not even be aware said content provider exists, much less visit of their own volition. Advertisers are able to maximize the return on their advertising $ by only showing ads to individuals they believe are predisposed to like their product, making it more viable than ever before to target niches (to the benefit of their customers as well).

Against predictable travel

David Perell writes about the difficulty of finding serendipity, diversity, and “real” experiences while traveling. Google and the like have made travel destinations and experiences increasingly predictable and homogeneous.

Call me old-fashioned, but the more I travel, the less I depend on algorithms. In a world obsessed with efficiency, I find myself adding friction to my travel experience. I’ve shifted away from digital recommendations, and towards human ones. For all the buzz about landmarks and sightseeing, I find that immersive, local experiences reveal the surprising, culturally-specific ways of living and thinking that make travel educational. We over-rate the importance of visiting the best-places and under-rate the importance of connecting with the best people. If you want to learn about a culture, nothing beats personalized time with a passionate local who can share the magic of their culture with you. There’s one problem with this strategy: this kind of travel doesn’t scale. It’s in efficiency and doesn’t conform to the 80/20 rule. It’s unpredictable and things could go wrong. Travel — when done right — is challenging. Like all face-to-face interaction, it’s inefficient. The fact that an experience can’t be found in a guidebook is precisely what makes it so special. Sure, a little tip helps — go here, go there; eat here, eat there; stay here, stay there — but at the end of the day, the great pleasures of travel are precisely what you can’t find on Google. Algorithms are great at giving you something you like, but terrible at giving you something you love. Worse, by promoting familiarity, algorithms punish culture.

Google Cloud

Having invested $30b over the last 3 years in its infrastructure from hardware to submarine cables, Google has bought itself a seat at the adult’s table. The question at Next wasn’t, then, whether Google belongs in a conversation with the likes of AWS, Azure and, increasingly, Alibaba. The question is where Google is choosing to invest that capital, and how those investments are paying off. To explore that question, here are 5 brief takeaways from Google Next.

  • Google Goes on Premises
  • Diverse Assets
  • Enterprise Ready
  • Serverless
  • Serverless