Banging through it all, though, to come up with a model that fit the data, tweaking and prodding and adjusting and starting all over when it didn’t work – which is what the evolutionary algorithms did – takes something else: inhuman patience and focus. That’s what computers are really good at, relentless grinding. I can’t call it intelligence, and I can call it artificial intelligence only in the sense that an inflatable palm is an artificial tree. I realize that we do have to call it something, though, but the term “artificial intelligence” probably confuses more than it illuminates.
2022-12-09: Some new hopes for paper mining, but see this caveat.
SciHub has 88m papers, and if we assume that we can extrapolate the Semantic Scholar dataset statistics (2600 words per article) with some paper loss due to old/faulty PDFs, it could be reasonable to expect 200b tokens of scientific knowledge, 10x bigger than the Minerva training set of Arxiv papers (21b tokens). This is a 10x boost in technical knowledge that would exist inside current LLMs.
There will be a universal language of physical science work that does not speak directly to humans. Monolithic cloud labs alone may not be optimal deployment of automated biology in the future. Projects like PyHamilton demonstrate growing open source communities for benchtop automation, and the SayCan collaboration by Google and Everyday Robots is a reminder of how multifunctional robots are steadily progressing (as well as ultralight indoor drones). As the cost curve goes down and the natural-language programmability goes up, there may be an intersection at which it is easier to convert an existing lab environment/protocol into an automated one, rather than to outsource work to a physically separate facility. Or, there may be a steady-state solution that some tasks are optimal for large automated warehouses and others are optimized for more distributed, edge labs. If there is any future of multiple robotic work providers, then interoperability will become a bottleneck, which will motivate a universal formalization of life science work.