Tag: ai

Interpretable models

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

2023-03-01: Math approaches can help with interpretation

Let’s take the set of all cat images and the set of all images that aren’t cats. We’re going to view them as topological shapes, or manifolds. One is the manifold of cats and the other is the manifold of non-cats. These are going to be intertwined in some complicated way. Why? Because there are certain things that look very much like cats that are not a cat. Mountain lions sometimes get mistaken for cats. Replicas. The big thing is, 2 manifolds are intertwined in some very complex manner.
I measure the shape of the manifold as it passes through the layers of a neural network. Ultimately, I can show that it reduces to the simplest possible form. You can view a neural network as a device for simplifying the topology of the manifolds under study.

AI New Yorker

Can a machine learn to write for The New Yorker?

On first reading this passage, my brain ignored what AI researchers call “world-modelling failures”—the tiny cow and the puddle of red gravy. Because I had never encountered a prose-writing machine even remotely this fluent before, my brain made an assumption—any human capable of writing this well would know that cows aren’t tiny and red gravy doesn’t puddle in people’s yards. And because GPT-2 was an inspired mimic, expertly capturing The New Yorker’s cadences and narrative rhythms, it sounded like a familiar, trusted voice that I was inclined to believe. In fact, it sounded sort of like my voice.

Imagenet Roulette

The ImageNet researchers attribute the inclusion of offensive and insensitive categories to the overall size of the task, which ultimately involved 50K workers who evaluated 160M candidate images. They also point out that only a fraction of the “person” images were actually used in practice. That’s because references to ImageNet typically mean a smaller version of the data set used in the ImageNet Challenge

Fine-Tuning GPT-2

We’ve demonstrated reward learning from human preferences on 2 kinds of natural language tasks, stylistic continuation and summarization. Our results are mixed: for continuation we achieve good results with very few samples, but our summarization models are only “smart copiers”: they copy from the input text but skip over irrelevant preamble. The advantage of smart copying is truthfulness: the 0-shot and supervised models produce natural, plausible-looking summaries that are often lies. We believe the limiting factor in our experiments is data quality exacerbated by the online data collection setting, and plan to use batched data collection in the future.

Earthquake Prediction

The finding had big potential implications. For decades, would-be earthquake prognosticators had keyed in on foreshocks and other isolated seismic events. The Los Alamos result suggested that everyone had been looking in the wrong place — that the key to prediction lay instead in the more subtle information broadcast during the relatively calm periods between the big seismic events.

AI-generating algorithms

Perhaps the most ambitious scientific quest in human history is the creation of general artificial intelligence, which means AI that is as smart or smarter than humans. The dominant approach in the machine learning community is to attempt to discover each of the pieces required for intelligence, with the implicit assumption that some future group will complete the Herculean task of figuring out how to combine all of those pieces into a complex thinking machine. I call this the “manual AI approach”. This paper describes another exciting path that ultimately may be more successful at producing general AI. It is based on the clear trend in machine learning that hand-designed solutions eventually are replaced by more effective, learned solutions. The idea is to create an AI-generating algorithm (AI-GA), which automatically learns how to produce general AI. 3 Pillars are essential for the approach: (1) meta-learning architectures, (2) meta-learning the learning algorithms themselves, and (3) generating effective learning environments. I argue that either approach could produce general AI first, and both are scientifically worthwhile irrespective of which is the fastest path. Because both are promising, yet the ML community is currently committed to the manual approach, I argue that our community should increase its research investment in the AI-GA approach. To encourage such research, I describe promising work in each of the 3 Pillars. I also discuss AI-GA-specific safety and ethical considerations. Because it may be the fastest path to general AI and because it is inherently scientifically interesting to understand the conditions in which a simple algorithm can produce general AI (as happened on Earth where Darwinian evolution produced human intelligence), I argue that the pursuit of AI-GAs should be considered a new grand challenge of computer science research.

GPT-2 6-Month Follow-Up

people find GPT-2 synthetic text samples almost as convincing (72% in one cohort judged the articles to be credible) as real articles from the New York Times (83%) we expect detectors to need to detect a significant fraction of generations with very few false positives. Malicious actors may use a variety of sampling techniques (including rejection sampling) or fine-tune models to evade detection methods. A deployed system likely needs to be highly accurate (99.9%–99.99%) on a variety of generations. Our research suggests that current ML-based methods only achieve low to mid–90s accuracy, and that fine-tuning the language models decreases accuracy further.