Tag: ai

Multi-Agent Games

Recent breakthroughs in AI for multi-agent games like Go, Poker, and Dota, have seen great strides in recent years. Yet none of these games address the real-life challenge of cooperation in the presence of unknown and uncertain teammates. This challenge is a key game mechanism in hidden role games. Here we develop the DeepRole algorithm, a multi-agent reinforcement learning agent that we test on The Resistance: Avalon, the most popular hidden role game. DeepRole combines counterfactual regret minimization (CFR) with deep value networks trained through self-play. Our algorithm integrates deductive reasoning into vector-form CFR to reason about joint beliefs and deduce partially observable actions. We augment deep value networks with constraints that yield interpretable representations of win probabilities. These innovations enable DeepRole to scale to the full Avalon game. Empirical game-theoretic methods show that DeepRole outperforms other hand-crafted and learned agents in 5-player Avalon. DeepRole played with and against human players on the web in hybrid human-agent teams. We find that DeepRole outperforms human players as both a cooperator and a competitor.

2022-11-28: Diplomacy has fallen (with big caveats)

When people say the AI ‘solved’ Diplomacy, it really really didn’t. What it did, which is still impressive, is get a handle on the basics of Diplomacy, in this particular context where bots cannot be identified and are in the minority, and in particular where message detail is sufficiently limited that it can use an LLM to be able to communicate with humans reasonably and not be identified.

If this program entered the world championships, with full length turns, I would not expect it to do well in its current form, although I would not be shocked if further efforts could fix this (or if they proved surprisingly tricky).

Interestingly, this AI is programmed not to mislead the player on purpose, although it will absolutely go back on its word if it feels like it. This is closer to correct than most players think but a huge weakness in key moments and is highly exploitable if someone knows this and is willing and able to ‘check in’ every turn.

2022-12-02: Stratego has fallen

Stratego, the classic board game that’s more complex than chess and Go, and craftier than poker, has been mastered. We present DeepNash, an AI agent that learned the game from scratch to a human expert level by playing against itself.
DeepNash uses a novel approach, based on game theory and model-free deep reinforcement learning. Its play style converges to a Nash equilibrium, which means its play is very hard for an opponent to exploit. So hard, in fact, that DeepNash has reached an all-time top-three ranking among human experts on the world’s biggest online Stratego platform, Gravon. The machine learning approaches that work so well on perfect information games, such as DeepMind’s AlphaZero, are not easily transferred to Stratego. The need to make decisions with imperfect information, and the potential to bluff, makes Stratego more akin to Texas hold’em poker and requires a human-like capacity once noted by the American writer Jack London: “Life is not always a matter of holding good cards, but sometimes, playing a poor hand well.”

Tensor Considered Harmful

Despite its ubiquity in deep learning, Tensor is broken. It forces bad habits such as exposing private dimensions, broadcasting based on absolute position, and keeping type information in documentation. This post presents a proof-of-concept of an alternative approach, named tensors, with named dimensions. This change eliminates the need for indexing, dim arguments, einsum- style unpacking, and documentation-based coding. The prototype PyTorch library accompanying this blog post is available as namedtensor.

change.org parody

A neural net trained on change.org tries to write its own petitions, eg “Help Bring Climate Change to the Philippines!” and “Donald Trump: Change the name of the National Anthem to be called the ‘Fiery Gator’”

stories about sexual selection (like “women are naturally attracted to dominant-looking men, because throughout evolution they were better able to provide”) are meaningless, because for most of human history women did not choose their own mates, and so women are unlikely to have strong biologically-ingrained mate preferences.

Totally not robots

If Only Turing Was Alive To See This

There’s a silly subreddit called r/totallynotrobots where people pretend to be badly-disguised robots. They post cat pictures with captions like “SINCE I AM A HUMAN, THIS SMALL FELINE GENERATES POSITIVE EMOTIONS IN MY CARBON-BASED BRAIN” or something like that. There’s another subreddit called r/SubSimulatorGPT2, that trains GPT-2 on various subreddits to create imitations of their output. Now r/SubSimulatorGPT2 has gotten to r/totallynotrobots, which means we get to see a robot pretending to be a human pretending to be a robot pretending to be a human.

DNN parallelization strategies

Traditional approaches to training exploit either data parallelism (dividing up the training samples), model parallelism (dividing up the model parameters), or expert-designed hybrids for particular situations. FlexFlow encompasses both of these in its sample (data parallelism), and parameter (model parallelism) dimensions, and also adds an operator dimension (more model parallelism) describing how operators within a DNN should be parallelized, and an attribute dimension with defines how different attributes within a sample should be partitioned (e.g. height and width of an image).

ML Data validation

anecdotal evidence from some teams suggest a mental shift towards a data-centric view of ML, where the schema is not solely used for data validation but also provides a way to document new features that are used in the pipeline and thus disseminate information across the members of the team.

Solving Quake

We used a tournament-style evaluation to demonstrate that an agent can achieve human-level performance in a 3D multiplayer first-person video game, Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input. We used a 2-tier optimization process in which a population of independent RL agents are trained concurrently from 1000s of parallel matches on randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. These results indicate the great potential of multiagent reinforcement learning for artificial intelligence research.

Low energy classification

The goal is to produce a low-energy hardware classifier for embedded applications doing local processing of sensor data. To get there, the authors question a whole bunch of received wisdom, beginning with this: do we really need to convert the analog sensor data into a digital signal?! Here’s another fun one: what if instead of being something you worked hard to avoid, you had to build your whole application based on the outcomes of data races??!

The AI Does Not Hate You

This is a book about AI and AI risk. But it’s also more importantly about a community of people who are trying to think rationally about intelligence, and the places that these thoughts are taking them, and what insight they can and can’t give us about the future of the human race over the next few years. It explains why these people are worried, why they might be right, and why they might be wrong. It is a book about the cutting edge of our thinking on intelligence and rationality right now by the people who stay up all night worrying about it.