Today, we published “Open-Ended Learning Leads to Generally Capable Agents,” a preprint detailing our first steps to train an agent capable of playing many different games without needing human interaction data. We created a vast game environment we call XLand, which includes many multiplayer games within consistent, human-relatable 3D worlds. This environment makes it possible to formulate new learning algorithms, which dynamically control how an agent trains and the games on which it trains. The agent’s capabilities improve iteratively as a response to the challenges that arise in training, with the learning process continually refining the training tasks so the agent never stops learning. The result is an agent with the ability to succeed at a wide spectrum of tasks — from simple object-finding problems to complex games like hide and seek and capture the flag, which were not encountered during training. We find the agent exhibits general, heuristic behaviors such as experimentation, behaviors that are widely applicable to many tasks rather than specialized to an individual task. This new approach marks an important step toward creating more general agents with the flexibility to adapt rapidly within constantly changing environments.
Tag: google
Starline
let’s hope this goes somewhere, unlike Glass. perhaps this can be bootstrapped via C-level status symbol purchases, like cisco telepresence (which was 10 years ahead of its time)
Spectre web exploit
In this post, we will share the results of Google Security Team’s research on the exploitability of Spectre against web users, and present a fast, versatile proof-of-concept (PoC) written in JavaScript which can leak information from the browser’s memory. We’ve confirmed that this proof-of-concept, or its variants, function across a variety of operating systems, processor architectures, and hardware generations.
Improving urban GPS
Combine raw GPS measurements with building outlines, plus some linear algebra, to reduce GPS errors in dense urban environments by 75%.
Easy WiFi roaming
Orion Wifi this looks superior to boingo etc because it’s transparent for users. hopefully this scales fast, this would have been technically possible for nearly 20 years, though to be fair the backbone was far less mature. i remember fon in 2003, among others. it’s disappointing how long “obvious” ideas take sometimes.
Waymo Superhuman Safety
The gauntlet is down. If Cruise, Zoox, Argo, Tesla and others want to say they are in the game, they need to show the same data. If they won’t show it, we should presume they are afraid of releasing it for a reason. No proprietary secrets are disclosed. A few useful lessons are revealed but everybody should be sharing those lessons anyway, for the good of the industry.
Will Waymo be as bold in deploying as suggested above? Probably not. It’s a quirk of humanity that “people don’t like being killed by robots.” We would rather be killed by drunks. We expect perfection from machines that they can’t deliver, and which we don’t expect from other human drivers. The risk to the public of Waymo deploying today are not just low, they are much lower than the risk which will be created by the people who drive themselves rather than taking a ride in a Waymo. And I don’t just mean the people today. If we assume that Waymo grows in a similar way depending on when they launch, and that if they launch a month later they get big a month later, then the math tells us the risk they prevent is actually all the people who didn’t ride with Waymo in that whole period before they got big, and it’s equal to all the people who ride in a month when they do get big. If Waymo can grow in the distant future to be 10% of trips in the USA, that means delaying a month causes 80000 accidents and over 250 deaths due to people who drove themselves rather than rode in a Waymo. For each and every month of delay.
Meena
Towards a human-like open-domain chatbot. Meena is the new eliza, and is getting close to human levels of performance.
MuZero
MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games – the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled – our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
More With Less
There’s still a lot of potential to build more efficient and larger scale computing systems, particularly ones tailored for machine learning. And I think the basic research that has been done in the last 5 or 6 years still has a lot of room to be applied in all the ways that it should be. We’ll collaborate with our Google product colleagues to get a lot of these things out into real-world uses.
But we also are looking at what are the next major problems on the horizon, given what we can do today and what we can’t do. We want to build systems that can generalize to a new task. Being able to do things with much less data and with much less computation is going to be interesting and important.
Google & Face Recognition
The company’s new facial-recognition service comes with limitations to prevent abuse, which sometimes lets competitors take the lead.