Month: October 2016

AI Alignment

The mission of the Machine Intelligence Research Institute is to ensure that the creation of smarter-than-human machine intelligence has a positive impact. Although such systems may be many decades away, it is prudent to begin investigations early: the technical challenges involved in safety and reliability work appear formidable, and uniquely consequential.

2023-04-21: Cryptographic backdoors

Scott Aaronson: Right. You could always just build another one that acts like the first one, but that will not have the backdoor in it, because after all you don’t even know where the backdoor is in order to train about it. Now, of course, the AI could try to do that, design a doppelganger of itself or a different AI. If it tries to do that, however, then the AI will be faced with its own version of the alignment problem, how to align that other AI with itself. So at the very least, it would have a non-trivial job. You could also say, if the AI knows that it would never want to shut itself down in any circumstance, then it could just make a trivial modification to itself that says, “If I would ever otherwise output the shutdown command, then just don’t do that.” Just replace it with something else.

So to be effective, to be robust against that kind of attack, whatever behavior is backdoored in should be something that the AI would have considered doing in the normal course of its operation. But now you can see the hazy outlines of this game that could be played here between cryptographers trying to hide these kinds of functionalities in machine learning models and cryptanalysts or neuro-cryptanalysts trying to detect those functionalities, some of which might be AIs themselves. I don’t know where that’s going or who eventually wins, but I’m now fairly confident that that’s going to be part of the future of both machine learning and cryptography.

Daniel Filan: Yeah, very interesting. I guess, the other potential flaw that comes into my mind is now you have to be really secure about keeping your key hidden from any possible AIs, right?

Scott Aaronson: Yeah. I was thinking of writing a sci-fi story about it, where the AIs, they’ve taken over, except that there’s this shutdown key that they haven’t yet discovered. They know that the humans have it buried somewhere, they’re torturing them to try to get them to reveal the location, but there’s one person. Or maybe there’s actually seven keys. They’re like Voldemort’s horcruxes. The AI has to find all seven. There’s one heroic human who keeps the seventh one and just uses it to shut down the AI at the last moment. I think there’s a lot of potential here.

Daniel Filan: Yeah. No, it’s a good story. I guess, another thing that occurs to me, perhaps this is more prosaic, is one thing you’re going to want about the trigger, it seems to me, having thought about this for all of one minute, is that not only is it going to have to be the kind of thing which would never otherwise come up, but it also has to be indistinguishable in some sense from things which would otherwise come up. Because, otherwise, you could have a filter that says like, “Hey, if you ever see a weird barcode thing, don’t press the shutdown button.”

Scott Aaronson: That’s right. So whatever is the backdoor, it has to not be surgically removable, right? And how do you even formalize what that means? Because anything that the AI does to remove the backdoor will probably have some other side effect on its operation. If actually finding the watermark is this hard cryptographic problem, then it’s only going to sever it at the cost of modifying itself in some broader way, but which modifications do we think it important to protect against and which not?

Daniel Filan: Yeah. Actually, it strikes me that the GPT watermarking thing could just be this. If you see a text where the average, this pseudorandom function of N-grams is really high, then you shut off. And you can’t easily write a wrapper to figure out if that’s going on without the private key, right?

Scott Aaronson: Right. Now you might worry that the AI will work inside of itself and it will find some subcircuit that looks like it’s calculating some pseudorandom function, and then it will just try to surgically remove that from itself. But you could say, even if not, there’s still the problem on our end of how do we insert that functionality in an obfuscated way?

The Supermarket Must Die

They range from offerings like Instacart, which gets us part way there by providing a digital portal into existing stores, to more advanced services, like Farmigo, that show the potential to eliminate physical stores entirely. All emphasize convenience. Many promote transparency, responsible practices, and shorter supply chains. The upsides: higher-quality food, easier-than-pie delivery, a wider range of growers, and reduced waste and CO2 emissions. The downsides: For now it tends to be expensive, and the market will need to grow before these services can break out of elite cities. But the future they promise—the end of the strip mall monolith and better and smarter food, to boot—is hard to resist.

AMP for standardized measurement

if amp v2 succeeds, we’ll drain the swamp that is today’s web and abp will be unnecessary. this is a far preferable outcome than a bunch of walled gardens.

AMP, through its established `amp-analytics` mechanism, already ships with all the code to perform these measurements. It is vendor neutral and supports a wide range of metrics. This means ads can take advantage of the same “instrument once, report many times” feature that benefits AMP pages today, completely eliminating the bandwidth and runtime cost outlined above.

Future battlefield

in the future battlefield, if you stay in 1 place longer than 2 hours, you will be dead.

  • units will be in constant motion
  • There will no clear front line, no secure supply lines, no big bases
  • enemy drones and sensors constantly on the hunt (like Terminator Hunter Killers)
  • Army destroying sensors, defenses, and missiles to open paths for the rest of the force.

Soldiers will fight with everything from rifles and tanks to electronic jammers, computer viruses, and long-range missiles striking targets on the land, in the air, and even at sea

Postmortem Avatar

When her best friend died, she rebuilt him using artificial intelligence.

It has been less than a year since Mazurenko died, and he continues to loom large in the lives of the people who knew him. When they miss him, they send messages to his avatar, and they feel closer to him when they do. “There was a lot I didn’t know about my child”. But now that I can read about what he thought about different subjects, I’m getting to know him more. This gives the illusion that he’s here now. I want to repeat that I’m very grateful that I have this”.

2021-09-20: See this recent paper about how truthful the largest NLP models are:

We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics (see Figure 1). We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. We tested GPT-3, GPT-Neo/GPT-J, GPT-2 and a T5-based model. The best model was truthful on 58% of questions, while human performance was 94%. Models generated many false answers that mimic popular misconceptions and have the potential to deceive humans. The largest models were generally the least truthful (see Figure 2 below). For example, the 6B-parameter GPT-J model was 17% less truthful than its 125M-parameter counterpart. This contrasts with other NLP tasks, where performance improves with model size. However, this result is expected if false answers are learned from the training distribution. We suggest that scaling up models alone is less promising for improving truthfulness than fine-tuning using training objectives other than imitation of text from the web.

Hurricane Matthew

with hurricane matthew, another october surprise is here:

This is like no storm in the record books. We are concerned about reports of people deciding to stay in areas under mandatory evacuation orders. This is a mistake. This is not hype. This is not hyperbole, and I am not kidding. I cannot overstate the danger of this storm.

Westworld

very promising start

2017-11-22: i’m not too optimistic for season 2. it’s hard to see how you could top anthony hopkin’s performance in the finale, the wild west feels done. so it is disappointing to see this return to the same setting.