maybe ai will kill us all

This is going to be my first long blog (and first blog on time) in a while! I’ll talk about an interesting problem I learned about this week.

productivity check in

Couple of goals for this week.

How I plan to do this?

The zettlekasten method is awesome too. This is the first time that I’ve actually felt like reviewing my notes hasn’t been a waste of time. I won’t describe everything in full detail here, but I 100% recommend this system/app combo if you’re trying to get into digital note taking.

messy-optimizer

The coolest (and scariest) thing I learned about this week was the mesa-optimizer. It’s a key risk when we create AI’s that pursue complex goals.

what’s an optimizer??

An optimizer is a method of taking an existing solution and nudging it toward a better one. What defines better varies from environment to environment. The best example of an optimizer is evolution. Evolution optimizes for reproductive fitness, and builds an organism (like us) that maximizes this fitness.

However, this organism doesn’t care about the optimizers goal. We are optimized by evolution to maximize our reproductive fitness, yet (most) of us don’t really want to pump out as many babies as we can before we die. Instead we optimize for different things, like love, joy, and fulfillment. In other words The optimizer (evolution) built an optimizer (us).

why does this matter?

First some jargon. When an optimizer builds another optimizer, we call the resulting optimizer a mesa-optimizer. The mesa prefix meaning below, giving us a “below-optimizer”, an optimizer beneath our previous optimizer.

So optimizers might build mesa-optimizers. The mesa-objective of the optimizer should be aligned with the objective of our original optimizer right? Otherwise, why would the optimizer select it?

Not quite.

Going back to the evolution example, evolution optimized us for reproductive fitness. It was in evolution’s favor for sex to feel pleasurable for humans, but for us, that’s just a nice side effect.

We don’t optimize for evolutions goal (reproduction) we optimize for our goal, (pleasure). That’s why in addition to sex, we also masturbate, even though through the lens of evolution that’s sub-optimal.

Similarly, we train our AI models via an optimizer. And that optimizer can (and will) build mesa-optimizers. The consequences of such a process can be drastic.

a simple example

Above, you see a blue ball training to navigate an environment with these holographic spheres. There is a specific order the agent must visit the spheres in, and our optimizer rewards the agent for visiting the spheres in the proper order.

This is the training environment for the agent, and provided with is is a red helper bot which knows the order and carries out the blue bot’s task. Within this environment, the optimizer is happy. It’s managed to train an agent that visits the spheres it the proper order.

However, the environment above shows that the agent hasn’t really learned the order of spheres, just that it should follow the red ball regard less of if the red ball is correct or not. While the optimizer optimizes for order, the mesa-optimizer optimizes for distance from the red ball.

Faults in the mesa-optimizer are revealed when there are distributional shifts between the testing environment and the training environment. This is a serious problem. What if a model trained to identify criminals actually is focused on identifying minorities? If a self driving car accelerates because a stop sign is painted green instead of red? There are a million ways the mesa-optimizer can be out of line the the optimizer, the question is, can we find them all? I’ll let you know if I find the answer.

The above example is taken from here