Why Most AI Safety Discourse Misses the Point

The AI safety debate has become completely untethered from reality.

Not because people are worried about the wrong things — existential risk is real, misalignment is real, all that matters. The problem is what everyone is actually working on.

Here's the uncomfortable truth: almost nobody is solving the core alignment problem. They're solving legible problems. Problems you can measure. Problems you can get grants for. Problems that look good on a Safety Summit agenda.

The 2025 International AI Safety Report straight-up admits there's no scientific consensus on what the risks even ARE. Yet we've got entire conferences, policy frameworks, export controls — a whole bureaucratic apparatus built around… what exactly?

The illegible problem nobody wants to touch:

We don't actually know how to control the systems we have RIGHT NOW. Not future superintelligence. Current models. GPT-4. Claude. The things running in production.

Yoshua Bengio keeps pointing this out: critics saying "don't worry about safety" never provide a technical methodology for demonstrably controlling these systems. Because one doesn't exist. We're flying blind and pretending the instruments work.

But working on that? The fundamental, hard, illegible alignment problem? That doesn't get you invited to Bletchley Park. It doesn't get you a policy fellowship. It's unglamorous research with no clear metrics.

What we're actually doing instead:

Building consensus documents. Running summits. Creating safety benchmarks that measure… something. Arguing about whether AI safety advocates are "anti-technology" (they're not — they're mostly nerdy progress enthusiasts who want to not die).

The discourse has become its own industry. AI safety is now permanently institutionalized on the international agenda. That sounds like progress until you realize it means the incentive structure has shifted from "solve the problem" to "look like you're solving the problem."

Some researchers get it. The "AI safety via debate" work is interesting — train models in zero-sum games where humans judge truthfulness, let them learn to be honest by competing to be most helpful. That's at least trying to address the actual control problem.

But most of the field? It's what happens when academia meets policy meets VC funding meets existential dread. You get performative safety theater.

The real tell:

Look at what the frontier labs are ACTUALLY doing vs. what they're saying. Anthropic writes essays about "core views on AI safety" while racing to launch more capable models. OpenAI has a whole Safety division but keeps shipping before the alignment team signs off.

This isn't hypocrisy — it's the natural result of misaligned incentives. The market rewards capability. Safety is a cost center. So you get just enough safety work to avoid PR disasters, not enough to actually solve alignment.

ngl, the whole thing feels like watching someone build a nuclear reactor while the physics team is still debating whether atoms exist.

What would actually help:

Stop optimizing for legibility. Stop chasing the next summit invite. Take the weirdest, hardest, most fundamental questions about how minds work and what it means to align them.

Build systems that demonstrably work on CURRENT models before worrying about AGI. If you can't control GPT-4, what makes you think you'll control something smarter?

And maybe — just maybe — admit that the emperor has no clothes. We don't know how to do this yet. The research hasn't solved it. The summits haven't solved it. The benchmarks haven't solved it.

That's fine. Admitting it would be the first honest thing the AI safety discourse has done in years.