When will AI be able to help solve its own alignment problems?

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

AI alignment? That’s a you problem, Artificial intelligence.

Artificial intelligence’s growing capabilities raise profound questions about when AI systems might assist with or even automate aspects of AI alignment research itself. While current frontier AI models demonstrate remarkable knowledge capabilities and outperform human experts on standardized exams, they still struggle with sustained, complex projects that require deep conceptual understanding. This paradox creates an opportunity to apply Metr’s law—the idea that AI systems will eventually automate tasks requiring t amount of human time—to predict when AI might meaningfully contribute to solving the alignment problem.

The capabilities gap: Current frontier AI systems demonstrate impressive knowledge and text prediction abilities while falling short of autonomous project execution.

Despite outperforming human experts on exams and knowledge-based tasks at a fraction of the cost, today’s most advanced AI agents cannot reliably handle even relatively basic computer-based work like remote executive assistance.
The most sophisticated AI systems possess considerable “expertise” but lack the capacity to independently conduct good research, which requires significant time investment even for purely theoretical work.

The alignment opportunity: Metr’s law provides a potential framework for predicting when AI could meaningfully contribute to alignment research.

The central question becomes: at what point will AI systems be able to “automatically do tasks that humans can do in time t” with sufficient capability to advance alignment research?
This framing helps distinguish between AI’s impressive pattern-matching abilities and the more complex requirements of conducting original research to solve alignment challenges.

Why this matters: The timeline for AI assistance in alignment research has significant implications for AI safety.

If alignment research remains exclusively human-driven for too long while capabilities rapidly advance, we may face scenarios where powerful systems emerge before adequate safety measures.
Conversely, if AI can meaningfully assist with alignment research relatively soon, it could help accelerate safety work to keep pace with capability development.

The critical question: The article frames a key consideration for the field through Metr’s law.

The central inquiry becomes determining the threshold time t at which AI can perform tasks that humans can complete in time t, where those tasks constitute meaningful alignment research.
This frames the debate around when AI might cross from being merely knowledgeable about alignment to being practically helpful in solving it.

How far along Metr's law can AI start automating or helping with alignment research?

lesswrong

Menu

When will AI be able to help solve its own alignment problems?

Recent News

TransUnion’s AI-driven platform transformation led by Venkat Achanta

Most Americans aren’t that into AI. But they do like it for photo editing.

AI-driven LinkedIn updates boost job search success

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

When will AI be able to help solve its own alignment problems?

Recent News

TransUnion’s AI-driven platform transformation led by Venkat Achanta

Most Americans aren’t that into AI. But they do like it for photo editing.

AI-driven LinkedIn updates boost job search success

Join the revolution

CO/AI

Resources

Join the revolution