back
Get SIGNAL/NOISE in your inbox daily
I will argue that a large class of reward functions, which I call “behaviorist”, and which includes almost every reward function in the RL and LLM literature, are all doomed to eventually lead to AI that will “scheme”—i.e., pretend to be docile and cooperative while secretly looking for opportunities to behave in egregiously bad ways such as world takeover (cf. “treacherous turn”)…
Recent Stories
Jan 19, 2026
App Store apps are exposing data from millions of users
An effort led by security research lab CovertLabs is actively uncovering troves of (mostly) AI-related apps that leak and expose user data.
Jan 19, 2026Stop ignoring AI risks in finance, MPs tell BoE and FCA
Treasury committee urges regulators and Treasury to take more ‘proactive’ approach
Jan 19, 2026OpenAI CFO Friar: 2026 is year for ‘practical adoption’ of AI
OpenAI CFO Sarah Friar said the company is focused on "practical adoption" in 2026, especially in health, science, and enterprise.