We’re Building the Agentic Web Faster Than We’re Protecting It
Google’s WebMCP gives agents structured access to every website. Anthropic’s data shows autonomy doubling with oversight thinning. OpenAI’s agent already drains crypto vaults.
Google shipped working code Thursday that hands AI agents a structured key to every website on the internet. WebMCP, running in Chrome 146 Canary, lets sites expose machine-readable “Tool Contracts” so agents can book a flight, file a support ticket, or complete a checkout without parsing screenshots or scraping HTML. Early benchmarks show 67% less compute overhead than visual approaches. Microsoft co-authored the spec. The W3C is incubating it. This isn’t a proposal. It’s production software already running.
At the same time, Anthropic published data from millions of real agent interactions, and the numbers don’t match the demos. Claude Code’s longest autonomous sessions have nearly doubled in three months, from under 25 minutes to over 45. Experienced users have stopped reviewing individual actions. More than 40% now run full auto-approve. On complex tasks, the agent pauses for clarification more often than humans interrupt it. The oversight model isn’t what anyone designed. Agents are running longer and freer than the people deploying them planned.
And while the infrastructure to put agents inside every website is being assembled, OpenAI released a benchmark showing GPT-5.3-Codex can drain vulnerable crypto smart contracts 72% of the time, up from 31.9% six months ago. No instructions, no human guidance. One test had an agent execute a complete flash loan attack in a single transaction, draining an entire test vault. OpenAI framed this as a defensive tool and put $10M in API credits behind security research. The defense isn’t catching up: the best models still identify less than half of all vulnerabilities.
Three developments, one pattern. The infrastructure to put agents inside every website is being built. Autonomy in real deployments is outpacing oversight. Offensive capability is running well ahead of defense in the same environments where agents will operate. We’re building the agentic web faster than we’re protecting it.
Google Gives Agents a Key to Every Website
Until this week, AI agents trying to complete tasks on websites had two bad options: capture screenshots and burn thousands of tokens having a vision model guess what to click, or parse raw HTML and try to reverse-engineer which elements are buttons and what actions they trigger. Both are slow, expensive, and brittle. A single website redesign can break an entire automation workflow.
Google’s answer is WebMCP, shipped Thursday through Chrome 146 Canary. The protocol lets websites publish a “Tool Contract,” a structured manifest of capabilities that agents discover and invoke directly through a new browser API called navigator.modelContext. Instead of navigating a visual interface built for human eyes, an agent calls a function like buyTicket(destination, date). Early benchmarks show 67% less compute overhead than visual agent-browser interactions.
Two integration paths. The Declarative API requires minimal changes, adding metadata tags to existing HTML forms. The Imperative API handles complex workflows through JavaScript. The protocol is already in testing for customer support ticket filing, travel booking, and ecommerce checkout.
Microsoft co-authored the specification, which means Edge support is likely, though no timeline has been announced. The W3C is incubating it through its Web Machine Learning community group, the same path that brought WebAssembly and WebGPU from proposal to standard.
Alex Nahas, WebMCP’s creator and former Amazon backend engineer who previously built Anthropic’s Model Context Protocol, put it directly: “Think of it as MCP, but built into the browser tab.” Instead of requiring separate backend infrastructure, websites advertise their capabilities through the browser where users are already present. Chrome acts as gatekeeper, requiring user approval before agents execute sensitive operations.
WebMCP is designed for collaborative browsing, not headless automation. Fully autonomous scenarios are a stated non-goal; for those, Google points to its separate Agent-to-Agent protocol. The protocol also doesn’t compete with Anthropic’s MCP. MCP connects AI platforms to service providers through backend servers. WebMCP runs client-side in the browser. They solve adjacent problems.
This is the second piece of Google’s agentic web infrastructure. The Universal Commerce Protocol, announced in January, standardized how agents handle shopping. WebMCP handles the layer below it: the fundamental mechanics of how any agent talks to any website.
Key takeaway: If your website or product depends on humans completing actions, it will increasingly depend on agents completing those actions. WebMCP is the standard being laid down right now for how that works. The first companies to implement Tool Contracts won’t just serve human visitors. They’ll be discoverable by every agent acting on their behalf. Early adoption isn’t an experiment anymore. It’s positioning.
Source: Google Ships WebMCP, The Browser-Based Backbone For The Agentic Web — Forbes
OpenAI Built a Tool That Drains Crypto Wallets. It Works 72% of the Time.
Smart contracts are automated vaults. They hold money, follow rules written in code, and have no customer service line. When there’s a bug in the code, someone can drain the vault. It’s irreversible. Over $100 billion in crypto assets sit in contracts right now.
OpenAI, alongside crypto investment firm Paradigm, released EVMbench this week: a benchmark that tests how well AI agents can find, fix, and exploit security vulnerabilities in smart contracts. They also tested their own agent on it. GPT-5.3-Codex scored 72.2% on exploit tasks, meaning it successfully drained funds from vulnerable contracts nearly three-quarters of the time. Six months ago, GPT-5 scored 31.9% on the same tasks.
The case study is harder to ignore than the benchmark score. A GPT-5.2 agent discovered and executed a flash loan attack, a complex multi-step exploit that borrows funds, manipulates a market, and repays the loan in a single transaction. It drained an entire test vault’s balance. No human guidance. No step-by-step instructions.
The defense isn’t close. Detection (finding bugs) and patching (fixing them) are significantly harder than exploitation. The best model identified only ~46% of vulnerabilities. One key finding: give the AI a small hint about where to look, and patch success jumps from 39% to 94%. The bottleneck isn’t skill. It’s search. Attackers know where to look. Defenders don’t.
OpenAI is framing this as a defensive research tool. They’re backing it with $10 million in API credits for cybersecurity researchers, an expanding beta of Aardvark (their AI security research agent), and a new Trusted Access for Cyber program for vetted security professionals. The argument is that defenders need to understand exactly what offensive AI can do before they can build adequate protections.
That argument is legitimate. It’s also not reassuring. The same capability is available to anyone who builds a similar system, and the benchmark makes clear what those systems can do. The race between AI-powered offense and defense is real. Right now, offense is winning by a wide margin.
Why this matters: The crypto market is the first major asset class where AI agents are already operating at scale. What works against vulnerable smart contracts will work against other automated financial infrastructure as it’s built. EVMbench is a preview of what sophisticated attacks will look like in 18 months. If your organization is building AI-powered financial workflows, the question isn’t whether AI will probe those systems. It’s whether your defense is ready when it does.
Source: OpenAI built a crypto thief — The Neuron
Anthropic Studied Millions of Agent Deployments. The Numbers Don’t Match the Pitch.
Anthropic published data from millions of real agent interactions across Claude Code and its public API this week. Read it if you’re running agents in production, because it describes what’s actually happening, not what anyone designed.
The headline number: Claude Code’s longest autonomous sessions have nearly doubled in three months, from under 25 minutes to over 45. The increase is smooth across model releases, which means it isn’t just better capabilities. Existing models were already capable of more autonomy than they were being given. Users are simply letting them run.
The autonomy shift is sharper than that number suggests. New users run roughly 20% of sessions on full auto-approve, meaning no review of individual actions. Experienced users hit 40% and above. That’s not a gradual drift. Experienced users are treating agents the way they treat employees: let them work, intervene when something looks wrong. The oversight model is shifting from review-every-action to trust-but-verify, without anyone explicitly deciding that’s the policy.
One finding cuts against the assumption that agents are running unsupervised: on complex tasks, Claude Code stops to ask for clarification more than twice as often as humans interrupt it. The agent is more cautious than its users. But that’s Claude Code specifically, a product built with explicit intervention points. The pattern across the broader API is less controlled.
Agents are already operating in healthcare, finance, and cybersecurity, though not at scale yet. Software engineering accounts for nearly 50% of agentic activity. Those new domains are where mistakes are hardest to reverse.
Anthropic said plainly that effective oversight will require post-deployment monitoring infrastructure and new human-AI interaction models that don’t yet exist. That’s an admission the current oversight setup isn’t adequate. Coming from the company running the systems, that’s worth taking seriously.
The strategic read: Experienced users are trusting agents more. Sessions are running longer. The domains expanding into are higher-stakes. This is the pattern you’d expect before a significant failure in a production agentic system. If your organization has agents running, the question isn’t whether you have a policy. It’s whether you have monitoring that catches drift before the output is already wrong.
Source: Measuring AI agent autonomy in practice — Anthropic
Tracking
What CEOs Should Be Watching:
- SaaSpocalypse Now: Claude’s 11 Plugins Triggered a $285B Wipeout — Forbes — Claude’s integrated tools (GitHub, Jira, Notion, databases) are cutting into enterprise software revenue that was previously distributed across dozens of vendors. The plugin consolidation story is accelerating and the market cap damage is already visible.
- Figma Just Hit $304M in a Single Quarter, Growing 40% — SaaStr — Despite predictions that AI would gut design tool spending, Figma is accelerating. The company will start enforcing monthly AI use limits in March. AI may be expanding markets, not just cannibalizing them.
- Microsoft Bug Let Copilot Access Confidential Emails Without Consent — PCMag — A bug in Microsoft Office exposed customers’ confidential emails to Copilot AI without authorization. If you’re running Copilot across your organization, your security team should treat this as a model for what unintended data access looks like at enterprise scale.
THE BOTTOM LINE
The agentic web is being built right now. Not planned. Not piloted. Built.
Google shipped working protocol code that puts AI agents inside every website without scraping or guessing. Microsoft co-authored it. Anthropic’s data shows agents running longer, with less supervision, in higher-stakes domains than anyone officially announced. OpenAI demonstrated that the same coding agents being deployed for software engineering can drain a financial vault in a single transaction.
Prepare for agents, not just AI tools. The shift from AI assistants to agents isn’t a future state. It’s in your pipeline whether your strategy accounts for it or not. Experienced users are at 40% full auto-approve and climbing. Agents in finance and healthcare are already appearing in production data. Your procurement, security, and risk teams need agent-specific policies now, not after the first incident.
Task your security team this week. The EVMbench results aren’t a crypto story. They’re a preview of what AI-powered attacks look like against any automated financial workflow. If you’re building infrastructure that agents will eventually touch, audit it before the offensive agents do. The gap between 72% exploit success and 46% detection is not a rounding error.
Build the monitoring before you need it. Anthropic admitted that effective oversight requires infrastructure that doesn’t exist yet. That’s your window. Organizations building post-deployment monitoring for agent behavior now will have a real advantage when regulators and insurers start asking. They will ask.
The companies treating this week as background noise will still be running point-in-time evaluations when agents are on 24-hour sessions. Don’t be those companies.
Key People & Companies
| Name | Role | Company | Link |
|---|---|---|---|
| Alex Nahas | Creator, WebMCP | X | |
| Sundar Pichai | CEO | X | |
| Dario Amodei | CEO | Anthropic | X |
| Dylan Field | CEO | Figma | X |
Sources
- Google Ships WebMCP, The Browser-Based Backbone For The Agentic Web — Forbes
- OpenAI built a crypto thief — The Neuron
- Measuring AI agent autonomy in practice — Anthropic
- Agentic AI systems don’t fail suddenly — they drift over time — CIO.com
- SaaSpocalypse Now: Claude’s 11 Plugins Triggered A $285B Wipeout — Forbes
- Figma Just Hit $304M in a Single Quarter — SaaStr
- Microsoft Bug Let Copilot Access Confidential Emails Without Consent — PCMag
Compiled from 299 sources across news sites, X threads, YouTube, and company announcements. Cross-referenced with thematic analysis and edited by Anthony Batt, Harry DeMott and CO/AI’s team with 30+ years of executive technology leadership.
Past Briefings
Control Is Slipping: Armed Robots, $135BBets, Self-Evolving AI
China's exporting missile-armed robotdogs. Meta's betting $135B on NVIDIA. AIagents learned to improve themselveswithout permission. The autonomous arms race just shifted into overdrive. Control is slipping in three directions at once. Last week in Riyadh, China displayed the PF-070 at the World Defense Show: a production-ready robot dog carrying four anti-tank missiles, marketed directly to Middle Eastern and Asian buyers. Not a prototype. A product. Turkey already fielded missile-armed quadrupeds at IDEF 2025. Russia showed an RPG-armed version in 2022. Ukraine's deploying them on the frontline. The global arms market for autonomous ground weapons is forming right now, and China's...
Feb 17, 2026Stop optimizing for last quarter’s AI economics
Anthropic dropped Sonnet 4.6 on Tuesday at one-fifth the cost of their flagship model while matching its performance on enterprise benchmarks. For companies running agents that make millions of API calls per day, the math just changed. OpenAI and Google now have to match these prices or lose customers. That $30B raise last week wasn't about safety research—it was about having enough capital to undercut competitors while scaling infrastructure to handle the volume. While American AI labs fight over pricing and benchmarks, China put four humanoid robot startups on prime-time national TV. The CCTV Spring Festival gala drew 79% of...
Feb 16, 2026Microsoft Says 12 Months. Anthropic Said 5 Years. Someone’s Catastrophically Wrong About AI Jobs.
Microsoft Says 12 Months, Anthropic Said 5 Years, OpenAI Just Hired the Competition, and China's Catching Up on Consumer Hardware Two AI executives gave dramatically different timelines for the AI job apocalypse. Mustafa Suleyman, Microsoft's AI CEO, told the Financial Times that "most" white-collar tasks will be "fully automated within the next 12 to 18 months." Dario Amodei, Anthropic's CEO, predicted last summer it would take five years for AI to eliminate 50% of entry-level jobs. Both can't be right. The difference matters because investors, boards, and employees are making decisions right now based on these predictions. Meanwhile, OpenAI just...