← All news

AI News — June 3, 2026

#models #agents

Anthropic scales Project Glasswing and Claude Mythos to ~150 critical-infrastructure organizations, while Microsoft Build's second day puts names to its homegrown MAI model family and ships a standalone GitHub Copilot app.

Model releases

  • [2026-06-02] Anthropic — expanded Project Glasswing, extending Claude Mythos Preview access to ~150 new organizations across 15+ countries, most of them critical-infrastructure operators in newly added sectors (power, water, healthcare, communications, hardware). The first ~50 partners have already surfaced 10,000+ high/critical-severity flaws — Cloudflare reports ~2,000 bugs (400 high/critical) across critical-path systems and Mozilla found 271 vulnerabilities in Firefox 150, >10× a prior run. Anthropic also points to Claude Security, a new product that uses public frontier models like Opus 4.8 to scan codebases and suggest patches. (official, source, source)

    This is the concrete follow-through on the “Mythos to all customers in the coming weeks” intent flagged earlier — but it’s a controlled expansion to vetted infrastructure operators, not general availability. The practical read: a security-strong frontier model is now finding real bugs at scale in software you depend on, and the productized spin-off (Claude Security) is what most teams will actually get their hands on first.

  • [2026-06-02] Microsoft — Build’s second day put names to yesterday’s homegrown-model push: a family of seven self-developed MAI models, led by MAI-Thinking-1 (a reasoning model) and MAI-Code-1 (an inference-efficient coding model tuned for GitHub and VS Code, available now in Copilot and VS Code). It fleshes out the “Project Polaris” Copilot-engine strategy reported yesterday and is Microsoft’s clearest statement yet that it intends to run its flagship dev tools on its own models. (official, source, source)

    The thing to watch is that Microsoft has put names and availability (“now in Copilot and VS Code”) to MAI-Code-1 but hasn’t published detailed independent benchmarks for the seven-model lineup — so treat the “tuned for GitHub” framing as a positioning claim until third parties test it on real repositories.

    For Software Developers: Since MAI-Code-1 is available now in Copilot and VS Code, you can switch your completion model to it and trial it on a real branch today — a low-stakes way to see how it handles your stack before Microsoft’s own models become the Copilot default.

Coding agents

  • [2026-06-02] GitHub — a standalone GitHub Copilot desktop app (preview) brings agentic development out of the editor: start from an idea, issue, or PR, orchestrate multiple agent sessions in parallel, and keep changes moving through review, CI, and merge from one native surface. It’s the clearest sign yet of Copilot shifting from in-editor assistant toward a place you supervise several long-running agents at once. (official, source)

    For Software Developers: Instead of babysitting one Copilot session in VS Code, you can kick off several agents — say one fixing a flaky test suite, another drafting a migration, a third triaging an issue — and review their PRs as they land, treating the app as a control panel rather than a chat box.

  • [2026-06-02] GitHub — a Copilot CLI refresh (v1.0.58/1.0.59) shipped at Build: a /rubber-duck command that hands the agent’s current plan, design, or tests to an independent critic agent for adversarial feedback (now on by default), prompt scheduling via /every and /after to run a prompt or skill within a session, and voice input — all GA today. (official, source)

    The /rubber-duck critic is the interesting one: a built-in second agent that looks for blind spots and design flaws in the main agent’s work is a cheap, in-loop way to catch the kind of confidently-wrong output that the Sonar survey’s “verification gap” warned about.

    For Software Developers: With prompt scheduling now GA, you can have the CLI run a prompt or skill on a cadence inside a session — /every to re-run your test-and-lint step after each change, or /after to kick a follow-up once a long build finishes — instead of manually re-issuing it each time.

AI-assisted SDLC

  • [2026-06-02] Microsoft — the Surface RTX Spark Dev Box targets sustained local AI workloads: up to 1 petaflop of AI compute and 128GB unified memory, enough to run 120B-parameter models locally within a 100W envelope, pitched for long-running training jobs, agentic AI pipelines, and local fine-tuning. Available later this year in the US. (source, source)

    For teams whose data-residency or cost constraints make cloud inference awkward, a desk-side box that runs 120B models locally changes the calculus on where agentic pipelines and fine-tuning runs can live — though “later this year” and US-only availability mean it’s a plan-ahead item, not something to provision against now.

  • [2026-06-02] Microsoft — Windows AI APIs expand beyond NPUs to GPUs and CPUs, starting with GPU support for Phi Silica and CPU support for video super-resolution and live captions, pushing local AI inference across a much wider range of Windows 11 hardware rather than only Copilot+ PCs. (source, source)

    This matters for fleet rollout: features gated to NPU-only “capable PCs” land unevenly, so opening the same APIs to GPU/CPU means apps built on Windows AI can degrade gracefully to more machines instead of failing on hardware without an NPU.

Watch list

  • MAI model cards & independent benchmarks: Microsoft named the seven-model MAI family but hasn’t published detailed benchmarks; watch for those and for how the August “Project Polaris → GitHub Copilot default” migration actually lands. Vendor coding numbers rarely survive contact with real codebases.

    This becomes significant the moment third parties benchmark MAI-Code-1 on real repositories — that’s when the “tuned for GitHub” framing either holds up or doesn’t, and the August default-engine swap is when those quality questions stop being academic for Copilot users.

  • Surface RTX Spark Dev Box availability: “later this year in the US” via microsoft.com — watch for a firm date, price, and whether availability widens beyond the US.

    Worth watching because a firm price and date are what turn a desk-side 120B-capable box from a keynote spec into something a team can actually budget against as an alternative to cloud inference — and whether it ever reaches beyond the US decides who gets that option.

  • Anthropic developer billing split (June 15): Claude Agent SDK, claude -p, and GitHub Actions usage reportedly move off subscriptions onto a separate monthly credit pool (~$20–$200); Anthropic still hasn’t detailed it on an official page. (source) (unconfirmed)

    Should this land as described, it would change the cost model for anyone running Claude in CI or automation — a subscription seat would no longer cover programmatic usage — so it’s worth pricing out ahead of June 15 even while it stays unconfirmed.

  • MCP spec 2026-07-28 final: stateless core, Tasks, and MCP Apps move to stable; server operators will need to migrate off sticky sessions.

    The significance for operators is the migration work it implies: once the stateless core, Tasks, and MCP Apps are stable, anything relying on sticky sessions needs reworking, so it’s worth auditing your MCP deployments before the date.