todai — Saturday, 2 May

REDDIT SIGNAL

Benchmark — Gemma 4 31B wins a one-shot local Pac-Man gamedev contest over Qwen 3.6 27B on M5 Max: shorter output (6K vs 34K tokens), 3m51s vs 18m04s, stronger game logic despite slower token rate — the efficiency vs verbosity question for local code tasks in one experiment. — 683 upvotes, 114 comments | r/LocalLLaMA | https://www.reddit.com/r/LocalLLaMA/comments/1t0epei/

TODAY'S ITEMS

1. Claude Code v2.1.126: Project Purge Command + Container OAuth + 30+ Fixes

claude project purge [path] wipes all Claude Code state for a project (transcripts, tasks, file history, config entry) with dry-run, interactive, and --all flags — the state management command that was always missing. Auth login now accepts pasted OAuth codes in WSL2, SSH tunnels, and devcontainers where browser callbacks can't reach localhost; a fixed race condition was also clearing valid OAuth refresh tokens on concurrent writes.
Source: Claude Code GitHub Releases
Why it matters: Run claude project purge --dry-run on any stale project today to see what state has accumulated — the --interactive flag makes cleanup surgical without nuking your entire config.
Verified

2. Uber Burnt Its Entire 2026 AI Budget on Claude Code in Four Months

Uber's CTO Praveen Neppalli Naga revealed the company burned through its full annual AI allocation by April, with 95% of engineers now using AI tools monthly and individual Claude Code costs running $500–$2,000/person/month. The HN thread (302pts, 314 comments) surfaced three cost tiers: beginners keeping long-lived context sessions, intermediate users over-spawning subagents, and power users running 10+ parallel worktrees — with the top fix being context compaction discipline (cache expires in 5 minutes of inactivity on /loop tasks).
Source: The Information / HN
Why it matters: If you're running Claude Code at team scale, the HN thread on this story has the most practical token-spend framework published this week — skim the top 10 comments before your next sprint planning.
Emerging

3. Apple Shipped CLAUDE.md Files in a Public App Update

A user spotted Claude.md files accidentally bundled in today's Apple app update — visible in the extracted IPA — confirming Apple is using Claude Code internally for app development. The files include project conventions, architecture context, and style rules in the standard Claude Code format.
Source: r/ClaudeCode
Why it matters: Apple's use confirms that the Claude.md format has crossed from developer tool to big-tech engineering standard — if you haven't formalised yours yet, now is the time.
Emerging

LANDSCAPE NOTES

Grok 4.3 launched — xAI's latest model, locked behind a $300/month SuperGrok tier, improved agentic benchmarks. HN 348pts, 458 comments — mostly cost/access frustration. https://news.ycombinator.com/item?id=47972447
Opus 4.7 privacy story — "Opus 4.7 knows the real Kelsey" (HN 443pts, 239 comments): essay on how Anthropic's memory and personalisation layers mean Claude recognises returning users across sessions, raising questions about the boundary between helpfulness and surveillance. Worth reading for the policy angle if you're building products on top of Claude's memory APIs. https://www.theargumentmag.com/p/i-can-never-talk-to-an-ai-anonymously

PROMPT OF THE DAY

You are a cost-aware Claude Code session planner.
Before starting any multi-step task, output a session plan
in this format:

## Task
[one sentence: what we're building or fixing]

## Model selection
- Use claude-opus-4-7 if: I need to be present and quality
  matters most
- Use claude-sonnet-4-5 if: this is batch/automated/polling work

## Context strategy
- Fresh session or existing? [recommend one with reason]
- Context compaction trigger: compact when context >60% full
- Loop stop condition (if using /loop): [explicit condition]

## Subagent budget
- Max parallel subagents: [number with reason]
- Each subagent scope: [what each one handles]

## Estimated token spend
- Low / Medium / High — [one-sentence reason]

Now apply this to the following task: [paste your task here]

Pre-task cost planning before any automated or multi-step Claude Code session — especially useful before /loop commands or multi-worktree runs. Adapted from the HN discussion on the Uber AI budget story: https://news.ycombinator.com/item?id=47976415

TRY THIS WEEKEND

Item 1 (required): Set up mattpocock's skills pack in a real project. Run /setup-matt-pocock-skills inside Claude Code — it walks you through issue tracker selection (GitHub, Linear, or local files) and configures 17 workflow skills (PRD writing, TDD, issue triage, codebase architecture, git guardrails). Takes ~30 minutes, and the triage + refactor skills alone will change how you handle Monday's backlog.

Repo: https://github.com/mattpocock/skills

TOOL OF THE WEEK

browserbase/skills — Claude Agent SDK with web browsing built in. Wraps Playwright under the hood so Claude can visit URLs, click, extract, and act on live web content inside a skill. Install: npm install @browserbasehq/sdk | https://github.com/browserbase/skills