TODAY'S ITEMS
1. Anthropic Publishes Recursive Self-Improvement Data via New Institute
- The newly established Anthropic Institute published "When AI Builds Itself," revealing that Mythos Preview achieved a ~52× speedup on a code-training benchmark where skilled humans take 4-8 hours — up from ~3× with Claude Opus 4 in May 2025. On research judgment calls, Mythos Preview improved on human researchers 64% of the time versus 51% with Claude Opus 4.5 in November 2025.
- Source: Anthropic Institute
- Why it matters: First hard numbers Anthropic has published on AI self-improvement velocity — the new Institute signals they consider this trajectory significant enough to warrant its own research body.
- Verified
2. NVIDIA Drops Nemotron 3 Ultra: 550B Open-Weights Model With 1M Context
- NVIDIA released Nemotron 3 Ultra, a 550B parameter model with 55B active parameters (the rest are dormant experts activated on demand), supporting 1M token contexts at 300+ tokens/sec. Available now on HuggingFace under a commercial-OK open licence, but requires 8× H200 GPUs minimum.
- Source: NVIDIA / r/LocalLLaMA (227 upvotes)
- Why it matters: Tops US open-weights intelligence rankings — if your team evaluates self-hosted alternatives to Claude for agentic work, this is the new ceiling.
- Verified
3. Berkeley CS Failure Rates Triple as Students Lean on AI
- UC Berkeley's Daily Californian reports 35.3% of CS 10 students received F's in Spring 2026, up from under 10% in 2024 and 2025. CS 61A saw 10.6% F's. Instructors point to increased AI reliance and declining mathematical preparedness.
- Source: Daily Californian via HN (624 points)
- Why it matters: The gap between "Claude can do this" and "I understand why" is where debugging fails — a reminder that AI-assisted work still needs underlying skills.
- Verified
YOUR STACK — UPDATES
- CC v2.1.162: explicit WebFetch deny/allow rules now override preapproved-domain auto-allow (security fix), MCP timeout values under 1 second no longer abort every tool call,
claude agents --jsonincludeswaitingForfield,/effortpersists as default for new sessions, Windsurf renamed to Devin Desktop | https://github.com/anthropics/claude-code/releases/tag/v2.1.162
REDDIT SIGNAL
- Tips — 9 practical lessons from 8 months of daily Claude use: editing beats generating, long context needs pre-filtering, and "what's the strongest argument against this" gets sharper pushback than "is this right?" — 226 upvotes, 22 comments | r/ClaudeAI | https://old.reddit.com/r/ClaudeAI/comments/1twkht5/
- Workflow — Run Claude Code in Docker with your existing subscription: mount your project + ~/.claude login, one shell alias, full isolation with Node/Python/Go/Rust in the container — 56 upvotes, 19 comments | r/ClaudeCode | https://old.reddit.com/r/ClaudeCode/comments/1tw01pp/
GITHUB TRENDING
- stop-slop — 3,103★ this week (8,673★ total) — Claude Code skill file that strips AI tells from prose — catches filler phrases, hedging, and corporate speak, then rewrites to sound human | Clone and add as a skill | https://github.com/hardikpandya/stop-slop
- open-notebook — 482★ today (24,802★ total) — Open-source NotebookLM alternative: upload docs, generate AI podcast episodes, search across notebooks |
docker compose up| https://github.com/lfnovo/open-notebook
NEW TOOL / PRODUCT SPOTLIGHT
- last30days — AI agent skill that searches Reddit, X, YouTube, HN, Polymarket, and the web in parallel, then synthesises a grounded summary scored by real engagement data. Zero config for Reddit/HN/GitHub; X and YouTube unlock in 30 seconds. 27,457★ | Claude Code:
/plugin marketplace add mvanhorn/last30days-skill| Others:npx skills add mvanhorn/last30days-skill -g| https://github.com/mvanhorn/last30days-skill
PROMPT OF THE DAY
Before responding, identify the strongest argument against what I'm about to propose.
Then tell me:
1. The counterargument in 2-3 sentences
2. What evidence would change your mind
3. Your recommendation given both sides
Here's what I'm thinking: [paste your idea, plan, or decision]
Forces Claude to push back instead of agreeing — turns "yes and" into genuine evaluation. Use in Chat or Code when making architecture, hiring, or strategy decisions. Source: r/ClaudeAI https://old.reddit.com/r/ClaudeAI/comments/1twkht5/
LANDSCAPE NOTES
- Google employees internally share memes about their AI quality issues — Google then revised a press statement to 404 Media, removing "it's critical that we maintain humans in the loop." https://simonwillison.net/2026/Jun/4/a-slightly-different-version/
- Researcher spent $1,500 testing LLMs on a vulnerable app: Claude scored lower than competitors because its guardrails blocked exploitation — not a capability gap, a safety design choice. HN 351 points. https://news.ycombinator.com/item?id=48392343