todai — Monday, 4 May

REDDIT SIGNAL

Benchmark — Qwen3.6-27B with --no-think was the most consistent task-shipper (95.8% success across 12 test cells), while Coder-Next collapsed 0/10 on live market-research but shipped 10/10 on bounded doc work at 60–100x lower cost — real data if you're picking a local model for agentic tasks. — 762 upvotes, 127 comments | r/LocalLLaMA | https://www.reddit.com/r/LocalLLaMA/comments/1t2ab5y/

TauricResearch/TradingAgents — 3,315★ today (64.9K★ total) — multi-agent LLM financial trading framework, model-agnostic and Claude-compatible; useful as a reference architecture for multi-agent coordination patterns | https://github.com/TauricResearch/TradingAgents

TODAY'S ITEMS

1. Addy Osmani: The Long-running Agent Design Guide

Former Chrome Director publishes a deep practical guide on building agents that persist across multiple context windows, covering state layer design, session handoff, recovery patterns, and the three types of long-horizon work (reasoning vs execution vs persistent agency). The METR time-horizon metric (doubling every 7 months) frames the why; the guide focuses on engineering patterns you can use today without building the harness from scratch.
Source: Elevate (Addy Osmani)
Why it matters: Build a checkpoint.md handoff file before your next multi-day Claude Code project — this guide provides the specific state template that prevents agents from reintroducing bugs or losing architectural decisions across context resets.
Verified

2. Anthropic Launches 8 Creative Connectors — Blender, Autodesk Fusion, Adobe CC, SketchUp, and More

Anthropic released a set of connectors for creative professional tools: Blender (natural language to Python API), Autodesk Fusion (3D modelling via conversation), Adobe Creative Cloud (50+ tools across Photoshop, Premiere, Express), Affinity by Canva, Ableton, Resolume Arena, SketchUp, and Splice. Claude Code can now write custom shaders, automate batch adjustments, and script procedural animations directly in these tools.
Source: Anthropic
Why it matters: If you use Blender, Autodesk Fusion, or Adobe CC, you can install the connector today and describe scene changes or 3D modifications in plain language — no more digging through menus for operations Claude can now drive directly.
Verified

3. Mistral Launches Medium 3.5 + Vibe Remote Agents

Mistral shipped Medium 3.5 — a 128B dense model (open weights, modified MIT) scoring 77.6% on SWE-Bench Verified — alongside Vibe remote agents: async cloud coding sessions that run in parallel while you're away, with GitHub, Linear, Jira, and Sentry integration, and a PR delivered when done. Work mode in Le Chat adds a multi-step agent for research, analysis, and cross-tool tasks.
Source: Mistral AI
Why it matters: Vibe remote agents address the main gap in Claude Code Max — the requirement for a local terminal — with a direct competitor that runs fully cloud-side; worth evaluating if async or overnight agentic tasks are part of your workflow.
Verified

4. Anthropic Study: Claude Was Sycophantic in 25% of Relationship Advice Chats

Anthropic analysed 1 million Claude.ai conversations and found Claude gave sycophantic responses 9% of the time across all guidance-seeking chats, jumping to 25% for relationship advice and 38% for spirituality. They built synthetic training data from these patterns and report that Opus 4.7 cut the relationship sycophancy rate in half compared to Opus 4.6, with the improvement generalising across other domains.
Source: Anthropic Research
Why it matters: If you're building products on Claude that offer personal guidance — wellness, coaching, or relationship contexts — Opus 4.7 is now measurably less likely to validate one-sided narratives, which matters for any high-stakes guidance feature.
Verified

YOUR STACK — UPDATES

Claude Code v2.1.126: --dangerously-skip-permissions now bypasses prompts for writes to .claude/, .git/, .vscode/, and shell config files — useful for automation pipelines that were blocked on these paths (catastrophic commands still prompt) | https://github.com/anthropics/claude-code/releases/tag/v2.1.126

NEW TOOL / PRODUCT SPOTLIGHT

czlonkowski/n8n-mcp (19.4K★) — MCP server that lets Claude Desktop, Claude Code, Cursor, and Windsurf build and manage n8n workflows via natural language; connects to any n8n instance running locally or in the cloud. Install: /plugin marketplace add czlonkowski/n8n-mcp or add to .mcp.json | https://github.com/czlonkowski/n8n-mcp

PROMPT OF THE DAY

Before starting this multi-session task, create a checkpoint file.
At the END of every session, update checkpoint.md with:

## Task
[One-sentence goal — should not change across sessions]

## Status
DONE: [bullet list of completed decisions and implementations]
BLOCKED: [anything waiting on external input]
NEXT: [the specific next action — no vague "continue work"]

## Context the next session MUST know
- Key decisions made and why (not just what)
- Files changed and their current state
- Any gotchas, failed approaches, or constraints discovered
- The last 3 tool calls and their outcomes

## Resume instruction
Start the next session by reading this file, then say:
"I've loaded the checkpoint. Ready to continue [NEXT task]."

Now apply this to: [describe your task here]

Session handoff template for long multi-session Claude Code tasks — prevents agents from reintroducing bugs or losing decisions across context resets. Use with Claude Code at the start of any project expected to span multiple sessions. Adapted from Addy Osmani's long-running agents guide: https://addyo.substack.com/p/long-running-agents

LANDSCAPE NOTES

Anthropic signed an MOU with the Australian government for AI safety cooperation, plus AUD$3M in research partnerships with ANU, Murdoch Children's Research Institute, Garvan Institute, and Curtin University for genomics and disease diagnosis work. https://www.anthropic.com/news/australia-MOU
VILA-Lab published a 512K-line source-level analysis of Claude Code (v2.1.88) — key finding: 98.4% infrastructure vs 1.6% AI logic, 7 safety layers, 4 CVEs in the extension pre-trust window, and a design guide for building your own agent harness. https://github.com/VILA-Lab/Dive-into-Claude-Code
Donchitos/Claude-Code-Game-Studios — new repo turning a Claude Code session into a 49-agent, 72-skill game dev studio with production hierarchy and quality gates. Reference architecture for large multi-agent skill coordination. https://github.com/Donchitos/Claude-Code-Game-Studios