I once accidentally spawned 60 Claude agents at the same time.
My machine started grinding to a halt. Each agent was spinning up its own exploration subagents, which were spinning up their own subagents—exponentially multiplying like some kind of AI hydra. I needed to grep and kill every Claude process, but I couldn't remember the exact command. So I opened a new terminal tab and asked Claude for help.
Except Claude was now slow as hell because my entire system was choking on agent processes. I sat there watching my computer crawl, genuinely worried I was about to cook my machine and burn through my API credits in one catastrophic cascade.
I did eventually kill them all. And then I kept building.
Because despite the chaos, I'd stumbled onto something that fundamentally changed how I work: async agents unlock long autonomous orchestration. The ability to hand off complex, multi-step workflows and walk away for 15+ minutes while the system works—without drifting into chaos.
"Async agents enable sustained trust: hand off work and enter flow state elsewhere for 15+ minutes."
Flow state requires this sustained trust.
The Blocking Problem
Before async agents, I was doing batch parallelism. The workflow went like this:
- Create a comprehensive plan with tasks organized into batches
- Each batch contains independent tasks that can run in parallel
- Instruct Claude to spawn agents for all tasks in batch 1 simultaneously
- Wait for entire batch to complete
- Move to batch 2, repeat
Claude spawns multiple agents in parallel as part of the same group of function calls—this is where the real compute advantage comes from. More agents running simultaneously means more work done faster.
But it's still fundamentally blocking.
You can't start batch 2 until batch 1 finishes, even if most of batch 1 is done and just waiting on one slow agent. The workflow waits. Claude waits. I wait.
The entire orchestration becomes as slow as the slowest agent in each batch—and I can't really do anything else because I'm context-locked to this task, ready to jump in when the next batch starts.
This is the fundamental constraint of synchronous orchestration: it demands continuous attention even when no attention is needed.
Enter Async
Async agents solve this by making "waiting" productive.
Instead of blocking on agent completion, Claude spawns agents and actively monitors their progress through a combination of:
./agent-responses/await {agent_id}- Block and wait for specific agents when neededsleep 30 && Read agent-responses/{agent_id}.md- Non-blocking progress checks- Continuous orchestration logic that adapts based on agent outputs
The orchestrator agent stays active, making decisions, spawning new agents, handling blockers—all while other agents continue working in the background.
This unlocks something critical: I can walk away.
Not just for 30 seconds. For 15+ minutes. Because the orchestrator is handling the complexity, the dependency management, the error recovery. It doesn't need me sitting there ready to respond to every batch completion.
The Trust Threshold
Here's what changed: I can now delegate a complex refactor—say, fixing 15 type errors across 8 files—and trust that when I come back, the work is done correctly.
The orchestrator spawned specialized agents. It monitored their progress. When one hit a blocker, it analyzed the issue and spawned a fixer agent. When the build failed, it read the errors, identified the root cause, and delegated the fix.
All without me.
That's the trust threshold: can I enter flow state on something else and return to completed work?
With async orchestration, the answer is yes.
Implementation Reality
Building this wasn't about clever prompts or fancy tooling. It was about creating the right primitives:
- Agent outputs as files: Every agent writes its progress to
agent-responses/{agent_id}.md - Blocking and non-blocking monitors:
awaitfor dependencies,sleep + Readfor progress checks - Active orchestrator role: Never spawns agents and exits—stays engaged until work completes
- Build validation: Always runs the build after implementation, delegates fixes if needed
- Error recovery protocol: One mitigation attempt per blocker before escalating to user
These aren't sophisticated. They're boring primitives that compose into surprisingly powerful orchestration.
What This Unlocks
I'm now running workflows that would have been impossible with synchronous batch parallelism:
- Full codebase refactors with 30+ file changes
- Feature implementations requiring investigation, planning, implementation, and validation
- Code smell discovery and remediation across multiple subsystems
- Multi-phase workflows with complex dependencies
All while I'm in flow state on something completely different.
The system doesn't demand my attention every 3 minutes. It works autonomously, fails gracefully, and only interrupts when it genuinely needs human judgment.
The Catch
This only works because I'm willing to accept that it might break things.
I'm in pre-production mode—building fast, iterating quickly, knowing that a broken build is fixable. The orchestrator doesn't have perfect judgment. Sometimes it makes the wrong call. Sometimes it fixes one error and introduces another.
But that's the trade-off: velocity over perfection.
And honestly? The error rate is lower than I expected. Most runs complete successfully. When they fail, they fail with clear context about what went wrong and why.
What's Next
I'm going to keep building on this foundation. There's a lot more to explore:
- Better error recovery strategies
- More sophisticated dependency management
- Adaptive orchestration that learns from past runs
- Integration with other tools in my workflow
But the core insight stands: async agents enable sustained autonomous work at a scale that fundamentally changes the human-AI collaboration model.
From "I'm working with AI" to "AI is working while I'm doing something else."
That's the game changer.