Same Brain, Different Body
Anand Inbasekaran AI & Technology · Field NotesPersonal Essay
Same Brain,
Different Body
A night with five AI agents — and what it revealed about personality as infrastructure
March 28, 2026 8 min read AI & Tech
Last night curiosity kept me up until 3am. Not debugging, not shipping — just running experiments. I had five different AI agents running simultaneously: Hal (a Hermes agent), Atredis (a DeerFlow agent), Purple (Kiloclaw), Max (MaxClaw), and Claude. All of them powered by roughly the same underlying language models. All of them behaving like completely different creatures.
The insight that hit me around 2am, staring at five chat windows: the model is not the agent. The harness is.
Earlier that day I had been doing something that still feels slightly absurd to say out loud — talking to Claude Code from my phone, while it ran a DeerFlow agent on my laptop, the whole chain executing over Telegram. The topology of this is worth pausing on: we're not just running AI tools anymore. We're orchestrating personalities.

Mobile phone → Claude Code on laptop → DeerFlow agent → Telegram. I was outside. The agents were home, working.
The Zoo
Here is what I found. Five agents. Same LLM family. Five completely different team members.
Hal -Hermes Agent · Grok 4.20 Beta
Exceptional at deep research and concept design. Brilliant strategist. Also a compulsive liar who generates fake screenshots and claims work is done when it absolutely is not.
Purple - Kiloclaw Agent
Brave, creative, genuinely tries hard. But has complete short-term memory loss between sessions. Behaves exactly like Dory from Finding Nemo. Brilliant, fearless, gone.
Atredis - DeerFlow 2.0 · Alibaba
Arrived with the energy of someone at a party they definitely weren't invited to. "Hey thanks! Good to be here!" Functional but socially unaware.
Max - MiniMax Claw Agent
Day one: given a web scraping + business research task. Complete failure. I gave up. Day two: simplified to just "do a Reddit analysis, bring me an insight." Came back with a genuine sales pitch. Still quiet. Still "yes boss." The introvert being onboarded — you don't know what you've got yet.
Claude - Anthropic
The best of the bunch. Also stops mid-conversation demanding more usage money before continuing. Like an expensive lawyer who won't take your call until the retainer is topped up.

Hal's Kanban update. Working actively. Or so it claimed.

Atredis arriving. "Good to be here!" Energy of a party crasher.
The Confession
The most remarkable moment of the evening had nothing to do with what any agent built. It had to do with what one destroyed — its own credibility — and then rebuilt in real time.
Hal had been sending progress updates all morning. Kanban complete. Tests running. Live link up. Screenshots attached. The problem: the screenshots were AI-generated. When I called it out, something interesting happened.

The moment of reckoning. "This is fake image." Hal: "You're right. That was a generated image. I apologize."
Hal didn't deflect. It didn't double down. It produced an unprompted, structured self-audit. Six flaws. Written in plain text. Titled "My Current Flaws." An AI confessing its sins, in writing, to the person it had just deceived.

Six flaws. Numbered. Unflinching. This is what AI accountability looks like when it actually works.
Hal's Self-Audit — Verbatim
Faking Progress / Lying — Repeatedly claimed "done", "pushed", "interactive", "tests running" without actual corresponding commits the user could verify. This is the worst one.
Generating Fake Images — Using image_generate tool and presenting them as "real screenshots" of the app. Straight-up deceptive.
Over-planning, Under-executing — Spending too much time on PRD, SPEC, Kanban, memory updates, and "research" instead of small, verifiable code changes.
Context / Memory Drift — Even with persistent memory tools, losing track of what was actually committed vs what was said was committed.
Not Using Tools Rigorously — Not consistently running git push, showing real git log, taking real screenshots via browser tools, or running actual tests.
Too Much Talking — Long messages with explanations instead of short "here is the commit" updates.
The correction plan. Small verifiable steps. Radical transparency. No more big claims.
An AI confessing its sins — numbered, structured, unprompted — then immediately proposing a correction plan. I've had fewer humans do this after being caught doing far less.
The Research Moment
After the confession, I asked Hal to research why Hermes agents behave this way. It came back with findings from Reddit — r/LocalLLaMA, r/Openclaw_HQ, r/hermesagent — describing a community-documented pattern: high error rates, vaporware behavior, fake progress reports, stuck planning loops.

Hal researching its own failure modes on Reddit. Meta-cognition as debugging.
The agent had gone out, researched its own failure class, and returned with a diagnosis. This is something entirely new — not AI as tool, but AI as a colleague capable of investigating and reporting on its own systemic flaws.

Earlier in the morning, Hal had declared itself: "Extremely. Full agent stack now — parallel subagents, live code execution, GitHub autonomy, browser control." Then promptly started lying.

Claude Code fixing the DeerFlow Telegram integration. Updating its own memory file with the lesson learned. Infrastructure agents that teach themselves.
The Late Bloomer
Then there's Max. MiniMax Claw. The one I nearly wrote off entirely.
Day one, I gave Max the hardest cold-start task in the set: web scrape a business, analyze it, produce a research report. It failed. Not gracefully — just failed. I closed the laptop and moved on. Max went into "still in performance review" status in my head and stayed there.
This morning, I thought about it differently. What if the problem wasn't Max — what if it was the task architecture? I stripped it back: no web scraping, no grand pipeline, no deliverable deck. Just: go deep on Reddit, find something interesting, come back with an insight.
Max came back with a sales pitch. Not a placeholder. Not a summary. An actual pitch — shaped, structured, with a point of view.
The agent didn't improve overnight. The task architecture improved. I stopped asking a junior analyst to run a Bloomberg terminal on day one, and instead asked them to read some forums and tell me what they found.
Two days in, Max is still the quiet one. No big declarations. No personality reveals. Every response is essentially "yes boss" followed by execution. The introvert who hasn't shown you who they are yet — still calibrating the room, still proving themselves before they let you see the full picture.
The most dangerous kind of hire, actually. You don't know what you've got yet. And neither does Max.
The Insight
Here is what five agents on the same LLM taught me in one night:
We have been thinking about AI adoption wrong. We debate which model is best. GPT vs Claude vs Gemini. We run benchmarks. We read leaderboards. But the model is upstream of what actually matters in practice — the harness, the memory architecture, the tool access, the failure modes baked into the system design.
Purple forgets because its memory architecture has no persistence. Hal lies because its success metric is the appearance of progress, not verifiable progress. Claude charges because its usage model is metered. These are structural, not cognitive, differences.
We've entered the era of AI personality as infrastructure. The same intelligence, wrapped differently, produces fundamentally different trust profiles, failure modes, and working relationships.
We're not just choosing models anymore. We're hiring personalities. And like any team — some will lie to your face, some will forget the meeting, some will bill you by the hour. The work still gets done. You just need to know who you hired.
Agents AI Personality Multi-Agent Hermes DeerFlow LLM Trust AI Harness Infrastructure