Stop Shipping ChatGPT Wrappers. Ship an Agent in TypeScript, or Don’t Bother.
Most "AI features" I see today are a chat box taped onto an app.
That's not a product. That's a demo with a fancy UI.
If your "assistant" can't do real work, with real constraints, it's just vibes. And vibes don't reconcile invoices.
I've built systems where "almost correct" equals "wrong". Fintech is not forgiving. Your on-call future self is not forgiving. So let's draw the line between what actually works and what just looks good in a demo.
What Actually Separates a Toy from an Agent
Here's the problem I keep seeing.
Someone builds a chat interface, connects it to ChatGPT's API, and calls it an "AI agent". But it can't actually do anything.
A toy:
- Talks nicely.
- Hallucinates confidently.
- Has no tools.
- Has no audit trail.
- Breaks silently.
An agent:
- Uses tools with contracts.
- Fails loudly.
- Logs everything.
- Can be stopped.
- Can be rolled back.
If your thing is not the second one, don't call it an agent. Call it "chat". No shame. Just name it right.
The difference isn't cosmetic. It's architectural. A chat box responds. An agent acts. And action requires boundaries, validation, and control.
Why Tools First, Prompts Second (Even When It Feels Slower)
People start with prompts because it feels fast.
I get it. London taught me to move quickly. Ship fast, iterate faster.
But Melbourne taught me something better: move quickly in the right direction. The right direction starts with a **tool contract** that your code owns, not your prompts.
Here's why this matters.
Prompting is content. It changes with every model update, every context shift, every edge case you discover. Tooling is architecture. It defines what your agent can actually do, and it doesn't change just because you switched from GPT-4 to Claude.
When you start with tools:
You force clarity Before you write a single prompt, you must define what actions are possible. This exposes gaps in your thinking early.
You can test behaviors without the model. I can write unit tests for my tool validation logic. I can't write unit tests for "will the model understand this prompt correctly".
You can reason about security. If the model wants to delete a database row, it calls `deleteDatabaseRow`. That function has explicit permissions, logging, and validation. The model doesn't get to invent a new attack vector.
You can ship something that survives production traffic. Prompts fail. Tools either work or they don't. The binary nature makes debugging possible.
I learned this the hard way.
I once built an agent that used prompts to "understand" what API endpoint to call. It worked great in testing. In production, it started calling endpoints that didn't exist. The prompt said "call the user endpoint" and the model interpreted that as "call user-delete-all". Not great.
Now I define the tools first. The model picks from a menu. No inventing actions. No hallucinations about what's possible.
How TypeScript Becomes Your Safety Net (Not Just Nice-to-Have)
TypeScript is not "nice to have" here.
It's the only way I've found to keep agent integrations from becoming spaghetti. The agent is the least reliable part of your system. So your boundaries must be strict.
Here's what types give you that JavaScript with "any" everywhere cannot:
Hard edges between "model output" and "app actions". The model returns a string. That string might say "delete user 123". But TypeScript ensures that string goes through a parser, validation, and only then reaches your actual delete function. There's no direct path from model output to dangerous operations.
Compile-time guard rails in code review. When I review agent code, I'm not just reading prompts. I'm seeing the type definitions. I can spot when someone tries to pass unvalidated input to a tool. The compiler catches it before it hits production.
A place to document reality, not hopes. Types are executable documentation. This tool expects a userId (string) and a reason (string), not "whatever the model thinks might work".
A sane refactor path when the tools inevitably evolve. When I need to add a new parameter to a tool, TypeScript shows me every place that calls it. I can't miss an update. In JavaScript, I'd discover the missing parameter when users complain.
If you're building agents in JavaScript with "any" everywhere, you're gambling.
For TypeScript best practices that prevent this, check out essential coding principles for better code quality.
The Agent Loop That Actually Works (With Why Each Part Matters)
Here's the shape I trust. Not because it's pretty. Because it's controllable.
Let me show you the code, then explain why each decision matters.
type ToolResult =
| { ok: true; data: unknown }
| { ok: false; error: { code: string; message: string } };
type Tool<Input> = {
name: string;
description: string;
validate: (input: unknown) => input is Input;
run: (input: Input, ctx: { requestId: string; actorId: string }) => Promise<ToolResult>;
};
type AgentAction =
| { type: "tool"; toolName: string; input: unknown }
| { type: "final"; message: string };
async function agentLoop(args: {
model: (prompt: string) => Promise<AgentAction>;
tools: Record<string, Tool<any>>;
prompt: string;
ctx: { requestId: string; actorId: string };
maxSteps: number;
abortSignal?: AbortSignal;
}) {
const transcript: Array<{ role: "system" | "agent" | "tool"; content: string }> = [
{ role: "system", content: args.prompt },
];
for (let step = 0; step < args.maxSteps; step++) {
if (args.abortSignal?.aborted) {
return { kind: "aborted" as const, transcript };
}
const action = await args.model(renderTranscript(transcript));
if (action.type === "final") {
transcript.push({ role: "agent", content: action.message });
return { kind: "done" as const, transcript };
}
const tool = args.tools[action.toolName];
if (!tool) {
transcript.push({ role: "agent", content: `Unknown tool: ${action.toolName}` });
return { kind: "failed" as const, transcript };
}
if (!tool.validate(action.input)) {
transcript.push({ role: "agent", content: `Invalid input for tool: ${tool.name}` });
return { kind: "failed" as const, transcript };
}
const result = await tool.run(action.input, args.ctx);
transcript.push({ role: "tool", content: JSON.stringify({ tool: tool.name, result }) });
if (!result.ok) {
return { kind: "failed" as const, transcript };
}
}
return { kind: "stopped" as const, transcript };
}
function renderTranscript(
t: Array<{ role: "system" | "agent" | "tool"; content: string }>,
): string {
return t.map((m) => `[${m.role}] ${m.content}`).join("\n");
}Now let me break down why this works:
The model never touches your database directly. Every action goes through a tool. The tool validates, logs, and executes. The model can't invent a new way to access data. It's constrained to the tools you've defined.
The tool input is validated before execution. The model might output `{ toolName: "deleteUser", input: "all" }`. But the validate function checks if that input matches the expected type. If it doesn't, the tool never runs. The agent gets feedback, adjusts, and tries again.
You cap steps. Infinite loops are not your friend. If the agent hasn't completed its task in 10 steps, something's wrong. Stop, log it, and alert someone.
You can abort. Maybe the user clicked cancel. Maybe you detected an anomaly. Maybe you just need to shut everything down. The abort signal gives you an escape hatch.
You log the transcript. Every interaction, every tool call, every result gets recorded. When something breaks at 2 AM, you have the full conversation history. You're not debugging blind.
This is where "agent" stops being a magic trick and becomes a controllable system.
The Production Non-Negotiables (What You Need Before You Sleep)
If you're building an agent that touches production systems, these aren't optional. They're the difference between a controlled system and a time bomb.
The Kill Switch: Your Emergency Exit
If you can't stop it, you don't control it.
Why this matters: I once watched an agent start making thousands of API calls because of a prompt that accidentally created a loop. We had to manually kill the process, restore from backup, and explain to the client why their system was down. With a kill switch, I'd have stopped it in 30 seconds.
Idempotency: Because Agents Repeat Themselves
Agents repeat themselves. They retry. They panic. They get the same instruction twice.
Your tools must be idempotent or guarded by idempotency keys. If they are not, your agent will double-charge or double-email. Enjoy that incident review.
Here's what I mean: If an agent calls "send email to user", and it gets called twice (maybe due to a retry, maybe due to a bug), it should send one email, not two. Use idempotency keys. Check if this exact operation already completed. If yes, return the previous result. If no, execute and store the key.
Timeouts and Budgets: Preventing Denial-of-Wallet Attacks
Give every tool:
- A timeout (don't wait forever for an external API)
- A cost budget (token budget and money budget)
- A rate limit (don't hammer APIs)
- A concurrency cap (yes, even if you think you don't need it)
If you don't do this, your "assistant" becomes a denial-of-wallet attack on yourself.
I learned this when an agent started making API calls in a tight loop. Each call cost $0.01. The loop ran 10,000 times. That's $100 in 5 minutes. Not great. Now every tool has hard limits.
Audit Trail: For When Things Go Wrong
Every tool call should be logged with:
- actorId (who triggered this)
- requestId (trace this specific flow)
- toolName (what was called)
- validated input (what you actually passed)
- result (what happened)
Not for compliance theatre. For debugging reality.
When a user complains "the agent charged my card twice", the audit trail tells me exactly what happened. I can see both tool calls, their timestamps, their inputs, their results. I can fix the bug and refund the user without guessing.
Human Approval: The Safety Net You Actually Need
Let the agent propose. Let a human approve anything that moves money, changes permissions, or sends external messages.
I love automation. I also love not being on the front page for the wrong reason.
Here's the rule: If the action is irreversible or has external consequences, it requires approval. The agent can suggest "refund customer $500". A human reviews and approves. The agent can suggest "send email to customer". That can be automatic. But "send email to all customers"? That needs approval.
My Spicy Takes on Your Favorite Stack (And Why They Matter)
Let's talk about the tools you're probably using, and when they help vs. when they hurt agent development.
GraphQL: Great for Clients, Terrible for Agent Brains
I like GraphQL for client-driven shapes. I do not like GraphQL as the "brain connector" for agents.
Here's why: Agents need explicit operations with strict inputs. A GraphQL schema can be too flexible for this. Flexibility is how you get weird tool calls that "seemed valid" but were not intended.
The model might see a GraphQL schema and think "I can call userDelete with any input!" But you actually want to restrict who can delete users. In a GraphQL resolver, you'd handle that with authorization logic. But the agent doesn't see your resolver code. It just sees the schema.
With explicit tool definitions, you control exactly what's possible. The model can't invent new operations.
If you're building GraphQL servers for other use cases, here's how I build them with Apollo, Prisma, and TypeScript
React: Fine, But Don't Build a Therapist UI
React is fine. But don't build a therapist chat UI and call it automation.
If your UI doesn't show:
- what tools ran
- what changed
- what is pending approval
- what failed (with a human sentence, not a stack trace)
then you shipped a conversation, not a workflow.
Here's what I mean: A chat interface shows "Agent: I'll process your refund." That's nice. But what did it actually do? Did it call the refund API? Did it fail? Is it waiting for approval? The user has no idea.
Show the workflow. Show the tool calls. Show the state. Let users see what's actually happening, not just chat messages.
If you need proper error boundaries in React, here's how to implement them in TypeScript.
AWS: Primitives Without Taste Are Dangerous
AWS gives you primitives. You still need taste.
If your agent touches production systems, isolate it:
- separate roles (don't reuse production IAM roles)
- separate queues (don't mix agent tasks with user tasks)
- separate rate limits (agents can be chatty)
- separate dashboards (so incidents don't become a treasure hunt)
Least privilege is not a checkbox. It's a lifestyle.
I've seen agents get compromised (via prompt injection) and then use the production AWS role to access sensitive data. Don't let that happen. Give agents the minimum permissions they need. Nothing more.
MicroFrontEnds: Not Worth the Complexity Here
MicroFrontEnds are not a free lunch. They can be worth it at scale, I've done it at Mecca Brands and other places.
But for agent features, keep the UI simple. You want one clear "what happened" surface. Complex UI boundaries hide mistakes.
When an agent fails, you need to see the full flow. If that flow spans multiple microfrontends, debugging becomes a nightmare. You're jumping between codebases, checking logs in multiple places, trying to reconstruct what happened.
Keep it simple. One codebase. One UI. One source of truth.
If you want to understand when MicroFrontEnds make sense, I wrote about building scalable web apps with single-spa and microfrontends
The Uncomfortable Truth: Slow Understanding Beats Fast Shipping
Here's something I've learned the hard way.
You can build an agent workflow in two hours with a coding agent. You can also spend two weeks understanding what you built.
I'm not saying slow is always better. I'm saying understanding is always better.
This connects to what I learned about AI-augmented development — tools are powerful, but understanding what they produce is what separates demos from production.
When you accept agent suggestions blindly, you build a black box. When that black box breaks at 3 AM, you're back to square one.
I've seen this play out. Someone ships fast. Everything works in the demo. Then production hits. Timezone bugs they don't understand. Database migrations that shouldn't exist. Tool calls that "seemed valid" but weren't intended.
The problem isn't the speed. It's the lack of mental models.
When you go slow enough to understand:
- You can debug intelligently, not just guess
- You can spot when implementation doesn't match intent
- You can have technical conversations without deferring to "whatever the engineer thinks"
- You can make better product decisions because you see the trade-offs
I'm not asking you to become an engineer. I'm asking you to build enough intuition to make better decisions.
The goal isn't to write all the code yourself. The goal is to know when to push back on technical suggestions.
Real Trade-Offs I've Pushed Back On (And What I Learned)
Let me show you what I mean with actual examples. These aren't theoretical. These are decisions I made, pushed back on, and learned from.
DateTime vs Free Text: When "Flexibility" Becomes Technical Debt
Someone suggested storing timeslots as free text for "flexibility". I pushed back.
Here's why free text is dangerous:
- Can't do time-based logic (like calculating RSVP deadlines)
- Can't validate "Sat 11 Oct" - which year?
- Can't query "find gatherings in the next 7 days"
- Agent gets messy data to interpret
We used DateTime instead. One-line UX change, but it touched multiple components. Form validation, date calculation logic, user-facing labels.
A simple UX improvement required understanding the full stack impact. That's the reality of building real products.
The lesson: When someone says "let's make it flexible", ask what that flexibility costs. Sometimes the cost is worth it. Often it's not.
Test Code vs Production Code: Knowing the Difference
During testing, we used hardcoded UUIDs. Someone suggested adding a UUID input field to the frontend.
That's throwaway test code. Not production architecture.
Instead, we built the proper flow: CreateGathering form → generate UUID → route to gathering page.
This taught me to distinguish between "make it work" and "make it right". The test code worked. But it wasn't right. It wasn't secure. It wasn't maintainable.
The lesson: If you can't explain why a piece of code exists, question it. Test code has a purpose: testing. Production code has a purpose: serving users safely.
Structured vs Unstructured Data: The Validation Nightmare
I wanted structured cuisine and dietary options with an "Other" catchall. Seems simple, right?
This is mixing structured and unstructured data. It requires different validation. Ideally, two different database columns. The user-friendly choice became a backend headache.
Now I understand why engineers say "that simple change isn't so simple".
The lesson: Every "simple" UX decision has technical implications. Understanding those implications is what separates good product decisions from bad ones.
These aren't coding wins. These are decision-making wins. I learned to ask: What breaks if we do it this way? What's the user impact? Is this test code or production code?
What to Build First (Before the Fancy Stuff)
If I had one week to build an agent system, here's what I'd build first:
- Tool registry with validation - Define what's possible before you write any prompts
- Agent loop with step caps, abort, transcript - The control structure that keeps everything sane
- A "dry run" mode - Test tool calls without actually executing them
- Approval gates for risky actions - The safety net before you need it
Then I'd ship.
This is how you avoid the trap of endless prompt tweaking. Prompts change. Contracts stay.
Focus on the architecture first. Get the boundaries right. Then worry about making the prompts better.
Good Abstractions vs Bad Abstractions (And How to Tell the Difference)
Here's the thing about frameworks.
Good abstractions handle application logic:
- Telemetry and observability
- State management
- Common complexity you'd write anyway
Bad abstractions hide what you need insight into:
- Prompt construction (you need to see what's being sent)
- Tool execution flow (you need to debug failures)
- Error handling (you need to understand failure modes)
This connects to the advanced complexity of web engineering — knowing when abstractions help and when they hurt is a critical skill.
LangChain is a perfect example of the second kind. Too many layers between you and what's happening. When it breaks, you're debugging the framework, not your code.
I'm not against frameworks. I'm against frameworks that make the unpredictable parts more opaque.
If your abstraction makes debugging harder, it's a bad abstraction. If your abstraction makes reasoning about failure modes impossible, it's a bad abstraction.
LLMs are already the least reliable part of your system. Don't add more unreliable layers on top.
Build with libraries, not frameworks. Keep control of the flow. Make it easy to swap components without rewriting everything.
This isn't "not invented here" syndrome. This is "I need to understand what's happening" syndrome.
Systems Thinking: Why AI Is Just One Component
Here's what changed how I think.
AI is not the system. AI is one component in the system.
The system includes:
- Your database schema and migrations
- Your API endpoints and routing
- Your authentication and authorization
- Your error handling and retries
- Your monitoring and alerting
- Your deployment pipeline
AI is the unpredictable component. Everything else should be boring and reliable.
This is the foundation of how to design agentic systems for production — understanding that agents are interfaces, orchestration patterns, and observability, not just LLM calls.
When you understand the whole system:
- You can evaluate technical trade-offs engineers face
- You can understand why "just use an agent" isn't always the answer
- You can appreciate the 80% of work that makes AI products production-ready
- You can ask better questions about architecture, not just prompts
This is systems thinking. Not coding ability.
For more on making these trade-offs at scale, see system scaling strategies: horizontal vs vertical.
You don't need to write all the code. You need to understand how the pieces fit together.
What I Tell Engineers Who Want to Build Real Agents
If you want to grow fast, stop chasing model tricks.
Work on the boring edges. Build the validation layer. Build the logs. Build the kill switch. Build the approval gate.
Anybody can build a demo. Not everybody can build a system that survives a bad day.
I teach this mindset in Maktabkhooneh and Taraz style classes: "design for failure, or it will design you". And in communities like PersiaJS and ClubCP, I push people to share post-mortems, not screenshots.
Here's the honest part.
Two weeks in, you still won't build an app from scratch without assistance. You don't need all the concepts fully absorbed. Some days you'll worry you're just 0.5cm below the surface.
But here's what changes:
You go from accepting "the build failed" as a mysterious black box to investigating which TypeScript error, which environment variable was missing, and why the database migration failed.
You go from nodding along when engineers say "we need a migration" to understanding when schema changes require them and when they don't.
The goal wasn't to become an engineer in 2 weeks. The goal is to build enough mental models to make better product decisions.
What's Next (If You Want to Go Deeper)
This post is the foundation. If you want, I'll follow up with:
- an MCP-style tool server pattern in Node.js
- a testing strategy with fake tools and deterministic transcripts
- a real example: "invoice helper" that never mutates state without approval
- a React UI pattern that shows tool calls, approvals, and failures without hiding the truth
This is the stuff I rant about on **Gazar Breakpoint**. And it's the kind of weekly discipline I bake into Monday by Gazar. It's also what I prototype inside Digitwin Lab, then stress-test with decision workflows in BetterBoard. If you want the shorter, sharper version, I'll also drop it as a Noghte Vorood note.
If your "AI feature" can't pass these constraints, call it what it is and move on. If it can, congrats. You're building the real thing.
