An AI agent is just a model in a loop with tools and a goal. If you can describe that loop precisely, you can build it. Here is how to build an AI agent the practical way, without the hype.
Step 1: Pick a model that fits the job
The model is the reasoning engine, not the whole agent. Pick based on three things: tool-calling quality, latency, and cost per task, not on benchmark leaderboards.
- Reasoning-heavy work (multi-step planning, code, ambiguous instructions): reach for a frontier model like Claude Opus, GPT-5, or Gemini 2.5 Pro.
- High-volume, well-scoped tasks (classification, extraction, routing): a smaller model like Claude Haiku or GPT-5 mini is cheaper and faster, and the quality gap often disappears once the prompt is tight.
- Local or private: Llama or Qwen variants run on your own hardware when data cannot leave the building.
A useful pattern is a two-tier setup: a cheap model handles routing and simple turns, and you escalate to the expensive model only when a turn actually needs it. Start with one strong model, get it working, then optimize for cost.
Step 2: Give it tools
Tools are what separate an agent from a chatbot. A tool is a function the model can call: search a database, hit an API, run code, send an email. You expose each one with a name, a description, and a typed schema for its arguments, and the model decides when to call it.
Most providers implement this as function calling or tool use. The contract is the same everywhere: you describe the tool, the model returns a structured call, your code runs it, and you feed the result back. The Model Context Protocol (MCP) standardizes this so the same tool server works across clients.
Three rules that save you pain:
- Write descriptions for the model, not for humans. "Returns the current order status for a given order_id" beats "order service endpoint." The description is the only thing the model sees.
- Keep tools narrow. Ten focused tools beat one tool with a mode flag. Narrow tools mean fewer wrong calls.
- Return clean, structured results. If a tool fails, return a clear error string the model can reason about, not a raw stack trace.
Step 3: Add memory
The model itself is stateless. Every call starts cold, so memory is something you build around it. There are two kinds and you usually need both.
Short-term memory
This is the conversation so far: the running list of messages, tool calls, and tool results in the context window. The hard part is that context is finite. Once a session grows long, you compact it: summarize older turns into a short recap, keep the last few turns verbatim, and drop redundant tool output. Done well, the agent stays coherent across a long task without blowing the token budget.
Long-term memory
This persists across sessions. The common approach is retrieval: store facts, documents, or past decisions in a vector store like pgvector, Pinecone, or Weaviate, then fetch the few most relevant chunks and inject them into context at the start of a turn. For durable facts like user preferences or account details, a plain database row is simpler and more reliable than embeddings. Use retrieval for fuzzy recall, use a database for facts you must get exactly right.
Step 4: Close the loop
This is the core of how to build an AI agent. A chatbot answers once. An agent runs a loop until the goal is met:
- Send the model the goal, the conversation, and the available tools.
- The model either responds with a final answer or requests a tool call.
- If it requests a tool, run it, append the result to the conversation, and loop again.
- Repeat until the model returns a final answer or you hit a stopping condition.
This pattern is often called ReAct: the model reasons, acts via a tool, observes the result, and reasons again. Frameworks like LangGraph, the OpenAI Agents SDK, and Claude's agent tooling implement it for you, but you can write the loop in about forty lines yourself, and doing so once teaches you more than any framework.
The detail people skip: always set a maximum iteration count. Without it, a confused agent will loop forever, burning tokens. Cap it at something like ten turns and surface a clear failure when it hits the ceiling.
Step 5: Add guardrails
An agent with tool access can do real damage. Guardrails are non-negotiable before anything touches production.
- Validate every tool input. Never let the model pass an unbounded value straight into a SQL query, a shell command, or a payment API. Check types, ranges, and allowlists in your code, not in the prompt.
- Gate destructive actions. Deleting data, sending money, emailing customers: require human approval or a confirmation step. The model proposes, a human or a strict rule disposes.
- Scope permissions tightly. Give the agent read-only credentials unless it genuinely needs to write. Run untrusted code in a sandbox.
- Log everything. Record each tool call and result so you can replay what happened when an agent misbehaves, which it will.
- Set hard budgets. Cap tokens, tool calls, and spend per task so a runaway loop fails loudly instead of quietly draining your account.
Treat the model as an untrusted user that happens to be clever. Prompt injection is real: a malicious document or web page can hijack an agent's instructions, so the safety has to live in your code, never only in the system prompt.
Putting it together
Start tiny. One model, one tool, a loop with a hard iteration cap, and input validation on that single tool. Get that running end to end before you add a second tool or any memory. Most failed agent projects tried to build all five layers at once. The ones that ship grow a working loop one tool at a time.