Building AI Agents with LangChain and LangGraph: From Prototype to Production
Building AI agents with LangChain and LangGraph has become the dominant approach for creating autonomous AI systems that can reason, plan, and execute multi-step tasks. Unlike simple prompt-response chatbots, agents use LLMs as reasoning engines that decide which tools to call, how to process results, and when to ask for human input. Therefore, mastering agent development is essential for any team building intelligent automation in 2026. This guide covers the complete journey from basic ReAct agents to sophisticated multi-agent systems with human-in-the-loop controls, persistent memory, and production deployment patterns.
The agent paradigm represents a fundamental shift from programmatic workflows to AI-driven orchestration. In traditional software, developers write explicit code paths for every scenario. With agents, the LLM dynamically decides the execution plan based on the user’s request and available tools. Moreover, agents can handle ambiguous requests, recover from errors, and adapt their approach when initial strategies fail. However, this flexibility comes with challenges — agents can hallucinate tool calls, enter infinite loops, or make expensive mistakes without proper guardrails.
Agent Architectures: ReAct, Plan-and-Execute, and Multi-Agent
The ReAct (Reasoning + Acting) pattern is the simplest and most common agent architecture. The LLM receives a prompt with available tools, thinks about what to do, calls a tool, observes the result, and repeats until the task is complete. Furthermore, ReAct agents work well for straightforward tasks with 2-5 tool calls. However, they struggle with complex multi-step tasks because they plan only one step ahead.
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
# Define tools with clear descriptions (LLM uses these to decide)
@tool
def search_database(query: str) -> str:
"""Search the product database for items matching the query.
Returns product name, price, and availability."""
# Real implementation would query your database
results = db.search(query, limit=5)
return format_results(results)
@tool
def check_inventory(product_id: str) -> dict:
"""Check real-time inventory levels for a specific product.
Returns current stock, warehouse location, and restock date."""
return inventory_service.check(product_id)
@tool
def create_order(product_id: str, quantity: int,
customer_id: str) -> dict:
"""Create a new order for a customer. Validates inventory
before placing the order. Returns order confirmation."""
return order_service.create(product_id, quantity, customer_id)
@tool
def calculate_shipping(origin: str, destination: str,
weight_kg: float) -> dict:
"""Calculate shipping options and costs between two locations.
Returns available carriers, estimated delivery dates, and prices."""
return shipping_service.calculate(origin, destination, weight_kg)
# Create the ReAct agent
model = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0)
memory = MemorySaver()
agent = create_react_agent(
model=model,
tools=[search_database, check_inventory, create_order,
calculate_shipping],
checkpointer=memory,
)
# Run the agent
config = {"configurable": {"thread_id": "customer-123"}}
response = agent.invoke(
{"messages": [("user", "Find the cheapest laptop under $1000 "
"and order it to New York")]},
config=config,
)Building AI Agents: LangGraph State Machines
LangGraph provides a graph-based framework for building agents with explicit control flow. Unlike the simple ReAct loop, LangGraph lets you define nodes (processing steps) and edges (transitions) that create structured agent workflows. This approach gives you fine-grained control over the agent’s behavior while still leveraging LLM reasoning for decisions within each node.
from langgraph.graph import StateGraph, START, END
from typing import Annotated, TypedDict
from operator import add
class AgentState(TypedDict):
messages: Annotated[list, add]
current_plan: list[str]
completed_steps: list[str]
needs_human_approval: bool
final_answer: str | None
def planner(state: AgentState) -> AgentState:
"""LLM creates an execution plan based on the user request."""
messages = state["messages"]
plan_prompt = f"""Based on the user's request, create a step-by-step
plan. Available tools: search_database, check_inventory,
create_order, calculate_shipping.
User request: {messages[-1].content}
Return a JSON list of steps."""
response = model.invoke(plan_prompt)
plan = json.loads(response.content)
return {"current_plan": plan, "completed_steps": []}
def executor(state: AgentState) -> AgentState:
"""Execute the next step in the plan using appropriate tools."""
plan = state["current_plan"]
completed = state["completed_steps"]
next_step = plan[len(completed)]
# LLM decides which tool to call for this step
result = agent_executor.invoke({"task": next_step})
return {"completed_steps": [next_step + ": " + str(result)]}
def should_continue(state: AgentState) -> str:
"""Determine if all plan steps are completed."""
if len(state["completed_steps"]) >= len(state["current_plan"]):
return "summarize"
if state.get("needs_human_approval"):
return "human_review"
return "execute"
def summarizer(state: AgentState) -> AgentState:
"""Synthesize results from all completed steps."""
summary_prompt = f"""Summarize the results of these completed steps
into a clear answer for the user:
{state['completed_steps']}"""
response = model.invoke(summary_prompt)
return {"final_answer": response.content}
# Build the graph
graph = StateGraph(AgentState)
graph.add_node("plan", planner)
graph.add_node("execute", executor)
graph.add_node("summarize", summarizer)
graph.add_node("human_review", human_review_node)
graph.add_edge(START, "plan")
graph.add_edge("plan", "execute")
graph.add_conditional_edges("execute", should_continue)
graph.add_edge("human_review", "execute")
graph.add_edge("summarize", END)
agent = graph.compile(checkpointer=memory)Tool Design Patterns for Reliable Agents
The quality of your tools determines the quality of your agent. Well-designed tools have clear, descriptive names and docstrings that tell the LLM exactly what the tool does, what inputs it needs, and what it returns. Furthermore, tools should handle errors gracefully and return informative error messages that help the LLM recover. Consequently, investing time in tool design pays dividends in agent reliability.
from pydantic import BaseModel, Field
class DatabaseSearchInput(BaseModel):
"""Input schema for database search — provides validation
and clear documentation for the LLM."""
query: str = Field(description="Natural language search query")
category: str | None = Field(
default=None,
description="Optional category filter: electronics, "
"clothing, books, or home"
)
max_price: float | None = Field(
default=None,
description="Maximum price in USD. Leave empty for no limit."
)
limit: int = Field(
default=5,
description="Maximum number of results to return (1-20)"
)
@tool(args_schema=DatabaseSearchInput)
def search_database(query: str, category: str | None = None,
max_price: float | None = None,
limit: int = 5) -> str:
"""Search the product database for items matching the query.
Returns a formatted list with: product name, price, rating,
availability status, and product ID.
Use this tool when the user wants to find, browse, or compare
products. Do NOT use this for checking inventory of a specific
product — use check_inventory instead.
Example: search_database("wireless headphones", category="electronics",
max_price=200, limit=3)
"""
try:
results = db.search(
query=query, category=category,
max_price=max_price, limit=min(limit, 20)
)
if not results:
return (f"No products found matching '{query}'"
+ (f" in category '{category}'" if category else "")
+ (f" under ${max_price}" if max_price else ""
+ ". Try broadening your search.")
return format_product_results(results)
except DatabaseError as e:
return f"Database search failed: {e}. Please try again."Memory Systems: Short-Term, Long-Term, and Semantic
Production agents need memory systems that go beyond simple conversation history. LangGraph’s checkpointer provides short-term memory (conversation context), but agents also need long-term memory (user preferences, past interactions) and semantic memory (learned facts and relationships). Additionally, memory management must handle context window limits gracefully — summarizing older messages instead of dropping them.
from langgraph.store.memory import InMemoryStore
from langchain_core.messages import SystemMessage
# Semantic memory store for user preferences
store = InMemoryStore()
async def update_user_memory(state: AgentState, store) -> AgentState:
"""Extract and store user preferences from conversation."""
user_id = state["config"]["user_id"]
messages = state["messages"]
# Ask LLM to extract preferences
extraction_prompt = """Analyze this conversation and extract any
user preferences (preferred brands, budget range, shipping
preferences, etc.) as JSON key-value pairs."""
preferences = await model.ainvoke(extraction_prompt + str(messages))
parsed = json.loads(preferences.content)
# Store in long-term memory
for key, value in parsed.items():
store.put(
namespace=("users", user_id, "preferences"),
key=key,
value={"preference": value, "updated_at": datetime.now().isoformat()}
)
return state
async def load_user_context(state: AgentState, store) -> AgentState:
"""Load user's historical preferences into agent context."""
user_id = state["config"]["user_id"]
memories = store.search(
namespace=("users", user_id, "preferences")
)
if memories:
pref_text = "Known user preferences:\n"
for mem in memories:
pref_text += f"- {mem.key}: {mem.value['preference']}\n"
system_msg = SystemMessage(content=pref_text)
return {"messages": [system_msg]}
return stateHuman-in-the-Loop: Approval Gates and Intervention
Production agents must include human oversight for high-stakes actions. LangGraph’s interrupt mechanism lets you pause agent execution at critical decision points, present the proposed action to a human, and resume or redirect based on their input. This pattern is essential for agents that create orders, send communications, or modify data. Moreover, the checkpointing system preserves the full agent state during interruption, allowing humans to review context before approving.
from langgraph.types import interrupt, Command
def execute_with_approval(state: AgentState) -> AgentState:
"""Execute actions, pausing for human approval on high-risk ops."""
action = state["next_action"]
# High-risk actions require approval
high_risk = ["create_order", "send_email", "delete_record",
"update_price"]
if action["tool"] in high_risk:
# This pauses execution and returns to the caller
human_response = interrupt({
"action": action,
"reason": f"Agent wants to {action['tool']} with "
f"args: {action['args']}",
"options": ["approve", "reject", "modify"]
})
if human_response == "reject":
return {"messages": [("system", "Action rejected by human. "
"Find an alternative.")]}
elif human_response.startswith("modify:"):
action["args"] = json.loads(
human_response.split("modify:")[1])
# Execute the approved action
result = execute_tool(action["tool"], action["args"])
return {"messages": [("tool", str(result))]}Multi-Agent Systems
Complex tasks benefit from multiple specialized agents working together. A supervisor agent routes tasks to specialized sub-agents — a research agent for information gathering, an analysis agent for data processing, and an action agent for executing operations. Furthermore, each sub-agent can have its own tools and system prompt optimized for its role. This separation of concerns improves both reliability and maintainability.
from langgraph.graph import StateGraph
def create_multi_agent_system():
# Specialized agents
researcher = create_react_agent(
model=model,
tools=[web_search, document_search, arxiv_search],
state_modifier="You are a research specialist. Find accurate, "
"up-to-date information. Always cite sources."
)
analyst = create_react_agent(
model=model,
tools=[data_analyzer, chart_generator, statistics_tool],
state_modifier="You are a data analyst. Analyze data objectively "
"and provide clear insights with visualizations."
)
writer = create_react_agent(
model=model,
tools=[document_writer, email_drafter, report_generator],
state_modifier="You are a professional writer. Create clear, "
"concise documents based on research and analysis."
)
def supervisor(state):
"""Route tasks to the appropriate specialist agent."""
response = model.invoke(
"Based on the current state, which agent should handle "
"the next step? Options: researcher, analyst, writer, done. "
f"Current progress: {state['completed_steps']}"
)
return {"next_agent": response.content.strip().lower()}
def route(state) -> str:
return state["next_agent"]
graph = StateGraph(MultiAgentState)
graph.add_node("supervisor", supervisor)
graph.add_node("researcher", researcher)
graph.add_node("analyst", analyst)
graph.add_node("writer", writer)
graph.add_edge(START, "supervisor")
graph.add_conditional_edges("supervisor", route)
graph.add_edge("researcher", "supervisor")
graph.add_edge("analyst", "supervisor")
graph.add_edge("writer", "supervisor")
return graph.compile(checkpointer=memory)Error Handling and Recovery
Production agents must handle failures gracefully. Tool calls can fail, APIs can be unavailable, and LLMs can produce invalid outputs. Implement retry logic with exponential backoff, fallback tools, and maximum iteration limits. Additionally, log every agent decision and tool call for debugging. Furthermore, set a hard timeout for agent execution to prevent runaway costs.
Production Deployment Patterns
Deploy agents as async services behind a queue (Redis, SQS) rather than synchronous API endpoints. Agent execution can take seconds to minutes, making synchronous HTTP unsuitable. Additionally, use LangSmith or similar observability platforms to trace agent execution, identify bottlenecks, and debug failures. Moreover, implement rate limiting per user and global cost caps to prevent budget overruns from agent loops.
Key Takeaways
Building AI agents with LangChain and LangGraph requires careful architecture decisions, robust tool design, and production-grade guardrails. Start with simple ReAct agents for straightforward tasks, graduate to LangGraph state machines for complex workflows, and use multi-agent systems for tasks requiring diverse expertise. Always include human-in-the-loop controls for high-stakes actions, implement comprehensive memory systems, and deploy with full observability. The agents that succeed in production are those with clear boundaries, proper error handling, and human oversight at critical decision points.
Related Reading:
- RAG Architecture Patterns for Production
- Vector Databases for AI Search Comparison
- AI Code Generation Tools Comparison
External Resources: