Building AI Agents LangChain LangGraph Guide

Building AI Agents with LangChain and LangGraph: From Prototype to Production

Building AI agents with LangChain and LangGraph has become the dominant approach for creating autonomous AI systems that can reason, plan, and execute multi-step tasks. Unlike simple prompt-response chatbots, agents use LLMs as reasoning engines that decide which tools to call, how to process results, and when to ask for human input. Therefore, mastering agent development is essential for any team building intelligent automation in 2026. This guide covers the complete journey from basic ReAct agents to sophisticated multi-agent systems with human-in-the-loop controls, persistent memory, and production deployment patterns.

The agent paradigm represents a fundamental shift from programmatic workflows to AI-driven orchestration. In traditional software, developers write explicit code paths for every scenario. With agents, the LLM dynamically decides the execution plan based on the user’s request and available tools. Moreover, agents can handle ambiguous requests, recover from errors, and adapt their approach when initial strategies fail. However, this flexibility comes with challenges — agents can hallucinate tool calls, enter infinite loops, or make expensive mistakes without proper guardrails.

Agent Architectures: ReAct, Plan-and-Execute, and Multi-Agent

The ReAct (Reasoning + Acting) pattern is the simplest and most common agent architecture. The LLM receives a prompt with available tools, thinks about what to do, calls a tool, observes the result, and repeats until the task is complete. Furthermore, ReAct agents work well for straightforward tasks with 2-5 tool calls. However, they struggle with complex multi-step tasks because they plan only one step ahead.

from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver

# Define tools with clear descriptions (LLM uses these to decide)
@tool
def search_database(query: str) -> str:
    """Search the product database for items matching the query.
    Returns product name, price, and availability."""
    # Real implementation would query your database
    results = db.search(query, limit=5)
    return format_results(results)

@tool
def check_inventory(product_id: str) -> dict:
    """Check real-time inventory levels for a specific product.
    Returns current stock, warehouse location, and restock date."""
    return inventory_service.check(product_id)

@tool
def create_order(product_id: str, quantity: int,
                 customer_id: str) -> dict:
    """Create a new order for a customer. Validates inventory
    before placing the order. Returns order confirmation."""
    return order_service.create(product_id, quantity, customer_id)

@tool
def calculate_shipping(origin: str, destination: str,
                       weight_kg: float) -> dict:
    """Calculate shipping options and costs between two locations.
    Returns available carriers, estimated delivery dates, and prices."""
    return shipping_service.calculate(origin, destination, weight_kg)

# Create the ReAct agent
model = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0)
memory = MemorySaver()

agent = create_react_agent(
    model=model,
    tools=[search_database, check_inventory, create_order,
           calculate_shipping],
    checkpointer=memory,
)

# Run the agent
config = {"configurable": {"thread_id": "customer-123"}}
response = agent.invoke(
    {"messages": [("user", "Find the cheapest laptop under $1000 "
                           "and order it to New York")]},
    config=config,
)

AI agent architecture diagram — AI agents use LLMs as reasoning engines to orchestrate tool calls and decision-making

Building AI Agents: LangGraph State Machines

LangGraph provides a graph-based framework for building agents with explicit control flow. Unlike the simple ReAct loop, LangGraph lets you define nodes (processing steps) and edges (transitions) that create structured agent workflows. This approach gives you fine-grained control over the agent’s behavior while still leveraging LLM reasoning for decisions within each node.

from langgraph.graph import StateGraph, START, END
from typing import Annotated, TypedDict
from operator import add

class AgentState(TypedDict):
    messages: Annotated[list, add]
    current_plan: list[str]
    completed_steps: list[str]
    needs_human_approval: bool
    final_answer: str | None

def planner(state: AgentState) -> AgentState:
    """LLM creates an execution plan based on the user request."""
    messages = state["messages"]
    plan_prompt = f"""Based on the user's request, create a step-by-step
    plan. Available tools: search_database, check_inventory,
    create_order, calculate_shipping.

    User request: {messages[-1].content}

    Return a JSON list of steps."""

    response = model.invoke(plan_prompt)
    plan = json.loads(response.content)
    return {"current_plan": plan, "completed_steps": []}

def executor(state: AgentState) -> AgentState:
    """Execute the next step in the plan using appropriate tools."""
    plan = state["current_plan"]
    completed = state["completed_steps"]
    next_step = plan[len(completed)]

    # LLM decides which tool to call for this step
    result = agent_executor.invoke({"task": next_step})
    return {"completed_steps": [next_step + ": " + str(result)]}

def should_continue(state: AgentState) -> str:
    """Determine if all plan steps are completed."""
    if len(state["completed_steps"]) >= len(state["current_plan"]):
        return "summarize"
    if state.get("needs_human_approval"):
        return "human_review"
    return "execute"

def summarizer(state: AgentState) -> AgentState:
    """Synthesize results from all completed steps."""
    summary_prompt = f"""Summarize the results of these completed steps
    into a clear answer for the user:
    {state['completed_steps']}"""
    response = model.invoke(summary_prompt)
    return {"final_answer": response.content}

# Build the graph
graph = StateGraph(AgentState)
graph.add_node("plan", planner)
graph.add_node("execute", executor)
graph.add_node("summarize", summarizer)
graph.add_node("human_review", human_review_node)

graph.add_edge(START, "plan")
graph.add_edge("plan", "execute")
graph.add_conditional_edges("execute", should_continue)
graph.add_edge("human_review", "execute")
graph.add_edge("summarize", END)

agent = graph.compile(checkpointer=memory)

Tool Design Patterns for Reliable Agents

The quality of your tools determines the quality of your agent. Well-designed tools have clear, descriptive names and docstrings that tell the LLM exactly what the tool does, what inputs it needs, and what it returns. Furthermore, tools should handle errors gracefully and return informative error messages that help the LLM recover. Consequently, investing time in tool design pays dividends in agent reliability.

from pydantic import BaseModel, Field

class DatabaseSearchInput(BaseModel):
    """Input schema for database search — provides validation
    and clear documentation for the LLM."""
    query: str = Field(description="Natural language search query")
    category: str | None = Field(
        default=None,
        description="Optional category filter: electronics, "
                    "clothing, books, or home"
    )
    max_price: float | None = Field(
        default=None,
        description="Maximum price in USD. Leave empty for no limit."
    )
    limit: int = Field(
        default=5,
        description="Maximum number of results to return (1-20)"
    )

@tool(args_schema=DatabaseSearchInput)
def search_database(query: str, category: str | None = None,
                    max_price: float | None = None,
                    limit: int = 5) -> str:
    """Search the product database for items matching the query.

    Returns a formatted list with: product name, price, rating,
    availability status, and product ID.

    Use this tool when the user wants to find, browse, or compare
    products. Do NOT use this for checking inventory of a specific
    product — use check_inventory instead.

    Example: search_database("wireless headphones", category="electronics",
             max_price=200, limit=3)
    """
    try:
        results = db.search(
            query=query, category=category,
            max_price=max_price, limit=min(limit, 20)
        )
        if not results:
            return (f"No products found matching '{query}'"
                    + (f" in category '{category}'" if category else "")
                    + (f" under ${max_price}" if max_price else ""
                    + ". Try broadening your search.")
        return format_product_results(results)
    except DatabaseError as e:
        return f"Database search failed: {e}. Please try again."

Memory Systems: Short-Term, Long-Term, and Semantic

Production agents need memory systems that go beyond simple conversation history. LangGraph’s checkpointer provides short-term memory (conversation context), but agents also need long-term memory (user preferences, past interactions) and semantic memory (learned facts and relationships). Additionally, memory management must handle context window limits gracefully — summarizing older messages instead of dropping them.

from langgraph.store.memory import InMemoryStore
from langchain_core.messages import SystemMessage

# Semantic memory store for user preferences
store = InMemoryStore()

async def update_user_memory(state: AgentState, store) -> AgentState:
    """Extract and store user preferences from conversation."""
    user_id = state["config"]["user_id"]
    messages = state["messages"]

    # Ask LLM to extract preferences
    extraction_prompt = """Analyze this conversation and extract any
    user preferences (preferred brands, budget range, shipping
    preferences, etc.) as JSON key-value pairs."""

    preferences = await model.ainvoke(extraction_prompt + str(messages))
    parsed = json.loads(preferences.content)

    # Store in long-term memory
    for key, value in parsed.items():
        store.put(
            namespace=("users", user_id, "preferences"),
            key=key,
            value={"preference": value, "updated_at": datetime.now().isoformat()}
        )
    return state

async def load_user_context(state: AgentState, store) -> AgentState:
    """Load user's historical preferences into agent context."""
    user_id = state["config"]["user_id"]
    memories = store.search(
        namespace=("users", user_id, "preferences")
    )

    if memories:
        pref_text = "Known user preferences:\n"
        for mem in memories:
            pref_text += f"- {mem.key}: {mem.value['preference']}\n"

        system_msg = SystemMessage(content=pref_text)
        return {"messages": [system_msg]}
    return state

AI memory and neural network systems — Production agents need layered memory: conversation history, user preferences, and semantic knowledge

Human-in-the-Loop: Approval Gates and Intervention

Production agents must include human oversight for high-stakes actions. LangGraph’s interrupt mechanism lets you pause agent execution at critical decision points, present the proposed action to a human, and resume or redirect based on their input. This pattern is essential for agents that create orders, send communications, or modify data. Moreover, the checkpointing system preserves the full agent state during interruption, allowing humans to review context before approving.

from langgraph.types import interrupt, Command

def execute_with_approval(state: AgentState) -> AgentState:
    """Execute actions, pausing for human approval on high-risk ops."""
    action = state["next_action"]

    # High-risk actions require approval
    high_risk = ["create_order", "send_email", "delete_record",
                 "update_price"]

    if action["tool"] in high_risk:
        # This pauses execution and returns to the caller
        human_response = interrupt({
            "action": action,
            "reason": f"Agent wants to {action['tool']} with "
                      f"args: {action['args']}",
            "options": ["approve", "reject", "modify"]
        })

        if human_response == "reject":
            return {"messages": [("system", "Action rejected by human. "
                                            "Find an alternative.")]}
        elif human_response.startswith("modify:"):
            action["args"] = json.loads(
                human_response.split("modify:")[1])

    # Execute the approved action
    result = execute_tool(action["tool"], action["args"])
    return {"messages": [("tool", str(result))]}

Multi-Agent Systems

Complex tasks benefit from multiple specialized agents working together. A supervisor agent routes tasks to specialized sub-agents — a research agent for information gathering, an analysis agent for data processing, and an action agent for executing operations. Furthermore, each sub-agent can have its own tools and system prompt optimized for its role. This separation of concerns improves both reliability and maintainability.

from langgraph.graph import StateGraph

def create_multi_agent_system():
    # Specialized agents
    researcher = create_react_agent(
        model=model,
        tools=[web_search, document_search, arxiv_search],
        state_modifier="You are a research specialist. Find accurate, "
                       "up-to-date information. Always cite sources."
    )

    analyst = create_react_agent(
        model=model,
        tools=[data_analyzer, chart_generator, statistics_tool],
        state_modifier="You are a data analyst. Analyze data objectively "
                       "and provide clear insights with visualizations."
    )

    writer = create_react_agent(
        model=model,
        tools=[document_writer, email_drafter, report_generator],
        state_modifier="You are a professional writer. Create clear, "
                       "concise documents based on research and analysis."
    )

    def supervisor(state):
        """Route tasks to the appropriate specialist agent."""
        response = model.invoke(
            "Based on the current state, which agent should handle "
            "the next step? Options: researcher, analyst, writer, done. "
            f"Current progress: {state['completed_steps']}"
        )
        return {"next_agent": response.content.strip().lower()}

    def route(state) -> str:
        return state["next_agent"]

    graph = StateGraph(MultiAgentState)
    graph.add_node("supervisor", supervisor)
    graph.add_node("researcher", researcher)
    graph.add_node("analyst", analyst)
    graph.add_node("writer", writer)

    graph.add_edge(START, "supervisor")
    graph.add_conditional_edges("supervisor", route)
    graph.add_edge("researcher", "supervisor")
    graph.add_edge("analyst", "supervisor")
    graph.add_edge("writer", "supervisor")

    return graph.compile(checkpointer=memory)

Error Handling and Recovery

Production agents must handle failures gracefully. Tool calls can fail, APIs can be unavailable, and LLMs can produce invalid outputs. Implement retry logic with exponential backoff, fallback tools, and maximum iteration limits. Additionally, log every agent decision and tool call for debugging. Furthermore, set a hard timeout for agent execution to prevent runaway costs.

Production Deployment Patterns

Deploy agents as async services behind a queue (Redis, SQS) rather than synchronous API endpoints. Agent execution can take seconds to minutes, making synchronous HTTP unsuitable. Additionally, use LangSmith or similar observability platforms to trace agent execution, identify bottlenecks, and debug failures. Moreover, implement rate limiting per user and global cost caps to prevent budget overruns from agent loops.

AI agents production deployment — Deploy agents as async services with observability, rate limiting, and cost controls

Key Takeaways

Building AI agents with LangChain and LangGraph requires careful architecture decisions, robust tool design, and production-grade guardrails. Start with simple ReAct agents for straightforward tasks, graduate to LangGraph state machines for complex workflows, and use multi-agent systems for tasks requiring diverse expertise. Always include human-in-the-loop controls for high-stakes actions, implement comprehensive memory systems, and deploy with full observability. The agents that succeed in production are those with clear boundaries, proper error handling, and human oversight at critical decision points.

Related Reading:

External Resources: