What observability tool should we use with LangChain/LangGraph?

LangSmith is the native observability platform and the most complete option - it provides full trace visibility, prompt versioning, evaluation datasets, and cost tracking. Phoenix (Arize) and Galileo are strong alternatives if you need vendor-neutral tooling. For multi-agent systems, AgentOps provides specialized debugging features for agent-specific failure patterns.

How do we manage token costs in LangGraph applications?

Implement cost tracking per node and per workflow using LangSmith callbacks or custom LLM callbacks that log token counts. Set alert thresholds for cost anomalies. Use the cheapest model that meets quality requirements for each task - route simple classification to mini/haiku models, complex reasoning to frontier models. Implement context window management so long conversations don't accumulate unnecessary token overhead.

Is LangChain maintained and safe to build on?

LangChain is actively maintained with regular releases. LangGraph is the active development focus for agent orchestration. The LangChain ecosystem (LangSmith, LangServe) is production-ready and used by a large number of enterprise applications. The main risk is API churn - validate against the current documentation and pin versions in production.

Development Technology LangChain / LangGraph

AI Frameworks

LangChain / LangGraph

Language

Python / JavaScript

License

MIT

Key Pattern

Chains + Graph Agents

Graph Layer

LangGraph (stateful)

LangChain and LangGraph are the most widely adopted frameworks for building applications powered by large language models. LangChain provides the composable building blocks - prompt templates, chains, retrievers, tool integrations - that most LLM workflows require. LangGraph, built on top of LangChain, adds stateful graph-based orchestration: the ability to model workflows with conditional logic, loops, checkpointing, and human-in-the-loop interrupts.

Axevate uses both frameworks across client engagements, from RAG pipelines for document Q&A to production agentic systems that take action in external systems. Our experience includes instrumenting these systems with LangSmith, optimizing token costs, designing for production resilience, and debugging the non-obvious failure modes that only appear under real-world conditions.

1LangChain: Core Abstractions

Chains are LangChain's fundamental building block - a sequence of calls to components that produces a structured output. The most basic LLMChain combines a PromptTemplate, an LLM, and an OutputParser. More complex chains (SequentialChain, RouterChain) compose multiple steps with conditional routing. Chains are appropriate for predictable, fixed-path workflows. When your process needs to adapt based on intermediate results, LangGraph is the right choice.

Agents replace fixed chains with dynamic decision-making. The agent uses an LLM to decide which tool to call next, evaluates the result, and continues until the task is complete. The quality of agent behavior depends almost entirely on the quality of tool definitions - specific descriptions, precise parameter schemas, and graceful error handling are what separate reliable agents from chaotic ones.

Retrievers are the core of RAG (Retrieval-Augmented Generation) systems. They accept a query and return relevant documents from a vector store. LangChain supports all major vector databases: Pinecone, Weaviate, Qdrant, pgvector, Chroma, and FAISS. The retriever abstraction makes it straightforward to swap backends without rewriting retrieval logic.

2LangGraph: Stateful Agent Workflows

LangGraph models workflows as directed graphs. Nodes are Python functions that read from and write to a shared State object. Edges connect nodes - either unconditionally or conditionally (routing to different nodes based on the current state). This architecture enables complex agent behaviors that LangChain chains cannot express: retry logic, multi-step tool use, parallel branches, and human approval checkpoints.

State is the central design concept. It's a TypedDict or Pydantic model that persists across graph steps. Getting state design right - minimal, well-typed, using reducer functions correctly - is the foundation of a reliable LangGraph application. State should contain what's needed to resume execution after any step, not the full history of what happened.

Checkpointers enable persistence: graph state is saved to storage after each node execution. InMemorySaver works for development; PostgresSaver or RedisSaver are required for production. Checkpointing enables recovery from failures, time-travel debugging (inspecting state at any point in history), and human-in-the-loop interrupts where execution pauses for review before continuing.

3RAG Implementation Best Practices

Chunking strategy is the most underestimated decision in RAG systems. Splitting documents at fixed character counts breaks semantic context, degrading both embedding quality and retrieval relevance. Semantic chunking that respects document structure - preserving heading hierarchies, keeping related content together, embedding contextual headers into each chunk - consistently outperforms character-based splitting. Start with 1,000-token chunks and 200-token overlap, then tune based on measured retrieval quality.

Hybrid retrieval (combining dense vector search with sparse keyword search, then reranking results) outperforms pure vector similarity for most enterprise knowledge bases. Dense search finds semantically similar content; sparse search finds exact term matches. Combining them captures both. Implement a reranker (Cohere Rerank, BGE-Reranker, or a cross-encoder) to merge and re-score the combined result set before passing to the LLM.

Evaluation is non-negotiable before production deployment. RAG systems produce confident-sounding wrong answers when retrieval fails. Build a test set of representative questions with expected answers before deployment. Measure answer correctness, context relevance, and hallucination rate. Continue monitoring in production - retrieval quality drifts as documents are added or updated and embedding models improve.

How We Use It in Practice

Real architectural problems across industries — and how we approach them.

Enterprise Customer Support

Support Agent Across SFCC, Salesforce Service Cloud, and LangGraph

A luxury retailer needed an AI support agent with access to live order data from SFCC, the ability to open returns in Service Cloud, and escalation logic that respects business hours, customer tier, and sentiment. Fixed-step chains couldn't handle the branching — sometimes the agent needed two tool calls, sometimes seven, with human review required for high-value customers.

Our approach

LangGraph stateful workflow with a triage node (Claude Haiku classifies intent and customer tier), specialized tool nodes (SCAPI for order lookup, Service Cloud REST for case creation), and a HITL interrupt before any write operation exceeding $500 order value. PostgresSaver checkpointer so conversations resume across sessions. Sentiment guardrail running in parallel — if frustration score exceeds threshold, escalation path is taken regardless of other routing rules.

Legal Tech / Document Intelligence

Contract Analysis Pipeline: 10,000 Documents, Two-Stage LLM Cost Control

A law firm needed to analyze contract documents for non-standard clauses, compare against counterparty templates, and flag risk — at a volume where sending every document to Claude Opus was economically unworkable. RAG alone missed cross-document relationships that required reasoning across sections.

Our approach

Two-stage LangGraph pipeline: Claude Haiku runs a fast extraction pass, identifying which sections are relevant and tagging risk indicators into a structured Pydantic schema. Only flagged sections proceed to a Claude Sonnet 4 analysis pass that reasons across the extracted context. Full Opus pass reserved for contested clauses above a risk threshold. Extracted clause data lands in PostgreSQL for cross-document comparison queries. Human review HITL node fires when risk score exceeds the client's threshold — the reviewer sees a structured summary, not the raw document.

eCommerce Operations Automation

Multi-Agent Inventory and Reorder Workflow with Audit Trail

An operator running 5 Shopify stores wanted to automate daily inventory reconciliation, reorder suggestions, and supplier email drafts — but needed the process auditable and all write operations human-approved. Previous automation scripts had triggered erroneous bulk orders when data was stale.

Our approach

Supervisor LangGraph architecture: a coordinator agent dispatches to specialized sub-agents (inventory reader via Shopify Admin API, reorder calculator, email drafter) but no write operations execute without a human approval node. Per-run token budget enforced in RunContext — runs that exceed budget are suspended, not silently aborted. Every agent decision written to a structured audit table in PostgreSQL before any external action. Cost per run tracked and alerted; the system has processed 600+ runs without an erroneous write.

FAQ

For simple single-step LLM calls, use the raw SDK (OpenAI, Anthropic). For multi-step chains without dynamic branching, LangChain LCEL (LangChain Expression Language) works well. For anything with agents, conditional logic, loops, or human-in-the-loop requirements, use LangGraph. LangGraph is the production-grade choice for most real applications.

Ready to build with LangChain / LangGraph?

Talk to Us