Skip to content

feat(closes OPEN-11315): modernize the LangChain/LangGraph callback handler for v1#654

Open
viniciusdsmello wants to merge 1 commit into
mainfrom
vini/open-11315-python-improve-langchain-callback-handler-retriever-capture
Open

feat(closes OPEN-11315): modernize the LangChain/LangGraph callback handler for v1#654
viniciusdsmello wants to merge 1 commit into
mainfrom
vini/open-11315-python-improve-langchain-callback-handler-retriever-capture

Conversation

@viniciusdsmello

@viniciusdsmello viniciusdsmello commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Modernizes the sync and async Openlayer LangChain/LangGraph callback handler (langchain_callback.py) for OPEN-11315. The callbacks API stayed backwards-compatible across LangChain v1, so these are gap-fills + modernizations rather than a migration.

🔴 High-priority fixes

  • Sync handler now wires retriever callbacks. on_retriever_start/end/error existed only on AsyncOpenlayerHandler; the sync handler now has them too, so synchronous RAG pipelines produce a RETRIEVER step and populate the context column. Also fixes context being dropped when a sync root-level retriever has no external trace context (external trace still wins).
  • Tool calls no longer discarded. _extract_output falls back to serializing AIMessage.tool_calls when the generation text is empty (tool-only agent turns previously recorded ""), and _message_to_dict preserves tool_calls. Backwards-compatible for messages without tool calls.

🟡 / 🟢 Modernization

  • Tokens (v1 standard). _extract_token_info reads AIMessage.usage_metadata first, then falls back to llm_output / generation_info; captures input_token_details / output_token_details (cache_read, cache_creation, reasoning) under step.metadata["token_details"].
  • Provider. metadata["ls_provider"] is now the primary provider source (mapped to Openlayer's canonical names), with the _type map and LiteLLM model prefixes as fallbacks.
  • LangGraph metadata.
    • Chain step naming: a chain step is named from the runnable's own name, falling back to metadata["langgraph_node"], then the serialized id — i.e. name → langgraph_node → id, matching the TypeScript handler. langgraph_node is inherited by every run nested inside a node, so preferring it over name would relabel all of a node's internal LCEL runs (RunnableSequence / Prompt / …) — and even nested sub-graphs — with the parent node's name, collapsing the tree. LangGraph already sets name to the node name at node boundaries, so nodes stay identifiable while the inner structure keeps its real names.
    • Sessions: metadata["thread_id"] auto-maps to the trace session_id (opt out with map_thread_id_to_session=False), and never clobbers an explicitly provided session. post_process_trace already promotes session_id to its column, so it works end-to-end.
  • v1 content blocks. _message_to_dict normalizes list / content-block content (text joined into content, non-text blocks preserved under content_blocks).
  • Cleanup. Removed the dead langchain.schema / langchain.callbacks.base import fallback (removed in v1); imports from langchain_core only, and the ImportError now suggests pip install langchain-core.

Tests

New offline suite tests/lib/integrations/test_langchain_callback.py drives the handler with real langchain_core objects (publishing disabled) — retriever capture, tool-call output, token/provider extraction, content blocks, and the LangGraph naming + thread_id → session_id rules. 34/34 passing.

End-to-end validation

Validated against a real LangGraph create_react_agent workflow streaming to an Openlayer inference pipeline (langchain-core 1.x, langgraph 1.x, langchain-openai 1.x):

  • The trace tree matches the expected structure — outer graph → nodes (buildSystemPrompt / callAgent / rewrite) → nested react agent (agent / tools / agent) → tool calls and chat completions — with node names, nested LCEL names, and the sub-graph name all preserved (this is what motivated the name-first precedence above).
  • thread_id → session_id, user_id, provider (OpenAI), and token counts all land on the resulting row.

New public surface (additive)

  • map_thread_id_to_session: bool = True constructor param on both handlers.
  • _message_to_dict may add a content_blocks key for v1 list content; token_details appears under step metadata when present.

Notes

  • on_agent_action / on_agent_finish are left as-is (dead with create_agent / LangGraph, but kept for langchain-classic AgentExecutor compatibility).
  • Pre-existing and unrelated: a test-isolation issue in tests/test_tracing_core.py that surfaces only single-process (masked by -n auto); confirmed identical on a clean main.

🤖 Generated with Claude Code

@viniciusdsmello viniciusdsmello changed the title fix(langchain): capture retriever steps in sync handler and tool-call outputs feat(langchain): modernize the LangChain/LangGraph callback handler for v1 Jun 16, 2026
@viniciusdsmello viniciusdsmello self-assigned this Jun 16, 2026
@viniciusdsmello viniciusdsmello force-pushed the vini/open-11315-python-improve-langchain-callback-handler-retriever-capture branch 2 times, most recently from 4404ea1 to 40ad1bc Compare June 17, 2026 15:25
…andler for v1

Modernizes the sync + async Openlayer LangChain/LangGraph callback handler for
LangChain v1 (OPEN-11315). The callbacks API stayed backwards-compatible across
v1, so these are gap-fills and modernizations rather than a migration.

High-priority fixes:
- Wire on_retriever_start/end/error into the sync handler (previously only the
  async handler had them) so synchronous RAG pipelines produce a RETRIEVER step
  and populate the context column; fall back to the handler-owned trace when a
  root-level retriever has no external trace context.
- Fall back to serializing AIMessage.tool_calls in _extract_output when the
  generation text is empty, and preserve tool_calls in _message_to_dict, so
  tool-only agent turns no longer record empty output.

Modernization:
- Tokens: read AIMessage.usage_metadata first, then llm_output / generation_info;
  capture input/output token details (cache_read, cache_creation, reasoning)
  under step metadata.
- Provider: use metadata["ls_provider"] as the primary source (mapped to
  Openlayer provider names), with the _type map and LiteLLM prefixes as fallbacks.
- LangGraph metadata: chain steps prefer the runnable's own name, falling back to
  langgraph_node, then the serialized id (matching the TS handler) so node
  internals keep their real names; metadata["thread_id"] auto-maps to session_id
  unless an explicit session is set (opt-out via map_thread_id_to_session).
- v1 content blocks: normalize list / content-block message content (text joined
  into content, non-text blocks preserved under content_blocks).
- Drop the removed langchain.schema / langchain.callbacks.base import fallback;
  import from langchain_core only.

Tests:
- Add an offline test suite for the handler (real langchain_core objects,
  publishing disabled).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@viniciusdsmello viniciusdsmello force-pushed the vini/open-11315-python-improve-langchain-callback-handler-retriever-capture branch from 40ad1bc to 26458d4 Compare June 17, 2026 15:44
@viniciusdsmello viniciusdsmello changed the title feat(langchain): modernize the LangChain/LangGraph callback handler for v1 feat(closes OPEN-11315): modernize the LangChain/LangGraph callback handler for v1 Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant