Feb 2026/12 min

How I Fixed a Streaming Bug That Corrupted Parallel Tool Calls in LangChain

A deep dive into LangChain's merge_lists() function, why parallel tool calls from Bedrock/Anthropic got corrupted, and designing a backward-compatible fix at the framework level.

LangChainStreamingPython

The Problem

A user reported that when using Claude via AWS Bedrock with LangChain, asking the model to call multiple tools in parallel produced garbage output. Instead of two separate tool calls, the model's response was merged into a single corrupted tool call — with the tool names, IDs, and JSON arguments all concatenated together.

For example, if the model wanted to call read_file(path="foo.txt") and search_text(query="bar") in parallel, the output would look like:

Tool call: "read_filesearch_text"
ID: "tooluse_ABCtooluse_DEF"
Args: '{"path": "foo.txt"}{"query": "bar"}'

This is clearly wrong — two separate tool calls got mashed into one.

Understanding the Streaming Architecture

To understand why this happened, I first needed to understand how LangChain handles streaming tool calls.

When an LLM streams a response with tool calls, it doesn't send the complete tool call in one shot. Instead, it sends it in chunks:

Chunk 1: {index: 0, id: "tooluse_ABC", name: "read_file", args: ""}
Chunk 2: {index: 0, id: null, name: null, args: '{"path"'}
Chunk 3: {index: 0, id: null, name: null, args: ': "foo.txt"}'}

LangChain's merge_lists() function in libs/core/langchain_core/utils/_merge.py is responsible for reassembling these chunks back into complete tool calls. It uses the index field to figure out which chunks belong together.

Finding the Root Cause

I opened _merge.py and found the merge condition at line 117:

python
to_merge = [
    i
    for i, e_left in enumerate(merged)
    if "index" in e_left and e_left["index"] == e["index"]
]

This says: "find all existing items in the merged list that have the same index as the incoming chunk, and merge with them."

The problem: parallel tool calls from some providers can share the same `index` value while having completely different `id` values.

When Claude via Bedrock streams two parallel tool calls, both might come with index=0:

Chunk 1: {index: 0, id: "tooluse_ABC", name: "read_file", args: ...}
Chunk 2: {index: 0, id: "tooluse_DEF", name: "search_text", args: ...}

Since both have index=0, merge_lists() treated them as parts of the same tool call and merged them via merge_dicts(), which naively concatenates string fields.

Why OpenAI Wasn't Affected

While investigating, I discovered something interesting: OpenAI already had a workaround for this. In LangChain's OpenAI integration, tool call indices are prefixed with lc_tc_ — converting the integer index to a unique string like lc_tc_0_ABC. This gave each tool call a unique index, bypassing the bug entirely.

But this was a provider-specific hack. The underlying bug in merge_lists() still affected every other provider: Bedrock, Anthropic direct, Ollama, etc.

The Discussion: Where Should This Be Fixed?

When I proposed this fix on the issue, the maintainer ccurme pushed back. His position was clear:

"Why not populate the index key correctly? That is the purpose of the key. If we implement a workaround in core, we will have two distinct blocks with different values for index in the aggregated result, which is wrong or at least confusing."

His reasoning made sense from a design purity standpoint: the index field exists precisely to tell merge_lists() which chunks belong together. If providers are sending the wrong index, the fix should be in the provider (the chat model integration), not in the core utility function.

I understood his concern but argued that a core-level fix was more practical:

  1. No one was fixing the providers. The bug had been open for a month with no provider-level fix in sight.
  2. Provider-specific workarounds don't scale. OpenAI already had its lc_tc_ prefix hack. Asking every provider to implement their own workaround means the same bug recurs every time a new provider integration is added.
  3. The impact of the bug was severe. Silently corrupting tool calls is much worse than having slightly confusing index values in the merged result.

A day later, ccurme merged the PR with this note:

"Merged — I'm not sure anyone is resolving these in the underlying chat models (haven't seen any reproducible examples) and IMO having incorrect values for index is less bad than merging distinct tool calls together."

The Fix

I added an id-aware check to the merge condition:

python
to_merge = [
    i
    for i, e_left in enumerate(merged)
    if (
        "index" in e_left
        and e_left["index"] == e["index"]  # index matches
        and (  # IDs not inconsistent
            e_left.get("id") is None
            or e.get("id") is None
            or e_left["id"] == e["id"]
        )
    )
]

The key design decision was the None handling:

  • Both have IDs and they match → same tool call, merge them
  • Either has `id=None` → streaming continuation, merge them
  • Both have IDs and they differ → different tool calls, keep them separate

Testing

I wrote three test scenarios to cover all the edge cases:

python
# 1. Two parallel tool calls with same index but different IDs → should NOT merge
left = create_tool_call_chunk(name="read_file", args='{"path": "foo.txt"}', id="tooluse_ABC", index=0)
right = create_tool_call_chunk(name="search_text", args='{"query": "bar"}', id="tooluse_DEF", index=0)
merged = merge_lists([left], [right])
assert len(merged) == 2  # Two separate tool calls

# 2. Streaming continuation with id=None → should still merge
first = create_tool_call_chunk(name="tool1", args="", id="id1", index=0)
continuation = create_tool_call_chunk(name=None, args='{"key": "value"}', id=None, index=0)
merged = merge_lists([first], [continuation])
assert len(merged) == 1  # One merged tool call

# 3. Three parallel tool calls all with same index → should remain separate
tc1 = create_tool_call_chunk(name="tool_a", args="{}", id="id_a", index=0)
tc2 = create_tool_call_chunk(name="tool_b", args="{}", id="id_b", index=0)
tc3 = create_tool_call_chunk(name="tool_c", args="{}", id="id_c", index=0)
merged = merge_lists([tc1], [tc2], [tc3])
assert len(merged) == 3  # Three separate tool calls

Key Takeaways

  1. Streaming is where bugs hide. The non-streaming path worked fine because complete tool calls were already separated. It's only when you reassemble chunks that assumptions about indices break down.
  2. Provider differences matter. OpenAI's workaround masked the bug for the most popular provider, which is why it took so long to surface. Always test with multiple LLM providers.
  3. Fix bugs at the framework level. Instead of adding per-provider workarounds, fixing merge_lists() itself benefits all providers at once.

Impact & Reflection

Impact: This fix went into langchain-core, the foundation package installed by every LangChain user — over 30 million monthly downloads. It resolved a critical data corruption bug that affected all non-OpenAI providers (Bedrock, Anthropic direct, Ollama, and any future provider). The PR was merged within 24 hours of submission.

What I learned about engineering judgment: The most valuable lesson from this contribution wasn't the code — it was the design discussion with the maintainer. ccurme's initial pushback ("fix it at the provider level") was architecturally sound. My counter-argument ("a pragmatic core fix is better because no one is fixing the providers") was practical. The final decision to merge showed me how experienced maintainers weigh design purity against real-world impact. In open source, the "correct" solution and the "right" solution aren't always the same thing.

What changed in my debugging approach: This was the first time I traced a bug through a streaming pipeline end-to-end. I now always check: "Does this code make assumptions about the uniqueness of identifiers across providers?" Streaming systems are where implicit assumptions become explicit bugs.