Feb 2026/8 min

Adding Structured Reasoning to Haystack's Streaming Pipeline

Migrating Anthropic and Ollama integrations from unstructured meta dict access to proper ReasoningContent objects, with helper extraction and accumulation refactoring.

HaystackStreamingAnthropic

The Problem

Haystack 2.18.0 added a dedicated StreamingChunk.reasoning field with a ReasoningContent dataclass to provide structured access to LLM thinking/reasoning data. But the Anthropic and Ollama integrations hadn't been updated — they still dumped raw reasoning data into StreamingChunk.meta, an unstructured dictionary.

This meant downstream consumers had to write code like this to access reasoning:

python
# Before: digging through unstructured meta
thinking = chunk.meta.get("delta", {}).get("thinking", "")

Instead of the clean:

python
# After: structured field access
thinking = chunk.reasoning.reasoning_text

Understanding Anthropic's Streaming Protocol

Anthropic's streaming API is more complex than most. When the model uses "extended thinking" (Claude's reasoning mode), the stream contains several event types:

  1. `content_block_start` with type="thinking" — marks the beginning of a thinking block
  2. `content_block_delta` with type="thinking_delta" — contains a chunk of thinking text
  3. `content_block_delta` with type="signature_delta" — contains a cryptographic signature for the thinking block
  4. `content_block_start` with type="redacted_thinking" — marks a redacted thinking block (Anthropic sometimes redacts reasoning for safety)

Each of these needed to be mapped to the ReasoningContent dataclass:

python
@dataclass
class ReasoningContent:
    reasoning_text: str
    extra: dict[str, Any]  # for signatures, redacted data, etc.

The Fix: Anthropic Integration

I modified _convert_anthropic_chunk_to_streaming_chunk() in the utils module to populate the new reasoning field:

python
reasoning = None

# content_block_start events
if chunk.type == "content_block_start":
    if chunk.content_block.type == "thinking":
        reasoning = ReasoningContent(reasoning_text="")
    elif chunk.content_block.type == "redacted_thinking":
        reasoning = ReasoningContent(
            reasoning_text="",
            extra={"redacted_thinking": getattr(chunk.content_block, "data", "")}
        )

# content_block_delta events
elif chunk.type == "content_block_delta":
    if chunk.delta.type == "thinking_delta":
        reasoning = ReasoningContent(reasoning_text=chunk.delta.thinking)
    elif chunk.delta.type == "signature_delta":
        reasoning = ReasoningContent(
            reasoning_text="",
            extra={"signature": chunk.delta.signature}
        )

Refactoring the Accumulation Logic

The bigger refactoring was in _process_reasoning_contents(), which accumulates streaming reasoning chunks back into complete reasoning blocks. The old code was digging through chunk.meta dictionaries:

python
# Before: messy meta dict access
if (delta := chunk.meta.get("delta")) is not None:
    if delta.get("type") == "thinking_delta" and delta.get("thinking") is not None:
        content_block_text += delta.get("thinking", "")

I rewrote it to use the structured field:

python
# After: clean field access
if chunk.reasoning is None:
    continue
if chunk.reasoning.reasoning_text:
    content_block_text += chunk.reasoning.reasoning_text

I also extracted a _finalize_reasoning_group() helper to eliminate duplicated group-finalization code.

The Fix: Ollama Integration

The Ollama case was much simpler. Ollama exposes thinking data as a plain message.thinking string per chunk — no signatures, no redacted blocks, no complex event types.

python
reasoning = None
if hasattr(chunk_message, "thinking") and chunk_message.thinking:
    reasoning = ReasoningContent(reasoning_text=chunk_message.thinking)

return StreamingChunk(
    content=content,
    reasoning=reasoning,
    # ...
)

Key Takeaways

  1. Keep integrations consistent. When a framework adds a new structured field, all provider integrations should use it. Mixing structured and unstructured access creates a fragmented developer experience.
  2. Extract duplicated logic. The group-finalization code was repeated verbatim — extracting _finalize_reasoning_group() made the code shorter and easier to follow.
  3. Complexity varies by provider. The same conceptual feature (reasoning) required ~80 lines of changes for Anthropic (complex streaming protocol) vs ~10 lines for Ollama (simple thinking string). Understanding each provider's API shape is essential.

Impact & Reflection

Impact: These two PRs brought Haystack's Anthropic and Ollama integrations up to the framework's v2.18.0 standard for structured reasoning. Every Haystack user enabling Claude's extended thinking or Ollama's reasoning mode now gets clean, typed access to reasoning data instead of digging through raw dictionaries. Haystack has 18,000+ GitHub stars and is used in production RAG systems by companies worldwide.

What I learned about multi-provider abstraction: The 8:1 complexity ratio between Anthropic and Ollama implementations for the same feature was eye-opening. It taught me that "provider abstraction" in AI frameworks isn't just about wrapping APIs — it's about normalizing fundamentally different streaming protocols into a unified interface. The real engineering challenge is in the mapping layer, not the business logic.

How this changed my approach to reading code: Before this contribution, I would read code linearly. Tracing Anthropic's 4-event-type streaming protocol forced me to develop a new pattern: map out the state machine first (what events exist, what transitions are valid), then read the code through that lens. I now draw state diagrams before touching any streaming code.