Fixing Lost Rich Output in CAMEL's E2B Code Interpreter
An early-return chain silently dropped matplotlib plots and images. The fix: replace it with an accumulation pattern that collects output from all sources.
The Problem
CAMEL's E2BInterpreter is a sandboxed code execution environment powered by the e2b SDK. Users reported that when their agents generated matplotlib plots or other visual outputs, the results were silently lost — the interpreter returned only text output or an empty string.
For example, running:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.savefig("plot.png")
plt.show()Would return empty output, even though the plot was successfully generated in the sandbox.
Finding the Root Cause
I opened camel/interpreters/e2b_interpreter.py and found the run() method's output handling:
if execution.text and execution.text.lower() != "none":
return execution.text
if execution.logs:
if execution.logs.stdout:
return ",".join(execution.logs.stdout)
elif execution.logs.stderr:
return ",".join(execution.logs.stderr)
return str(execution.error)The problem is an early-return chain that only checks two places: execution.text and execution.logs. But the e2b SDK stores rich outputs (images, HTML, SVG) in a third location: execution.results. This field was completely ignored.
I verified this by reading the e2b SDK source — execution.results is a list of Result objects, each with methods like _repr_png_(), _repr_jpeg_(), _repr_svg_(), and _repr_html_(). This is the standard Jupyter-style _repr_*() protocol.
I also noticed two other bugs in the original code:
- stdout and stderr were mutually exclusive due to
elif— if both existed, stderr was lost - `str(execution.error)` when there's no error returned the string
"None"instead of something meaningful
The Fix
I replaced the early-return chain with an accumulation pattern that collects output from all sources:
output_parts = []
# 1. Text output
if execution.text and execution.text.lower() != "none":
output_parts.append(execution.text)
# 2. Rich outputs from execution.results
if execution.results:
for result in execution.results:
png_data = result._repr_png_()
if png_data:
output_parts.append(f"\n\n")
continue
jpeg_data = result._repr_jpeg_()
if jpeg_data:
output_parts.append(f"\n\n")
continue
svg_data = result._repr_svg_()
if svg_data:
output_parts.append(f"\n{svg_data}\n")
continue
html_data = result._repr_html_()
if html_data:
output_parts.append(html_data)
continue
# 3. Logs (both stdout AND stderr now)
if execution.logs:
if execution.logs.stdout:
output_parts.append(",".join(execution.logs.stdout))
if execution.logs.stderr: # Changed from elif to if
output_parts.append(f"[stderr] {','.join(execution.logs.stderr)}")
# 4. Combine all parts
if output_parts:
return "\n".join(output_parts)
# 5. Error handling (only when there's no output)
if execution.error:
return f"{execution.error.name}: {execution.error.value}\n{execution.error.traceback}"
return "Code executed successfully (no output)."Key design decisions:
- Images are returned as markdown image embeds with base64 data URIs. This means downstream consumers (chat UIs, notebooks) can render them directly without file I/O.
- Priority order for rich results: PNG > JPEG > SVG > HTML > text fallback.
- Stderr is prefixed with
[stderr]so users can distinguish it from stdout.
Key Takeaways
- Read the SDK documentation. The e2b SDK had a clear API for rich output (
execution.resultswith_repr_*()methods), but the original code simply didn't use it. - Early-return chains can hide bugs. The
return execution.textat the top meant everything below was dead code when text output existed. - Accumulate, don't early-return. When output can come from multiple sources, collect everything first and combine at the end.
Impact & Reflection
Impact: CAMEL is one of the leading multi-agent frameworks with 10,000+ GitHub stars. This fix restored a core capability — visual output from code execution — that was completely broken. Any agent workflow involving data visualization (matplotlib plots, chart generation, image processing) was silently returning empty strings. The fix also improved stdout/stderr handling for all E2B code execution, not just rich output.
What I learned about reading unfamiliar codebases: This was my first contribution to CAMEL, and I had to understand their interpreter abstraction layer quickly. The key was comparing the broken E2BInterpreter against the working JupyterKernelInterpreter in the same codebase — the Jupyter version already handled execution.results correctly. This "find a working sibling" pattern has become my go-to strategy when reading new codebases: before debugging broken code, find similar code that works and diff the approaches.
A pattern I now watch for: Early-return chains that silently swallow data are a code smell I actively look for during code review. The pattern if A: return A; if B: return B; return C is fine when the cases are mutually exclusive, but dangerous when they can overlap. I now default to the accumulation pattern (collect all outputs, combine at the end) whenever multiple data sources might produce results simultaneously.