Feb 2026/10 min

Hunting Down a Memory Leak in LangChain's lru_cache

How Python's descriptor protocol creates new method objects on every access, causing @lru_cache to have 0% hit rate but 100% memory retention. The fix: 15 lines.

Pull Request Issue

PythonMemoryDebugging

The Problem

Users reported that LangChain's RunnableSequence caused memory leaks. Objects that should have been garbage collected were staying alive forever. In long-running applications (like web servers or agent loops), this caused memory usage to grow indefinitely.

A minimal reproduction:

python

class ThingWithRunnable:
    def __init__(self):
        self.data = list(range(1000))  # some data

    def call(self, inputs: dict) -> dict:
        return {"result": "ok"}

thing = ThingWithRunnable()
(thing.call | RunnablePassthrough()).invoke({})  # Use it once

del thing  # Delete the reference
gc.collect()  # Force garbage collection
# thing is STILL alive in memory!

Understanding the GC Architecture

To find the leak, I needed to understand Python's garbage collection and how lru_cache interacts with it.

Python uses reference counting as its primary GC mechanism. When the reference count of an object drops to zero, it's immediately freed. The cyclic garbage collector handles cycles (A references B, B references A), but reference counting handles the rest.

functools.lru_cache works by maintaining an internal dictionary that maps function arguments to cached results. The dictionary holds strong references to the keys (arguments) — this prevents them from being garbage collected as long as the cache exists.

Tracing the Leak

I searched the codebase for @lru_cache decorators on functions that might receive object references as arguments. I found this in libs/core/langchain_core/runnables/utils.py:

python

@lru_cache(maxsize=256)
def get_function_nonlocals(func: Callable) -> list[Any]:
    """Get the nonlocal variables accessed by a function."""
    # ... AST parsing logic

This function is called during RunnableSequence construction to analyze what variables a function closes over. The problem: when you pass a bound method like thing.call, the cache stores a strong reference to the method object.

The Subtle Part: Python's Descriptor Protocol

Here's where it gets interesting. In Python, bound methods are created dynamically via the descriptor protocol. Every time you access thing.call, Python creates a new method object that wraps thing and the underlying function:

python

thing = ThingWithRunnable()
m1 = thing.call
m2 = thing.call
print(m1 is m2)  # False! Different method objects each time

This means lru_cache on get_function_nonlocals has a 0% cache hit rate for bound methods — every call creates a new method object that's never seen before. But it has a 100% memory leak rate — each new method object (and its reference to thing) gets stored in the cache indefinitely.

The chain of references looks like:

lru_cache dict → method object → __self__ → thing (your object)
                                             ↓
                                         thing.data (your data)

Even after del thing, the cache dict still holds the chain alive.

The Fix

The fix is minimal — 15 lines of actual logic. I split the function into three parts:

python

def get_function_nonlocals(func: Callable) -> list[Any]:
    # Bound methods: skip cache entirely (never had hits anyway)
    if inspect.ismethod(func):
        return _get_function_nonlocals_impl(func)
    # Regular functions/lambdas: still use cache for performance
    return _get_function_nonlocals_cached(func)

@lru_cache(maxsize=256)
def _get_function_nonlocals_cached(func: Callable) -> list[Any]:
    return _get_function_nonlocals_impl(func)

def _get_function_nonlocals_impl(func: Callable) -> list[Any]:
    # ... original AST parsing logic (unchanged)

The design rationale:

Bound methods bypass the cache because they never benefited from it. Each obj.method access creates a new method object, so the cache key is always unique — zero hits, pure leak.
Regular functions and lambdas are module-level singletons. my_function always returns the same object, so caching actually helps — high hit rate, no leak risk.

Testing

I wrote two tests using weakref to detect memory leaks:

python

def test_bound_method_no_memory_leak():
    obj = MyObject()
    ref = weakref.ref(obj)  # Weak reference doesn't prevent GC

    get_function_nonlocals(obj.call)  # This should NOT leak

    del obj
    gc.collect()

    assert ref() is None, "Object was not garbage collected (memory leak)"

def test_bound_method_in_runnable_sequence_no_leak():
    thing = ThingWithRunnable()
    ref = weakref.ref(thing)

    (thing.call | RunnablePassthrough()).invoke({})  # Full integration test

    del thing
    gc.collect()

    assert ref() is None, "Object was not garbage collected (memory leak)"

The weakref.ref() trick is the standard way to test for memory leaks in Python. A weak reference doesn't contribute to the reference count, so if ref() returns None after gc.collect(), the object was properly collected.

Key Takeaways

`lru_cache` + bound methods = memory leak. This is a well-known Python gotcha, but it still catches people. Bound methods are created fresh on every attribute access, so they're effectively uncacheable.
Zero-hit caches are pure cost. Before adding @lru_cache, always verify that the arguments actually repeat. A cache that never hits is worse than no cache — it wastes memory and, in this case, creates leaks.
`weakref` + `gc.collect()` is the standard leak test pattern. If you suspect a leak, create a weak reference before the operation, delete the strong reference, force GC, and check if the weak reference is dead.

Impact & Reflection

Impact: This memory leak had been open for 6 months (issue #30667) and affected every LangChain user passing bound methods into RunnableSequence — a common pattern in production agent systems. The fix was 15 lines of logic but eliminated unbounded memory growth in long-running applications like web servers and agent loops.

What I learned about Python internals: Before this bug, I understood lru_cache at a surface level. Tracing through the descriptor protocol — understanding that obj.method creates a new object on every access — gave me a much deeper mental model of Python's object system. I now instinctively check: "Is this cache key actually stable?" before adding @lru_cache to any function.

What surprised me: The most counterintuitive part was that the cache had a 0% hit rate but nobody noticed. This taught me that performance optimizations can become invisible technical debt — a cache that "seems like a good idea" but never gets validated against real usage patterns can silently cause worse problems than no cache at all. Now I always instrument caches with hit/miss counters during development.

All Posts