2014-11-20

Page
edited by
Matthew J. Levine

Changes between revision 2
and revision 3:

...

h3. {color:#0000ff}Python's Memory Management Model{color}

Let's talk about Python. We've already discussed Ruby's memory management [here|~mleblanc:Ruby Memory] and C's memory management [here|~mleblanc:Memory Management in C]. Memory management in C is especially relevant if you're using Cython, because the python garbage collector will be calling C's underlying malloc and free. We're generally not as concerned with low-level performance like that in Python, but we'll discuss the implication briefly anyways.

While implementations of the Python runtime will have slight differences, what we claim here should be generally true. We'll base our low-level claims on the 2010 reference compiler by Schemenauer and our higher level claims on the Python Reference (2014). Note that Schemenauer wrote the garbage collection service for Python (at least starting in Python 2, when cyclic detection was introduced), so his reference should be a decent springboard. We assume a C-based Python environment (as opposed to say, a Java-based one, or a custom Python environment).

h3. *Python Memory Leaks*

...

A memory leak is when a piece of memory is still reserved by the system for the application, but is inaccessible to the user. See our description of C memory for how these leaks can bring your entire computer to a grinding halt, as requests for new memory allocation must go to more distant memory locations (like disk) because the application thinks that the best spots are still in use.

Before Python 2.0, creating a memory leak was trivial. Consider an example from the release of Python 2
{code:language=python|title=Python 1.6 Memory Leak}instance = SomeClass()
instance.myself = instance
{code}

Since Python 2 only used a reference-counting model, this basic cycle was undetectable and would result in a leak*.

_Note_: For the sake of our discussion, we'll assume you aren't using any custom C extensions. Yes, C has potential memory leaks - it's trivial to say that you can get a memory leak in Python by getting a memory leak in C code written into Python.

After Python 2, memory leaks become subtle, but still are lurking in the shadows.

...

h5. 1. Memory Leaks Hiding in Default Parameters

Recall that *functions in Python are first class objects*. What does this mean in practice? Typically nothing. You can assign them to variables, you can add fields to them, but you can also hide memory leaks in the arguments\! Consider:
{code:language=python|title=Leaky Parameters}function memory_leaker( leaky_arg = [] ):
arg.append("LEAK!")
return arg

memory_leaker()
> ["Leak!"]
memory_leaker()
> ["Leak!","Leak!"]{code}
Notice what happens - the number of leaks increases each time the function is called\! That's because the argument is bound to the function when the definition occurs. This only makes sense when considering that the function is a first class object. Even if a reference to the function is lost, the function definition will be maintained with its default arguments, which may have grown exponentially.

h5. 2. Memory Leaks Hiding in Closures

Recall from our previous discussion what closures are... or don't we'll show you again here. A closure occurs when a scope captures a state for use later instead of recomputing it a runtime. We can hide a memory leak inside a closure without knowing it. Consider:
{code:language=python|title=Memory Leak Through Closure}def multiplyX( x ):
""" Returns a function which multiplies its input by X """
def helper( y ):
return x * y
return helper( x )

multiply3 = multiplyX( 3 )
multiply3( 4 )
> 12
multiply3( 5 )
> 15
# Note that the original 3 was "captured" and can re-used without offering it recurrently to the function{code}
No leak here\! Looks like I was wrong... or was I*\*
{code:language=python|title=Closure Memory Leak (2)}def multiplyX( x ):
""" A FAR worse implementation of the original multiplyX function """
def helper_evil(y):
l_malicious = l
return x * y
return helper_evil( x )

l = [ 0 for i in range(1000) ]
multiply3 = multiplyX( 3 ){code}

h5. 3. Memory Leaks Through Finalizers

There's a mysterious line the Python garbage collector that reads
{code:language=java|title=Mysterious GC Line }move_finalizer_reachable(&finalizers);{code}
This line directs the garbage collector to take any unreachable Python object that have a finalizer and move them into the "reachable" list. Now consider a modified version of the example from *1.*:
{code:language=python|title=Cyclic Memory Leak Python 2+}class CyclicLeaker:
def __init__(self):
self.__cycle = self

def __del__(self):
pass{code}

Clearly, the class declares a cycle in its constructor, but because it has a finalizer, it can't undergo the cycle checking, and causes a memory leak. This is because the finalizer actions occur (along with debugging, if enabled), after the garbage is cleared. Otherwise, finalizers could attempt to create there own new references, which would cause concurrent modification in the garbage collection process.

h5. 4. Memory Leak Through Tracebacks

Here's a really really good idea
{code:language=python|title=A Really Really Good Idea}Class Foo:
def foo():
try:
open("non-existant-file.txt","w")
except Exception as t:
this.really_good_idea = t.traceback.extract_tb(){code}
Because the traceback stores a reference to the stack frame, and the stack frame, of course, stores a reference to the traceback, this causes a pretty complex memory leak that GC is not guaranteed to find.

h3. Avoiding Memory Leaks

Suppose I want my object to reference itself, and have a finalizer. It doesn't seem like a generally good design choice, but it's bound to come up... what can I do. One option would be to use a managerie of wrapper objects to make sure the reference is cyclical, but that the finalizer never falls on the cycle-holder. This is definitely a recipe for catastrophically unmanageable code.

*Option 1: Weak References*

Python allows for an option called "weak references". Weak references allow a user to hold link to an instance, but not increase its reference count.

*Option 2: Manual GC*

The gc instance is public and accessible. You can manually call garbage collection - or even disable automatic collection and _only_ call it manually

*{+}Footnotes{+}*

...

\*It would result in a leak unless you were very careful. By manually deleting the reference to the cycle before deleting the reference to the instance, you could find salvation. You could also manipulate weakreferences to that end, I think, though those weren't introduced until 2.8 at which point you already had cycle detection

\**BUM BUM BUM

\**\* From python's point of view, it's still technically usable. I suppose if we did enough digging we could probably recover it somehow, but to anyone but the preposterously determined user, it's gone forever

View Changes Online
View All Revisions |
Revert To Version 2

Python's Memory Management Model

Let's talk about Python. We've already discussed Ruby's memory management [here] and C's memory management [here]. Memory management in C is especially relevant if you're using Cython, because the python garbage collector will be calling C's underlying malloc and free. We're generally not as concerned with low-level performance like that in Python, but we'll discuss the implication briefly anyways.

While implementations of the Python runtime will have slight differences, what we claim here should be generally true. We'll base our low-level claims on the 2010 reference compiler by Schemenauer and our higher level claims on the Python Reference (2014). Note that Schemenauer wrote the garbage collection service for Python (at least starting in Python 2, when cyclic detection was introduced), so his reference should be a decent springboard. We assume a C-based Python environment (as opposed to say, a Java-based one, or a custom Python environment).

Python Memory Leaks

A memory leak is when a piece of memory is still reserved by the system for the application, but is inaccessible to the user. See our description of C memory for how these leaks can bring your entire computer to a grinding halt, as requests for new memory allocation must go to more distant memory locations (like disk) because the application thinks that the best spots are still in use.

Before Python 2.0, creating a memory leak was trivial. Consider an example from the release of Python 2

Python 1.6 Memory Leak

Since Python 2 only used a reference-counting model, this basic cycle was undetectable and would result in a leak*.

Note: For the sake of our discussion, we'll assume you aren't using any custom C extensions. Yes, C has potential memory leaks - it's trivial to say that you can get a memory leak in Python by getting a memory leak in C code written into Python.

After Python 2, memory leaks become subtle, but still are lurking in the shadows.

1. Memory Leaks Hiding in Default Parameters

Recall that functions in Python are first class objects. What does this mean in practice? Typically nothing. You can assign them to variables, you can add fields to them, but you can also hide memory leaks in the arguments! Consider:

Leaky Parameters

Notice what happens - the number of leaks increases each time the function is called! That's because the argument is bound to the function when the definition occurs. This only makes sense when considering that the function is a first class object. Even if a reference to the function is lost, the function definition will be maintained with its default arguments, which may have grown exponentially.

2. Memory Leaks Hiding in Closures

Recall from our previous discussion what closures are... or don't we'll show you again here. A closure occurs when a scope captures a state for use later instead of recomputing it a runtime. We can hide a memory leak inside a closure without knowing it. Consider:

Memory Leak Through Closure

No leak here! Looks like I was wrong... or was I**

Closure Memory Leak (2)

3. Memory Leaks Through Finalizers

There's a mysterious line the Python garbage collector that reads

Mysterious GC Line

This line directs the garbage collector to take any unreachable Python object that have a finalizer and move them into the "reachable" list. Now consider a modified version of the example from 1.:

Cyclic Memory Leak Python 2+

Clearly, the class declares a cycle in its constructor, but because it has a finalizer, it can't undergo the cycle checking, and causes a memory leak. This is because the finalizer actions occur (along with debugging, if enabled), after the garbage is cleared. Otherwise, finalizers could attempt to create there own new references, which would cause concurrent modification in the garbage collection process.

4. Memory Leak Through Tracebacks

Here's a really really good idea

A Really Really Good Idea

Because the traceback stores a reference to the stack frame, and the stack frame, of course, stores a reference to the traceback, this causes a pretty complex memory leak that GC is not guaranteed to find.

Avoiding Memory Leaks

Suppose I want my object to reference itself, and have a finalizer. It doesn't seem like a generally good design choice, but it's bound to come up... what can I do. One option would be to use a managerie of wrapper objects to make sure the reference is cyclical, but that the finalizer never falls on the cycle-holder. This is definitely a recipe for catastrophically unmanageable code.

Option 1: Weak References

Python allows for an option called "weak references". Weak references allow a user to hold link to an instance, but not increase its reference count.

Option 2: Manual GC

The gc instance is public and accessible. You can manually call garbage collection - or even disable automatic collection and only call it manually

Footnotes

*It would result in a leak unless you were very careful. By manually deleting the reference to the cycle before deleting the reference to the instance, you could find salvation. You could also manipulate weakreferences to that end, I think, though those weren't introduced until 2.8 at which point you already had cycle detection

**BUM BUM BUM

*** From python's point of view, it's still technically usable. I suppose if we did enough digging we could probably recover it somehow, but to anyone but the preposterously determined user, it's gone forever

View Online

Show more