Department of Engineering

IT Services

Python memory usage

Usually you don't need to worry about how much memory your python programs use, but efficient memory usage ("reducing the memory footprint") is important when Big Data is involved. When you start scaling up a little program (making the array sizes orders of magnitude bigger) you may hit some memory limits and experience "resource starvation".

The amount of memory a program uses varies during execution. On most types of computers it never shrinks. Whenever you create a list or numpy array, your process grows (unless it can recycle used memory). When Python knows that an array/list is no longer needed, it will perform "garbage collection" and recycle the memory, but if for some reason it can't recycle the memory and you repeatedly create array/lists you may eventually run out of space.

Local variables (non-global variables created within a function) are easy for "garbage collection" to deal with - when a function ends, all the memory associated with its variables can be recycled. But in some other situations, "garbage collection" isn't so easy. This stackoverflow page shows some ways that you might be leaking memory. The packages you use may be leaking memory too.

This stackoverflow page mentions some utilities that might help you discover where you're using up memory. Diagnosing Memory "Leaks" in Python may help too. You can get a fair idea of how big your processes might become by calculating the size of your biggest arrays, but running experiments is best. Here are 2 simple examples showing how to insert lines in your code to diagnose leakage on Unix machines.

import resource

print ('Memory usage before x created: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
x=[0]*100000
print ('Memory usage after  x created: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
del x
print ('Memory usage after  x deleted: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
y=[0]*100000
print ('Memory usage after  y created: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
z=[0]*100000
print ('Memory usage after  z created: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

When I run this I get

Memory usage before x created: 7324 (kb)
Memory usage after  x created: 8108 (kb)
Memory usage after  x deleted: 8108 (kb)
Memory usage after  y created: 8108 (kb)
Memory usage after  z created: 8660 (kb)

x, y, and z are about 400k long. Note that creating x makes the process bigger. Deleting x doesn't shrink the process, but it does mean that when y is created, it re-uses the memory that x used. Creating z makes the process grow again.

The next example shows what happens when functions are involved.

import resource

def eatmemory():
  x=[0]*100000
  print ('Memory usage after  x created in eatmemory: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

print ('Memory usage initially: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
for i in range(3):
  eatmemory()
print ('Memory usage finally: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

Output should be something like

Memory usage initially: 6620 (kb)
Memory usage after  x created in eatmemory: 7404 (kb)
Memory usage after  x created in eatmemory: 7404 (kb)
Memory usage after  x created in eatmemory: 7404 (kb)
Memory usage finally: 7404 (kb)

Note how the memory used when eatmemory is first called is recycled for use in subsequent calls. This is the kind of behaviour you want. In some languages it's easy to write functions that don't re-use memory. Each time they're called, the process grows until finally there's no memory left. Such a program is said to have a "memory leak".

The following (artificial!) program doesn't really have a memory leak but it does show how recursion can add to memory problems.

import resource

def eatmemory(i):
  x=[0]*100000
  print ('Memory usage after  x created in eatmemory: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
  if i==0:
     return
  else:
     return eatmemory(i-1)

print ('Memory usage initially: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
eatmemory(5)
print ('Memory usage finally: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

When you run this you should get something like

Memory usage initially: 6608 (kb)
Memory usage after  x created in eatmemory: 7392 (kb)
Memory usage after  x created in eatmemory: 8184 (kb)
Memory usage after  x created in eatmemory: 8976 (kb)
Memory usage after  x created in eatmemory: 9504 (kb)
Memory usage after  x created in eatmemory: 10296 (kb)
Memory usage after  x created in eatmemory: 11088 (kb)
Memory usage finally: 11088 (kb)

At the end, all the memory can be recycled, but while recursion is happening many x lists are in use.