Zero-Copy

Learn about Zero-Copy in Python.

Often programs have to deal with an enormous amount of data in the form of large arrays of bytes. Handling such a massive amount of data in strings can be very ineffective once you start manipulating it through copying, slicing and modifying.

Memory profiler

Let’s consider a small program that reads a large file of binary data, and partially copies it into another file. To examine our memory usage, we will use memory_profiler, a nice Python package that allows us to see the memory usage of a program line by line.

To run the below code, click on the Run button and use command python -m memory_profiler memoryview-copy.py to run the memory_profiler.

@profile
def read_random():
    with open("/dev/urandom", "rb") as source:
        content = source.read(1024 * 10000)
        content_to_write = content[1024:]
    print("Content length: %d, content to write length %d" %
          (len(content), len(content_to_write)))
    with open("/dev/null", "wb") as target:
        target.write(content_to_write)

if __name__ == '__main__':
    read_random()
Using memory_profiler
  • In line 4, we are reading 10 MB from /dev/urandom and not doing much with it. Python needs to allocate around 10 MB of memory to store this data as a string.

  • In line 5, we copy the entire block of data minus the first kilobyte – because we won’t be writing those first 1024 bytes to the target file.

What is interesting in this example is that, as you can see, the memory usage of the program is increased by about 10 MB when building the variable content_to_write. In fact, the slice operator is copying the entirety of content, minus the first KB, ...