Using binary data instead of text files in Python

It’s been a while since we’ve posted anything here. Sorry about that. It has just been too much to do lately to find the time to write new posts.

This post is also arriving a bit late to be useful for anyone this semester, but what it shows might be useful for others. Or maybe for someone taking the FYS1120 course at a later time. So I’ll post it here still.

In mandatory exercise 2 we needed to load a huge file containing data from an AM radio signal. This was available a Matlab file and as a text file with all the values on each row. Those of us using Python realized quite quickly that most of the time spent on this exercise went to load the data before performing any computations on the data.

However, in proper hindsight, I thought it might have been a better idea to save the file as binary data instead of as a text file. After all, binary data saves space and usually is a bit quicker to load.

Doing this in NumPy is extremely simple. Just load the text file data and save it back in NumPy’s binary format:

from numpy import *
data = loadtxt("input_data.txt")
save("binary_data.npy", data)

Now, loading the file in your application is just as simple:

from numpy import *
data = load("binary_data.npy")

The time you’ll save on doing this for large data sets is extreme. Loading the data set we recieved in mandatory exercise 2 as a text file took me about 2 minutes and 21 seconds, while loading the data from the binary file format took only 0.06 seconds! Yes, that is 60 milliseconds.

On top of this, loading the data as text using “loadtxt” practically wasted 2 GB of memory on my computer, while loading it as a binary file used only about one hundred megabytes; just about the size of the binary data file.

Now I just wish I thought of this a couple of weeks earlier. But I guess that is what hindsight is for.

Leave a Reply

Your email address will not be published.