Pickling

Next: The shelve module Up: Object Persistence: the pickle/cPickle Previous: Object Persistence: the pickle/cPickle Contents

Pickling

Consider a situation where information is stored in a file, and you've written a Python program to read that file and perform some operation on it. If the contents of the file change regularly, there is usually no recourse but to read the file each time you run your program. But if the contents of the file don't change very often, it may be worthwhile to store the data in a form which is easier for Python to read than plain text. The pickle module takes any Python object and writes it to a file in a format which Python can later read back into memory in an efficient way. On systems that support it, there will also be a cPickle module which is implemented in the C programming language and will generally much faster than the Python implementaion; I'll refer to the cPickle module in the examples that follow. Especially in the case where a large amount of processing needs to be done on a data set to create the Python objects you need, pickling can often make your programs run much faster. In the larger scheme of things, using a database may be a more appropriate solution to achieve persistence. But the pickling approach is very simple, and is adequate for many problems.

There are two steps to the pickling process: first, a Pickler object is created through a call to the Pickler function. You pass this function a file (or file-like) object, such as that returned by the built-in open function. (See Section 5.4). If you pass Pickler a second argument of 1, it will write the pickled object in a more efficient binary format, instead of the default human-readable format. Having created a Pickler object, you can now invoke the dump method to pickle the object of your choice. At this point, you can invoke the close method on the file object passed to Pickler, or you can let Python close the file when your program terminates.

As a simple example, let's consider a dictionary whose elements are dictionaries containing information about employees in a company. Of course, the real benefit of pickling comes when you are creating objects from some large, external data source, but this program will show the basic use of the module:

import cPickle,sys

employees = {
'smith':{'firstname':'fred','office':201,'id':'0001','phone':'x232'},
'jones':{'firstname':'sue','office':207,'id':'0003','phone':'x225'},
'williams':{'firstname':'bill','office':215,'id':'0004','phone':'x219'}}

try:
        f = open('employees.dump','w')
except IOError:
        print >>sys.stderr, 'Error opening employees.dump for write'
        sys.exit(1)

pkl = cPickle.Pickler(f,1)
pkl.dump(employees)

When the employee dictionary is needed in another program, all we need to do is open the file containing the pickled object, and use the load function of the cPickle module:

>>> import cPickle
>>> f = open('employees.dump','r')
>>> employees = cPickle.load(f)
>>> employees['jones']['office']
207
>>> employees['smith']['firstname']
'fred'

Pickling can offer an especially attractive alternative to databases when you've created a class which represents a complex data structure. (Section 10.4).

Next: The shelve module Up: Object Persistence: the pickle/cPickle Previous: Object Persistence: the pickle/cPickle Contents

Phil Spector 2003-11-12