Serializing Objects Using JSON
Learn how to serialize objects using JSON methods.
We'll cover the following
Overview
There are many formats that have been used for text-based data exchange over the years. Extensible Markup Language (XML) is popular, but the files tend to be large. Yet Another Markup Language (YAML) is another format that you may see referenced occasionally. Tabular data is frequently exchanged in the Comma-Separated Value (CSV) format. Many of these are fading into obscurity and there are many more that you will encounter over time. Python has solid standard or third-party libraries for all of them.
Before using such libraries on untrusted data, make sure to investigate security concerns with each of them. XML and YAML, for example, both have obscure features that, used maliciously, can allow arbitrary commands to be executed on the host machine. These features may not be turned off by default. Even something as simple-seeming as a ZIP file or a JPEG image can be hacked to create a data structure that can crash a web server.
JSON format
JavaScript Object Notation (JSON) is a human-readable format for exchanging data. JSON is a standard format that can be interpreted by a wide array of heterogeneous client systems. This means JSON is extremely useful for transmitting data between completely decoupled systems. The JSON format does not have any support for executable code; because only data can be serialized, it is more difficult to inject malicious content.
Because JSON can be easily interpreted by JavaScript engines, it is often used for transmitting data from a web server to a JavaScript-capable web browser. If the web application serving the data is written in Python, the server needs a way to convert internal data into the JSON format.
Object serializing using json
module
There is a module to do this, predictably named json
. This module provides a similar interface to the pickle
module, with dump()
, load()
, dumps()
, and loads()
functions. The default calls to these functions are nearly identical to those in pickle, so let’s not repeat the details. There are a couple of differences: obviously, the output of these calls is valid JSON notation, rather than a pickled object. In addition, the json
functions operate on str
objects, rather than bytes. Therefore, when dumping to or loading from a file, we need to create text files rather than binary ones.
The JSON serializer is not as robust as the pickle
module; it can only serialize basic types such as integers, floats, and strings, and simple containers such as dictionaries and lists. Each of these has a direct mapping to a JSON representation, but JSON is unable to represent objects unique to Python like class or function definitions.
Generally, the json
module’s functions try to serialize the object’s state using the value of the object’s __dict__
attribute. A better approach is to supply custom code to serialize an object’s state into a JSON-friendly dictionary. We also want to go the other way: deserializing a JSON dictionary to recover a Python object’s state.
In the json
module, both the object encoding and decoding functions accept optional arguments to customize the behavior. The dump()
and dumps()
functions accept a poorly named cls keyword argument. (It’s short for class
, which we have to spell funny because class
is a reserved keyword.) If this argument value
is provided to the function, it should be a subclass of the JSONEncoder
class, with the default()
method overridden. This overridden default()
method accepts an arbitrary Python object and converts it to a dictionary that json can serialize. If it doesn’t know how to process the object, we should call the super()
method, so that it can take care of serializing basic types in the normal way.
The load()
and loads()
methods also accept such a cls
argument that can be a subclass of the inverse class, JSONDecoder
. However, it is normally sufficient to pass a function into these methods using the object_hook
keyword argument. This function accepts a dictionary and returns an object; if it doesn’t know what to do with the input dictionary, it can return it unmodified.
Example
Let’s look at an example. Imagine we have the following simple contact class that we want to serialize:
Get hands-on with 1300+ tech skills courses.