Overview
We noted that our mypy type hint doesn’t really guarantee the JSON document is—in any way—what we expected. There is a package in the Python Package Index that can be used for this. The jsonschema
package lets us provide a specification for a JSON document, and then confirm whether or not the document meets the specification.
The JSON Schema validation is a runtime check, unlike the mypy type hint. This means using validation will make our program slower. It can also help to diagnose subtly incorrect JSON documents. For details, visit https://json-schema.org. This is evolving toward standardization, and there are several versions of compliance checking available.
We’ll focus on newline-delimited JSON. This means we need a schema for each sample document within the larger collection of documents. This kind of additional validation might be relevant when receiving a batch of unknown samples to classify. Before doing anything, we’d like to be sure the sample document has the right attributes.
A JSON Schema document is also written in JSON. It includes some metadata to help clarify the purpose and meaning of the document. It’s often a little easier to create a Python dictionary with the JSON Schema definition.
Example
Here’s a candidate definition for the Iris schema for an individual sample:
Get hands-on with 1400+ tech skills courses.