Creating Samples From CSV Files

Learn how to create samples with the help of a CSV file and raise exceptions for bad data.

CSV (Comma-Separated Values) can be used to define the rows of a spreadsheet. Within each row, the cell values are represented as text, separated by commas. When this data is parsed by Python’s csv module, each row can be represented by a dictionary where the keys are the column names and the values are the cell values from a particular row.

For example, a row might look like this:

Press + to interact
row = {"sepal_length": "5.1", "sepal_width": "3.5", "petal_length": "1.4", "petal_width": "0.2",
"species": "Iris-setosa"}

The csv module’s DictReader class provides an iterable sequence of dict[str, str] row instances. We need to transform these raw rows into instances of one of the subclasses of Sample, if all of the features have valid string values. If the raw data isn’t valid, then we need to raise an exception.

KnownSample class

Given rows like the example above, here’s a method that will translate from the dictionary to a more useful object. This is part of the KnownSample class:

Press + to interact
@classmethod
def from_dict(cls, row: dict[str, str]) -> "KnownSample":
if row["species"] not in {
"Iris-setosa", "Iris-versicolour", "Iris-virginica"}:
raise InvalidSampleError(f"invalid species in {row!r}")
try:
return cls(
species = row["species"],
sepal_length = float(row["sepal_length"]),
sepal_width = float(row["sepal_width"]),
petal_length = float(row["petal_length"]),
petal_width = float(row["petal_width"]),
)
except ValueError as ex:
raise InvalidSampleError(f"invalid {row!r}")

The from_dict() method makes a check of the species value, raising an exception if it’s not valid. It attempts to create a row, applying the float() function to convert various measurements from ...