...
/Creating Samples From CSV Files
Creating Samples From CSV Files
Learn how to create samples with the help of a CSV file and raise exceptions for bad data.
CSV (Comma-Separated Values) can be used to define the rows of a spreadsheet. Within each row, the cell values are represented as text, separated by commas. When this data is parsed by Python’s csv
module, each row can be represented by a dictionary where the keys are the column names and the values are the cell values from a particular row.
For example, a row might look like this:
row = {"sepal_length": "5.1", "sepal_width": "3.5", "petal_length": "1.4", "petal_width": "0.2","species": "Iris-setosa"}
The csv
module’s DictReader
class provides an iterable sequence of dict[str, str]
row instances. We need to transform these raw rows into instances of one of the subclasses of Sample
, if all of the features have valid string values. If the raw data isn’t valid, then we need to raise an exception.
KnownSample
class
Given rows like the example above, here’s a method that will translate from the dictionary to a more useful object. This is part of the KnownSample
class:
@classmethoddef from_dict(cls, row: dict[str, str]) -> "KnownSample":if row["species"] not in {"Iris-setosa", "Iris-versicolour", "Iris-virginica"}:raise InvalidSampleError(f"invalid species in {row!r}")try:return cls(species = row["species"],sepal_length = float(row["sepal_length"]),sepal_width = float(row["sepal_width"]),petal_length = float(row["petal_length"]),petal_width = float(row["petal_width"]),)except ValueError as ex:raise InvalidSampleError(f"invalid {row!r}")
The from_dict()
method makes a check of the species value, raising an exception if it’s not valid. It attempts to create a row, applying the float()
function to convert various measurements from ...