Reading CSV Files

Learn how to read CSV files using various methods in Python while adhering to the DRY principle.

Using form_dict() method

We’ll provide a common template for creating objects from CSV source data. The idea is to leverage the from_dict() methods of the various classes to create the objects our application uses:

Press + to interact
class TrainingData:
def __init__(self, name: str) -> None:
self.name = name
self.uploaded: datetime.datetime
self.tested: datetime.datetime
self.training: list[TrainingKnownSample] = []
self.testing: list[TestingKnownSample] = []
self.tuning: list[Hyperparameter] = []
def load(self, raw_data_iter) -> None:
for n, row in enumerate(raw_data_iter):
try:
if n % 5 == 0:
test = TestingKnownSample.from_dict(row)
self.testing.append(test)
else:
train = TrainingKnownSample.from_dict(row)
self.training.append(train)
except InvalidSampleError as ex:
print(f"Row {n+1}: {ex}")
return
self.uploaded = datetime.datetime.now(tz = datetime.timezone.utc)

Constructing the load() method

The load() method is partitioning the samples into testing and training subsets. It expects an iterable source of dict[str, str] objects, which are produced by a csv.DictReader object.

The user experience implemented here is to report the first failure and return. This might lead to an error message like the following:

Press + to interact
text Row 2: invalid species in {'sepal_length': 7.9, 'sepal_width':
3.2, 'petal_length': 4.7, 'petal_width': 1.4, 'species': 'Buttercup'}

This message has all the required information, but may not be as helpful as desired. We ...