Reading CSV Files
Learn how to read CSV files using various methods in Python while adhering to the DRY principle.
We'll cover the following...
Using form_dict()
method
We’ll provide a common template for creating objects from CSV source data. The idea is to leverage the from_dict()
methods of the various classes to create the objects our application uses:
Press + to interact
class TrainingData:def __init__(self, name: str) -> None:self.name = nameself.uploaded: datetime.datetimeself.tested: datetime.datetimeself.training: list[TrainingKnownSample] = []self.testing: list[TestingKnownSample] = []self.tuning: list[Hyperparameter] = []def load(self, raw_data_iter) -> None:for n, row in enumerate(raw_data_iter):try:if n % 5 == 0:test = TestingKnownSample.from_dict(row)self.testing.append(test)else:train = TrainingKnownSample.from_dict(row)self.training.append(train)except InvalidSampleError as ex:print(f"Row {n+1}: {ex}")returnself.uploaded = datetime.datetime.now(tz = datetime.timezone.utc)
Constructing the load()
method
The load()
method is partitioning the samples into testing and training subsets. It expects an iterable source of dict[str, str]
objects, which are produced by a csv.DictReader
object.
The user experience implemented here is to report the first failure and return. This might lead to an error message like the following:
Press + to interact
text Row 2: invalid species in {'sepal_length': 7.9, 'sepal_width':3.2, 'petal_length': 4.7, 'petal_width': 1.4, 'species': 'Buttercup'}
This message has all the required information, but may not be as helpful as desired. We ...