Case Study
Learn how to perform data validation and simplify the partitioning of samples.
We'll cover the following...
In this chapter, we’ll continue developing elements of the case study. We want to explore some additional features of object-oriented design in Python. The first is what is sometimes called syntactic sugar, a handy way to write something that offers a simpler way to express something fairly complex. The second is the concept of a manager for providing a context for resource management.
In the previous chapter, we built an exception for identifying invalid input data. We used the exception for reporting when the inputs couldn’t be used.
Here, we’ll start with a class to gather data by reading the file with properly classified training and test data. In this chapter, we’ll ignore some of the exception-handling details so we can focus on another aspect of the problem: partitioning samples into testing and training subsets.
Input validation
The TrainingData
object is loaded from a source file of samples, named bezdekIris.data
. Currently, we don’t make a large effort to validate the contents of this file. Rather than confirm the data contains correctly formatted samples with numeric measurements and a proper species name, we simply create Sample
instances, and hope nothing goes wrong. A small change to the data could lead to unexpected problems in obscure parts of our application. By validating the input data right away, we can focus on the problems and provide a focused, actionable report back to the user. Something like “Row 42 has an invalid petal_length
value of 1b.25
” with the line of data, the column, and the invalid value.
A file with training data is processed in our application via the load()
method
of TrainingData
. Currently, this method requires an ...