Classification Using Multiple Machine Learning Models

Learn how classification is done with support vector machines, random forests, and multilayer perceptrons.

We will show here how to apply three different types of machine learning classifiers using sklearn implementations, that of a support vector classifier (SVC) (\text {SVC}) , a random forest classifier (RFC) (\text {RFC}) , and a multilayer perceptron (MLP) (\text {MLP}) . We, therefore, concentrate on the mechanisms and will discuss what is behind these classifiers using the classic example of the iris flowers dataset that we discussed in the previous chapter to demonstrate how to read data into NumPy arrays.

We will start with the SVC \text {SVC} , which is a support vector machine (SVM) (\text {SVM}) . SVM was the original common name for this technique because it originated for classification problems. Later, the techniques were generalized to support vector regression, and we follow here the abbreviation used in sklearn. The sklearn implementation is actually a wrapper for the SVMLIB \text {SVMLIB} implementation by Chih-Chung Chang and Chih-Jen Lin that has been very popular for classification applications. Later in this chapter, we’ll describe more of the math and tricks behind this method but for now, we’ll use it to demonstrate the mechanics of applying this method.

We will apply this machine learning technique of a classifier to the iris dataset in the program IrisClassificationSklearn.ipynb. The program starts as usual by importing the necessary libraries. We then import the data similar to the program discussed in the previous chapter. We choose here to split the data into a training set and a test set by using every second data point as a training point and every other as a test point. This is accomplished with the index specifications 0:-1:2, which is a list that starts at index 0, iterates until the end, specified by index -1, and uses a step of 2. Since the data is ordered and well balanced in the original data file, this will leave us with a balanced dataset.

Get hands-on with 1400+ tech skills courses.