- Model Application

Using logistic regression in MLlib to fit the training dataframe.

Logistic regression

Now that we have prepared our training and test datasets, we can use the logistic regression algorithm provided by MLlib to fit the training dataframe.

  1. We first create a logistic regression object and define the columns to use as labels and features.
  2. Next, we use the fit function to train the model on the training dataset.
  3. In the last step in the snippet below, we use the transform function to apply the model to our test dataset.
Press + to interact
from pyspark.ml.classification import LogisticRegression
# specify the columns for the model
lr = LogisticRegression(featuresCol='features', labelCol='label')
# fit on training data
model = lr.fit(trainVec)
# predict on test data
predDF = model.transform(testVec)

Results

The resulting dataframe now has a probability column, as shown in the table below. This column is a 22 ...