- Model Application
Using logistic regression in MLlib to fit the training dataframe.
We'll cover the following...
Logistic regression
Now that we have prepared our training and test datasets, we can use the logistic regression algorithm provided by MLlib to fit the training dataframe.
- We first create a logistic regression object and define the columns to use as labels and features.
- Next, we use the
fit
function to train the model on the training dataset. - In the last step in the snippet below, we use the
transform
function to apply the model to our test dataset.
Press + to interact
from pyspark.ml.classification import LogisticRegression# specify the columns for the modellr = LogisticRegression(featuresCol='features', labelCol='label')# fit on training datamodel = lr.fit(trainVec)# predict on test datapredDF = model.transform(testVec)
Results
The resulting dataframe now has a probability
column, as shown in the table below. This column is a ...