Quiz: Predicting Diabetes Using PySpark MLlib
Learn how to build a predictive model to detect diabetes using PySpark MLlib.
Task 1: Load the diabetes prediction data into a PySpark DataFrame
To commence, create a SparkSession
as previously learned. Utilize it to load the data into a PySpark DataFrame and display the initial rows.
Press + to interact
main.py
diabetes.csv
# Import librariesfrom pyspark.sql import SparkSession# Create a SparkSessionspark =# Write code to read "diabetes.csv" that is available in your path in a variable called 'diabetes_df'. Make sure you include options 'header=T' and 'inferSchema=True'diabetes_df =# Write code to display the initial rows of the dataframe# Write code to check the column types of diabetes_df
Task 2: Data preprocessing and EDA
In the data preprocessing task, we’ll ...