How to drop multiple columns from a PySpark DataFrame

Overview

The drop() method in PySpark drops one or more columns of the DataFrame or dataset.

Syntax

dataframe.drop(*cols)

Parameters

cols - These are the columns to be removed.

Return value

The method returns a new DataFrame after deleting the specified columns.

Example

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('edpresso').getOrCreate()
data = [("James","Smith","USA","CA"),
    ("Michael","Rose","USA","NY"),
    ("Robert","Williams","USA","CA"),
    ("Maria","Jones","USA","FL")
  ]
columns = ["firstname","lastname","country","state"]
df = spark.createDataFrame(data = data, schema = columns)
print("Initial dataframe")
df.show(truncate=False)
cols_to_remove = ["country", "firstname"]
new_df = df.drop(*cols_to_remove)
print("-" * 8)
print("Dataframe after removing the columns")
new_df.show(truncate=False)

Explanation

Line 4: A spark session with the app’s Educative Answers is created.
Lines 6–10: We define data for the DataFrame.
Line 12: The columns of the DataFrame are defined.
Line 13: A DataFrame is created using the createDataframe() method.
Lines 14–15: The original or initial DataFrame is printed.
Line 17: The columns to be removed are defined as cols_to_remove.
Line 19: The columns are dropped by invoking the drop() method and passing the cols_to_remove parameter.
Line 24: The new DataFrame with the columns removed is printed.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design