Reduce the memory usage when loading a file in pandas

import numpy as np
import pandas as pd
import os
# At first, we create a dataset with 20000 rows and 10 columns.
# Meanwhile, we assign 10 column names for these 10 columns.
d = np.random.randint(0, 20, size=(20000, 10))
df = pd.DataFrame(d,
                  columns=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"])
# Export this dataset to a csv file with sep=`\t` and without index.
df.to_csv("output/raw.csv", sep='\t', index=False)
# At first, we load all columns from this file.
# Then print the information of this dataframe object.
full_df = pd.read_csv("output/raw.csv", sep='\t')
print(full_df.info())
print("----------------------------------------------------------------")
# Then, we load this file again, but with only 3 fields.
# Then print the information of this dataframe object again.
less_df = pd.read_csv("output/raw.csv", sep='\t', usecols=["a", "b", "c"])
print(less_df.info())
os.remove("output/raw.csv")

import numpy as np
import pandas as pd
import os
# At first, we create a dataset with 20000 rows and 10 columns.
# Meanwhile, we assign 10 column names for these 10 columns.
d = np.random.randint(0, 20, size=(20000, 10))
df = pd.DataFrame(d,
                  columns=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"])
# Export this dataset to a csv file with sep=`\t` and without index.
df.to_csv("output/raw.csv", sep='\t', index=False)
# At first, we load all columns from this file.
# Then print the information of this dataframe object.
full_df = pd.read_csv("output/raw.csv", sep='\t')
print(full_df.info())
print("----------------------------------------------------------------")
# Specify data type for each column.
# The key is column name, value is data type
dtype = {
    "a": 'uint8',
    "b": 'uint8',
    "c": 'uint8',
    "d": 'uint8',
    "e": 'uint8',
    "f": 'uint8',
    "9": 'uint8',
    "h": 'uint8',
    "i": 'uint8',
    "j": 'uint8'
}
# Then, we load this file again, but specify the data type for each column.
# Then print the information of this dataframe object again.
less_df = pd.read_csv("output/raw.csv", sep='\t', dtype=dtype)
print(less_df.info())
os.remove("output/raw.csv")

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design

Reduce the memory usage when loading a file in pandas

Reduce memory by loading selected columns

Reduce memory by specifying column types