Introduction to DataFrames

Explore the basics of the DataFrame data structure in pandas.

In pandas, the two-dimensional counterpart to the one-dimensional Series is the DataFrame. If we want to understand this data structure, we should start by looking at how it’s constructed.

Database and spreadsheet analogues

If we think of a DataFrame as row-oriented, the interface will feel wrong. Many tabular data structures are row-oriented. Perhaps this is due to how spreadsheets and CSV files are dealt with on a row by row basis. Perhaps it’s due to the many OLTP databases that are row-oriented out of the box. A DataFrame is often used for analytical purposes and is better understood when thought of as column-oriented, where each column is a Series.

Note: In practice, many highly optimized analytical databases (those used for OLAP cubes) are also column-oriented. Laying out the data in a columnar manner can improve performance and requires fewer resources. Columns of a single type can be compressed easily. Performing analysis on a column requires loading only that column, whereas a row-oriented database would require reading the complete database to access an entire column.

Get hands-on with 1200+ tech skills courses.