Concatenation

pandas provides some useful utilities to combine data from multiple files. Let's see how to use them.

We'll cover the following...

- append() with same column
- append() with different column
- concat with outer join
- concat with inner join
- Comparison with SQL join

`append()` with same column

In real life, data comes from different sources, combining them together into a single DataFrame is very useful.

Let’s begin with a very simple scenario.

There are two DataFrame structures with the same column names. Your job is to combine these two DataFrames into one. Before trying it in pandas, do you remember how to do it using the native Python list? We would use append() to add items to a Python list. In pandas, the function is also append(). It concatenates the two DataFrame objects along axis=0, namely the index or column.

Notice:

In the example, the function only accepts one DataFrame. However, multiple DataFrames can be passed as a list.

The index can be duplicated.

ignore_index=True could be passed if the index of this DataFrame is meaningless.

Press + to interact

Python 3.5

import pandas as pd
d1 = {"a": [1, 2], "b": [2, 4]}
df1 = pd.DataFrame(d1)
print("The first DataFrame")
print(df1)
print("------------------------")
d2 = {"a": [3, 4], "b": [6, 8]}
df2 = pd.DataFrame(d2)
print("The second DataFrame")
print(df2)
df2 = pd.DataFrame(d2)
print("------------------------")
df1 = df1.append(df2)
print("Append the second DataFrame to the first one")
print(df1)
print("Append the second DataFrame to the first one and set ignore_index=True")
df1 = df1.append(df2, ignore_index=True)
print(df1)

Press + to interact

`concat` with outer join

concat() is a much more powerful function to join DataFrames, which can combine DataFrame objects along both rows and columns, unlike append which does it only along the rows.

Its default behavior is to join DataFrames and columns, just like append(). axis=1 can be passed to join DataFrames along with rows.

join is a parameter that specifies how to handle indexes on another axis. "outer" for union and "inner" for the intersection.

Below is an example of an outer join. The first DataFrame has an index [1, 2], the second DataFrame has an index [2, 3], they have one overlapping index, 2. The rows with index=2 in both DataFrames would join. Meanwhile, other rows would be kept separate. The empty location would be filled by NaN. After concat, the total row number is 3.

Press + to interact

Comparison with SQL join

If you are familiar with SQL, you can check the comparison table below.

concat join type	SQL
inner	INNER JOIN
outer	FULL OUTER JOIN
right	RIGHT OUTER JOIN
left	LEFT OUTER JOIN

Introduction

Basic Operations on DataFrames

Other topics

Improve performance

Concatenation

`append()` with same column

`append()` with different column

`concat` with outer join

`concat` with inner join

Comparison with SQL join

Concatenation

append() with same column

append() with different column

concat with outer join

concat with inner join

Comparison with SQL join

`append()` with same column

`append()` with different column

`concat` with outer join

`concat` with inner join