What is a hash join in SQL?

What is join in SQL?

SQL join statement or command is often used to fetch data from multiple tables. SQL joins are used to join a row of data from multiple tables on the basis of a common column (field) between them. Multiple types of joins are used in the SQL listed below.

  • SQL inner join

  • SQL outer join

  • SQL left join

  • SQL right join

  • SQL semi join

  • SQL anti join

Hash Join

The name Hash join comes from the hash function(). This hash join is useful for middle to large inputs, but it is not efficient for every small set. Hash join requires at least one equi join(=), and it supports all joins (left/ right semi/ anti join). Hash join is the only physical operator that needs memory. Hash join consists of 2 phases.

  1. Building or blocking phase

  2. Probe or non-blocking phase

How it works

The Hash join is one of the three available joins for joining two tables. However, it is not only about joining. Hash join is used to find the matching in two tables with a hash table, several joins are available, like nested loop join, but the hash join is more efficient than the nested loop join.

Build phase

In the first phase, sever creates a hash table in the memory. In this hash table, rows of the input will be stored using join. Hash join attributes are used as hash table keys. This build is called build input. Let’s assume countries is designated as the build input. The hash join condition is countries.country_id, which belongs to the build input. It will be used as the key in the hash table. Once all the rows are stored in the hash table, the build phase is completed.

Build Phase of Hash Join Query with Database
Build Phase of Hash Join Query with Database

Probe phase

During the probe phase, the server reads rows from the probe input (persons in our illustration). For individual rows, the server probes the hash table for comparing rows using the value from persons.country_id as the lookup key. For the particular match, a joined row is sent to the client. In the end, the server scanned each input only once, using constant time lookup to find matching rows between the two inputs.

Code

Let’s see the working of hash join with the help of the following query as an example:

SELECT
person_name, country_name
FROM
persons JOIN countries ON persons.country_id = countries.country_id;

Code explanation

  • Line 1: The SELECT clause is used to select the columns from the table we want to display.

  • Line 2: The person_name and country_name are the column names in the tables.

  • Line 3: The FROM clause is used to mention the required table name.

  • Line 4: The JOIN clause is used to join two related connected tables on a specific matching condition for both tables.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved