How to implement linear hashing in Python

Hashing addresses the need to quickly locate or store an item in a collection. Hashing is a method for increasing productivity by effectively filtering the search.

What is hashing?

The process of translating keys and values into a hash table using a hash function is known as hashing. We use hashing for quicker access to elements.

The creation of hash tables is the most common application of hashing. It involves turning a string of characters into a key or a fixed-length value, which typically reflects the longer string but is shorter. Since it takes less time to discover an item using the shorter hashed key than it does to find it using the original value, databases employ hashing to index and retrieve data.

Hashing
Hashing

What is hashing used for?

  • Obtaining data: When looking for objects on an object data map, a hash may be utilized to focus our search.

    For instance, developers store data in the form of key and value pairs in hash tables, which may include customer records. The hash code or the integer is then translated to a predetermined size, and the key serves as an input to the hashing method to help identify the contents. The supported function by hash tables includes: insert (key, value) (key, value), get (key) (key), and delete (key) (key).

  • Electronic signatures: Hashing aids in the encryption and decryption of digital signatures need to authenticate message senders and recipients in addition to enabling quick data retrieval. In this case, the digital signature is changed by a hash function before the hashed value, and the signature is transmitted separately to the recipient.

What is linear hashing?

A disk-based index structure called linear hashing, dynamically updates, grows, or reduces one bucket at a time (or you can call it a dynamic hashing scheme). The index is used to locate the record that corresponds to a certain key or to make exact match searches easier.

The method is called linear hashing because the number of buckets grows or shrinks linearly. The maximum number of hashing functions that the system can use at once varies dynamically.

The distinction between linear hashing and other hashing

  • There is no required directory in linear hashing.

  • It is capable of handling long overflow chains.

  • It is more flexible with respect to the timing of buckets splits

  • It allows us to grow one slot at a time.

Linear hashing terminology

Terminologies in Linear Hashing

Terminology

Description

h0, h1,h2

A family of hash functions, where each funtion's range it twice of its predecessor

N

Initial number of buckets

d0

The number of bit to represent N

Level

Iindicate the number of spilt cycle completed, initially 0

Next

Pointer to the next bucket inline to be split

Linear hashing math properties

Linear Hashing Math Properties

Properties


N

= 2d0, for some d0

di

d0 + i

hi(Key)

h(key) mod(2iN)

As mentioned earlier, linear hashing is flexible with respect to the timing of the bucket split, and it allows us to choose from two splitting criteria.

  • Load factor: it is calculated by diving the number of entries by the number of buckets multiplied by bucket capacity. We can set the split trigger to any load factor value we want, giving us greater control over the space utilization of the hashing table.
    Load factor = 4/3*3 = 40%

  • Overflow: it is very simple to understand, once we detect an overflow after an incursion, we trigger the split.

Linear hashing implementation

Let's walk through an example to see how linear hashing works:

Linear Hashing Implementation

N

= 4

Level

= 0

Split by Overflow


Next

= 0 (the first bucket)

Insert 37

Since we are at level 0, size level = 0, h0(37) = 100101, di = d0 = 2, then we only need to look at the last 2 bits of H(37) = 01. The table will now look like this:

Inserting 37

h0





00

32

(100000)

44

(101100)

36

(100100)


01

9

(1001)

25

(11001)

5

(0101)

37

(100101)

10

14

(1110)

18

(10010)

10

(1010)

30

(11110)

11

31

(11111)

35

(100011)

7

(0111)

11

(1011)

Insert 43

We are still at level 0 therefore, the size level = 0, h0(43) = 101011, di - d0 = 2. We only need to look at the last 2 bits of H(43) = 11. We create a new column and insert an overflow of 43(101011), since the hash table to set to trigger on overflow is now set as split, the table looks like this now:

Inserting 43

h0





Overflow

00

32

(100000)

44

(101100)

36

(100100)



01

9

(1001)

25

(11001)

5

(0101)

37

(100101)


10

14

(1110)

18

(10010)

10

(1010)

30

(11110)


11

31

(11111)

35

(100011)

7

(0111)

11

(1011)

43

(101011)

Let's demonstrate this by splitting:

Splitting

N

= 4

Level

= 0

Split by Overflow


Next

= 1

Split is done on the bucket (00) where the pointer points to not the overflow bucket. So we split the next bucket and increment the pointer to the next bucket in the circle.

Now we need to relocate entries originally in the bucket(00) with respect to the new hlevel+1 and we need to look at the last 3 bits of each entry.

Splitting table

h1

h0





Overflow


00

32

(100000)

44

(101100)

36

(100100)




01

9

(1001)

25

(11001)

5

(0101)

37

(100101)



10

14

(1110)

18

(10010)

10

(1010)

30

(11110)



11

31

(11111)

35

(100011)

7

(0111)

11

(1011)

43

(101011)

100

00

44

(101100)

36

(100100)




Code example

Let's see how it works with an executable code:

#list definition
init_list = [120 , 111 , 80 , 260 , 118 , 110 , 100 , 97 , 85]
HashValues = []
#define a function to accept the data from the list
def linear_hash_function(init_list):
#another list with a none datatype
second_list = [None for i in range(10)]
for i in init_list:
#let's append the values from the second list to the HashTable
HashValues.append(i % len(second_list))
second_list[i % len(second_list)] = i
print(second_list)
print(init_list)
print(linear_hash_function)
print(linear_hash_function(init_list))

Code explanation

  • Line 2: We define a list with the name init_list and add nine values to the list.

  • Line 5: Another list HashValues is defined so as to save the final values after the linear hashing is done.

  • Line 8: We define a function with init_list as the parameter and went ahead to define a local list within the function name, which has a None datatype because we want to handle null datatypes in the list within the range of 10.

  • Line 11: We create a for loop. We append the values from the second_list to the HashTable. Then for all values (i) in second_list find the modulo and set it equal to i.

  • Line 23: Then, we print the function linear_hash_function(init_list) .

The output:

  • When we print the output, we get a None output because it contains null values or no values at all.

Note: None is not the same as an empty string, 0 or False, it is a datatype of its own (NoneType).

Conclusion

We learned about linear hashing in this Answer. We talked about what hashing is, what linear hashing itself means, how it is different from other hashing techniques, terminologies, and its implementation.

Copyright ©2024 Educative, Inc. All rights reserved