Hashing addresses the need to quickly locate or store an item in a collection. Hashing is a method for increasing productivity by effectively filtering the search.
The process of translating keys and values into a hash table using a hash function is known as hashing. We use hashing for quicker access to elements.
The creation of hash tables is the most common application of hashing. It involves turning a string of characters into a key or a fixed-length value, which typically reflects the longer string but is shorter. Since it takes less time to discover an item using the shorter hashed key than it does to find it using the original value, databases employ hashing to index and retrieve data.
Obtaining data: When looking for objects on an object data map, a hash may be utilized to focus our search.
For instance, developers store data in the form of key and value pairs in hash tables, which may include customer records. The hash code or the integer is then translated to a predetermined size, and the key serves as an input to the hashing method to help identify the contents. The supported function by hash tables includes: insert (key, value) (key, value), get (key) (key), and delete (key) (key).
Electronic signatures: Hashing aids in the encryption and decryption of digital signatures need to authenticate message senders and recipients in addition to enabling quick data retrieval. In this case, the digital signature is changed by a hash function before the hashed value, and the signature is transmitted separately to the recipient.
A disk-based index structure called linear hashing, dynamically updates, grows, or reduces one bucket at a time (or you can call it a dynamic hashing scheme). The index is used to locate the record that corresponds to a certain key or to make exact match searches easier.
The method is called linear hashing because the number of buckets grows or shrinks linearly. The maximum number of hashing functions that the system can use at once varies dynamically.
There is no required directory in linear hashing.
It is capable of handling long overflow chains.
It is more flexible with respect to the timing of buckets splits
It allows us to grow one slot at a time.
Terminology | Description |
h0, h1,h2 | A family of hash functions, where each funtion's range it twice of its predecessor |
N | Initial number of buckets |
d0 | The number of bit to represent N |
Level | Iindicate the number of spilt cycle completed, initially 0 |
Next | Pointer to the next bucket inline to be split |
Properties | |
N | = 2d0, for some d0 |
di | d0 + i |
hi(Key) | h(key) mod(2iN) |
As mentioned earlier, linear hashing is flexible with respect to the timing of the bucket split, and it allows us to choose from two splitting criteria.
Load factor: it is calculated by diving the number of entries by the number of buckets multiplied by bucket capacity. We can set the split trigger to any load factor value we want, giving us greater control over the space utilization of the hashing table.
Load factor = 4/3*3 = 40%
Overflow: it is very simple to understand, once we detect an overflow after an incursion, we trigger the split.
Let's walk through an example to see how linear hashing works:
N | = 4 |
Level | = 0 |
Split by Overflow | |
Next | = 0 (the first bucket) |
Since we are at level 0, size level = 0, h0(37) = 100101, di = d0 = 2, then we only need to look at the last 2 bits of H(37) = 01. The table will now look like this:
h0 | ||||
00 | 32 (100000) | 44 (101100) | 36 (100100) | |
01 | 9 (1001) | 25 (11001) | 5 (0101) | 37 (100101) |
10 | 14 (1110) | 18 (10010) | 10 (1010) | 30 (11110) |
11 | 31 (11111) | 35 (100011) | 7 (0111) | 11 (1011) |
We are still at level 0 therefore, the size level = 0, h0(43) = 101011, di - d0 = 2. We only need to look at the last 2 bits of H(43) = 11. We create a new column and insert an overflow of 43(101011), since the hash table to set to trigger on overflow is now set as split, the table looks like this now:
h0 | Overflow | ||||
00 | 32 (100000) | 44 (101100) | 36 (100100) | ||
01 | 9 (1001) | 25 (11001) | 5 (0101) | 37 (100101) | |
10 | 14 (1110) | 18 (10010) | 10 (1010) | 30 (11110) | |
11 | 31 (11111) | 35 (100011) | 7 (0111) | 11 (1011) | 43 (101011) |
Let's demonstrate this by splitting:
N | = 4 |
Level | = 0 |
Split by Overflow | |
Next | = 1 |
Split is done on the bucket (00) where the pointer points to not the overflow bucket. So we split the next bucket and increment the pointer to the next bucket in the circle.
Now we need to relocate entries originally in the bucket(00) with respect to the new hlevel+1 and we need to look at the last 3 bits of each entry.
h1 | h0 | Overflow | ||||
00 | 32 (100000) | 44 (101100) | 36 (100100) | |||
01 | 9 (1001) | 25 (11001) | 5 (0101) | 37 (100101) | ||
10 | 14 (1110) | 18 (10010) | 10 (1010) | 30 (11110) | ||
11 | 31 (11111) | 35 (100011) | 7 (0111) | 11 (1011) | 43 (101011) | |
100 | 00 | 44 (101100) | 36 (100100) |
Let's see how it works with an executable code:
#list definitioninit_list = [120 , 111 , 80 , 260 , 118 , 110 , 100 , 97 , 85]HashValues = []#define a function to accept the data from the listdef linear_hash_function(init_list):#another list with a none datatypesecond_list = [None for i in range(10)]for i in init_list:#let's append the values from the second list to the HashTableHashValues.append(i % len(second_list))second_list[i % len(second_list)] = iprint(second_list)print(init_list)print(linear_hash_function)print(linear_hash_function(init_list))
Line 2: We define a list with the name init_list
and add nine values to the list.
Line 5: Another list HashValues
is defined so as to save the final values after the linear hashing is done.
Line 8: We define a function with init_list
as the parameter and went ahead to define a local list within the function name, which has a None datatype because we want to handle null datatypes in the list within the range of 10
.
Line 11: We create a for
loop. We append the values from the second_list
to the HashTable. Then for all values (i
) in second_list
find the modulo and set it equal to i
.
Line 23: Then, we print the function linear_hash_function(init_list)
.
The output:
When we print the output, we get a None
output because it contains null values or no values at all.
Note:
None
is not the same as an empty string, 0 or False, it is a datatype of its own (NoneType).
We learned about linear hashing in this Answer. We talked about what hashing is, what linear hashing itself means, how it is different from other hashing techniques, terminologies, and its implementation.