Hash Table

What is a Hash Table?

A hash table or associative array is a key value lookup. In others words, it is a data structure that stores elements in key-value pairs where the key is a unique integer that is used for indexing the values and the value is the data associated with keys.

You might want to use hash tables because they have a time complexity of O(1). Using a hash table is one of the fastest ways to search, insert and delete.

In a hash table, a new index is processed using keys, and the element corresponding to that key is stored in the index. This process is called hashing.

Hashing is a technique or process of mapping keys values into the hash table. The efficiency of mapping depends on the efficiency of the hash function used.

Let a hash function H(x) maps the value x at the index x-1. For example if the list of values is [1,2,3,4,5] it will be stored at positions {0,1,2,3,4} in the Hash table. However, when the hash function generates the same index for multiple keys, there will be a hash collision.

The hash collision can be solved using two techniques, chaining or open addressing.

Hash Collision

Chaining

This solution uses a Doubly Linked List to store these elements in the same index.

Open Addressing

In this solution we store only one element into the same slot. There are different techniques like Linear Probing, Quadratic Probing and Double hashing.

Linear Probing and Quadratic Probing

In these collision cases, we resolve by checking the next slot. Linear polling increments the value linearly and Quadratic increments the value with an arbitrary quadratic polynomial.

Double hashing

In this case of collision we apply another hash function or the same with the differents parameters.

Implement a Hash Table

As always, our data structure will use a struct to group variables under the same data. This will be our Node with a key and value.

struct node 
{
	int key;
	int value;
};

Node * createNode(int key, int data){
    Node* node = (Node *)malloc(sizeof(Node));
 
    node->key = key;
    node->data = data;
    return node;
    // you can call it as Node * node = createNode(int, int);
}

We are going to build our hash table using a list of Nodes, to accomplish that we will need to store its size and capacity. Also, we are going to use a constant value CAPACITY, that we will be using on our entire code.

#define CAPACITY 10

typedef struct hash_table
{
  int capacity;
  int size;
  struct node **array;
}Hash_table;

Hash_table * createHashTable(){
    Hash_table* hash_table = (Hash_table *)malloc(sizeof(Hash_table));

    hash_table->size = 0;
    hash_table->capacity = CAPACITY;
    hash_table->array = calloc((size_t)hash_table->capacity, sizeof(Node*));

    return hash_table;
    // you can call it as Hash_table * hash_table = createHashTable();
}

A simple example of a hash function is dividing the key by the capacity.

int hashing(int key){
	return (key % CAPACITY);
}

Now, we can implement the three main functions in a hash table. Push, pop and search.

First, lets populate it. We need to use the hash function here, to collect the index to the array in the hash table.

Hash_table * push(Hash_table * hash_table, Node * node){
    int index = hashing(node->key) + (hashing(node->key) * attempt);

    hash_table->array[index] = node;
    hash_table->size++;

    return hash_table;
}

This way could work, but in case of collision it is not that effective. When do we know that happened a collision? Just check the position, if it is not null, it is not empty then it is a collision.

In cases of collision this example will use Double hashing. Create a variable attempt to store how many attempts we have made and help with the double hashing. With this we can control how many times it will recursivelly call push, and stop when it has reached the capacity of the hash table.

int attempt = 0;

Hash_table * push(Hash_table * hash_table, Node * node){
    int index = hashing(node->key) + (hashing(node->key) * attempt);

    Node * at_pos = hash_table->array[index];

    if (at_pos == NULL){
        hash_table->array[index] = node;
        hash_table->size++;
        attempt = 0;
    } else if (hash_table->capacity == attempt){
        printf("There isnt space left\n");
    } else{
        attempt++;
        hash_table = push(hash_table, node);
    }
    return hash_table;
}

For the pop, we use a loop to verify if the contest of the position is the same as in the Node. The hash function and attempt variable are also used here, to make the search for the Node quicker.

void pop(Hash_table * hash_table, Node * node) {
    int index = hashing(node->key) + (hashing(node->key) * attempt);
    Node * closestNode = hash_table->array[index];
    while (closestNode != NULL) {
        if (closestNode->key == node->key &&  closestNode->data == node->data) {
            free(closestNode);
            hash_table->array[index] = NULL;
            attempt = 0;
            break;
        }
        attempt++;
        index = hashing(node->key) + (hashing(node->key) * attempt);
        closestNode = hash_table->array[index];
    } 
    hash_table->size--;
}

The search follows the same logic as in the pop, we use a loop to search for the Node, and use the hashing function and attempt variable to make it quicker.

Node * search(Hash_table * hash_table, int key) {
    int index = hashing(key) + (hashing(key) * attempt);
    Node * closestNode = hash_table->array[index];
    while (closestNode != NULL) {
        if (closestNode->key == key) {
            attempt = 0;
            return closestNode;
        }
        attempt++;
        index = hashing(key) + (hashing(key) * attempt);
        closestNode = hash_table->array[index];
    }
}

The Big-O notation

Data Structure Time Complexity Space Complexity
Average Worst Worst
Access Search Insertion Deletion Access Search Insertion Deletion
Hash Table N/AO(1)O(1)O(1)N/A O(n) O(n) O(n) O(n)

Code

/*
 * Giovanna Borges Bottino 
 * giovannabbottino@gmail.com
 */

#include <stdio.h>
#include <stdlib.h>

#define CAPACITY 10

typedef struct node
{
  int key;
  int data;
}Node;

typedef struct hash_table
{
  int capacity;
  int size;
  struct node **array;
}Hash_table;

int attempt = 0;

int hashing(int key);
Node * createNode(int key, int data);
Hash_table * createHashTable();
Hash_table * push(Hash_table * hash_table, Node * node);
void pop(Hash_table * hash_table, Node * node);
Node * search(Hash_table * hash_table, int key);

int main(){

    Hash_table * hash_table = createHashTable();
    Node * node = createNode(1, 1);

    hash_table = push(hash_table, node);

    Node * searchNode = search(hash_table, node->key);

    printf("Key: %d\tData: %d\n", searchNode->key, searchNode->data);

    pop(hash_table, node);

    return 0;
}

int hashing(int key){
	return (key % CAPACITY);
}

Node * createNode(int key, int data){
    Node* node = (Node *)malloc(sizeof(Node));
 
    node->key = key;
    node->data = data;
    return node;

}

Hash_table * createHashTable(){
    Hash_table* hash_table = (Hash_table *)malloc(sizeof(Hash_table));

    hash_table->size = 0;
    hash_table->capacity = CAPACITY;
    hash_table->array = calloc((size_t)hash_table->capacity, sizeof(Node*));

    return hash_table;
}

Hash_table * push(Hash_table * hash_table, Node * node){
    int index = hashing(node->key) + (hashing(node->key) * attempt);

    Node * at_pos = hash_table->array[index];

    if (at_pos == NULL){

        hash_table->array[index] = node;
        hash_table->size++;
        attempt = 0;
    } else if (hash_table->capacity == attempt){
        printf("There isnt space left\n");
    } else{
        attempt++;
        hash_table = push(hash_table, node);
    }
    return hash_table;
}

void pop(Hash_table * hash_table, Node * node) {
    int index = hashing(node->key) + (hashing(node->key) * attempt);
    Node * closestNode = hash_table->array[index];
    while (closestNode != NULL) {
        if (closestNode->key == node->key &&  closestNode->data == node->data) {
            free(closestNode);
            hash_table->array[index] = NULL;
            attempt = 0;
            break;
        }
        attempt++;
        index = hashing(node->key) + (hashing(node->key) * attempt);
        closestNode = hash_table->array[index];
    } 
    hash_table->size--;
}

Node * search(Hash_table * hash_table, int key) {
    int index = hashing(key) + (hashing(key) * attempt);
    Node * closestNode = hash_table->array[index];
    while (closestNode != NULL) {
        if (closestNode->key == key) {
            attempt = 0;
            return closestNode;
        }
        attempt++;
        index = hashing(key) + (hashing(key) * attempt);
        closestNode = hash_table->array[index];
    }
}

References

[1] PROGRAMIZ. Hash Table. Programiz. Available at: https://www.programiz.com/dsa/hash-table. Accessed: January 09, 2022.