Algorithm

Hashing Algorithm

A hash function takes any input and produces a fixed-size number. Same input, same output. Always.

21 Mar 2024

Hashing Algorithm

A hash function takes any input and produces a fixed-size number. Same input, same output. Always.

That's it. That's hashing. But from this simple idea, you get hash maps, password storage, data integrity checks, and distributed caching.

What Makes a Good Hash Function

Three properties matter:

  1. Deterministic — same key always gives the same hash.
  2. Uniform distribution — hashes spread evenly across the output space. Clusters cause collisions.
  3. Fast — hashing should be cheaper than the operations it enables.

A Simple Hash Function

Javascript
function simpleHash(key, tableSize) {
  let hash = 0;
  for (let i = 0; i < key.length; i++) {
    hash += key.charCodeAt(i);
  }
  return hash % tableSize;
}

console.log(simpleHash("hello", 10)); // 2
console.log(simpleHash("world", 10)); // 2 — collision!

This sums character codes and takes the modulo. It works, but it's a bad hash function. "hello" and "olleh" produce the same hash. Anagrams always collide.

A Better Hash Function

Weight each character by its position.

Javascript
function betterHash(key, tableSize) {
  let hash = 0;
  const prime = 31;
  for (let i = 0; i < key.length; i++) {
    hash = (hash * prime + key.charCodeAt(i)) % tableSize;
  }
  return hash;
}

console.log(betterHash("hello", 10)); // 2
console.log(betterHash("olleh", 10)); // 8 — different!

The prime multiplier makes the hash sensitive to character order. This dramatically reduces collisions.

Handling Collisions

No hash function eliminates collisions entirely. Two strategies:

  • Chaining — each bucket holds a list. Collisions just add to the list. Simple. Works well with a good hash function.
  • Open addressing — if a slot is taken, probe the next slot (linear probing, quadratic probing, or double hashing). Better cache performance but more complex.

Complexity

  • Hash computation: O(k) where k is the key length.
  • Lookup with good hash: O(1) average.
  • Lookup with bad hash: O(n) worst case (everything in one bucket).

The Trade-off

Hashing gives you O(1) lookups — but only on average. The worst case is O(n). A good hash function and proper table sizing keep you in O(1) territory. A bad hash function turns your hash map into a linked list.

Hashing also doesn't preserve order. If you need sorted data or range queries, a tree structure is a better choice.