Cryptographic Hash Functions: A Complete Guide

Cryptographic hash functions are the silent workhorses of modern security. Every time you log in to a website, push code to Git, download a file, or make a Bitcoin transaction, hash functions are working behind the scenes. Yet most developers interact with them daily without understanding what they truly do — or what can go terribly wrong when they're misused.

This guide covers everything: the mathematics, the history, the broken algorithms, the modern standards, and the practical code you need to use hash functions correctly.

1. What Are Cryptographic Hash Functions?

A cryptographic hash function takes an input of any size and produces a fixed-size output, called a digest or hash. For example, SHA-256 always produces exactly 256 bits (64 hexadecimal characters), regardless of whether the input is a single character or an entire movie file.

Core Properties

Deterministic: The same input always produces the same output. SHA-256("hello") will always return 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824.

Fast to compute: Computing a hash should take milliseconds, not seconds. This efficiency is critical for file integrity checks and digital signatures, though it becomes a liability for password hashing (more on that later).

Pre-image resistance (one-way): Given a hash output H, it must be computationally infeasible to find any input m such that hash(m) = H. You cannot reverse-engineer the original data from its hash alone.

Second pre-image resistance: Given an input m1, it must be infeasible to find a different input m2 such that hash(m1) = hash(m2). Even if an attacker knows your original data, they cannot find a different input that produces the same hash.

Collision resistance: It must be infeasible to find any two distinct inputs m1 and m2 such that hash(m1) = hash(m2). This is a stronger requirement than second pre-image resistance.

Avalanche effect: A tiny change in the input — even a single bit flip — completely changes the output. Changing "hello" to "hellp" produces an entirely different hash with no apparent relationship to the original.

SHA-256("hello") = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
SHA-256("hellp") = 9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca7

These two hashes share no predictable relationship — that's the avalanche effect in action.

2. A Brief History of Hash Algorithms

MD5 (1991)

Ronald Rivest designed MD5 as an improvement over MD4. It produces a 128-bit digest and was widely adopted through the 1990s for checksums and password storage. For over a decade, MD5 was the default choice for many security applications.

SHA-1 (1995)

The National Security Agency (NSA) designed SHA-1 (Secure Hash Algorithm 1) as part of the Digital Signature Standard. It produces a 160-bit digest. SHA-1 became the dominant hash algorithm for TLS/SSL certificates, code signing, and Git's object storage.

SHA-2 Family (2001)

Also designed by the NSA, SHA-2 is actually a family of six functions: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. SHA-256 and SHA-512 are the most commonly used. They produce 256-bit and 512-bit digests respectively and remain secure today.

SHA-3 / Keccak (2015)

After SHA-1's weaknesses became apparent, NIST held a public competition (2007–2015) to find an entirely new hash standard independent of the NSA's SHA-2 design. The winner was Keccak, designed by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. Unlike SHA-2, which uses a Merkle–Damgård construction, SHA-3 uses a sponge construction, offering a fundamentally different security profile.

BLAKE2 (2012)

BLAKE2 is a cryptographic hash function faster than MD5 while providing the security of SHA-3. It was designed by Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O'Hearn, and Christian Winnerlein as an improvement over BLAKE, which was a finalist in the SHA-3 competition. BLAKE2b is optimized for 64-bit platforms; BLAKE2s for 32-bit.

3. How SHA-256 Works

SHA-256 uses the Merkle–Damgård construction: the message is broken into fixed-size blocks, and a compression function is applied iteratively, feeding the output of one block as input to the next.

Step 1: Padding

The input message is padded so its total length is a multiple of 512 bits. A single 1 bit is appended, followed by zeros, followed by the original message length as a 64-bit big-endian integer.

Step 2: Message Schedule

Each 512-bit block is expanded into 64 32-bit words using a schedule that mixes and rotates bits. Words W[0] through W[15] come directly from the message block; words W[16] through W[63] are computed as:

W[i] = σ1(W[i-2]) + W[i-7] + σ0(W[i-15]) + W[i-16]

Where σ0 and σ1 are specific bit rotation and shift operations.

Step 3: Compression — 64 Rounds

SHA-256 maintains 8 working variables (a through h), initialized to the fractional parts of the square roots of the first 8 prime numbers. For each of the 64 rounds, the algorithm applies:

T1 = h + Σ1(e) + Ch(e,f,g) + K[i] + W[i]
T2 = Σ0(a) + Maj(a,b,c)
h = g; g = f; f = e; e = d + T1
d = c; c = b; b = a; a = T1 + T2

The round constants K[i] are the fractional parts of the cube roots of the first 64 primes — a design choice that prevents the "nothing-up-my-sleeve" criticism by making the constants publicly verifiable and free of hidden backdoors.

Step 4: Output

After processing all blocks, the 8 working variables are added to the initial hash values to produce the final 256-bit digest. This "feed-forward" ensures that each block's output depends on all previous blocks.

4. Why MD5 and SHA-1 Are Broken

MD5 Collisions (2004)

In 2004, Xiaoyun Wang and colleagues demonstrated practical collision attacks against MD5 — finding two different inputs that produce the same MD5 hash. By 2008, researchers used MD5 collisions to forge a fraudulent SSL certificate from a real CA, demonstrating a real-world attack against HTTPS infrastructure.

The attack uses sophisticated differential cryptanalysis and can generate MD5 collisions in seconds on modern hardware.

SHA-1 SHAttered (2017)

Google's Project Zero team and CWI Amsterdam produced the first practical SHA-1 collision in 2017, dubbed SHAttered. They generated two different PDF files with identical SHA-1 hashes. The attack required approximately 9.2 × 10¹⁸ SHA-1 computations — equivalent to 6,500 years of single-CPU time — but only about 110 years of GPU time, well within reach of nation-states and large organizations.

What This Means in Practice

MD5 and SHA-1 are NOT safe for:

Digital signatures
Certificate fingerprints
Password storage
Any security-sensitive application

They are still acceptable for:

Non-cryptographic checksums (verifying file download integrity over a trusted channel)
Hash table lookups
Non-security deduplication
Legacy system compatibility (with appropriate caveats)

5. Real-World Use Cases

Password Storage

Never store passwords as plain text — or even as plain hashes. If your database is leaked, an attacker can crack plain hashes using dictionary attacks or rainbow tables within hours or days.

The correct approach uses a slow, salted hash function specifically designed for passwords: bcrypt, scrypt, or Argon2.

File Integrity Verification

When you download software, the developer provides a SHA-256 checksum. After downloading, you compute the file's hash and compare. If they match, the file wasn't corrupted or tampered with in transit.

sha256sum downloaded-file.tar.gz
# Compare with the checksum published by the developer

Digital Signatures

Hash functions are fundamental to digital signatures. Rather than signing an entire document (which could be gigabytes), you hash it and sign only the hash. The recipient hashes the document independently and verifies the signature against that hash.

Blockchain

Bitcoin uses SHA-256 twice (SHA-256d) for proof-of-work mining and to hash transaction blocks. Miners must find an input (nonce) that, when hashed, produces an output with a certain number of leading zeros — a process that requires enormous computational effort and provides the security guarantees of the blockchain.

Git Object Storage

Git uses SHA-1 to hash every commit, tree, and blob object. The hash serves as both the object's identifier and an integrity check. Git is actively migrating to SHA-256 due to SHA-1's weaknesses.

Storage Deduplication

Backup systems and content-addressable storage (like IPFS) use hashes to identify duplicate content. If two files have the same hash, they're stored only once.

6. Computing Hashes in Practice

JavaScript (Node.js)

const crypto = require('crypto');

// SHA-256
const sha256 = crypto.createHash('sha256')
  .update('Hello, World!')
  .digest('hex');
console.log(sha256);
// 315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3

// MD5 (use only for non-security purposes)
const md5 = crypto.createHash('md5')
  .update('Hello, World!')
  .digest('hex');
console.log(md5);
// 65a8e27d8879283831b664bd8b7f0ad4

// SHA-512
const sha512 = crypto.createHash('sha512')
  .update('Hello, World!')
  .digest('hex');
console.log(sha512);

Python

import hashlib

# SHA-256
h = hashlib.sha256(b"Hello, World!").hexdigest()
print(h)
# 315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3

# SHA-512
h512 = hashlib.sha512(b"Hello, World!").hexdigest()
print(h512)

# Multiple algorithms
for algo in ['md5', 'sha1', 'sha256', 'sha512']:
    h = hashlib.new(algo, b"Hello, World!").hexdigest()
    print(f"{algo}: {h}")

Bash / Shell

# SHA-256
echo -n "Hello, World!" | sha256sum
# 315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3  -

# MD5
echo -n "Hello, World!" | md5sum
# 65a8e27d8879283831b664bd8b7f0ad4  -

# SHA-1
echo -n "Hello, World!" | sha1sum

# Hash a file
sha256sum /path/to/file.iso

7. HMAC: Hash-based Message Authentication Code

A plain hash verifies data integrity — it tells you whether data was corrupted. But it doesn't verify authenticity — it doesn't prove who created the data. Anyone can compute a hash.

HMAC (RFC 2104) solves this by combining a secret key with the hash function:

HMAC(K, m) = hash((K' ⊕ opad) || hash((K' ⊕ ipad) || m))

Where K' is the key padded to the block size, and opad/ipad are specific padding constants. This construction is provably secure if the underlying hash function is secure.

Common Uses

API Authentication: REST APIs use HMAC-SHA256 to sign requests. The server and client share a secret key. The client signs the request body with the key; the server verifies the signature.

JWT Signatures: JSON Web Tokens use HMAC-SHA256 (HS256) to sign the header and payload, ensuring the token wasn't tampered with.

Webhook Verification: GitHub, Stripe, and many other services sign webhook payloads with HMAC-SHA256 so receivers can verify the payload is genuine.

Computing HMAC

// Node.js
const crypto = require('crypto');

const hmac = crypto.createHmac('sha256', 'my-secret-key')
  .update('message to authenticate')
  .digest('hex');
console.log(hmac);

import hmac
import hashlib

key = b'my-secret-key'
message = b'message to authenticate'
sig = hmac.new(key, message, hashlib.sha256).hexdigest()
print(sig)

8. Rainbow Tables and Salting

What Are Rainbow Tables?

A rainbow table is a precomputed database mapping known hash values back to their original plaintext inputs. If an attacker obtains your database of password hashes, they don't need to crack each hash individually — they simply look it up in the table.

For MD5 and SHA-1, rainbow tables covering all ASCII passwords up to 8 characters have been freely available for years. Websites like CrackStation maintain databases of billions of hash-to-password mappings.

How Salting Defeats Rainbow Tables

A salt is a random value appended to the password before hashing:

hash(salt + password) = stored_hash

The salt is stored alongside the hash (it doesn't need to be secret). Because every user gets a unique random salt, the attacker cannot use precomputed tables — they'd need a separate rainbow table for every possible salt value, which is computationally impossible.

bcrypt: Automatic Salting and Intentional Slowness

bcrypt was designed in 1999 specifically for password hashing. It automatically generates and incorporates a random salt, and includes a cost factor that controls how slow the hash computation is:

const bcrypt = require('bcrypt');

// Hash a password (cost factor 12 — takes ~250ms on modern hardware)
const hash = await bcrypt.hash('user-password', 12);

// Verify
const isMatch = await bcrypt.compare('user-password', hash);

The stored hash looks like: $2b$12$EixZaYVK1fsbw1ZfbX3OXePaWxn96p36WQoeG6Lruj3vjPGga31lW

The $2b$12$ prefix encodes the algorithm version and cost factor — bcrypt handles everything automatically.

9. Algorithm Comparison Table

Algorithm	Output Size	Speed	Security Status	Best For
MD5	128-bit	Very fast	❌ Broken (collisions)	Non-security checksums only
SHA-1	160-bit	Fast	❌ Broken (SHAttered)	Legacy systems only
SHA-256	256-bit	Fast	✅ Secure	General purpose, TLS, signing
SHA-512	512-bit	Fast on 64-bit	✅ Secure	High-security applications
SHA-3/Keccak	Variable	Moderate	✅ Secure	Alternative to SHA-2
BLAKE2b	Variable	Very fast	✅ Secure	Performance-critical hashing
bcrypt	184-bit	Slow (intentional)	✅ Secure	Password storage
Argon2id	Variable	Slow (intentional)	✅ Secure	Password storage (recommended)

10. Best Practices for Password Hashing

Passwords deserve special treatment because they are the keys to user accounts. A compromised password database can be devastating. Follow these rules without exception:

Rule 1: Never Store Plain Passwords

This should be obvious but still happens. In 2019, Facebook was found to have stored hundreds of millions of passwords in plain text internally.

Rule 2: Never Use Fast Hashes for Passwords

MD5, SHA-1, SHA-256, and SHA-512 are all too fast for password hashing. A modern GPU can compute billions of SHA-256 hashes per second, enabling brute-force attacks in hours.

Rule 3: Use Purpose-Built Password Hashing Algorithms

bcrypt (recommended minimum): Use a cost factor of 12 or higher. Widely supported, battle-tested.

scrypt: Memory-hard, making it resistant to GPU and ASIC attacks. Configurable memory and CPU costs.

Argon2id (recommended today): Winner of the 2015 Password Hashing Competition. Argon2id is the recommended variant as it provides resistance against both side-channel attacks and time-memory trade-off attacks. Configure with at minimum:

Memory: 64 MB
Iterations: 3
Parallelism: 4

Rule 4: Use a Unique Salt Per Password

Even with bcrypt/scrypt/Argon2 (which include automatic salting), understand why it matters: identical passwords must produce different hashes so that compromising one doesn't reveal others.

Rule 5: Tune Cost Factors Over Time

As hardware gets faster, increase cost factors. Aim for ~250–500ms for bcrypt. Re-hash passwords on next login.

Rule 6: Consider Pepper

A pepper is a server-side secret (unlike a salt, it's not stored in the database). It's added to the password before hashing: hash(pepper + salt + password). Even if an attacker steals your database, they cannot crack passwords without the pepper.

Conclusion

Cryptographic hash functions are fundamental to security, integrity, and trust across the internet. Understanding them — from their mathematical properties to their practical vulnerabilities — enables you to build systems that are genuinely secure.

The key takeaways:

SHA-256 and SHA-512 are your go-to general-purpose hashes
MD5 and SHA-1 are broken for cryptographic uses
For passwords, always use bcrypt, scrypt, or Argon2
Use HMAC when you need authentication, not just integrity
Salting defeats rainbow tables; bcrypt and Argon2 do it automatically

Use the Tool3M Hash Generator to quickly compute SHA-256, SHA-512, MD5, and other hashes directly in your browser — no installation required.