Cryptographic Hash Functions: A Complete Guide
Cryptographic hash functions are the silent workhorses of modern security. Every time you log in to a website, push code to Git, download a file, or make a Bitcoin transaction, hash functions are working behind the scenes. Yet most developers interact with them daily without understanding what they truly do — or what can go terribly wrong when they're misused.
This guide covers everything: the mathematics, the history, the broken algorithms, the modern standards, and the practical code you need to use hash functions correctly.
1. What Are Cryptographic Hash Functions?
A cryptographic hash function takes an input of any size and produces a fixed-size output, called a digest or hash. For example, SHA-256 always produces exactly 256 bits (64 hexadecimal characters), regardless of whether the input is a single character or an entire movie file.
Core Properties
Deterministic: The same input always produces the same output. SHA-256("hello") will always return 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824.
Fast to compute: Computing a hash should take milliseconds, not seconds. This efficiency is critical for file integrity checks and digital signatures, though it becomes a liability for password hashing (more on that later).
Pre-image resistance (one-way): Given a hash output H, it must be computationally infeasible to find any input m such that hash(m) = H. You cannot reverse-engineer the original data from its hash alone.
Second pre-image resistance: Given an input m1, it must be infeasible to find a different input m2 such that hash(m1) = hash(m2). Even if an attacker knows your original data, they cannot find a different input that produces the same hash.
Collision resistance: It must be infeasible to find any two distinct inputs m1 and m2 such that hash(m1) = hash(m2). This is a stronger requirement than second pre-image resistance.
Avalanche effect: A tiny change in the input — even a single bit flip — completely changes the output. Changing "hello" to "hellp" produces an entirely different hash with no apparent relationship to the original.
SHA-256("hello") = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
SHA-256("hellp") = 9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca7
These two hashes share no predictable relationship — that's the avalanche effect in action.
2. A Brief History of Hash Algorithms
MD5 (1991)
Ronald Rivest designed MD5 as an improvement over MD4. It produces a 128-bit digest and was widely adopted through the 1990s for checksums and password storage. For over a decade, MD5 was the default choice for many security applications.
SHA-1 (1995)
The National Security Agency (NSA) designed SHA-1 (Secure Hash Algorithm 1) as part of the Digital Signature Standard. It produces a 160-bit digest. SHA-1 became the dominant hash algorithm for TLS/SSL certificates, code signing, and Git's object storage.
SHA-2 Family (2001)
Also designed by the NSA, SHA-2 is actually a family of six functions: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. SHA-256 and SHA-512 are the most commonly used. They produce 256-bit and 512-bit digests respectively and remain secure today.
SHA-3 / Keccak (2015)
After SHA-1's weaknesses became apparent, NIST held a public competition (2007–2015) to find an entirely new hash standard independent of the NSA's SHA-2 design. The winner was Keccak, designed by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van Assche. Unlike SHA-2, which uses a Merkle–Damgård construction, SHA-3 uses a sponge construction, offering a fundamentally different security profile.
BLAKE2 (2012)
BLAKE2 is a cryptographic hash function faster than MD5 while providing the security of SHA-3. It was designed by Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O'Hearn, and Christian Winnerlein as an improvement over BLAKE, which was a finalist in the SHA-3 competition. BLAKE2b is optimized for 64-bit platforms; BLAKE2s for 32-bit.
3. How SHA-256 Works
SHA-256 uses the Merkle–Damgård construction: the message is broken into fixed-size blocks, and a compression function is applied iteratively, feeding the output of one block as input to the next.
Step 1: Padding
The input message is padded so its total length is a multiple of 512 bits. A single 1 bit is appended, followed by zeros, followed by the original message length as a 64-bit big-endian integer.
Step 2: Message Schedule
Each 512-bit block is expanded into 64 32-bit words using a schedule that mixes and rotates bits. Words W[0] through W[15] come directly from the message block; words W[16] through W[63] are computed as:
W[i] = σ1(W[i-2]) + W[i-7] + σ0(W[i-15]) + W[i-16]
Where σ0 and σ1 are specific bit rotation and shift operations.
Step 3: Compression — 64 Rounds
SHA-256 maintains 8 working variables (a through h), initialized to the fractional parts of the square roots of the first 8 prime numbers. For each of the 64 rounds, the algorithm applies:
T1 = h + Σ1(e) + Ch(e,f,g) + K[i] + W[i]
T2 = Σ0(a) + Maj(a,b,c)
h = g; g = f; f = e; e = d + T1
d = c; c = b; b = a; a = T1 + T2
The round constants K[i] are the fractional parts of the cube roots of the first 64 primes — a design choice that prevents the "nothing-up-my-sleeve" criticism by making the constants publicly verifiable and free of hidden backdoors.
Step 4: Output
After processing all blocks, the 8 working variables are added to the initial hash values to produce the final 256-bit digest. This "feed-forward" ensures that each block's output depends on all previous blocks.
4. Why MD5 and SHA-1 Are Broken
MD5 Collisions (2004)
In 2004, Xiaoyun Wang and colleagues demonstrated practical collision attacks against MD5 — finding two different inputs that produce the same MD5 hash. By 2008, researchers used MD5 collisions to forge a fraudulent SSL certificate from a real CA, demonstrating a real-world attack against HTTPS infrastructure.
The attack uses sophisticated differential cryptanalysis and can generate MD5 collisions in seconds on modern hardware.
SHA-1 SHAttered (2017)
Google's Project Zero team and CWI Amsterdam produced the first practical SHA-1 collision in 2017, dubbed SHAttered. They generated two different PDF files with identical SHA-1 hashes. The attack required approximately 9.2 × 10¹⁸ SHA-1 computations — equivalent to 6,500 years of single-CPU time — but only about 110 years of GPU time, well within reach of nation-states and large organizations.
What This Means in Practice
MD5 and SHA-1 are NOT safe for:
- Digital signatures
- Certificate fingerprints
- Password storage
- Any security-sensitive application
They are still acceptable for:
- Non-cryptographic checksums (verifying file download integrity over a trusted channel)
- Hash table lookups
- Non-security deduplication
- Legacy system compatibility (with appropriate caveats)
5. Real-World Use Cases
Password Storage
Never store passwords as plain text — or even as plain hashes. If your database is leaked, an attacker can crack plain hashes using dictionary attacks or rainbow tables within hours or days.
The correct approach uses a slow, salted hash function specifically designed for passwords: bcrypt, scrypt, or Argon2.
File Integrity Verification
When you download software, the developer provides a SHA-256 checksum. After downloading, you compute the file's hash and compare. If they match, the file wasn't corrupted or tampered with in transit.
sha256sum downloaded-file.tar.gz
# Compare with the checksum published by the developer
Digital Signatures
Hash functions are fundamental to digital signatures. Rather than signing an entire document (which could be gigabytes), you hash it and sign only the hash. The recipient hashes the document independently and verifies the signature against that hash.
Blockchain
Bitcoin uses SHA-256 twice (SHA-256d) for proof-of-work mining and to hash transaction blocks. Miners must find an input (nonce) that, when hashed, produces an output with a certain number of leading zeros — a process that requires enormous computational effort and provides the security guarantees of the blockchain.
Git Object Storage
Git uses SHA-1 to hash every commit, tree, and blob object. The hash serves as both the object's identifier and an integrity check. Git is actively migrating to SHA-256 due to SHA-1's weaknesses.
Storage Deduplication
Backup systems and content-addressable storage (like IPFS) use hashes to identify duplicate content. If two files have the same hash, they're stored only once.
6. Computing Hashes in Practice
JavaScript (Node.js)
const crypto = require('crypto');
// SHA-256
const sha256 = crypto.createHash('sha256')
.update('Hello, World!')
.digest('hex');
console.log(sha256);
// 315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3
// MD5 (use only for non-security purposes)
const md5 = crypto.createHash('md5')
.update('Hello, World!')
.digest('hex');
console.log(md5);
// 65a8e27d8879283831b664bd8b7f0ad4
// SHA-512
const sha512 = crypto.createHash('sha512')
.update('Hello, World!')
.digest('hex');
console.log(sha512);
Python
import hashlib
# SHA-256
h = hashlib.sha256(b"Hello, World!").hexdigest()
print(h)
# 315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3
# SHA-512
h512 = hashlib.sha512(b"Hello, World!").hexdigest()
print(h512)
# Multiple algorithms
for algo in ['md5', 'sha1', 'sha256', 'sha512']:
h = hashlib.new(algo, b"Hello, World!").hexdigest()
print(f"{algo}: {h}")
Bash / Shell
# SHA-256
echo -n "Hello, World!" | sha256sum
# 315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3 -
# MD5
echo -n "Hello, World!" | md5sum
# 65a8e27d8879283831b664bd8b7f0ad4 -
# SHA-1
echo -n "Hello, World!" | sha1sum
# Hash a file
sha256sum /path/to/file.iso
7. HMAC: Hash-based Message Authentication Code
A plain hash verifies data integrity — it tells you whether data was corrupted. But it doesn't verify authenticity — it doesn't prove who created the data. Anyone can compute a hash.
HMAC (RFC 2104) solves this by combining a secret key with the hash function:
HMAC(K, m) = hash((K' ⊕ opad) || hash((K' ⊕ ipad) || m))
Where K' is the key padded to the block size, and opad/ipad are specific padding constants. This construction is provably secure if the underlying hash function is secure.
Common Uses
API Authentication: REST APIs use HMAC-SHA256 to sign requests. The server and client share a secret key. The client signs the request body with the key; the server verifies the signature.
JWT Signatures: JSON Web Tokens use HMAC-SHA256 (HS256) to sign the header and payload, ensuring the token wasn't tampered with.
Webhook Verification: GitHub, Stripe, and many other services sign webhook payloads with HMAC-SHA256 so receivers can verify the payload is genuine.
Computing HMAC
// Node.js
const crypto = require('crypto');
const hmac = crypto.createHmac('sha256', 'my-secret-key')
.update('message to authenticate')
.digest('hex');
console.log(hmac);
import hmac
import hashlib
key = b'my-secret-key'
message = b'message to authenticate'
sig = hmac.new(key, message, hashlib.sha256).hexdigest()
print(sig)
8. Rainbow Tables and Salting
What Are Rainbow Tables?
A rainbow table is a precomputed database mapping known hash values back to their original plaintext inputs. If an attacker obtains your database of password hashes, they don't need to crack each hash individually — they simply look it up in the table.
For MD5 and SHA-1, rainbow tables covering all ASCII passwords up to 8 characters have been freely available for years. Websites like CrackStation maintain databases of billions of hash-to-password mappings.
How Salting Defeats Rainbow Tables
A salt is a random value appended to the password before hashing:
hash(salt + password) = stored_hash
The salt is stored alongside the hash (it doesn't need to be secret). Because every user gets a unique random salt, the attacker cannot use precomputed tables — they'd need a separate rainbow table for every possible salt value, which is computationally impossible.
bcrypt: Automatic Salting and Intentional Slowness
bcrypt was designed in 1999 specifically for password hashing. It automatically generates and incorporates a random salt, and includes a cost factor that controls how slow the hash computation is:
const bcrypt = require('bcrypt');
// Hash a password (cost factor 12 — takes ~250ms on modern hardware)
const hash = await bcrypt.hash('user-password', 12);
// Verify
const isMatch = await bcrypt.compare('user-password', hash);
The stored hash looks like: $2b$12$EixZaYVK1fsbw1ZfbX3OXePaWxn96p36WQoeG6Lruj3vjPGga31lW
The $2b$12$ prefix encodes the algorithm version and cost factor — bcrypt handles everything automatically.
9. Algorithm Comparison Table
| Algorithm | Output Size | Speed | Security Status | Best For |
|---|---|---|---|---|
| MD5 | 128-bit | Very fast | ❌ Broken (collisions) | Non-security checksums only |
| SHA-1 | 160-bit | Fast | ❌ Broken (SHAttered) | Legacy systems only |
| SHA-256 | 256-bit | Fast | ✅ Secure | General purpose, TLS, signing |
| SHA-512 | 512-bit | Fast on 64-bit | ✅ Secure | High-security applications |
| SHA-3/Keccak | Variable | Moderate | ✅ Secure | Alternative to SHA-2 |
| BLAKE2b | Variable | Very fast | ✅ Secure | Performance-critical hashing |
| bcrypt | 184-bit | Slow (intentional) | ✅ Secure | Password storage |
| Argon2id | Variable | Slow (intentional) | ✅ Secure | Password storage (recommended) |
10. Best Practices for Password Hashing
Passwords deserve special treatment because they are the keys to user accounts. A compromised password database can be devastating. Follow these rules without exception:
Rule 1: Never Store Plain Passwords
This should be obvious but still happens. In 2019, Facebook was found to have stored hundreds of millions of passwords in plain text internally.
Rule 2: Never Use Fast Hashes for Passwords
MD5, SHA-1, SHA-256, and SHA-512 are all too fast for password hashing. A modern GPU can compute billions of SHA-256 hashes per second, enabling brute-force attacks in hours.
Rule 3: Use Purpose-Built Password Hashing Algorithms
bcrypt (recommended minimum): Use a cost factor of 12 or higher. Widely supported, battle-tested.
scrypt: Memory-hard, making it resistant to GPU and ASIC attacks. Configurable memory and CPU costs.
Argon2id (recommended today): Winner of the 2015 Password Hashing Competition. Argon2id is the recommended variant as it provides resistance against both side-channel attacks and time-memory trade-off attacks. Configure with at minimum:
- Memory: 64 MB
- Iterations: 3
- Parallelism: 4
Rule 4: Use a Unique Salt Per Password
Even with bcrypt/scrypt/Argon2 (which include automatic salting), understand why it matters: identical passwords must produce different hashes so that compromising one doesn't reveal others.
Rule 5: Tune Cost Factors Over Time
As hardware gets faster, increase cost factors. Aim for ~250–500ms for bcrypt. Re-hash passwords on next login.
Rule 6: Consider Pepper
A pepper is a server-side secret (unlike a salt, it's not stored in the database). It's added to the password before hashing: hash(pepper + salt + password). Even if an attacker steals your database, they cannot crack passwords without the pepper.
Conclusion
Cryptographic hash functions are fundamental to security, integrity, and trust across the internet. Understanding them — from their mathematical properties to their practical vulnerabilities — enables you to build systems that are genuinely secure.
The key takeaways:
- SHA-256 and SHA-512 are your go-to general-purpose hashes
- MD5 and SHA-1 are broken for cryptographic uses
- For passwords, always use bcrypt, scrypt, or Argon2
- Use HMAC when you need authentication, not just integrity
- Salting defeats rainbow tables; bcrypt and Argon2 do it automatically
Use the Tool3M Hash Generator to quickly compute SHA-256, SHA-512, MD5, and other hashes directly in your browser — no installation required.