Choosing a Hash Algorithm: MD5, SHA-1, SHA-256 and Beyond
There is a subtle but costly mistake that keeps appearing in security audits: developers using the same hash function for storing passwords that they use for verifying file downloads. These are fundamentally different jobs, and confusing them has caused real breaches. Before you pick a hash algorithm, you need to decide what problem you are actually solving.
This article walks through the major hash functions in use today — what they do well, where they fail, and how to match the right one to the right task. We will cover MD5 and SHA-1 (both broken in different senses), the SHA-2 family, SHA-3, and the password-hashing algorithms that do not get enough attention in general security writing.
What a Hash Function Actually Does
A cryptographic hash function takes an input of arbitrary length and produces a fixed-length output, called a digest. Feed it a 4 GB ISO file or a single space character and you get the same number of output bits. Run the same input again and you get the identical digest. Change a single bit in the input and the output changes unpredictably — roughly half the bits in the digest flip, on average.
The security properties that matter are:
- Pre-image resistance: Given a digest, you cannot reconstruct the input.
- Second pre-image resistance: Given an input, you cannot find a different input that produces the same digest.
- Collision resistance: You cannot find any two distinct inputs that hash to the same value.
When we say a hash algorithm is "broken," we almost always mean collision resistance has been undermined. That sounds abstract until you realize what it enables: an attacker who can manufacture collisions can substitute a malicious file for a legitimate one while keeping the hash intact.
MD5: Broken Since 1996, Still Everywhere
MD5 produces a 128-bit digest and was designed by Ron Rivest in 1991. Theoretical weaknesses appeared by 1996. By 2004, Xiaoyun Wang's team demonstrated practical collision attacks. By 2008, researchers had forged an intermediate CA certificate using MD5 collisions — meaning they created a certificate that browsers trusted but that could sign anything. The attack required about $700 in PlayStation 3 cluster time.
MD5 is not just theoretically weak. It is practically exploitable by well-funded attackers and, increasingly, by anyone with a GPU and a weekend.
Where you still find MD5 in 2024: legacy database schemas storing password hashes, older software update mechanisms, internal checksum tools that never got updated, and — inexplicably — some new code written by developers who learned from old tutorials. None of these uses are acceptable for security-sensitive contexts.
The one place MD5 is not harmful: non-security checksums where collision attacks are not a concern and where you simply need to detect accidental corruption. rsync uses MD5 (and even weaker functions) for this purpose internally, and that is fine because a corrupt packet is not trying to fool you — it just has different bits.
SHA-1: Deprecated, but the Deprecation is Recent
SHA-1 produces a 160-bit digest and held up significantly longer than MD5. NIST deprecated it for digital signatures in 2011. Google's Project Zero team published SHAttered in 2017, demonstrating the first practical SHA-1 collision — two distinct PDF files with identical SHA-1 hashes. The computation required approximately 9.2 × 1018 SHA-1 computations, equivalent to 6,500 years of single-CPU time, but only 110 years of GPU time. That is within reach of nation-states and serious criminal organizations.
By 2020, Gaëtan Leurent and Thomas Peyrin published a chosen-prefix collision attack against SHA-1 that is significantly cheaper. The cost estimate dropped to around $45,000 in cloud compute — accessible to sophisticated criminal groups.
Major browsers stopped accepting SHA-1 TLS certificates in 2017. Git, which used SHA-1 for object hashing, began migrating to SHA-256. GitHub stopped accepting SHA-1-signed commits in 2024.
SHA-1 still appears in HMAC constructions (HMAC-SHA1) where collision resistance matters less than pre-image resistance. HMAC-SHA1 is not immediately catastrophic in, say, a legacy TOTP implementation — but you should migrate anyway, because "not immediately catastrophic" is a poor standard to defend.
SHA-256 and the SHA-2 Family: Currently Safe
SHA-256 is part of the SHA-2 family designed by the NSA and published by NIST in 2001. It produces a 256-bit digest. No practical attacks against SHA-256's collision resistance exist as of this writing. The best known attacks reduce the effective security margin against pre-image attacks in heavily reduced-round variants — not the full algorithm.
SHA-2 variants you will encounter:
- SHA-224, SHA-256: Based on the same internal structure, 32-bit word operations. SHA-256 is the standard choice for most applications.
- SHA-384, SHA-512: Use 64-bit word operations, faster on 64-bit hardware. SHA-512 is genuinely faster than SHA-256 on most modern 64-bit processors. SHA-384 truncates SHA-512 output and is used primarily in TLS.
- SHA-512/256: Takes SHA-512's structure but produces 256-bit output. More resistant to length-extension attacks than SHA-256, useful in certain API signing scenarios.
SHA-256 is the right choice for file integrity verification, digital signatures, certificate fingerprints, and HMAC constructions. It is not the right choice for password storage — and this distinction is critical.
SHA-3: The Backup Algorithm You Should Know About
SHA-3 (Keccak) won NIST's hash competition in 2012. It uses a completely different internal structure — a sponge construction — compared to the Merkle-Damgård structure underlying MD5, SHA-1, and SHA-2. This matters because if a fundamental flaw is ever found in Merkle-Damgård, SHA-3 remains unaffected.
SHA-3 is not faster than SHA-2 in software on most commodity hardware — SHA-256 benchmarks faster on x86-64. SHA-3 is faster in hardware, which is why it appears in lightweight cryptography and embedded systems. For standard server-side applications, SHA-256 is the pragmatic choice; SHA-3 is the hedge if SHA-2 is ever compromised.
NIST also standardized SHAKE128 and SHAKE256 as extendable-output functions (XOFs) from the SHA-3 family. These are useful in specific cryptographic protocols but not something most application developers need to reach for directly.
The Password Storage Problem: Why SHA-256 is Wrong Here
This is where many security articles gloss over the most important practical point. SHA-256 is fast — that is its virtue for checksums and signatures. For password storage, speed is the enemy.
A modern GPU can compute roughly 8.5 billion SHA-256 hashes per second. If your database leaks and an attacker has your hashed passwords, they can attempt 8.5 billion guesses per second against each one. An eight-character lowercase password has about 200 billion possible values — meaning it falls in roughly 23 seconds against a $500 GPU.
Password hashing algorithms are specifically designed to be slow and to remain slow even as hardware improves:
- bcrypt: Designed in 1999. Uses a cost factor that lets you tune computational expense. At cost 12, bcrypt performs roughly 250 hashes per second on the same GPU that does 8.5 billion SHA-256 hashes. It has a 72-byte input limit, which is a real limitation for long passphrases — though in practice, passwords over 72 bytes are rare.
- scrypt: Adds memory hardness to bcrypt's time hardness. Memory-hard means you cannot parallelize the attack cheaply even with custom ASICs, because each hash attempt requires significant RAM. Used by some cryptocurrencies and recommended by NIST for password hashing in certain contexts.
- Argon2: Won the Password Hashing Competition in 2015. Has three variants: Argon2d (optimized against GPU attacks, not safe against side-channel attacks), Argon2i (side-channel safe, slightly weaker against GPU attacks), and Argon2id (hybrid, recommended for most applications). Supports tunable time cost, memory cost, and parallelism. If you are building new software today, Argon2id is the correct choice.
- PBKDF2: Older, uses an underlying PRF (typically HMAC-SHA256) with a configurable iteration count. Not memory-hard, so vulnerable to GPU acceleration. Still widely used because it is FIPS-certified. If FIPS compliance is a hard requirement, PBKDF2-HMAC-SHA256 with 600,000+ iterations is acceptable; otherwise prefer Argon2id.
A Practical Decision Framework
The choice of hash function depends almost entirely on what you are hashing:
File integrity / checksums (non-adversarial): SHA-256 is the standard. It is fast, widely supported, and produces a digest small enough to include in documentation. MD5 still appears in many tools for this purpose and is not dangerous in a purely accidental-corruption context, but SHA-256 has no downsides and should be preferred.
Digital signatures and certificates: SHA-256 minimum. SHA-384 or SHA-512 for higher-security contexts or long-validity certificates. Never MD5 or SHA-1.
HMAC for API authentication: HMAC-SHA256. Some legacy systems use HMAC-SHA1; migrate when possible.
Password storage: Argon2id with at least m=19456 (19 MiB), t=2 iterations, p=1. If your framework only supports bcrypt, use cost 12 or higher and rehash on login as hardware improves. Never SHA-anything, never MD5, never unsalted hashes of any kind.
Random token generation (API keys, session tokens, password reset links): Do not hash at all during generation — use a cryptographically secure random number generator. If you store the token in a database for comparison, SHA-256 of the token is acceptable (tokens are high-entropy, so slow hashing is unnecessary).
The Salting Rule That Never Goes Away
Whatever password hashing function you use, it must incorporate a unique random salt per password. Without salts, two users with the same password produce the same hash, enabling precomputed rainbow table attacks. Argon2id and bcrypt handle salting internally and automatically. PBKDF2 requires you to generate and store the salt explicitly — typically 16 bytes from a CSPRNG.
A password hashing implementation that does not salt is broken regardless of which underlying algorithm it uses.
Migration Strategy for Legacy Systems
If you have an existing system using MD5 or SHA-1 password hashes, you cannot retroactively re-hash without knowing the original passwords. The standard approach: on each successful login, re-hash the submitted password with the new algorithm and replace the stored hash. Mark accounts that have not logged in after a cutoff date and force a password reset. This hybrid migration is gradual but it works without a bulk breach of user privacy.
File checksum migrations are simpler: recalculate and republish. If you distribute software with MD5 checksums, generate SHA-256 equivalents and update your documentation. Old checksums should be explicitly marked as deprecated, not quietly removed.
The Bottom Line
MD5 is broken for security purposes. SHA-1 is broken for security purposes. Both still appear in production systems and tutorials with alarming frequency. SHA-256 is the correct general-purpose cryptographic hash for most applications. For password storage, SHA-256 is wrong — use Argon2id, bcrypt, or scrypt instead. SHA-3 is the algorithm to watch if SHA-2 is ever compromised. These are not preferences or style choices — they reflect specific, measurable attack capabilities against specific algorithmic weaknesses.