Sign in

Mnemonic phrases and how a wallet gets created

The previous lesson ended on a sentence that should bother you: "the private key, and only the private key, is your identity." If the private key is 32 random bytes, how is a human supposed to back that up safely? You can't memorise hex, you'll mis-type it, you'll lose the paper. The answer is one of the most elegant standards in cryptography, and it's the same standard used by almost every wallet you'll ever touch.

The problem with 32 random bytes

A private key looks like this:

4f3edf983ac636a65a842ce7c78d9aa706d3b113b37b6b1da19c89e5fcca7c84

64 hex characters. 32 bytes. Cryptographically random. This is your entire identity on a blockchain. There is no recovery from the issuer because there is no issuer.

Now imagine you have to write that on a piece of paper, store it somewhere safe, and read it back correctly five years from now. You will:

  • Mis-write one character and never notice
  • Confuse 0 and O, or 1 and l
  • Lose the paper to a flood, a fire, or just plain misplacement
  • Find it ten years later and not remember which wallet it belonged to

This is a real problem. Early cryptocurrency users lost meaningful amounts of money to exactly this failure mode. So in 2013, a standard was proposed that turned the same 32 bytes of randomness into something humans could actually back up: a short sequence of ordinary English words.

That standard is BIP 39, and almost every wallet you'll meet uses it.

The full pipeline

Here's the end-to-end process that runs when you click "create a new wallet" in any modern wallet application.

1. Entropy 128 or 256 random bits from the OS entropy source 2. Mnemonic 12 or 24 words from a fixed 2048-word list, plus checksum bits 3. Seed 64 bytes via PBKDF2-HMAC-SHA512 over the mnemonic + optional passphrase 4. Master key + many child keys + addresses a whole tree of key pairs, derived deterministically from the seed

Four stages, each with a specific job. Each stage is deterministic: same input always produces the same output. This is why a mnemonic written on paper is a complete backup of every key and every address the wallet has ever generated, even ones it hasn't shown you yet.

Stage 1: Entropy

The wallet asks the operating system for high-quality random bytes. 128 bits for a 12-word mnemonic, 256 bits for a 24-word one. The bytes come from the OS entropy pool, the same pool that secures TLS connections and SSH sessions on the same machine. A wallet that uses anything weaker is a broken wallet, full stop.

This step is short to describe and disastrous to get wrong. Past wallet failures with weak randomness have produced predictable private keys that attackers have systematically drained. The math of the rest of the pipeline only protects you if this first step is genuinely unpredictable.

Stage 2: Mnemonic generation

The wallet takes the entropy bytes, computes a short checksum (the first few bits of a hash of the entropy), and appends it to the entropy. The resulting bit-string is split into 11-bit chunks. Each chunk is a number between 0 and 2047. That number indexes into a fixed list of 2048 English words, defined once and shared by every BIP 39-compliant wallet on earth.

A 12-word example:

abandon ability able about above absent absorb abstract absurd abuse access accident

These are the first 12 words of the wordlist, all chosen by chance to give you an easy-to-remember illustration. Real mnemonics look more random:

army van defense carry jealous true garbage claim echo media make crunch

The wordlist is the same for every wallet. It contains no two words sharing a four-letter prefix, no plurals of other entries, no words easily confused with each other. This is why "rabb" is enough to uniquely identify "rabbit" when typing into a recovery field, why no wallet ever shows you the word "horsex" instead of "horse," and why entering the wrong word almost always triggers an "invalid mnemonic" error from the checksum.

A 24-word mnemonic uses 256 bits of entropy plus 8 checksum bits, split into 24 chunks of 11 bits each. The same algorithm, just with more bits at the start.

Stage 3: Mnemonic to seed

The mnemonic itself is not the cryptographic input to anything. The seed is. To convert one to the other, the wallet runs:

typescript
seed = PBKDF2-HMAC-SHA512(
  password   = mnemonic_as_text,
  salt       = "mnemonic" + optional_passphrase,
  iterations = 2048,
  output     = 64 bytes
)

PBKDF2 is a key-stretching function: it takes an input and repeatedly hashes it to produce an output, intentionally slowly. The 2,048 iterations are a deliberate cost that adds milliseconds for a legitimate user but multiplies the work required for an attacker brute-forcing weak mnemonics.

The optional passphrase is the part most users don't know exists. If you supply one, any string at all, no length limit, the resulting seed is completely different from the seed you'd get with the same mnemonic and no passphrase. This is sometimes called the "25th word" because it acts like an extra hidden word on top of the visible 12 or 24.

A passphrase is the killer feature for plausible deniability. Your written-down mnemonic plus an empty passphrase opens an obvious wallet, the same mnemonic plus your secret passphrase opens a completely different wallet. An attacker with the mnemonic can drain the visible wallet but cannot reach the hidden one without also knowing the passphrase.

The output of this stage is always 64 bytes regardless of mnemonic length. That 64-byte seed is what the rest of the system actually uses.

Stage 4: Seed to master key, and everything below

The 64-byte seed feeds into a tree-derivation algorithm that produces a master key and then any number of child keys below it. From the master key you can derive child keys, from each child key you can derive grandchildren, and so on, infinitely many key pairs from one seed.

The mechanics of that tree are detailed enough to deserve a treatment of their own, and we'll come back to them in context when they actually matter. For now, the important property is this: the entire tree is deterministic. Same seed, same tree. Same mnemonic, same seed, same tree.

This is why a 12-word mnemonic is a complete wallet backup. Restore it on a new device, the wallet runs the same pipeline, and every key and every address shows up in the same order with the same values. Nothing else needs to be saved.

What this means in practice

Three takeaways that should shape how you think about wallets from here on.

The mnemonic is the wallet. Lose it and there is no recovery. The wallet software does not have a copy. The wallet vendor does not have a copy. The chain does not know who you are, only that you control a key derived from it.

Share the mnemonic and you've shared everything. Anyone who sees your 12 or 24 words can reconstruct every private key derived from it on any device. There is no "read-only" version of a mnemonic. Phishing attacks that ask for your seed phrase are stealing the entire wallet, not a session token.

The mnemonic plus a passphrase is two different wallets. If you use a passphrase, you must back up both the mnemonic and the passphrase, and you must back them up separately. Lose the passphrase and the funds in the hidden wallet are unreachable even though the mnemonic is intact.

The playground below has a mnemonic-generator node. It produces a fresh mnemonic. Use only the test mnemonics it generates. Never paste a real wallet's mnemonic into any web tool, including this one.

Where this goes next

You now have the full flow from "random bits" to "a usable wallet." The next lesson goes back to the cryptographic primitives that make this wallet actually do something on a blockchain: digital signatures, the operation that lets you authorise a transaction using a private key without ever revealing it.