Passphrase entropy

A while back I was interested in coming up with a passphrase that would result in the same keypresses when typed on Colemak and Qwerty keyboard layouts. I concluded that it would be too hard to get a reasonable amount of entropy because there are only 13 keys that hold the same position in both layouts.

Tonight on a whim I went back to perform a more precise calculation using this quick and dirty script:

#!/usr/bin/env node

const {readFileSync} = require('fs');

const qwerty = `
  qwertyuiop
  asdfghjkl;
  zxcvbnm,./
`;

const colemak = `
  qwfpgjluy;
  arstdhneio
  zxcvbkm,./
`;

const normalize = layout => layout.replace(/\s+/g, '').split('');
const getCommon = (a, b) => a.filter((char, index) => char === b[index]);
const escape = char => char === '.' ? '\\.' : char;

const chars = getCommon(normalize(qwerty), normalize(colemak));
const regexp = new RegExp('^[' + chars.map(escape).join('') + ']+$');

const words = readFileSync('/usr/share/dict/words')
  .toString()
  .split(/\s+/)
  .map(s => s.toLowerCase());

const filtered = words.filter(word => word.match(regexp));

console.log(
  `Of ${words.length} words,\n` +
  `${filtered.length} words contain only ${chars.length} common keys\n` +
  `(${chars.join(', ')}).\n`
);

filtered.forEach(word => {
  console.log(`  ${word}`);
});

console.log('\nEntropy (bits) for an n-word passphrase:\n');
const bitsPerWord = Math.floor(Math.log2(filtered.length));
for (let i = 1; i < 10; i++) {
  console.log(`${i} word${i > 1 ? 's': ''}: ${bitsPerWord * i} bits`);
}

console.log(
  '\nFor comparison, dictionary words each have about 14 bits of entropy\n' +
  '(source: https://security.stackexchange.com/a/62911/151988).'
);

What’s this script doing? It’s scanning through the 235,887 words in /usr/share/dict/words and collecting the pool of just 132 words that contain only characters common to both Colemak and Qwerty, then printing out some entropy info at the end for passphrases of different word lengths.

For those too lazy to run it, here’s the output:

Of 235887 words,
132 words contain only 13 common keys
(q, w, a, h, z, x, c, v, b, m, ,, ., /).

  a
  a
  aa
  aam
  ab
  aba
  abac
  abaca
  abama
  abb
  abba
  abwab
  acca
  ach
  ah
  ah
  aha
  am
  ama
  amah
  amba
  amma
  amma
  ava
  aw
  awa
  ax
  azha
  b
  b
  ba
  baa
  bab
  baba
  bac
  bacaba
  bacach
  bacca
  bach
  bah
  baham
  bahama
  bam
  baw
  bhava
  c
  c
  ca
  caam
  caama
  cab
  caba
  caca
  cacam
  cachaza
  cam
  camaca
  camb
  cava
  caw
  caza
  cha
  chaa
  chab
  chac
  chacma
  cham
  cham
  chama
  chamma
  chaw
  cwm
  h
  h
  ha
  haab
  habab
  hah
  ham
  hammam
  hamza
  haw
  hawm
  hwa
  m
  m
  ma
  ma
  maam
  mab
  maba
  mac
  mac
  macaca
  macaw
  mah
  maha
  mam
  mamba
  mamma
  maw
  max
  maza
  mazama
  mwa
  q
  q
  v
  v
  w
  w
  wa
  wa
  waac
  wab
  wac
  wah
  waw
  wawa
  wawah
  wax
  waxhaw
  wha
  wham
  x
  x
  z
  z
  za
  zac
  zach
  zax

Entropy (bits) for an n-word passphrase:

1 word: 7 bits
2 words: 14 bits
3 words: 21 bits
4 words: 28 bits
5 words: 35 bits
6 words: 42 bits
7 words: 49 bits
8 words: 56 bits
9 words: 63 bits

For comparison, dictionary words each have about 14 bits of entropy
(source: https://security.stackexchange.com/a/62911/151988).

What can we conclude from all this?

  • If we drew words directly from /usr/share/dict/words without regard to layout, we could get an excellent 17 bits of entropy per word (for comparison, the word list used by 1Password is apparently only large enough to deliver about 14 bits per word). Unfortunately, many of the words in this list aren’t practical to use (consider an early example like "abdominohysterectomy", which nobody is ever going to accept), so we’re not really claiming 17 bits of entropy in the real world.
  • Our Colemak/Qwerty hybrid words have about half the entropy per word (a measly 7 bits), meaning that you need your passphrase to be twice as long to match the entropy you’d get with standard dictionary words: for 56 bits of entropy, for example, you’d need an 8-word passphrase instead of a 4-word one. It’s not going to be particularly memorable either, as it will end up being something like "mamba waxhaw zax macaca habab cachaza wab azha". I’ll grant that that’s pretty fun to say out loud, but that’s not a redeeming quality for a passphrase.
  • I guess we could inject more entropy by adding numbers and symbols, but the base set of words to draw from is still sucky.

I’m going to stick to my boring existing passphrase for now. For reference, and in case I forget it, it is "rosemary horde shotgun portrait".

Discuss: Twitter