Passphrase entropy

A while back I was interested in coming up with a passphrase that would result in the same keypresses when typed on Colemak and Qwerty keyboard layouts. I concluded that it would be too hard to get a reasonable amount of entropy because there are only 13 keys that hold the same position in both layouts.

Tonight on a whim I went back to perform a more precise calculation using this quick and dirty script:

#!/usr/bin/env node

const {readFileSync} = require('fs');

const qwerty = `

const colemak = `

const normalize = layout => layout.replace(/\s+/g, '').split('');
const getCommon = (a, b) => a.filter((char, index) => char === b[index]);
const escape = char => char === '.' ? '\\.' : char;

const chars = getCommon(normalize(qwerty), normalize(colemak));
const regexp = new RegExp('^[' +'') + ']+$');

const words = readFileSync('/usr/share/dict/words')
  .map(s => s.toLowerCase());

const filtered = words.filter(word => word.match(regexp));

  `Of ${words.length} words,\n` +
  `${filtered.length} words contain only ${chars.length} common keys\n` +
  `(${chars.join(', ')}).\n`

filtered.forEach(word => {
  console.log(`  ${word}`);

console.log('\nEntropy (bits) for an n-word passphrase:\n');
const bitsPerWord = Math.floor(Math.log2(filtered.length));
for (let i = 1; i < 10; i++) {
  console.log(`${i} word${i > 1 ? 's': ''}: ${bitsPerWord * i} bits`);

  '\nFor comparison, dictionary words each have about 14 bits of entropy\n' +

What’s this script doing? It’s scanning through the 235,887 words in /usr/share/dict/words and collecting the pool of just 132 words that contain only characters common to both Colemak and Qwerty, then printing out some entropy info at the end for passphrases of different word lengths.

For those too lazy to run it, here’s the output:

Of 235887 words,
132 words contain only 13 common keys
(q, w, a, h, z, x, c, v, b, m, ,, ., /).


Entropy (bits) for an n-word passphrase:

1 word: 7 bits
2 words: 14 bits
3 words: 21 bits
4 words: 28 bits
5 words: 35 bits
6 words: 42 bits
7 words: 49 bits
8 words: 56 bits
9 words: 63 bits

For comparison, dictionary words each have about 14 bits of entropy

What can we conclude from all this?

  • If we drew words directly from /usr/share/dict/words without regard to layout, we could get an excellent 17 bits of entropy per word (for comparison, the word list used by 1Password is apparently only large enough to deliver about 14 bits per word). Unfortunately, many of the words in this list aren’t practical to use (consider an early example like "abdominohysterectomy", which nobody is ever going to accept), so we’re not really claiming 17 bits of entropy in the real world.
  • Our Colemak/Qwerty hybrid words have about half the entropy per word (a measly 7 bits), meaning that you need your passphrase to be twice as long to match the entropy you’d get with standard dictionary words: for 56 bits of entropy, for example, you’d need an 8-word passphrase instead of a 4-word one. It’s not going to be particularly memorable either, as it will end up being something like "mamba waxhaw zax macaca habab cachaza wab azha". I’ll grant that that’s pretty fun to say out loud, but that’s not a redeeming quality for a passphrase.
  • I guess we could inject more entropy by adding numbers and symbols, but the base set of words to draw from is still sucky.

I’m going to stick to my boring existing passphrase for now. For reference, and in case I forget it, it is "rosemary horde shotgun portrait".

Discuss: Twitter

Searching for the Holy Grail

I recently came to the conclusion that, when it comes to programming languages, there is no Holy Grail, just a bunch of cups, and perhaps we should just get on with drinking.

I’ve worked with a lot of languages over the years. I started as a 10-year-old writing in Commodore BASIC on the Commodore 64, progressed to AmigaBASIC, AMOS and finally 68000 assembly on the Amiga 500, and briefly dipped into x86 assembly during my short stint as the owner of a 486SX33. At college I did programming courses that used Ada.

Sadly, I don’t have any records of any of the programs that I wrote back in those days, although I fondly recall some. Whatever remains of them, fragilely stored on flimsy floppy disks or inscribed on dot-matrix printouts, is probably decomposing under countless layers of land-fill by now.

I got my first Mac in the mid-90s. My earliest Mac programs were in REALbasic, but with the birth of the web I wrote "discussion boards" with flat-file "databases" in Perl, "weblogs" in PHP, and I learned HTML, CSS and JS, as you do. Almost all of that early stuff is lost too.

It’s not until the 2000s that my personal fossil record has some surviving evidence. I used the CVS, Subversion, and SVK version control systems, and some of the artifacts that I created in those systems survived into the Git era, where I expect they’ll enjoy a much longer digital half-life.

I wrote large projects in C and Objective-C on the desktop, and for the web I learned Ruby and went deep into JS. These were my professional languages that paid the bills for me, but along the way I explored and wrote side-projects in Haskell, Go and Lua. At work I sometimes have to dip into Hack. I’ve written my fair share of Vimscript and Bash (etc) stuff. Sometimes, I write snippets of Python. My latest experiments have been with Reason.

Looking back over that history, I envy the simplicity of the constrained choices that I used to make. I wrote BASIC because it came with the computer. I learned assembly because it was the only tool for the job (if by "the job" you meant writing games with fast graphics). Some of the choices were totally serendipitous and not really choices at all; I learned AMOS because someone gave me a copy of it. I learned Perl because it was basically "the way" that CGI was done back in the day. I learned C and Objective-C because those were the languages with which you could access Apple’s APIs and build desktop apps.

Starting with Rails, the pace of innovation and renewal seemed to really take off. Languages and frameworks were spawned en masse, duking it out with one another to achieve domination and establish a "new paradigm". The choices became less clear, the list of contenders impractically long. For all the languages I did learn, there are many more that I didn’t have time to even dabble in (not just the older languages — Erlang, Java, C++ — but countless others that they have spawned or influenced: Dart, Rust, Elm, PureScript, CoffeeScript, Elixir, Scala, Kotlin, D and many others).

I think there are some pretty clear conclusions to draw here: either we as language designers aren’t very good at what we do, or as human beings we’re afflicted with a crippling case of not-invented-here syndrome and an evolved inability to be happy and productive with the tools that we’re given. Either way, the costs are immense: we spend a disproportionate amount of time and energy reinventing our tooling relative to how much we invest in actually building things with those tools. It’s easy to mistake motion for progress, I guess.

Why am I writing about this now? I think it’s because I’ve felt more and more restless about JavaScript. JavaScript is an increasingly multi-paradigm language that wants to be all things to all people. It has "Feature Envy". Its aggressively forward-looking development model (the whole TC39 process with its keen involvement by significant industry players with the power to effect meaningful change in the deployed ecosystem, and the availability and widespread use of tools like Babel that enable rapid and aggressive experimentation) together with the need to maintain backwards compatibility forever (you can’t "break the web") mean that it accrues an ever-growing, never-shrinking set of functionality and syntax. It has — or will have — almost every feature available in any other language, up to and including a Kitchen Sink capability, but will probably stop short of the one thing that I really wish it had (a real, not-bolted-on type system). JavaScript is simultaneously the best language to teach to beginners (there is no cheaper way to build something with a UI that will run anywhere) and the worst (welcome to an ecosystem with a rapidly churning set of tooling that’s metastisizing fast enough that it may well collapse into a black hole one day).

Yet, I’ve come to the conclusion that all this searching for something better is a fool’s errand. You don’t make something better by combining the best bits of other languages. The most you can do is to take an opinionated stance about something that you consider to be really important, focus on getting that one thing really right, and then get on with the business of building useful things with it. Note that it doesn’t suffice to be just opinionated; you also have to be focused. Here are some examples of opinionated, focused stances:

  • Haskell:
    • Core thesis: Fully unlock the power of abstraction with a purely functional, lazily evaluated core.
    • Advantage: You get an expressive, sophisticated type system that allows you to succinctly materialize ideas with a high degree of machine-assisted verification.
    • Trade-off: Some things, like modifying deeply nested immutable data structures, are hard.
  • Go:
    • Core thesis: Simplicity is paramount.
    • Advantage: Out of simple primitives you can build robust, highly-performant concurrent solutions.
    • Trade-off: Code is "boring", "verbose", "pedestrian".
  • C:
    • Core thesis: Abstraction is overrated.
    • Advantage: Speak to the Von Neumann architecture in its native tongue (almost) to build fast things, without needing to learn processor-specific assembly language.
    • Trade-off: When you build stuff out of gun powder, wire, and spark-plugs, you just may singe off your eyebrows.

If you try to make these languages better by blending together their best elements you wind up with behemoths like JavaScript and C++. These are not bad languages; they’ve been extremely successful, and many great things get built using them. And yet, people can’t resist somehow trying to "fix" them, either by augmentation or outright replacement. Something is rotten in the state of programming languages.

Inevitably with the good stuff comes some baggage. Sometimes the elements don’t combine well. You can’t make a better Ruby, for example, by adding Haskell’s strong static typing to it, because what you’d get wouldn’t be the loose, fluid, pleasant, expressiveness of Ruby: you’d just get Haskell with an awkward syntax. I’ve previously remarked that programming in Ruby is like driving a rubber car without a seatbelt; that sounds like fun in a weird kind of way, and it is. By the time you’ve added static typing the language is no longer a car, nor is it made of rubber, and I guess it doesn’t really make sense to ask whether it has a seatbelt or not any more (whatever "it" is). Large, growing languages inevitably tend towards resembling Frankenstein’s monster over time.

All of this may strike you as being perilously close to just plain old obvious common-sense, and you may wonder, why I am bothering to write it? Why would I ever have thought that there was a Holy Grail in the first place?

It crept up on me insidiously at first. About 10 years ago I first got exposed to the idea that you should learn a new programming language every year, not necessarily to add to your practical tool-kit, but to expand your mind. Haskell was a popular choice for this purpose back then. The notion was that you should seek out "novel" ideas — note this means novel to you and not necessarily something universally recent — and train your mind by grappling with them. At the very least you kept your mental axe sharp by exercising it, and at best you might stumble across something that subtly (or even dramatically) changed your world view and in some nebulous, hard-to-articulate way, "made you a better developer".

Fast forward ten years into this practice, I haven’t learned ten new languages, but I have made significant incursions into about five. Even though I never lost sight of the reason why I was going through this whole exercise, there was a primitive, subconscious part of my brain that was wondering if I would end up finding "The One": the language that would somehow feel so "right" and enable me to be so effective that I’d be able to stop searching and settle down for a decade, or two, or three. I’d forgotten that I wasn’t engaged in a search at all. This is what reading too many blog posts with titles like "Why we’re rewriting everything from X to Y" will do to you, given enough time. You start to think like a believer.

This year I was looking around for a new Language of the Year to dive into, but was struggling to find something that met my novelty criterion. Reason/OCaml felt too similar to Haskell. Rust definitely had some novel ideas, but it failed my other criterion: practical applicability (in the sense that I needed to find a well-suited, useful, motivating side-project in which to try it out).

I finally decided to go with Reason despite the relative lack of novelty, and this meant honestly confronting myself with the fact that at least part of me — the irrational part — had been holding out, hoping to find a Holy Grail candidate. I had to let go of that. I went in knowing Reason would have some real strengths (eg. fast compile times, great developer experience, solid performance etc) and some down-sides (eg. spartan documentation, few examples, and so on). There’d also be some mixed-bag stuff, like the small community, which you can consider a blessing or a curse depending on how you look at it.

All languages are going to suck in some way. But the bright side is that we have so many choices available to us now that we can choose languages that suck in the ways that we can tolerate, and conversely, excel in the areas that really matter to us. For me, Reason is interesting because two of the things that it gets really right (having a solid type system, and language-level support for immutable records) happen to represent a couple of the things whose absence really annoys me about JavaScript. I can compromise a lot on the rest — take the good, and the bad — because ultimately it doesn’t matter that much to me. I’m looking for a cup to drink out of, not a fountain of eternal youth.

In closing, I want to make sure that this post doesn’t end up being yet another "Why I’m rewriting everything from X to Y". I’ll continue to use the other "cups" I have in my cupboard. The cup I choose at any particular moment will depend on the circumstances. I am sure that every now and again I’ll want to try out a new vessel or two. But if you ever see me with a distant, glazed-over look in my eyes, and I look like I’m about to ride off towards the horizon in search of some mythical language that doesn’t exist — or worse still, I start showing signs of wanting to design my own — please try to snap me out of it. A slap in the face and, if that doesn’t work, a bucket of cold water should do the trick.

Discuss: Twitter

Building Relay Modern

There’s a long backstory about the development of Relay Modern that’s been bubbling around in the back of my head for a while. As I write this, version 1.0.0 is out, we’ve published an official blog post introducing the new version, and people out in the community have had time to write some useful introductory posts about it. There are already quadrillions of Facebook users getting their data delivered to them via Relay Modern, and even more importantly, I’ve ported my blog over to it. Seems like as good a time as any to tell this story, or at least part of it.

If you’re a GraphQL aficionado, recovering JS framework author (or user), or are simply interested in the question of how best to manage data flow in complex server/client applications, then I’m writing this for you.

Hello, Relay

I started working on Relay back in early 2014 when it wasn’t open source, wasn’t called Relay, and had only recently decided to be in the business of bridging the gap between React and GraphQL (it originally started off as a new routing solution, or so the legend has it). GraphQL was still a pretty young technology at that point, but it had seen rapid uptake and was used extensively across our native apps.

Like any emergent technology, GraphQL had some growing pains. Because of this, Relay set out not just to bring GraphQL to JavaScript — and note that that meant not just the web but also mobile, via the still-secret React Native project — but to rethink some of the assumptions that had been made in the native apps up to that point.

  • One of the big ideas was query colocation — the notion that you should be able to specify your data requirements for each view component inside the view itself and that the framework should transparently handle aggregation and efficient fetching.
  • Another was that we could totally eliminate overfetching by dynamically constructing queries at runtime based on a comparison of what data the developer had asked for in their component and the data that the framework had already stashed away in the cache as the result of previous queries.
  • Finally, we figured that GraphQL fragments — the basic unit of re-use that allows developers to assemble queries out of a bunch of otherwise redundant parts — should be parameterizable; that is, fragments should be able to take arguments, just like functions do, so that they could be used flexibly in multiple places without duplication.

This was a super exciting time to be at Facebook. React was taking off, and React Native, Relay, Flow and GraphQL were all angling towards open source release. There was a real sense that we had something awesome to share with the world.

The (First) Great Rewrite

As we approached the open source release, we realized it was time to rewrite a substantial part of Relay. GraphQL was going to come out in open source with a minimal but pretty rigorous spec, a new syntax and some subtle corrections and improvements from the organically grown internal implementation. We had some long standing bugs that we wanted to fix, and a bunch of ideas on how to improve performance by making use of immutable data structures. Not truly immutable ones, mind you, as JavaScript doesn’t have those: but ones that we’d build out of standard old mutable JS objects, and with which we’d carefully implement structural sharing and copy-on-write semantics, with Flow providing some assurances that we weren’t mutating things we didn’t own.

This is where I must introduce Joe Savona. Joe was pretty new at Facebook at the time, but he joined the Relay team and dived into tackling some of the hardest problems we had to solve in the rewrite. In fact, his continual production of new ideas was one of the things that fueled the desire to actually go ahead and do the rewrite, for real. We had always lived with a long backlog of stuff that we’d love to get to some day, some of it quite "moonshotty", but Joe had a talent for translating those ideas into a series of ordered, achievable steps. We came up with some pleasant APIs for traversing and transforming trees (query ASTs, data trees), and set about rewriting the guts, heart, brain, and peripheral appendages of Relay. I presented a deep dive on some of this stuff back in 2015 that you can watch if you want to learn more.

This was some of the most intellectually interesting work I’ve done, working on hard problems among talented, inspiring, hard-working peers. My favorite part was adding support for nested "deferred" queries. For the first part of this, I adapted somebody else’s very clever code for splitting apart a heterogenous tree into a version that could do so recursively. Tied my brain in knots doing it. I then got to rewrite it on top of our new APIs and the result was satisfyingly simple compared to the old version. The same was true for all the other traversals that we had to reimplement. We finished the rewrite, open sourced Relay, and rode off into the sunset.

The (Second) Great Rewrite

Not quite. The sunset bit. Releasing the project was only the beginning. We had an ever-growing internal user-base at FB with increasingly demanding and diverse workloads to fulfill. We were faced with a critical problem: despite the fact that Relay was recently re-written and much better architected, it was crumpling under the weight of its own complexity. As we added features such as query persistence (the ability to reduce query upload sizes by saving the query on the server and sending up an identifier instead of the full query text), garbage collection, integration with offline disk caches on native platforms, and sophisticated new APIs for dealing with "connections" (paginated collections), we found ourselves frustrated with the speed at which we could make progress. This thing was intricate and complicated, hard to modify, stupefyingly magical and unpredictable.

We still had that long backlog of ideas, but we knew that we were adding to the tail of the queue faster than we were shifting from the head of it. It was a scary prospect, but we came to the conclusion that it was time to burn it all down and rewrite the thing from scratch. We knew we had to unlock performance wins that would require drastic changes, and rewriting was the only way we were going to be able to do it before old age, senility and burn-out took us out of the game. A risky move — big rewrites are often warned against for a reason — but we felt like we had to take the gamble. In doing the rewrite, we knew that the risk of failure (in typical "Second System Syndrome" fashion) was real, but inaction would have led to certain failure.

Everything old is new again

I can still remember the day in early 2016 when Joe and I grabbed a room in MPK 20, that fancy, Frank Gehry-designed thing with a park on the roof, and stood in front of a whiteboard wall to try and imagine what "Relay 2" would look like if we let go of all our previously held assumptions.

What if every query in Relay were statically known?

Woah, that’s crazy talk, Joe. What are you talking about? I’d been spending too long inside the bubble of the Relay philosophy — the one with the tenets about query colocation, dynamic query construction, and fragment parametrization — that I’d never really considered this. Those tenets were already in place before I joined the team, and I assumed — perhaps naively — that they must have been there for a good reason; people who’d been at Facebook much longer than I and had witnessed the birth and evolution of GraphQL had decided that there must be a better way, and something new should be tried. It never occurred to me that embracing the static, the rigid, the "inflexible" could be a step forward. Funny that I hadn’t, seeing as I had just prior to this built a new static API for writing Relay mutations (data updates) that aimed to replace the magical dynamism of the existing Relay mutations with something more predictable, debuggable and teachable.

But Joe hadn’t just considered it; he’d had the idea circling around in his concious and unconscious mind for possibly weeks or months. He’d given it deep, painstaking thought, and he was nearing the conclusion that it just might work. Fully static queries, known ahead of time, would unlock new kinds of performance optimization by allowing us to burn cycles at build time precomputing optimal structures that would allow us to go faster at run time. And with static queries, we’d get query persistence effectively "for free", just like the native apps.

So, back to that question.

What if every query in Relay were statically known?

It was heresy, but we went through the exercise anyway, figuring out what each of the existing APIs would look like if we wiped the slate clean and started from scratch without dynamic, runtime query construction. It meant giving up some features, jettisoning some magic. In return, users would get predictability, performance, and an execution model that mere mortals could understand. There would be a cost though: instead of having Relay figure out a minimal set of data to refetch when parameters change, we’d require users to specify a static query ahead of time. And we’d have to rewrite everything, again, in order to implement this.

On the flip side, rewriting would mean the ability to scratch some long-felt itches, like:

  • Switching to a purely POJO-based representation for cache data.
  • Abstracting all low-level record access behind a thin facade API that would allow us to plug-in different kinds of underlying storage (including native data structures, mediated by a JavaScript-to-native bridge).
  • Aligning our terminology, API shape, and data-flow with the latest thinking on the iOS and Android side (for better interoperability and communication).
  • Dropping support for legacy GraphQL (pre-open source) syntax.
  • Splitting the code up into separate "compiler", generic "core/runtime" and "React" packages.
  • Implementing deterministic, performant garbage collection.
  • And many others.

Relay Modern

As a tiny hat-tip to risk management, we decided to build a toy prototype before fully committing to the rewrite. Joe spent about two weeks building a little React Native app that could render and paginate through a list of friends, and navigate to a simple "permalink" view using two or three static queries. "I think it’s going to work", he said. So Joe and I started again, this time for real. It took us about 3 months to implement the new core, while in the meantime other Relay team members continued adding features to the existing codebase.

We knew perf was going to matter, so I built a microbenchmarking framework that uses the Wilcoxon Signed Rank test to give us an accurate picture of whether any given change made things better or worse. We maintained great test coverage and made sure everything was thoroughly Flow-typed. I built a "golden" test runner (this predated Jest’s "snapshots" feature) to enable us to maintain a large body of tests easily even as we made frequent changes to our internal query representation. I made a sample React Native app so that we could run on-device benchmarks. Basically, I was scrambling as fast as I could to lay down most of the supporting work while Joe built core abstractions on top. It was amazing to work with such a motivated, talented collaborator. Striving to keep up, providing deep code review on his diffs (that he knocked out at an humbling clip and quality), and the countless stimulating discussions around whiteboards: I know that the experience made me a better developer. It was deeply rewarding.

The moment of truth came when we were finally able to run an on-device normalization benchmark — normalization is the term we use for processing a query response from the network, transforming it from a hierarchical form into a flattened, "normalized" representation for storage in the on-device cache. We knew Relay Modern should be faster because it was drastically simpler, we’d taken great care to avoid performance anti-patterns, and we were simply doing much less work at run-time. When the benchmarks came in we were a little stunned. We ran them again. We sanity-checked them on multiple devices. The results were consistent: normalization in Relay modern was about 10 times faster. It’s true that normalization is just one of the things that Relay has to do, but it was clear that we were onto something. Relay Modern was going to be great for mobile devices and spotty networks. It would perform great on desktop environments too, but we’d aimed to solve a harder problem and it looked like that’s exactly what we’d done. The bet had paid off.

The happy ending

All this happened in the first half of 2016. We actually thought we were on the brink of shipping it. I spoke about it publicly for the first time in August — ill-advisedly calling it "Relay 2" because we didn’t have a better name for it yet — and Joe followed soon after. We had a few road bumps on the way which led us to delay shipping; I’m sorry that it took so long, but I’m really happy to say that the product is finally out the door.

Between finishing the new core and shipping 1.0.0 there has been a lot of thankless grunt work done by a bunch of people on the team. It was a group effort, but in particular:

  • Yuzhi lead an amazing effort to migrate thousands of Classic components and educate teams.
  • Jennifer built out prefetching (the ability to have native code on a mobile start fetching a query for a React Native app before the JavaScript VM has even finished booting).
  • Jan did a fantastic job of making sure we had a great migration strategy and compatibility API for moving existing apps over from Relay Classic to Relay Modern.
  • Lee helped us prepare and package everything for an open source release.
  • Our manager Alex was a roving support agent who tirelessly helped out with anything and everything.

But this post is in large part a tribute to Joe Savona. Neither of us is working directly on Relay any more, but the experience will forever loom as an indelible and transformative part of my Facebook story. As a colleague, erstwhile neighbor, and friend, working on Relay with Joe was a once-in-a-lifetime experience. I’m sure that Relay will continue to be an important building block for teams at Facebook, and I hope that it’s useful to teams in the external community as well, but no matter what direction the framework ends up evolving towards in the future, I know that the design and architecture will retain elements of Joe’s brilliant touch for a long time to come. Thank you, Joe, and keep on hacking.

Discuss: FacebookTwitter