Filechute

A while ago I wrote about how to organize things inside your Dropbox folder. That post presents a bunch of heuristics to help you answer the vexing question of "where do I put my stuff" given a hierarchical structure in which any given item may only appear in exactly one place[1]. Since then, a couple of things have changed.

One is that I switched from Dropbox to Sync. Sync offers a simpler, more focused product that offers encrypted storage and hasn’t succumbed to the slow feature creep that seems to have infected Dropbox over the years. Unlike Dropbox, Sync looks focused on building a robust and secure file synchronization service, and not a bunch of stuff that I don’t want[2]. Although I moved on some time ago, I continue to see complaints about Dropbox, noting lack of support for Apple Silicon and excessive CPU usage even during periods of supposed inactivity.[3]

One interesting thing about Sync is that, even though symlinks are not "officially supported", they do work in quite a reasonable way. This means that you can make things appear in more than one place at a time, although it is quite cumbersome to do so: you create symbolic links (with ln -s) and in the Sync web interface and also on the iOS client those links will look and feel just like the source files that they point at. Having said this, however, the awkwardness of managing large numbers of symlinks means that I am unlikely to make much use of this.

The other thing that happened is somebody shared with me their great blog post, "Designing better file organization around tags, not hierarchies". What’s interesting about that post is that it starts from the same problem ("fitting things into hierarchical structures is hard"), but arrives at a completely different destination — while my post modestly proposed a set of rules for living within a traditional filesystem, Nayuki proposes to reject hierarchical systems entirely and build something completely new.

This gave me some food for thought. Now, even though I love the content addressable storage ideas embodied in programs like Git, I am not interested in coming up with an alternative filesystem that addresses items by content (ie. hashes) rather than paths. For all its flaws, the path-based paradigm has been dominant ever since the early days of UNIX and its existence is assumed pervasively within our operating systems and applications. Moving away from it would only bring a world of hurt, requiring a bunch of broken things to be fixed or replaced with "equivalent but probably quite buggy at first" rewrites built on the new paradigm. And the truth is, the current system is not even that bad. One of the reasons it is so dominant is precisely that: it’s not too bad. Faint praise, you may say, but it’s true.

For example, in software projects — where I spend a lot of my time — the ability to reference files by paths has proven to be perfectly adequate. You establish your conventions, whether they be things like having a models/ folder or a utils/ folder, and sticking tests in a certain location, and so on, and get to work. People have been able to build quite large systems without much friction using this pattern. When thinking about alternatives, like storing files in databases, or looking them up by tags, they feel like solutions looking for a problem.

Likewise, shoving my photos into an opaque package on disk managed by Apple’s Photos.app also works reasonably well. I mean, I can get at the original files if I really need to, and letting the app abstract over all the messy details of tens of thousands of files with names like 49890870-2B89-482B-82AF-F70A4F98456A.jpeg, neatly surfacing the Exif-provided metadata, is a net win. In many ways, the app sucks, and like any large and complex software artifact, it is occasionally going to lock up, get slow, lose data, corrupt precious digital memories, or manifest bugs — but this is probably a good time to riff on the Churchillian trope: Photos.app is the worst possible photo management solution, except for all other photo management solutions…

But if we go back to the problem that originally motivated me to write my post — how to organize your Dropbox folder — I think this idea of literally throwing out the whole hierarchical filesystem abstraction is worthy of some serious consideration. A well-executed tag-based architecture can eliminate so many of the headaches that come with the doomed enterprise of maintaining a complex taxonomy. It may take some work — maybe a lot of work — to get there, but surely it’s at least worth thinking about it.

It’s true, even if you only build your new abstraction as a mere layer on top of the hierarchical filesystem (as opposed to replacing it), you’re still going to need custom applications in order to access it: probably an app for your desktop OS and another for your mobile, and while we’re at it we may as well throw in a web UI. But this is already true of Dropbox, so maybe it’s not too much of a concession (other than the fact that we’d have to write or otherwise source the darn apps on our own).

I did a little experiment. Despite writing that massive long blog post — an 18-minute read, apparently — and coming up with all those organizing principles, I’m still deeply unsatisfied with the state of my Dropbox Sync folder. And I don’t even have that much stuff in it (25GB on disk at the time of writing, or 186,000 files in 26,000 directories[4]), and I’m still unhappy with the level of chaos whorling within. When I found out that symlinks actually did work in Sync, I took them for a small spin to see whether I thought I could make it work. The experiment was brief. These symlinks were too painful to create and too easy to break[5], and ultimately it started feeling that even something crazy like, er, writing your own abstraction over the filesystem, started to sound like it might be a better investment. Now, I don’t know if I’ll actually do that, but I at least can commit to writing a blog post about it[6].

So the thing about any tag-based alternative to a hierarchical taxonomy is that, in order to actually find stuff, the tags have to be rich. It’s not going to cut it if you have to find a needle in a haystack of 186,000 files and your stuff is barely tagged at all. This means that the friction involved in adding tags has to be near zero. In other words, though the engineering challenges of building a fast, scalable, and robust tag-based file storage mechanism sound super interesting, they all mean nothing without a UX that makes adding files into the corpus a delightful, lightweight experience.

I got to thinking about what a UI for this might look like. The search interface is easy. I’m thinking something like iTunes where you have a list of matching items in the main pane, and above it you’d have a column-based widget for drilling down into the tag-space[7]. Get the UX for doing a good job of tagging right, and the search problem becomes much easier, in a virtuous cycle.

While I was thinking this, the visual metaphor of dumping stuff into a chute came to mind. You have some document, say, a PDF from your bank with an amazing name like 967164278_02_03_2019_911687434.pdf, and you just want to get it into this system as quickly as possible and trust that you’ll be able to find it again when you need it, without ever having to think about that horrible name again. So basically, you want to be able to drag and drop this thing onto something, have it intelligently suggest some useful tags to you based on the type and contents of the file, whatever metadata can be extracted from properties in the file or perhaps associations with similarly named files that you’ve dragged in previously. And as quickly as possible, you want this thing to disappear into this black box whose internal structure you don’t need to know about (but the engineer in you is delighted to know is actually going to be something like c7/c8187f8280db72204227ee6d501ac8c7cba15d and not Finances/Bank/Statements/2021/January.pdf).

And funnily enough, a word like "Dropbox" describes pretty much exactly what you would want this thing to be: a place where you can painlessly drop things and trust that they will be "taken care of", so to speak (and not the hideous reality that Dropbox and all similar products actually are: unfathomably large junk drawers in which you play out your small part in contributing to the eventual heat death of the universe).

But the name "Dropbox" is taken, obviously, so the next thing that pops into mind for this vaporware is "Filechute"[8], hence the name of this article. Giving something a name makes it sound real, but make no mistake about it: I probably won’t build anything like this unless I suddenly find myself on an unexpected sabbatical year, and I just wanted to commit the idea to writing in the hope of visiting it more seriously some time in the future.

This thing wouldn’t seek to be a replacement for the hierarchical file system, and it wouldn’t seek to be something that would, for example, interoperate with all those legacy APIs and applications that expect a hierarchical filesystem by presenting a FUSE-powered compatibility layer. It wouldn’t be optimized for documents that change frequently, but instead lean in hard to the dropbox/filechute analogy where the entire thing would be optimized for doing two things with minimum effort: adding stuff and then finding it. It wouldn’t try to take over your iTunes or Photos libraries, nor manage your software projects. It really would just focus on managing that collection of typically immutable documents that we all end up shoving into Dropbox or an equivalent, usually in the belief that we might need it someday, but also knowing that the effort we expend on keeping it all organized is almost certainly barely worth it.

I think I’ll stop here, as that basically lays out the high-level vision, and from here the only direction to go is the indeterminate "off into the weeds". I am interested in going there in due time, so watch this space for a follow-up in which I get into some details about what I think the underlying architecture might be, and maybe I’ll even sketch up some mocks for the kind of UI I’m imagining. In the meantime, thanks for reading!


  1. Other than the obvious workaround of copying a file so that it appears in multiple places, with the evident downside that if you ever edit one of these copies, you’ve now created a divergence. ↩︎

  2. And not just "don’t want" in the sense of "this thing is not interesting to me", but rather "I actively wish for this thing not to be here", because I think the lack of focus leads to a more complex, less stable product. ↩︎

  3. I switched to Resilio Sync in late 2023, so most of the references to Sync are out-dated at this point. ↩︎

  4. With the small disclaimer that I’m not actually manually managing so many items — those numbers are definitely inflated by the presence of a number of machine-managed subtrees in the form of Git repositories and the like. ↩︎

  5. You can break a symlink quite easily by moving the file it points at and forgetting to recreate the symlink. And in the specific case of Sync, I am not sure what would happen if you tried editing a file that you opened via a symlink (that is, when I tested this, I only verified that you can view such a file in the iOS client and in the web, neither of which permit you to actually edit the contents — it’s not clear what would happen if you tried such an edit on another machine running the Sync client, and I’m not even sure whether it would present itself as a symlink or an apparent copy). ↩︎

  6. You’re reading it right now. ↩︎

  7. I also thought of the ability to define programmatically derived "views" (like views in a relational database) that would allow you to do really bend this thing into your desired shape, but very quickly concluded that the dumb column-based UI would probably be just as effective for search, provided the quality of the tagging was good. There’s always time to build a more "sophisticated" search interface later on if the dumb version turns out to be insufficient in practice. ↩︎

  8. A name apparently already used for one or more things, based on this Google search that currently returns a little over 19,000 results. ↩︎