A Generic Sync Server

Published Sunday, September 24th 2023 · 8min read

I originally wanted to use this month’s post to tell you about how I re-built my website in Astro because Gridsome is pretty much dead by now, and I even have trouble getting it to run at all in more recent Node versions—but sadly, I didn’t have the time (well to be honest, the energy) to tackle this project yet.

So instead, I wanted to get some thoughts out about a concept that I’ve been thinking a lot about since last November when I kicked off the development of the next version of Qami.

Some Context

Qami is a creative writing app that I originally released in 2017 and which has been a constant companion on my design and development journey over the years. It provides a distraction free environment for long-form creative writing and tracks how many words users write on a given day. This allows them to set writing goals and try to meet them in a low-stress and joyful manner. In short, it’s a fitness tracker for your creative writing.

The original version works by storing chapters as files in a single folder on the user’s file system. This makes syncing trivial as all users need to do is set up this folder in a location that’s being synchronised with their regular cloud provider such as Google Drive or iCloud drive.

While quick and simple, this solution has a couple of drawbacks. First, users aren’t really meant to interact with the project folder or files within it outside the application. It’s great to own the data and have it available as text files, but I stored them in a JSON-like format that doesn’t really have much tolerance for errors which could potentially break the app. Second, building the app in that way required direct access to the file system, which is barely possible from a web app running in a desktop browser, and even less so on a mobile device.

I want the next version of Qami to be truly cross-platform, which is why I’m building it as a progressive web app. In doing so, I’m also planning on storing the data in IndexedDB like I do with my other apps, which will have multiple benefits—but no obvious way of syncing it between devices, which I think is a crucial feature for an app that is all about being able to quickly jot down a poem on a phone while sitting at the lake, or more deliberately writing morning pages on a laptop in the kitchen after taking the first few sips of morning coffee—with all those words being tracked correctly and across devices.

The Simple Options

Writing offline-first applications that sync isn’t trivial, but it has been done before. I could either fall back on an already existing library or framework, or build my own, specifically catered towards Qami’s needs. But that would also require me to provide and administer the infrastructure necessary, increasing cost and overhead, something I’d ideally like to avoid.

There’s also a somewhat new, batteries-included solution for syncing IndexedDB databases available for Dexie, the library I use to interact with IndexedDB. Dexie Cloud is a solution built specifically to make creating and running offline-first syncing web apps a breeze. It sounds like a dream, however I’d also be offloading my user’s data to a (closed source) third party that I’d have to trust. Not exactly something I’d do lightly, considering what just happened to all the people using Unity for their projects.

The Best of All Worlds

Those are some solid options that I could work with, especially using Dexie Cloud would make things extremely easy for me and since sync is an optional add-on for Qami anyway and Dexie itself is open source, lock-in would be kept at a minimum. However, I feel like there’s a place for a more general, distributed solution that could work for Qami, but also a number of other apps.

A simple, generic sync server that anyone could self-host or be a user of, no matter what apps they’re using, as long as they speak the same sync protocol.

Such a system would allow me to keep the core of the app free and even offer a way to sync their data with their own instance to technologically well versed or privacy-conscious users, while hosting and administering an official instance for paying customers, decoupling sync from the app itself.

It would also mean that I could use this form of sync for all of my apps, not just Qami, and users could buy (or self-host) a single package to sync their data in all of them.

Core Principles

For me, such a system would have to adhere to a set of core principles to be trustworthy. It would have to be:

  • Fully open-source, for longevity and self-hosting
  • Fully encrypted: no unencrypted user-data should be stored remotely where it could be accessed by the system administrator
  • Easy to set up and operate
  • Trivial to self-host

Especially the encryption is something I wouldn’t want to compromise on, even if it makes things more complicated. In an ideal world, the server wouldn’t even store any unencrypted personal information such as email adresses.

Additionally, using the service should be seamless for its users. They shouldn’t be forced to remember yet another password, and they shouldn’t have to periodically re-authenticate after setting up sync for the first time. At the same time, I am a firm believer that they should own their data and thus should be able to seamlessly enable and disable sync at any time and choose whether to delete any existing remote copies.

In my initial designs, users authenticate and authorise remote operations with one-time-passwords generated and sent to their email. They also receive an app-specific symmetric encryption key when they first set up sync, which they can then manually export when setting up a secondary device, for example by scanning a QR code that encodes the key.

Implementation Details

Based in my initial research, there are two viable ways I could go about creating something like this. Both involve a simple server that functions as a gatekeeper for the database and hosts a basic dashboard to manage users and their (encrypted) data. What differs is the backend that actually manages synchronisation and data storage.

CouchDB / PouchDB

CouchDB is the database system powering NPM (among other things). It is built for replication, which in itself is a form of sync. I know of a couple of applications that use CouchDB as their backend to provide synchronisation for their users, so it seems to be a very viable option. There’s even a compatible library that is capable of running in a browser (as well as on the server): PouchDB.

Since replication is a built-in feature, I wouldn’t have to do the heavy lifting on that front, which is a big plus. There are also existing plugins for PouchDB that allow for encryption of all data leaving the device and even encrypting the data at rest, if I wanted to.

Unfortunately, it would also mean that I couldn’t use Dexie as my interface to IndexedDB. While CouchDB seems to be moving along at a steady pace, it seems like PouchDB has had some maintainability issues in the past and some plugins haven’t been updated in years. This makes me feel like I would be tying myself to a dying technology that could cause more trouble than its worth in the long run. Obviously, that’s a worrisome thought for a project that’s meant to last for a long time.

From a technical perspective, I’d use the server-side component to negotiate an initial handshake when a replication request comes in, and create or connect to a server-side database specific for the user and the app they’re using after confirming their identity. With that in place, CouchDB / PouchDB should be able to handle the rest from a backend perspective.

On the frontend, I’d have to ensure the data is properly encrypted before leaving the user’s device and decrypted when new data arrives, which would then also be the point at which some form of conflict resolution should happen in an app-specific manner. In Qami’s case, this would likely mean a form of manual resolutions for user-generated content and an automatic conflict resolution for tracked data.

Custom DB-agnostic solution with Feathers

The main issue I have with going for a Couch- / Pouch-based solution is that I would be tying myself very strongly to a project that I cannot fully control. It offers a fair bit more freedom and security compared to a closed-source solution such as Dexie Cloud, but it would also mean that anyone wanting to set up their own sync server would have to run an instance of CouchDB or PouchDB (or another database supporting the replication protocol) as well.

For the system to be truly generic, I feel like it should support many, ideally any, backend database by providing a common query language and interface to interact with it. There’s actually a framework that makes that surprisingly easy: Feathers.js. I have worked with it before and really enjoyed it. It’s well maintained and continuously improves with every release, while also providing a lot of other building blocks (such as authentication and real-time events) out of the box. Using such a system would also make it possible to run a small, personal instance of the sync server on a service like Deta, which provides a generous free-tier.

The main drawback of using Feathers would be that I’d have to build the actual synchronisation logic myself, which from my initial research seems highly complex. On top of that, I’d also have to handle encryption and decryption, and the entire offline-first capabilities, so saving the data locally before syncing it, but keeping it up-to-date as well. There is a project called feathersjs-offline that technically has the foundation for such a system, but unfortunately, it seems to be unmaintained at the moment.

So while this option would definitely be the more flexible one, it would also require considerably more effort and thusly provide a larger surface for bugs and errors.

Moving Forward

While I haven’t made up my mind yet on which option to pursue, I believe such a generic sync system has a lot of potential. As far as I’m aware, nothing like it exists already (the closest project being Userbase, which also seems abandoned)—which makes me wary, as it must mean that it’s either extremely complex or simply not viable. However, I’m willing to try nonetheless, which is why I’m deliberately keeping the scope small. While features like collaboration and asymmetric encryption would be great to have, I want to focus on doing one thing well: providing an open-source and generic way for a user to sync their own data to any server speaking the protocol in a safe and non-restrictive way.

What do you think about this idea? Do you know of any projects attempting something similar? Did I get something fundamentally wrong? Feel free to let me know on Mastodon.

As always: thank you for reading! 😊