Today, we're happy to announce that the newest version of our Firefox extension adds support for opt-in synchronization of browser history, rounding out a lineup that includes synchronization of bookmarks, passwords, and open tabs. This latest version also includes beta support for Firefox 4. We'll be rolling this out to the entire userbase in the coming weeks, but if you're interested in a sneak preview, you can download an early version here:
To enable History Sync, navigate to the "Sync" pane of the Xmarks Settings dialog (Tools → Xmarks Settings → Sync) and click the checkbox next to History:
Do the same thing on two or more computers running Xmarks for Firefox, and those computers will automatically and silently start exchanging history information. Easy!
If you're interested in understanding more deeply how it works and some of the design trade-offs we faced to make this happen, read on below the fold. And feel free to let us know how it works for you in the comments.
History Sync vs. Bookmark Sync
When we first started thinking about history sync, we naturally viewed it as being very similar to bookmark sync, and contemplated applying our existing Bookmark Sync Server to the task. Bookmarks and History being so similar in their structure, this seemed like an easy job. And our Bookmark Sync Server is battle-hardened and extremely reliable. Have you ever noticed that when you have a hammer every problem looks like a nail?
The problem crops up when you consider the volume of data generated by millions of users surfing and syncing their history. How many pages do you visit for each bookmark that you create? 100? 1000? Our Bookmark Sync Server is scalable, but as a small company, scaling a free service by a factor of 1000 is really out of reach for us. So we needed another approach.
Trying Not To Boil the Ocean
While there are lots of similarities between syncing bookmarks and syncing history, there are plenty of differences too. We started to explore those differences, looking for a more economical approach to the problem. The first observation is that users manage bookmarks, but the browser manages history. The browser takes all kinds of liberties with your history (e.g., creating new entries as you browse, purging old entries as they expire) that would not be acceptable when dealing with bookmarks. That yielded the first insight: since users don't really "own" their history in the same way that they own their bookmarks, absolute fidelity is not critical for history. So we can probably deliver an acceptable experience without syncing everything in your history.
Next, the advent of the Awesome Bar in Firefox has really changed the way that users interact with browser history data. Power Users (who tend to be our biggest fans) learned soon after its introduction in Firefox 3 that the Awesome Bar allows them to get to their most frequently-visited sites just by typing a few letters into the address bar. The folks at Mozilla who designed the Awesome Bar spent time developing a clever algorithm to rank items in your history so that the most useful sites appear at the top of the drop-down list when you start typing.
The algorithm is driven by "Frecency" -- a combination of the frequency with which you visit a site and the recency with which you've visited it. It tends to bubble up to the top of the drop-down list those sites that you have either visited a lot or have visited recently.
That leads neatly to the second insight: if you're going to sync only a subset of the entire browser history, make sure that you pick the sites that are most frecent, because those are the sites that are most likely to appear in the Awesome Bar. A corollary here is that good history must be consistent with with frecency. That means that sync needs to accumulate visits to a site from all clients being synced so that frecency incorporates the sum total of all visits everywhere.
The final insight here is that, inasmuch as the browser purges expired items from your history as part of its regular operation, deletes don't really need to be synced. That is, I don't have to sync the deletion of old history items from one browser to another, because the other browser will probably purge those expired items on its own. That does neglect one use case that we decided wasn't so important: syncing the manual deletion of an item from history. We figured that was a rarity, and in any event, users with foresight can gain the same protection by switching the browser into private browsing mode.
So what's the result of this exploration? It's all about the Awesome Bar, and feeding it the right data so that it provides good results when users type. It's not about making sure that every change to history is synced to every other browser. We want to make sure that users can go back and forth between two computers they use regularly and get at sites that they visited recently on either computer, and also be able to set up a brand new computer and have the history get populated with the user's most frequently visited sites.
Given these requirements, our Bookmark Server looks like overkill: it provides complete data fidelity, it versions every change that a user makes so that they can do backup & restore, etc. For this problem, we need something lighter.
Happily, we've got another tool in our shed: to deploy tab sync, we developed a fast, lightweight server that is well-matched to the problem of syncing open tabs. It turns out that it's rather well tuned for syncing history too. Here's how it works:
Periodically, the Xmarks addon queries Firefox's local history database, looking for the most frecent urls, a set that changes constantly as you browse. Having determined the set of frecent urls, it then finds actual visits made on this browser to those urls. It builds a compact representation of this data and sends it to the server.
Then it queries the server for history data from other browsers. If there is any, the addon downloads it and adds any as-yet unseen visit data to the local history database, making it appear as if you had made those visits on this browser.
That's it. In actual deployment, the key variable is the size of the data being pushed from each browser to the server: the larger the data, the more history can be exported from one browser to another. But more data means more bandwidth and comptutation (for us and you). We've currently set the limit at 32KB, which allows for typically 200 or so urls. So far, that looks like a decent compromise, but we welcome feedback. Do you tend to find the things you're looking for in the awesome bar?
In the coming weeks, look for history sync to make its way onto other platforms that we support.