OK, so this has taken a bit longer than expected (vacation and homebuying will do that to a guy), but I’ve finally gotten around to looking at (and somewhat analyze) the data!
First, I have to thank everyone (again) who installed the add-on. I know that for some of you, it caused some serious startup and shutdown lag. Everyone can feel free to uninstall whenever you like (I’ve already stopped the data collection server from actually collecting any data, so it’s not providing me with any more information anyway).
Now, for the interesting bits:
There were almost 5300 data sets submitted. I have no idea who submitted each data set, so if one of you submitted 5300 times, you’ve seriously ruined the validity of my data! ;)
Out of all those data sets, no one had more than 512MiB of data in the cache that was re-used from a previous session.
Even more interesting, only 3 of you ever used more than 128MiB of data from a previous session.
Of course, the amount of data that was just re-used isn’t necessarily the most important thing. If you reuse only 128MiB, but happen to visit a lot of new sites, your total cache usage will go up. We don’t want to set our max at 128MiB, because that would cause lots of churn. So now, let’s take a look at some more useful data.
With the exception of 2 of you, no one ever used enough data to cause serious churn with our current maximum size (assuming our eviction algorithm is doing The Right Thing).
The vast majority (approximately 98%) would experience little to no churn of useful data (data that they would likely re-use next session) with a cache limited to 512MiB (half the current max size).
Even if we did limit to 128MiB, 87% of people would experience little to no churn of useful data.
Using a cache size of 320MiB (chosen because it mirrors the size of Chrome’s cache, which is architected very similarly to Firefox’s), 95% of people would experience little to no churn of useful data.
That last point is especially interesting, given that Google has noticed performance issues with a cache larger than 320MiB. See William Chan’s post for more details.
So what’s the end result of all this? So far, nothing. This does give us information to make an (hopefully) intelligent decision in the near future, though. I think it’s safe to say we’ll likely reduce the default max cache size at some point in the future. We have to decide precisely what the new default will be, however, as well as whether we want to make the change in one release (cue the sound of millions of hard disks spinning as they delete massive amounts of unused disk cache entries), or whether we want to spread the “pain” out over 2 or 3 releases.