Paul Irish recently gave a talk at Bocoup with a dire warning: “The Mobile Web is in Trouble.” The general theme of the talk, somewhat apparent from the title, revolved around developers moving to native apps for performance, capabilities and discoverability. Near the end of the talk he offered a challenge–developers working on the Open Web should focus on expanding the capabilities of our apps and the Open Web in general.
One of the core preserves of native applications is numerical simulation and modeling. Numerical simulation is commonly used to model and predict the behavior of systems which cannot be modeled analytically or which have a significant random component. Let’s get the core objection out of the way: it’s crazy to perform numerical simulation in JavaScript. There are a half-dozen languages with full featured simulation and modeling capabilities built into the language or well maintained libraries1.
But let’s be crazy. Over the next few weeks I’ll be covering numerical simulation techniques in JavaScript, starting today with an overview of random number generation, then generating normally distributed random variables (along with other distributions) and finally adding in a robust resampling method to the Miso Project’s Dataset library.
What’s the matter with Math.random()?
JavaScript already has a facility to generate uniformly distributed random variables: Math.random(). It seems to cover all our needs, but let’s look at what ES5 has to say about the expectations for output:
Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy. This function takes no arguments.
Let’s not (yet) get hung up on the distinction between random and pseudo-random2. The real problematic area is “using an implementation-dependent algorithm or strategy.” Application designers can rely on Math.random() to provide a black-box of entropy. In that regard it’s pretty good. All of the modern browsers with open source engines use fairly decent random number generators and they’re all pretty fast. Why isn’t that enough?
Pseudo-random number generators output a long, non-recurring sequence of numbers. The exact value of that sequence is determined by an initial starting value referred to as the “seed”. For any given seed the sequence is apparently random but known and recoverable so long as we know the position and the seed. Each implementation of Math.random() requires a seed and most seed from the current time or a source of OS randomness (if available). As the spec indicates, Math.random() does not accept a seed value from the user and even if it could, the actual generator varies across browsers; Chrome, Firefox, Safari each use different random number generators3.
Chrome(v8): Uses a version of the Multiply-with-carry generator.
Firefox (SpiderMonkey): Uses a linear feedback shift generator similar to Java’s. There was discussion to move to the Mersenne Twister but the decision was made to wait until the Crypto API was implemented and simply use that.
Safari (JavaScriptCore): Used the Mersenne Twister until recently, then switched to a newer generator described here.
We can imagine two broad uses for random numbers where an implementation like Math.random() would be insufficient: simulation and procedurally generated content. Both of these require a generator which is both apparently random and perfectly repeatable.
A trustworthy simulation is one where the exact output can be reproduced deterministically. Scientists increasingly depend upon numerical simulation to prove theorems, test systems and inform policy decisions. A seedable random number generator is crucial to reproducibility of those results. But we don’t need to be simulating collisions in the Large Hadron Collider in order to want a means to reproduce our results! If we wrote a physics simulation and wanted users to be able to share their results, it would be physically impossible to do so using Math.random().
Most random number generation in games is perfectly suited to Math.random(). In general games want a fast source of entropy and don’t need client-side cryptographic strength RNGs. If we want to compute ricochets in a first person shooter, we care a lot about speed and apparent randomness–less so about the period of the generator or our ability to recover the exact numbers generated.
However if we’re working with procedurally generated content Math.random() puts us in a bind. We can generate content from the entropy provided but we can’t repeat or share the seed. If we wanted to build a side scroller where players explore procedurally generated landscape and let people play in the same landscape asynchronously, we want to be able to generate identical output for multiple players while storing a minimal amount of state. A seedable random number generator could allow players to share worlds in a web game just easily as they share Minecraft seeds.
One thing I haven’t mentioned is cryptography. The W3C has a draft Crypto API and there are a few good libraries which provide nominally cryptographic strength random number generators (namely CryptoJS and the Stanford JavaScript Crypto Library). But in general we do not want to do crypto in the browser. No amount of cleverness on the part of a developer can stop a client machine from simply reassigning our random number generator to reallyRandomYouGuys() { return 0.9;} and the client is always in the hands of the enemy.
How a random number generator works
There are a host of different random number generators, each with their own advantages and (occasionally fatal) flaws. One of the simplest is a linear congruential generator (LCG). It’s predecessor, the Lehmer generator, is even simpler but we can explore the LCG a bit to get a feel for how a computer generates random numbers.
A LCG works by exploiting a relationship between a (the multiplier), c (a shifter) and m (the modulus). The output z is an integer and serves to seed the next round of the generator. The apparently random numbers are generated by the modulus operation wrapping the value of a * z + c around m4. Each output z will range from 0 to m. A good generator will produce a very long sequence of apparently random numbers. The length of this sequence is referred to as the period of the generator (less commonly, the cycle). A properly configured LCG should have a period equal to its modulus, in this case, 25.
For longer streams with the same seed this pattern will repeat perpetually. At first this doesn’t seem too useful but if the period of our generator was something like 2^50 (perfectly achievable for some generators) we could generate a million numbers every second from now until 2038 and not run out. Most LCGs of practical values use a very large modulus in order to establish a long period. Unfortunately for us, the period isn’t the only thing which characterizes a random number generator.
Let’s make some random numbers!
Don’t worry, they’re not all like this.
I’ve written a number of implementations of standard random number generators in JavaScript. Each are written in a different style either to match the implementation in the browser (e.g. Chrome’s MWC generator or WebKit2’s invertible mapping generator) or to expose the basic function of the generator as with the linear congruential generator or the linear feedback shift generator.
Performance of these generators in the barest sense is actually pretty good. Comparisons with Math.random() across a number of browsers indicate that most basic generator are relatively competitive. Also included is David Bau’s ARC4 implementation as an example of a full featured, robust generator. If your primary goal is not speed and you’re balking at writing a strong random number generator from the ground up, Bau’s implementation is well documented, updated and generally very good. You can see the full source code here.
Broadly speaking, the lesson is that we don’t lose too much in performance by avoiding native code. So long as we choose our random number generator with a specific purpose in mind, generating robust, reliable sequences in JavaScript should be fast enough for most needs5.
Where do we go from here?
Over the next few weeks we’ll dig into generating numbers from distributions other than the uniform distribution by a number of methods. If you want a rough preview of what that might involve, you can take a look at one method for generating normally distributed variables. We’ll explain how that method works as well as a few others. As we step through creating the building blocks for numerical simulation in JavaScript we’ll also illustrate a few tests for randomness and methods to visualize the performance of a generator.
Basically what R is built for.↩
Pseudo-random versus random discussion has probably spilled more pointless bits than any debate which didn’t involve a semi-colon.↩
We don’t know about IE or Opera, as their engines aren’t open source.↩
Knuth Volume III offers a rather intensive introduction to the theory behind linear congruential generators as does Hull and Dobell.↩
Even a native implementation is only about 6000 times faster than a Lehmer RNG written for the IBM/360 45 years ago!↩