2015-10-15

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

PLEASE NOTE: This is a guest post by Llewelyn Richards-Ward.

When there are two packages appearing to do the same thing, lets return to the Zen of Python which suggests that:

There should be one—and preferably only one—obvious way to do it.

Why is this particular mantra important? I think because the majority of users of packages like PyMC and PyStan are probably not programmers or statisticians. They are folk, like myself, who have an area of interest or expertise that these packages can assist us in understanding. So the implicit invitation to choose the “better” package, frankly, is probably not in our skill-set. With this in mind, I wrote to Prof. Andrew Gelman asking whether he would consider a blog or comparison between these two packages to add the expertise I don’t possess. He copied Bob Carpenter and Allen Riddell into the email, both of whom are subject matter experts around Stan and Python, respectively. The subsequent discussion stimulated so many more questions that I believe it is worth sharing some of the insights they offered me. This short summary is for end users and others with a healthy degree of OCD and inquisitiveness.

How much is each implementation of Bayesian analysis used? Using some (crude) indicators of search results from Stack Overflow and (Google search) showed:

PyMC2 had 49 (4,150),

PyMC3 173 (12,300),

Stan 1,116 (262,000),

PyStan 4 (4720).

[ed. No idea how you search for Stan on Google — we should’ve listened to Hadley and named it sStan3 or something.]

This fits with Stan being the powerhouse, with PyMC3 gaining a Python following and PyStan either being so clear to use no-one asks questions, or just not used in Python.

What goes in…

The original question about which to choose arose following some online (Stack-exchange, Wikipedia) searching for a solid set of reasons to choose between learning PyStan or PyMC3. I’ve used PyMC2 and it is, well, pythonic, but probably not really referenced elsewhere that often. Where there are examples, they are reasonable in terms of entry-level Bayesian modeling, but arguably also ‘toy’ examples.

In contrast, I’d read that Stan is powerful but disadvantaged from having to learn the model coding language (actually, having never programmed in C++, this turns out to be relatively easy with well documented syntax in the manual. The main drawback is not having access to attributes and methods within the text-formatted Stan language as you write the code, at least if you write it in Python. How statistical notation becomes python function or expression code, keeping the spirit of python around simplicity, is what the Stan language, Theano, patsy and others all are doing. Each has limitations, strengths and weaknesses with probably no outright ‘best way’, at present. Overall, the key message is that coding a model in PyStan, PyMC2 or PyMC3 all rely much more on understanding your model than how to write code.

What comes out…

Visualisation is at the heart of taking data and developing human intuition and understanding. Both PyMC and PyStan produce traces and have the capacity to simply query variables. Neither has a particularly well developed wrapper for plots, although PyMC is clearly ahead. A step back from PyStan is the Shinystan package (see also, Introducing ShinyStan) which “can be used to explore the output of any MCMC program written in any programming language, including but not limited to Stan, JAGS, BUGS, MCMCPack, NIMBLE, emcee, and SAS”. As an aspirational plotting package for Python implementation, either with PyStan or PyMC, it is well worth looking at. I’m unsure if it can be imported and used using the Rpy2 package. It is useful to know that there is agreement and intention for collaboration between PyMC and PyStan in how plots are produced. Neither will have seaborn as a dependency; this is somewhat puzzling as seaborn, when applied to traces from either, seems to handle them well. Both can convert the trace to a pandas object, allowing use of the pandas plotting api. So there is an exciting opportunity here for matplotlib (soon to be significantly improved) gurus to join with Allen Riddell in creating the “best of all worlds plotting api” for PyMC/PyStan.

The bit in between…

‘Under active development’ probably best describes the bit in between the code for the model and the output. It is useful to read and participate in the Stan and PYMC online fora. This is why the original question about comparing PyMC3 and PyStan is simply wrong; both are excellent packages. Both do different things across the two dimensions of implementation and usability. PyMC3 represents huge steps in modeling language and sampler flexibility. Bob Carpenter wrote, “I think they’re probably behind in functions and distributions and maybe in sampler adaptation. But they’re way ahead in Python integration directness! And we so far only have continuous params. I don’t know what PyMC does for discrete params. It’s a tradeoff, because marginalizing them out where possible leads to much faster samplers but is a pain from the coding side.” For a useful overview of why Stan was created, and particularly of the strengths it possesses, Bob Carpenter’s presentation provides some key insights.

Additional considerations:

PyMC2 used MH samplers; PyMC3 uses a “No-U-Turn Sampler (NUTS; Hoffman, 2014), a self-tuning variant of Hamiltonian Monte Carlo (HMC; Duane, 1987)”. PyStan uses the NUTS variant of HMC by default, but allows other current and future customizations by virtue of its overall architecture. The expectation from NUTS/HMC is that convergence will be significantly more rapid, an expectation borne out by empirical testing. For the non-technical user, both allow use of NUTS, which means the weakness of HMC (the need to tune parameters at the start) is well managed.

The level of documentation in PyStan offers a basic overview, but you need to read the Stan manual to really understand the code structure and depth. Similarly, the PyMC3 documentation (still in beta) is ‘emerging’, but introduces a powerful package. For a fuller treatment of current PYMC documentation, look to PyMC2.3.4. Overall, the Stan manual is significantly more comprehensive in documentation, is well-written (hat off to Bob Carpenter). Additionally, there is the popular Stan-users list, and Stan’s library of pre-coded models (about 100 models from the Bugs and Arm examples).

Optimization also is important: Stan is based on C++ code, PyMC2’s optimization path is less clear. One would imagine Cython or other optimizations are possible, but how this pans out in terms of real-world analyses, where the time is probably better spent on the problem and not solving optimization issues, seems like a barrier for PyMC’s utility. Enter Theano. PyMC3, through using Theano, is a huge step forward in optimization. Theano allows Python “to transparently transcode models to C and compile them to machine code, thereby boosting performance. Theano is a library that allows expressions to be defined using generalized vector data structures called tensors, which are tightly integrated with the popular NumPy ndarray data structure, and similarly allow for broadcasting and advanced indexing, just as NumPy arrays do. Theano also automatically optimizes the likelihood’s computational graph for speed and provides simple GPU integration.” (from the PyMC3 manual).

Missing data is more difficult to manage in PyStan/Stan; PyMC can easily leverage off masked arrays, perhaps using pandas to clean data first, and provide a tidier set of input data and/or manage data within the MCMC model.

Discrete parameters seemed more difficult to manage in PyStan than in PyMC. Or at least that was my initial impression. However, as Bob Carpenter noted: “ The key idea is that mixing is better when you marginalize out discrete parameters (as a consequence of the Rao-Blackwell theorem), and you get way more precise tail statistic estimates.” Another useful illustration of how end-user impressions can be wrong and how we simply may not ask the right questions. A useful comparison to develop understanding of both the theory and practice of discrete parameters and mixing is to be had from Chris Fonnesbeck’s example of the Coal Mining disasters model. In PyMC it is relatively intuitive to code and it solves efficiently. In PyStan, the coding is fully explained in the manual in pp.117-122. I suggest that coding and comparing both approaches will help a lot in understanding how these packages approach problems and account for statistical and computational theory. [ed. PyMC3 and Stan models will look more similar in problems without discrete parameters]

ODE solvers are implemented into the code base in Stan. PyMC2’s capacity for this appears to rely on utilizing optimization through Cython, Odeint [ed. Boost’s odeint is what we use right now in Stan 2.8 for solving ordinary diff eqs] or similar indirect methods. PyMC3’s use of ode-solvers is unknown at this stage.

Stan has variational inference under development, although not in PyStan, as yet; PyMC3 does not appear to offer variational inference. There is another package, BayesPy that currently has conjugate-exponential family (variational message passing) only, with plans for more.

Scaling of algorithms is important also. Michael Betancourt presents a useful examination around why scaling in terms of size and dimensions is an ongoing challenge, rounding this out with discussion of solutions being developed in Stan.

Stan, and PyStan are not compiled as Directed Acyclic Graphs; PyMC3 does use the DAG approach. PyMC3 allows production of graphics of the model because of the DAG implementation, which is a very useful feature. Stan relies on the language syntax at the start to illustrate the model.

Although maybe not an immediate consideration for end-users of these packages, it also is worth considering how each package might have the capacity to adapt and solve open research questions in Bayesian computing –future proofing. A couple of examples, amongst many no doubt, are how variational inference will be handled, or how HMC and discrete parameters interact. If these or similar future areas are relevant, or even might be, I would strongly recommend finding out where each package is in terms of a plan to approach the problem.

The wider stuff…

Both PyMC and PyStan have support from statisticians and academics who are leaders in their field. While it requires a reasonable statistical background, Bayesian Data Analysis (3rd edition) is a ‘must-have’ if you are serious about this area. Examples in the Appendix are in Stan, easily implementable in PyStan, and show clear links between textbook and application. A gentler introduction is to be had from Gelman and Hill’s multilevel regression book.

There are some excellent recordings of presentations by the leaders in this field, notably by Bob Carpenter who presents a ‘start-to-go’ MCMC presentation, followed by a practical introduction to Stan and its capability.

John Salvatier (from an engineering background) and Chris Fonnesbeck (bio-statistics) both have a strong online presence around PyMC3. [ed. John was the one who recommended we use autodiff in Stan!] Many of their IPython notebooks offer exemplars of PyMC2 and PyMC3. Chris Fonnesbeck also has a series of three talks introducing and then developing Bayesian and MCMC techniques in python, including use of PyMC2:

Bayesian Statistical Analysis Using Python, Part 1,

Part 2 and

Part 3.

There are other examples of using PyStan in climatology.

And next?

Initially my idea was to compare PyMC3 and PyStan on a more complex problem. Bob Carpenter suggested IRT 2PL problems as a useful benchmark, or possibly using an LKJ prior for correlation matrices (p.420 of the Stan manual). These ideas appeal as a way forward, comparing how greater complexity and breadth between both rather than the more simplistic comparison to date. What about a forum to compare and contrast, or a competition at the next SciPy event for the most elegant comparison?! An excellent ‘goto’ comparison currently available is Frequentism and Bayesianism 4, Bayesian in Python. My thought is that some productive friction between packages can only help move toward a best-practice position.

After further discussion, Prof. Gelman also commented on how PyStan, Stan and PyMC could all combine/evolve toward the common goal each has. In his words, “Ultimately I think/hope that PyMC can evolve into a wrapper for PyStan that will allow graphical models, missing-data imputation, fake-data simulation, posterior predictive checks, predictive model comparison (as in this paper, etc). As I see it, Stan is super-efficient and will continue to improve when it comes to sampling, optimization, and distributional approximation, so it would make sense for PyMC to use Stan as its “engine” rather than reinventing the wheel. But, from a statistical point of view, there’s a big gap between a graphical model and an unnormalized posterior density. And if Python users could work with graphical models and have Stan in the background ready to compute, that could be the best of all possible worlds. We’ve been talking about doing this in R but perhaps Python would be a better environment for it. I’d be happy to talk with the PyMC people (I’m cc-ing John Salvatier here) if there is interest in moving in that direction. I think this approach would be cool: if PyMC includes PyStan as its engine, PyMC could automatically have all the benefits of Stan but do so much more, enough to put Python at the forefront of serious Bayesian computing.”

Conclusion

The only reasonable conclusion, I believe, is that one should go well beyond naive “Which is better?” or “Which is easier?” questions. It seems that various opinions weight things like the language used to model the problem, output, and the ‘black box’ that is the sampler. PyMC3, however, seems to offer a significant step up from PyMC2.3.4. With the integration of Python behind it, PyMC3, Stan and PyStan now seem to be running in the same race. My choice, for what it is worth, will be to go with PyStan, which for me just feels more robust computationally. It has a clear development plan, significantly rich functions, comprehensive documentations, dedicated user forum, textbooks that inform and lead the package, and the robustness of a large multi-disciplinary team across programming, statistics, and mathematical theory.

P.S. Llewelyn Richards-Ward wrote the above post. But you can blame Andrew for the title. Following up on the success of this recent post, we’re gonna try to go all Buzzfeed from now on.

The post You’ll never guess what’s been happening with PyStan and PyMC—Click here to find out. appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

The post You’ll never guess what’s been happening with PyStan and PyMC—Click here to find out. appeared first on All About Statistics.

Statsblogs.com

You’ll never guess what’s been happening with PyStan and PyMC—Click here to find out.