2017-02-02



Introduction

This is test 5 in a series of real-world examples where Tableau capabilities are compared to those of Power BI (PBI). Once again, a completely random example was selected for this test with no pre-conceived notion of how each tool would perform during the test. The story about how the mathematical approach used in this test is funny (and true!) and is explained in the first video shown below.

This test represents a challenge in which PBI claims to have an advantage over Tableau. I remember reading a couple of months ago that Microsoft claimed that the PBI data engine was reported to be 10 to 100 times faster than the Tableau Data Engine. This statement intrigued me so I thought I would try to verify it myself.

The question I want to answer with this testing is this: How long does it take each tool to compute a series of numbers? What I am trying to determine in this test is a direct comparison of the computational speed of the data engines. I hope to be able to find out which engine is faster.

My Background in Benchmarking

I’ve got a long history (30 years) of benchmarking computer programs. Typically, the benchmarks I completed were of two types:

Given a few different compilers, take a section of code (Fortran, C, Pascal, Basic, etc), compile it, run it and compare the time required to complete the computational test. These benchmarks helped us determine which were the best compilers for performing serious number crunching, allocating memory, and other nerdy topics. Those results were always conclusive because we controlled the parameters of the test (i.e., how many iterations were allowed, the convergence criteria used, etc).

Given a single compiler, compare the computational speed of algorithm 1 vs 2 vs 3, for doing the same tasks. Determine things like big O for the algorithms. This type of test might compare different sorting algorithms or iteration methods, for example. Once again, the benchmark results were repeatable and definitive.

The Initial Test Case – Computing Squares With A Finite-Series

In the first video shown below, I discuss how I created the idea for this test and I show some initial testing of Tableau. The computation of the square of any number can be achieved with an iterative method. This method is a finite-series for the computation of squares and it uses recursion to solve the problem. I programmed this approach in Tableau to see how Tableau would perform under a heavy computational load.

This approach isn’t efficient, but I didn’t want efficiency – I needed to make the data engines do some work. By feeding a list of 10 million numbers into Tableau and asking it to compute the square of each number, I made it do some work by performing at least 30M operations. The iterations were possible using the calculated field that I show in the video (i.e., using the previous_value(0) function).

This form of recursion is very useful in Tableau and gives Tableau the ability to calculate a lot of the readily available quick calculations like running totals. I need you to remember the importance of this feature for the next few minutes, at least until you get down to the PBI explanation!

The Tableau Results

The time required for Tableau to compute 10 M squares was less than 35 seconds. When Tableau draws the resulting curve of 10M squares, it draws all 10M marks. In this case, the graphical rendering takes longer to complete than the computation of the squares.

The Alteryx Results

Just for the fun of it, I added Alteryx to the test, which really isn’t fair because it is such a highly optimized program. The time for Alteryx to compute 10M squares was 0.1 seconds, and 0.2 seconds for 100M squares!  If a browse tool was added to see the 10M rows of data, the time required varied from 3.7 to 4.7 seconds. Yikes.  Now you know why I wrote this article.

The Power BI Results

For me to do this test in Power BI, I had to do some DAX research. I had to find the equivalent function to previous_value() in Tableau to be able to do the iterations. Essentially I need to iterate on a new calculated field (called a measure in Power BI lingo). Even with their row based iterators (Sumx, etc), there is no way to access previous values of a measure.

In other words, I needed to find a way to do multi-row operations like I used in Alteryx. Well, unfortunately, there are no such functions in PBI (at least that I can determine). Therefore, I was not able to make a direct comparison of Tableau to PBI for this numerical test.

I am now wondering if the lack of this feature is one reason why Microsoft has not been able to deliver on its promise to deliver a whole series of standard-issue quick calculations, like the ones that are available in Tableau.

A Modified Test

To get an idea of the computational speed comparison, however, I used a different method to compute the squares. I let the measure compute the square directly as Number * Number. This single operation was less than the three operations used in Tableau (2 additions and 1 subtraction), so in theory this would take less time.

For this formulation, PBI completed the 10M squares in 28 seconds. Tableau also completed the 10M squares in 28 seconds. The programs tied in this case.

The Results

The research and testing I conducted tried to determine which program is faster with respect to pure computational speed. In the first test I designed, I was not able to apply the same mathematical formulation to both Tableau and PBI. In the second test, I applied a less strenuous formulation and both programs completed the task in 28 seconds, for a virtual tie.

Since I was testing programs that perform multiple functions (i.e., computations, graphical rendering, different capabilities of rendering) and have differing capabilities, it is not so easy to conclusively state which data engine is faster. What I can say is that both data engines are very fast, very robust, and will make you happy with what they can do. At this point, I’d say that they are tied in computational speed, although Tableau does have some additional computational flexibility.

Questions for Microsoft

Although this research and testing may not be totally definitive, I now feel like I need to ask Microsoft these three strategic questions:

How did you determine that the PBI engine is faster than the Tableau data engine?

What testing were you able to do that was able to isolate the pure computational engine from the rendering engine?

Can you publish the results of your testing so that I can see what you did?

Future Work?

Sometimes when I do this type of research, I will think of new approaches that can be used to answer the question. Now that I know more about what DAX can and cannot do, there might be other tests I can design to force the programs to work very hard in a computational sense. It is clear that longer computational times will be needed to determine which engine is faster.

With that being said, maybe I’ll revisit this topic later. For now, however, all I can say is that PBI is fast, Tableau is fast, and Alteryx is the most robust and fastest data engine of all of them.

Some Funny Things Can Happen When You Do This Type of Work

Thanks for reading.

Show more