2014-12-18

A/B testing tools like Optimizely or VWO make testing easy, and that’s about it. They’re tools to run tests, and not exactly designed for post-test analysis. Most testing tools have gotten better at it over the years, but still lack what you can do with Google Analytics – which is like everything.

When you run a test until you’ve reached validity (not the same as significance), you have to do post-test analysis to decide on the way forward.

Looking at a summary screen like this is not enough:

Use these at-a-glace views for a quick check to see what the overall status is. But you need to go beyond once the test is “cooked”.

Your test can really only end in 3 different ways:

Control wins

No difference

Treatment(s) win(s)

Even when our testing tool tells us that that’s the final outcome, that’s not where our job ends. You need to conduct post-test analysis. And in most cases you need to do that OUTSIDE of the testing tool. Sure – Optimizely enables you to see the results across pre-defined segments, but that’s not enough either.

You need to integrate each test with Google Analytics

Both VWO and Optimizely come with built-in Google Analytics integrations, and data for each test should be sent to Google Analytics. It’s not only to enhance your analysis capabilities, but also to be more confident in the data. Your testing tool might be recording the data incorrectly, and if you have no other source for your test data, you can never be sure whether to trust it or not. Create multiple sources of data.

In Optimizely setting up the integration is under Project Settings:

You definitely want to use Universal Analytics instead of Classic Google Analytics. If you haven’t switched your GA tracker over yet, do it as soon as you can.

Not only will you be able to take advantage of new GA features, you can have up to 20 concurrent A/B tests sending data to Google Analytics. With Classic it’s only 5.

And once this is done on a global level, you need to pick a slot for each test:

Make sure that there aren’t multiple tests that use the same Custom Dimension (or Custom Variable for Classic) slot in GA – they will overwrite each others data, and you can’t trust it anymore. One test per slot.

Optimizely’s manual has a step-by-step instruction for this integration as well, including how to set up custom dimensions.

Once done, you’re able to look at any test result in Google Analytics using Custom Reports. You can make the report show you ANY data you want:

Some variation has more revenue per user? Why is that – well let’s look at average cart value or average quantity – those metrics can shed some light here.

Use whatever metrics that are useful in your particular case. Swipe the custom report used in the example here.

Note that Google Analytics won’t tell you anything about statistical significance (p-values), power levels, error margins and so on. You’d need to pull that data into an Excel / Google spreadsheet or something where you auto-calculate that. Don’t start the analysis in GA before the data is cooked. Make sure the needed sample size and significance + power levels are there.

Send variations as events to use advanced segments

Built-in Google Analytics integration is not foolproof. Sometimes the data is not passed on, there’s a 20% to 50% discrepancy – somewhere somehow part of the data gets lost. There could be numerous reasons for that, anything from how the scripts are loaded, in which order to script timeouts and other issues. I’ve dealt with a lot of different problems over the years.

My good friend Ton Wesseling taught me this “trick” that I now use for every test: sending an event to Google Analytics each time a variation is loaded.

All you need to do is add one line to the test Global Javascript (executed for all variations), plus a line of event tracking code as the last line for each test variation.

So this is the line you should add in the Global Experiment Javascript console:

window.ga=window.ga||function(){(window.ga.q=window.ga.q||[]).push(arguments);};window.ga.l=+new Date();

This makes sure that the GA tracker gets all the information once it loads.

Here’s where you do it in Optimizely. First open up the Settings while editing a test:

And now choose Experiment Javascript. Add the code there:

And now you need to add a line of event tracking code at the end of each variation (including Original). You need to just change the Experiment ID number and the name of the Variation:

window.ga(‘send’, ‘event’, ‘Optimizely’, ‘exp-2207684569′, ‘Variation1′, {‘nonInteraction': 1});

So what the code does is send an event to GA where the event category is Optimizely, action is Experiment ID (you can get that from your URL while editing a test) and label is Variation1 (can also be Original, Variation 2 etc). Non-interaction means that no engagement is recorded. Otherwise your bounce rate for experiment pages would be 0%.

Here’s where you add the code in Optimizely:

Now you’re able to create segments in Google Analytics for each of the variations.

Segment setup:

Create separate segments for each variation, and apply them onto any report that you want. So you could see something like this:

Illustrative data only.

Same thing can be of course done with Custom Dimensions. Just make sure data consistency is there – compare thank you page visits, revenue numbers etc between your Optimizely result panel and GA custom dimension or event based report”.

No difference between test variations. Now what?

Let’s say the overall outcome is ‘no significant difference’ between variations. Move on to something else? Not so fast. Keep these 2 things in mind:

1. Your test hypothesis might have been right, but the implementation sucked

Let’s say your qualitative research says that concern about security is an issue. How many ways do we have to beef up the perception of security? Unlimited.

You might be on to something – just the way you did something sucked. If you have data that supports your hypothesis, try a few more iterations.

2. Just because there was no difference overall, the treatment might have beat control in a segment or two.

If you got a lift in returning visitors and mobile visitors, but a drop for new visitors and desktop users – those segments might cancel each other out, and it seems like it’s a case of “no difference”. Analyze your test across key segments to see this.

Look at the test results at least across these segments (make sure each segment has adequate sample size):

Desktop vs Tablet/Mobile

New vs Returning

Traffic that lands directly on the page you’re testing vs came via internal link

If your treatment performed well for a specific segment, it’s time to consider a personalized approach for that particular segment.

There’s no difference, but you like B better than A

We’re human beings, and we have personal preferences. So if your test says that there’s no significant difference between variations, but you like B better – there’s really no reason for not going with B.

If B is a usability improvement or represents your brand image better, go for it. But those are not good reasons to go with B if B performs worse in a test.

Conclusion

Don’t rely on a single source of data, and go deeper with your analysis than just looking at overall outcomes. You’ll find more wins and have better data to make decisions. Integrating your testing tool with Google Analytics is an excellent way to go about it.

Show more