Introduction to A/B Testing

A/B testing is a strategy which compares two versions of something to determine which one is the most efficient. It‘s an old term which existed in the world of the science before making its way into the marketing world, long before websites existed.

In web analytics, it means comparing two versions of the same webpage to see which one performs better. Differences between two variations can be minor like a button colour change or a complete redesign of a page. In practice, half of the visitors will be shown version A and the other half will see the B version. By tracking visitor’s engagement on both control (A) and variation (B) over some period we are able make the conclusion of which variant results in a better conversion rate.

What is a conversion, conversion rate and CRO?

A conversion is the desired action that a user completes when visiting a web site. A conversion might mean purchasing a product, signing up for a newsletter, signing up for a trial period, etc. So, a conversion would be any visitor’s activity that is important for a website’s success. Its specific to websites. Something that is considered a conversion on one website, might not be an important action on another one.

Websites and apps often have multiple conversion goals, and each will have its own conversion rate. A conversion rate is the number of conversions divided by the total number of visitors. Improving conversion rates is known as conversion rate optimisation (CRO).

A/B testing is the most common process used for conversion rate optimisation.

There is no universal A/B test because each web site requires an individual approach. Although there are some elements that are usually tested:

Call-To-Actions – placement, wording, size.
Copywriting – value propositions, product descriptions.
Forms – their length, field types, text on the forms.
Layout – homepage, content pages, landing pages.
Product pricing – try testing for revenue by testing your prices.
Images – their placement, content and size.
Amount of content on the page (short vs. long).

Conversion optimisation requires a methodical approach and implementation of good A/B testing is achieved by the following steps:

Research phase

In the research page your goal is to collect as much information as you can about the user experiences of your webpage and how visitors interact with it, to understand why they act the way they do. It will help you make assumptions about what is causing for instance, low conversion rate of clicks, sign-ups, why your landing pages have high bounce rates, etc. To get a fuller picture in this phase, it is advisable to use web analytics tools, heatmaps, session recordings and to get user’s feedback in the form of surveys, chat etc.

You can find what is happening on your site by using Google Analytics; what features they use, what they don’t use, but Analytics won’t give you an answer about why visitors did or did not convert. The best way to get qualitative data from your visitors is through surveys.

When possible, always segment your analytics data because it will give you deeper insight. For example, your overall data might be telling you you’re your landing page is not performing well but when looking into the segments, you might see it is performing well on mobile phones.

It’s important to spend more time in the research phase because diving deeper into problems will help you come up with right hypothesis and help you develop higher quality testing.

Sometimes the research phase will reveal many problems across different pages on your website. Before continuing with A/B test preparation, you should prioritize your problems.

Pages that are the most important for your business should be tested first. Those can be pages with sign-up forms, checkout pages, etc. Also, pages with high traffic should have the advantage when testing, because tests on high-traffic pages finish sooner so you can move on to the next tests faster, which will in the end speed up your optimization process.

Generate hypothesis

After using all the tools you had in the research phase, getting insights into visitor behaviour and after you’ve spotted problems on your webpage, it’s now time to come up with a hypothesis as to why there are problems with your conversion rates or bounce rates. Your hypothesis should state clearly what is being changed, what you believe the outcome will be, and why you think that’s the case.

For example: you saw your users are giving up after filling out three to four fields on your sign-up form. Your hypothesis in this case could be

‘A Shorter form might encourage visitors to fill it out completely’.

When creating the hypothesis it’s important to be as specific as possible. Testing one change will help you understand what changes had an effect on your visitor’s behavior, and which ones did not. Over time, you will be able to combine the effect of multiple winning changes from experiments and create more powerful tests.

A bad hypothesis in this case would be:

‘Redesign of sign-up page will encourage visitors to fill it out completely’.

Why this is bad?

This is not what your analytics are saying to you. Your visitors did start to fill out sign-up form so you probably are doing something right on the webpage.
Redesigning whole page won’t help you learn from this problem. Your variation might have a better conversion rate, but are you sure you will know why that is? Your variation might be a loser and you still won’t know why that happened, so you will be at the beginning again.

So more specific hypotheses and smaller changes on variation(s) are what you will benefit from. When a test is based on pedantic research and a clear hypothesis to test, you will learn about your audience after each test.

Of course, sometimes you won’t be able to apply just small changes, because sometimes big problems that need to be resolved will require big changes.

Create variation(s)

Now you have an assumption about why visitors are not converting as much as you want and it’s time to design and develop variation(s) based on your hypothesis. You can have as much variations as you want when setting up an experiment. So, if you want to have control (original) and create 2 variations, that is completely OK but keep on mind that if your site is not getting much traffic, you might end up waiting for significant results for ages.

For creating a variation you will usually need a designer and developer, depending on the changes you want to make and which tool you are using for implementing the variations. The changes can be changing the colour of a call-to-action button, reordering elements on the page or something more complex. Most leading A/B testing tools provide a visual editor, so some small changes can be made very easily and promptly.

You can see on the image below that you can make changes simply by clicking elements inside the visual editor. Some of the most popular tools are: Optimizely, VWO, Google Optimize. I will compare them in my next post.

Before starting an experiment, be sure you test it properly to make sure it works the way you want because if you notice some bugs afterwards it might make the whole test invalid. Also, if you are not skilled enough to make changes on your own, get help from a developer because you won’t have any benefit in a badly set up experiment.

Run the experiment

Once you’ve tested your variation(s) properly, across all browsers and devices and you made sure there are no odd things happening, it’s time to puts your experiment live. The A/B testing tools will take care of reporting and you will be able to see the number of visitors and conversion rates for your goals.

How long should you run a test?

There are a few online calculators that can tell you approximately for how long you should run a test or how big your sample size needs to be. VWO has a good calculator that will you give an idea of parameters that are important when calculating the duration of your tests. https://vwo.com/ab-split-test-duration/.

The more traffic you have going through the pages tested, the quicker your test will complete.
The more variations you run as part of a test, the longer you’ll have to wait for the test to complete.
The lower your current conversion rate is, the longer you’ll have to wait. If your conversion rate is 5%, then your tests will complete quicker than if your conversion rate is only 1%.
The lower your best variation’s performance, the longer you’ll have to wait.

For some high-traffic sites you will get the required sample size in a day or two. But that isn’t a good sample – it doesn’t include a full business cycle, all week days, weekends etc. So, to get a representative sample and for your data to be accurate, experts recommend that you run your test for a minimum of one to two weeks. By doing so, you will have covered all the different days.

We know for example that shopping behaviour during the weekend is different from shopping behaviour during weekdays.

If you want to analyze your test results across segments, you need even more conversions. It’s a good idea to run tests targeting a specific segment, so to have separate tests for desktop, tablets and mobile.

Along with the duration of your test, it’s important to determine if results are statistically significant.

What is statistical significance?

Statistical significance is a way of mathematically proving that a certain statistic is reliable. A result of an experiment is said to have statistical significance, or be statistically significant, if it’s likely not caused by chance for a given statistical significance level.

For example, if you run an A/B testing experiment with a significance level of 95%, this means that if you determine a winner, you can be 95% confident that the results are real and not caused by randomness. It also means that there is a 5% chance that you could be wrong.

Sometimes during the test, you might reach statistical significance a few times, so you must set a proper sample size to know when to stop the test. On the other hand, if you are running an experiment for a while and you still haven’t reached statistical significance, this means your test is inconclusive.

Analyse Results

There are three possible scenarios after your experiment is finished:

Your variation is a winner

Good job. You set up your hypothesis very well and next step is to implement those changes and show your new design to 100% of your visitors. But keep in mind that you can always look for improvement and having a winning variation on some page doesn’t mean you should stop analyzing and making new hypotheses and running new tests.

Your variation is a loser

This is a very common situation and you shouldn’t be disappointed. Look at the data and segments and try to understand why the test failed. You will learn something from the results and you will apply that knowledge when making new hypotheses and setting up new tests.

Test results are inconclusive

This means that your experiment has been running for a considerable amount of time and you didn’t reach any significance. This usually happens when your hypotheses are wrong, or your test didn’t differ much from control, particularly on lower traffic sites. Although, often you will find that one of the variations was a winner in a specific segment. That’s an insight you can build on. If you have a test which failed on all segments, revise your hypothesis and test again with new ones.

Statistical methods behind determining the winner of A/B testing

Two commonly referenced methods of computing statistical significance are Frequentist and Bayesian statistics.

The Frequentist approach is most often used by A/B testing software. It means making predictions using only data from the current experiment. Optimizely uses Frequentist methods to calculate statistical significance because they offer ‘guarantees’ about future performance: statistical outputs from an experiment that predict whether or not a variation will actually be better than the baseline when implemented, given enough time.

Bayesian tests make use of past knowledge to calculate experiment results. That past knowledge is encoded into a statistical device known as prior, and this prior is combined with current experiment data to make a conclusion. Using previous information should lead to the quickest experiment progress. Provided that the assumptions made using historical data to calculate the statistical prior are correct. This should help experimenters to reach statistically significant conclusions faster. But there is a risk that prior experiment knowledge may not actually match how an effect is being generated in a new experiment, and it’s possible to be lead to an incorrect conclusion. VWO and Google Optimize use the Bayesian method.

If you are interested more in this topic, I recommend you to read these articles:

https://conversionxl.com/blog/bayesian-frequentist-ab-testing/

https://blog.optimizely.com/2015/03/04/bayesian-vs-frequentist-statistics/

Conclusion

A/B testing, apart from being a method for increasing your conversion rates, will help you learn about your visitors and give you valuable insights about their behaviour which in will lead to more powerful tests and a bigger increase in conversions. Not all of your A/B tests will be successful, but that shouldn’t be something that will stop you from testing. You will start seeing considerable website performance improvements after a few different experiments. And remember, no matter how well your landing page or sign-up page is doing, there is always space for improvement.