Quantitative methods are an essential part of user experience (UX) research. They allow researchers to provide accurate estimates of user performance in terms of errors and response time. However, because of the constraints of traditional statistical approaches, the small sample sizes and high variability of most UX studies preclude meaningful inferences in most situations. In this post I will present Bayesian Data Analysis as a viable alternative and one well suited to address challenges and settings common to UX researchers.
Belief versus ‘the long run’
Traditional, frequentist approaches refer to long run frequency, or how likely an event would occur over N number of replications. To illustrate this, imagine flipping a coin. If the coin is fair, after flipping it 20 times you should get (about) 10 heads and after 100 you should get (about) 50. The Bayesian formulation of this example is quite different, instead of describing probability as long run frequency, one considers it a belief that updates in light of evidence (read: Data). While this seems like a subtle difference, its impact is profound. In particular, because Bayesian Data Analysis (BDA) is a process of updating beliefs about a phenomena, it demands that you have some initial beliefs which to update (even if your belief is that you know nothing), these are known as priors. Beliefs are inherently subjective and thus priors are also subjective, but, importantly, not immune to scrutiny. Priors are only as appropriate as the audience perceives them. Consider how much you would be willing to bet on the fairness of a coin, P(Heads)=.5, if you got the coin from your bank versus at a carnival game (assuming you are at the most boring carnival in history). Considering your prior knowledge that carnival games are biased it makes sense that it would take more evidence, coin flips, to convince you that the coin was, in fact, fair. This practical and obvious kind of thought process is explicitly quantified in BDA, but completely absent from frequentist inferential statistics.
A UX example
To take us out of the carnival and into the lab, consider a simple usability test of a commerce website. Let’s assume the client is interested in testing a particular shopping cart link location and they want to ensure at least 80% of users will be able to successfully find the shopping cart in one click. Based on your expertise as a user researcher and previous literature, you guess that maybe only 2 in 5 participants, 40%, will be able to accurately identify the shopping cart icon in its current location. In the frequentist framework, this knowledge is irrelevant, since long run frequency is immune to your own personal beliefs. However, you're the expert! So certainly your intuition is not without some value. BDA allows this information, and how strongly you believe it, to be incorporated into hypothesis testing, parameter estimation and other inferential statistics.
The frequentist approach to this research question would be to define how many participants will be run a priori and then conduct a frequentist confidence interval around the proportion of successes out of total, post data collection (Sauro & Lewis, 2012). If .8 does not fall within the boundaries of the 95% confidence interval (CI) than we reject it as a possible estimate. The Bayesian approach differs in two important ways, 1) sample size does not need to be set a priori. Because frequentist test are based on EXACT replications of a particular study ad infinitum, the researcher’s sampling intention is essential to interpretation (See Kruschke, 2013 for discussion). BDA, in contrast, allows one to sample to the desired conclusion since each new data point is just additional evidence. 2) We express our prior beliefs as a probability distribution and then update those based on the data. This means that your prior knowledge, past research or expertise will impact the final decision in a quantifiable manner.
To illustrate this point I conducted a small simulation of the scenario described above for sample sizes between 3 and 25 . I created a population that is a random binomial distribution centered at probability of .5, meaning that the true estimate of how many users will fail to identify the shopping cart is actually 50%. Then for each sample size of N I drew 100 random samples from the population of size 3-25, computed a 95% frequentist confidence interval, 95% Bayesian confidence intervals with an uninformative prior (all values are equally believable), an informative prior (initial guess 2/5), and a badly informed prior (initial guess 4/5). Then I computed the proportion of those 100 random draws for each N that correctly rejected the value of .8 as a believable value. The results are displayed in the figure below.
We can see that both uninformed and informed Bayesian priors outperformed the frequentist CI. In addition the informed CI reached 75% correct rejection after only N=7 whereas the frequentist CI didn't reach that level of accuracy until N=22. Finally, the badly informed priors only slightly underperformed the frequentist interval.
This example demonstrates measurable benefits to implementing BDA in UX settings. By leveraging expert knowledge in a systematic, probabilistic manner, we were able to improve estimates and reduce the number of participants to reach a conclusion. In this example using BDA with informed priors would have reduced the number of participants needed to achieve modest statistical power by 13 compared to traditional frequentist testing. Furthermore, no additional data collection procedures were required or additional assumptions, just a reformulation of the concept of probability and an application of the expertise already present in most UX research firms. The benefits of BDA extend beyond simple binomial research questions, graphical probability models can be used to expand models hierarchically including a wide variety of distributions and dependencies. Given that the majority of UX research involves small samples and simple one or two-sample t-tests and ANOVAs, the learning curve required to conduct Bayesian versions of basic statistical tests is dramatically flattening as software availability increases. In future blog posts, I will provide additional examples with actual data, exploring how expert knowledge and task analysis methods can inform and improve effect estimates in small sample size studies.
- Sauro, J., & Lewis, J. R. (2012). Quantifying the user experience: Practical statistics for user research. Elsevier.
- Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 142(2), 573.
Download R code for the simulation and graph here.