Metrics to get started with in your user testing
Thinking about using metrics in your usability testing? Metrics can be great when used at the right time, but they’re not always appropriate.
What do we mean by metrics?
Metrics are measurements that can give numerical scores to aspects of a product’s performance. Numerical scores make it easier to quantify and compare different sets of data that would otherwise be qualitative.
Things to consider before using metrics in your testing
Gathering metrics can increase costs through longer analysis times, use of unique tools and a need for greater numbers of participants to obtain reliable quantitative data (you should be testing with 20 or more per design).
Metrics are great when you want to gain comparable data to:
See where you stand amongst competition: by measuring your product against competitor products
Track progress: by comparing your product with other iterations of the same product to
Understand whether your product meets specific internal or external standards
Types of metrics
Performance metrics
These metrics are great to introduce into your study design when doing usability testing, as they are observable and give us extra information about how users carry out or ‘perform’ tasks.
Efficiency:
How long it takes users to complete tasks
If you’re using the “think aloud” technique, it’s best to avoid time taken as a metric. People will not produce reliable data because everyone will take different amounts of time “thinking aloud”
The number of clicks users take to complete tasks
This can be better if you’re using “ think aloud” as it’s based on the decisions people are making, not the time taken.
Effectiveness:
Measure task success rates (how many people completed the task)
Measure the number of errors users make completing tasks
This requires a robust definition of errors to ensure consistency in the data recording
Issue metrics
Issue metrics are useful for highlighting the most problematic areas of your product, and tracking the impact of design changes. Typically, these will be uncovered during the analysis. In most cases you will want to count the number of issues found across the: whole platform, specific tasks/journeys, or specific interface elements.
Issue metrics include:
Number of usability problems identified during the testing
Severity of usability problems (e.g. minor, moderate, critical)
Types of usability problems (for example, issues with navigation)
Experience metrics
Experience metrics help us to understand how a product makes users feel, which can be useful if you’re wanting to understand how well your design creates the reaction intended.
Single ease question
Single ease questions (SEQ) are an easy metric to gather a general sense of how users feel about a system. Get a score after each task, simply ask users:
Overall, how difficult or easy was the task to complete? Use a scale of 1-7 where 1 is hard and 7 is easy.
The SEQ is widely used so can be compared to other systems easily. It has a global average response of 5.5, which is above the true middle of 4 so make sure you keep that in mind!
Other metrics
A product that creates an appropriate feeling goes down better with users, but first you need a clear idea of what users will want from your product. This can be achieved through discovery research.
A design that gets the heart racing and feels exciting may not be the best fit for a banking app, but might work well for a fast paced gaming platform.
The easiest way to gather these would be to ask participants to rate categories such as the below on a scale of 1-10:
Trust
Pleasure
Frustration
Be careful about trusting the ratings alone! Self reported data can be misleading as people find it difficult to accurately quantify their feelings. Make sure you gather qualitative data too to validate your scores!
Results
Once you’ve got the data it’s time to figure out how to present it. In usability testing it is typical to record your chosen metrics for each user, task by task. Once you have gotten all of the data for each user, and doubled checked it, it’s time to get your averages!
Use a “mean” calculation:
Add up the total values for each metric on each task.
Then divide by the number of data points.
For example: if you have used “number of errors” add up each user’s number of errors on a particular task then divide this number by the total number of users.
Example of metrics and data we used for a recent benchmarking project.
Task | Success | Time | Clicks | Errors | Satisfaction |
---|---|---|---|---|---|
Task 1 | 74% | 5:05 | 9.2 | 1.2 | 6 |
Task 2 | 77% | 3:42 | 11.9 | 0.9 | 5.2 |
Task 3 | 42% | 3:37 | 10 | 7.3 | 4 |
Once you have got your averages you can use these figures to compare to future iterations or other similar products.
Tips for repeat testing
You can use repeat rounds of testing to compare your product with future iterations or competitor products. To make sure differences in the product(s) are the only things affecting the results you gain, you should:
Make sure to use the same tasks
Gather the same metrics metrics in the same way
For example, use the same definition of severity
recruit a participant panel that is as similar as possible to the group from the first round
tech literacy and prior knowledge of the system are the most important criteria to match
Use the same researchers if possible
Long term monitoring
If your product is designed for continued or repeated use over a long period you may want to consider gathering data on engagement. This can be useful to track the impact of redesigns on long term behaviour, as well as any changes to typical long term behaviour that are the result of other real-world influences.
Engagement metrics:
Number of visits over a time period, for example: visits per week.
Duration of engagement activity, how long do users keep coming back to the site for?
Drop off rates, what proportion of the users do not return over a given time period.
Engagement data is often gathered through analytics, which are great for answering surface level questions about what is happening.
It is not advised to gather this information through longitudinal UX studies, such as diary studies because they rely on self reported behaviour, which can be unreliable!
Getting metrics right
There are many good reasons to use metrics in a study. They are great at providing comparable measurements and answering the ‘what’ and ‘how many’ questions you may have about your users’ behaviour more broadly.
Metrics alone are poor for answering “Why?” questions, as they cannot explain the reasons for user behaviour. With this in mind when trying to explain behaviour metrics are best used to complement qualitative insights to help triangulate your findings.
If you are on a tight budget and want to maximise value we recommend sticking to qualitative methods where a small number of participants can provide a huge number of findings.
To sum up, don’t use metrics for the sake of it - use them at the right time. If you focus on identifying and fixing usability issues first, your data will be more reliable and robust. Then get your metrics out to fine tune the product once it’s reached a higher level!
Further reading:
Rating the Severity of Usability Problems, Jeff Sauro (https://measuringu.com/rating-severity/)