Frustration Index: The Upcoming mPulse Tool You Should Learn (and Love) to Transform Your Website User Experiences
By Lance Hester + Amit Singh on
Table of Contents
Children have felt it.
Traffic jammed commuters have felt it.
And, you are probably feeling it right now reading this teeny-tiny text.
What are we talking about? It's frustration: "that feeling of being upset or annoyed, especially because [of the] inability to change or achieve something."[1]
If you’ve ever used a website, we’re guessing you’ve felt it there too.
People come to websites to accomplish what they came to do: buying products, signing up for a service, or consuming information. When page items are unresponsive, its content randomly shifts, or a site slows to a crawl, it's a disappointing letdown. Disappointment leads to bad experiences. Bad experiences lead to website abandonment. The time and effort attracting visitors to a site is wasted. Worst of all, jilted website owners might swiftly discover, they and their customers are never, ever, ever getting back together.
A Brand New Hope
The Frustration Index Score is here to help!
At Akamai, we examine how real users access and interact with websites in real time. We pour over loads of this Real User Monitoring (RUM) data to understand how websites are performing. We identify areas for improvement, and can infer whether website visitors are elated or irritated. The Frustration Index Score resolves the tension between the two. This new User eXperience (UX) metric identifies when a web page is aggravating its users, so website owners can remedy problems.
The Score is easy to grasp. It’s a single integer value indicating the level of frustration while loading a web page. Ranging from 0 to 100, the greater the value, the greater the likelihood that visitors will not "escape [your web page] with their happiness intact."[2]
WebPerf from a New Perspective
"Big images in 1997. Slow servers or overly fancy widgets... [today]. Same effect. Make your website snappy, and you'll have a big leg up on the competition and their slow sites .”[3]
Fast is good, faster is always better – that’s been the prevailing maxim. But therein lies the rub. Perhaps it is not speed, but s-e-p-a-r-a-t-i-o-n between metrics that instigates irritation. As space decreases, so does frustration. The Frustration Index Score looks at the difference in time between timer metrics, the gaps. Small gaps lead to little to no frustration, larger gaps the opposite. Improving a single web performance metric by reducing its time does not necessarily result in better UX’s because that improvement could lead to larger separation times. Slower timers don’t always bother users. Large delays between timers do.
The Frustration Index Score looks at four key milestones perceived by end users while traversing a webpage:
- Time when a Title is visible (i.e., Time to First Byte),
- Time when First Content is visible (i.e., First Contentful Paint),
- Time when a Content is Visually Ready (i.e., Largest Contentful Paint or Time to Visually Ready), and
- Time when a Web Page Looks Ready (i.e., Page Load Time, Time to Interactive, or Largest ContentFul Paint).
It calculates the gaps (time differences) between these milestones, throws in a few more computations, and then finally accumulates the values to arrive at a scaled frustration score.
The pseudo-coded algorithm encompassing the four milestones has the following form.
Where:
- TTFB = Time to First Byte
- FCP = First Contentful Paint
- FP = First Paint
- LCP = Largest Contentful Paint
- TTVR = Time to Visually Ready
- TTI = Time to Interactive
- PLT = Page Load Time
And:
- (Milestone 1) A = frustration points between navigation start and TTFB
- (Milestone 2) B = frustration points between TTFB and FCP (or FP)
- (Milestone 3) C = frustration points between FCP (or FP) and LCP (or TTVR)
- (Milestone 4) D = frustration points between LCP (or TTVR) and largest value of (TTI and PLT)
So that:
A more detailed walkthrough of the algorithm and examples of the Frustration Index calculation can be found at [4].
Why We Are Confident Frustration Index Scores Expose Frustrating User Experiences
Of course, understanding what Frustration Index Scores relate to and having confidence that they convey an irksome website journey is easier said than done. Luckily, the methodology for deriving the Frustration Index Score is backed up by research in the field of human-computer interaction (HCI) and UX.
The Science Underpinning Frustration Index Scores
Existing studies on web performance timers suggest that making timers faster improves a user’s website experience [5, 6, 7,8]. The Frustration Index Score embraces this need for speed with a slight twist. The hallmark of the Frustration Index Score is that it targets the gaps (i.e., differences) between key timers instead of focusing on each timer individually [4]. Because it’s not a timer, but a score, it also goes a bit further to amend the mantra that faster is always better [9]. Hastening some timers actually has the unintended consequence of worsening a user's online voyage. As any website owner will attest, having a zero page load time, albeit wicked fast, may not deliver the exact smooth sailing journey you envisioned.[1]
So, why relate velocity with vexation? It all flows back to well, flow. Flow is that state of being in "the zone" – fully engrossed in an activity [10]. It’s pleasurable. People feel satisfied when in it. Contented users go about their website transactions in "flow" uninterrupted. However, during a visitor’s moment-to-moment interactions on a site, even a few seconds of delay will sap flow. Users lose control. It’s excruciating to have to wait to carry out a task. And, the stress of not knowing whether or not a transaction went through is not only frustrating, it hurts!
Keeping users happy and "flow", flowing requires ensuring fast response times and minimizing delays. When users encounter delay, after delay, after delay they’ll just give up on a webpage unless extremely committed to a task or resigned to stay because there is no alternative for them. Web properties don’t want that kind of brand recognition. Usability studies recommend the following time thresholds to keep users happy: 0.1 second (100 ms) - delays are imperceptible, 1 second - delays are perceived, and 10 seconds - delays are too long, abandon activities [11].
The three main time limits that are determined by human perceptual abilities and their accompanying user experiences [11]. Image sources [12, 13, 14].
The Frustration Index Score is a good indicator of user frustration because the gaps between timers represent delays. When delays exceed thresholds derived from the usability recommendations, the Score’s computations signal flowlessness or frustration. [4]
Based on user satisfaction (i.e., not breaking flow), Frustration Index Scores are a decent measure of user annoyance because the calculations associate small gaps with low or no frustration and larger gaps with higher frustration.
At the same time, a byproduct of this Score tabulation is that each milestone’s gap determination provides insights as to where and when a user experiences frustration based on the compared timer values.
We Don’t Do Direct
An inherent weakness in our translation of Frustration Index Scores to actual web operator frustration is the lack of a direct feedback mechanism verifying whether users were in fact frustrated. Such mechanisms might include:
- Performing supervised observations of website users navigating a site possibly even with sophisticated biometric devices,
- Conducting surveys of user experiences,
- Carrying on on-line chats,
- Discussing customer experiences via telephone calls, or
- Receiving customer feedback email responses.
While these methods are certainly valuable for obtaining real-world interaction data, there are fundamental issues applying them in a RUM context. The biggest challenge is scalability. We compute tens of thousands of Frustration Index Scores per minute per website property. Yep, thousands. It is just too impracticable to supervise that many remote ad-hoc interactions. The same goes for performing on-line chats or phone calls.
Surveys or customer feedback do seem viable and are helpful for FI Score algorithm fine tuning, but there are issues here too. First off, customers who are willing to go through the hassle of providing non-positive reactions may contribute to a form of sampling bias [15]. This type of bias occurs when surveys are deployed in such a way that some members of the user base are systematically more or less likely to be selected in a sample than others. Feedback is slanted to only certain levels and sources of frustration. As a result, we will no doubt miss out on factors that grind user’s gears because they are never communicated by users. These might be the very factors that sour them on engaging with a website again, and, worse yet, the items they share with friends to steer them clear of the website in the first place.
Let us also not forget that self-reporting is hard. Articulating feedback can be tricky business. Understanding what is off-putting during a web journey and expressing what is dissatisfying don’t always align. Plus, your mileage may vary with what is reported. Recency bias [16] may play a factor. Customers may report the last (most recent) pain point, the problematic piece that happened to them during the middle of their trek, or maybe the very first thing. Lastly, there are simply unreported annoyances. Some visitors just accept poor or overly slow services. They don’t consider them a problem, so don’t report them.
The Future of Frustration Detection is Indirect
So, what can we do?
While direct methods to corroborate website user frustration with Frustration Index Scores are not available, we can associate existing web performance frustration signals with Frustration Index Scores to strengthen the assertion that Scores are a good measure of frustration.
Going with What We Know
Consider the web performance metrics most often used to track frustration: bounce rates, average time on a page, changes in conversion rates, and session lengths. We have shown that Frustration Index Scores have strong positive correlations with bounce rates and strong negative correlations with conversion rate and session lengths.
Bounce Rate and page load times have a strong positive correlation with Frustration Index Scores. The figures reveal as Frustration Index Scores increase from 0 to 100 on the x-axis, metric values on the y-axis similarly increase.
Conversion rate and session length have strong negative correlations with Frustration Index Scores. The figures depict as Frustration Index Scores increase from 0 to 100 on the x-axis, metric values on the y-axis decrease.
All scatter plots in this post are derived from one week’s worth of mPulse RUM data collected from a retail website. Frustration Index Scores are displayed along the x-axis. Timer or metric values are presented along the y-axis. The size of each scatter plot data point is proportional to the number of metric/timer values found at a particular Frustration Index Score.[2]
When Frustration Index Scores increase, bounce rates increase. In contrast, when Frustration Index Scores increase, conversion rates and session lengths decrease. While we cannot definitively say that every bounce, drop in conversion, or reduction in sessions is a result of a dissatisfied user, these relationships still help to support the case that Frustration Index Scores are a viable measure of annoying UX experiences.
Taking it to the Limit
We can continue to expand on this notion of correlation. What if we correlate some of the top web performance timers to Frustration Index Scores. Except, here we are only going to count the cases where a timer has exceeded its poor operating thresholds as described in Table 1. Individual timer threshold values were obtained from [17] when available. If a timer threshold was not available there, then the threshold of the timer closest in the FI score tabulation to it was applied. For example, First Paint takes on the threshold of First Contentful Paint and Time to Visually Ready takes on the threshold of Largest Contentful Paint.
Timer | Poor Operating Region Threshold |
---|---|
Time to First Byte (TTFB) | 1800 ms |
First Contentful Paint (FCP) | 3000 ms |
First Paint (FP) | 3000 ms |
First Input Delay (FID)* | 300 ms |
Total Blocking Time (TBT) | 600 ms |
Largest Contentful Paint (LCP) | 4000 ms |
Time to Visually Ready (TTVR) | 4000 ms |
Time to Interactive | 5000 ms |
Page Load Time | 3000 ms |
*note future analysis will replace FID with Interaction to Next Paint (INP).
The correlogram above is derived from over three weeks of real data (via mPulse) from a retail website. It shows the correlation between Frustration Index Scores and percentage of instances key web performance timers operated at or above their poor operating threshold values. The key takeaway here is the first column: as Frustration Index Scores increase the percentage of instances that timers operated in poor operating regions (i.e., above their poor threshold) increase resulting in strong positive correlations.
As the correlogram figure’s first column illustrates, as Frustration Scores surge, the number of cases in which a timer operates in its poor operating region swells. Typically operating in timer’s poor operating regions tends to provoke poor UX experiences. Again, we can indirectly make the case that Frustration Index Scores indicate frustration.
Bringing in the Big Gun
The last association we want to make and probably the most influential is between Frustration Index Scores and rage clicks. As the name implies rage clicks are a direct indicator of an infuriating user experience. They are the repetitive and rapid mouse clicks or screen taps a user makes when a visual screen element doesn’t behave as expected. It is the physical manifestation of releasing a user's anger digitally. Showing that Frustration Index Scores have a strong positive correlation with rage clicks like below is a very convincing signal that Frustration Index Scores communicate frustration.
Scatter plot illustrating the strong positive correlation between rage clicks and Frustration Index Scores. As frustration increases, the rate of rage clicks increases especially in areas of high to very high (20-100 Frustration Index Scores). Increases in rage clicks in areas of no or low frustration index scores (0-19) can be explained by rage clicks resulting from non-delay related annoyances which are not incorporated explicitly in Frustration Index Score calculations.
Correlation does not imply causation and so we can not explicitly state that Frustration Scores are a result of rage clicks or vice versa. Rage clicks might be caused by confusing elements or a poor site design. Without direct user feedback it is not so easy to predict what might have been the exact cause of a rage click. However, taking into account the delay information implied by a high Frustration Index Score, possible correlations with timers operating in poor regions, and rage clicks present; it is probably a safe bet that users were having an uncomfortable time of it on a site.
Conclusion
While a plethora of WebPerf metrics exist to characterize page load events and user experiences, no single "unicorn metric" exists that can relate everything. Even Google has managed to reduce the multitude of measures down to only three Core Web Vitals. These three already wrestle with relating the speed, interactivity, and visual stability of a webpage. They shouldn’t have to wrestle with frustration too.
Marketing teams and company executives do not necessarily speak the language of web performance, but they do understand what it’s like to be irritated when using a website. The Frustration Index Score provides a reliable measure of user dissatisfaction on a web page related to actual website performance. Everyone from the WebPerf aficionado to the product sales rep can easily grasp its meaning. Such Frustration Index Score revelations might be unpleasant, but the Score is a gift; a tool to help website owners and web developers fight user frustration today.
References
Footnotes
A web page could load insanely fast because it’s broken (i.e., experiencing JavaScript errors) or possibly because key page resources are not being transmitted. ↩︎
Because Frustration Index Scores values are confined/clamped to values between 0 and 100, calculated Score values exceeding 100 are capped to 100. The relatively large data point at Frustration Index Score 100 are representative of the capture of the entirety of the “long tail” portion of the distribution of Scores exceeding 100. ↩︎