Resolve New England Usability Test

Client
Non-Profit Organization serving the infertility community within the New England region. 
Research MethodExpert Review/Heuristic Evaluation, Formative Usability Test​
SkillsModerating usability tests, screening & recruiting, quantitative analysis, qualitative analysis, test plan, usability test report 
Tools UsedZoom, Qualtrics, DocuSign, Doodle Poll 
RoleModerator, Content Reviewer and Editor, Data Analyst, Document Manager
Time PeriodSep 2019 – Dec 2019

​Process & Method

Participants: In the screener questionnaire, I included questions to obtain the individual’s demographics (gender, age), online health search experience, and general experiences with RNE. The purpose was to identify:

  • Whether the participants have been through the process of family-building
  • Determine the level of web-browsing expertise to ensure a balanced pool of participants
  • Consider whether past experiences with RNE might be a factor influencing our usability test results.
  • Set expectations for participants about when and where the session will occur, who to reach out for questions, and the incentive of receiving a $25 gift card. 

17 individuals completed the screener questionnaire, 9 individuals signed up to complete the usability test. Due to competing priorities, 3 participants had to drop out. A total of 6 participants signed the informed consent form and completed our study. ​

Scheduling: For participants who expressed interests in participating, we sent out a Doodle Poll to coordinate availabilities. Once a time was confirmed, we then sent a confirmation email along with a Participant Onboarding Document I created. The document included instructions to join Zoom. We also sent the informed consent form through DocuSign. ​

Pilot Test:  Prior to usability test sessions, two other teammates and myself  conducted a pilot test to refine our study tools, including the Screening Questionnaire, Moderator Guide, Participant Onboarding Document, and Informed Consent Form. 

Usability Test Sessions:

I developed the fictional scenario and five tasks for the usability study based on the primary personas’ motivation and goals. Participants were asked to imagine that they were in the process of building a family, and are seeking for support from RNE (see image for tasks summary). These tasks were subsequently used in the moderator guide.

We then conducted a formative (exploratory) study, and employed the think-aloud technique to examine what users were thinking as they perform each task. Each session included a moderator and observer. The observer used an observer guide I developed to take detail notes and record the participant’s behavior. 

Because the tasks were exploratory, as the moderator, we were able to ask unscripted follow-up questions during the sessions to clarify participants behavior and expectations.  

All sessions were conducted remotely via Zoom, and ran for an approximately 1 hour for each session.

Quantitative Results

Pre-Test Questionnaire: Prior to the usability tasks, the moderators also asked participants questions about their family building journey, as well as their experiences with Resolve New England. We found that the most challenging issues within this population included costs challenges related to in-vitro fertilization, need for emotional support, lack of control and general uncertainty, as well as low success rate. 

Single Ease of Use

  • After each task assigned, users were asked to rate how easy or difficult it was to complete the task on a 5-point Likert scale 
  • 4 out of 5 of the assigned tasks had high (>4) SEQ Scores 
  • Participants struggled the most with the first task, locating a resources for donor egg, which scored a SEQ of 2.67

Task Completion Rate: 

  • During the usability tests, the observer took note of the task completion rate, by assessing whether the participant completed the task success criteria (stop page). 
  • Task completion rate was calculated using the # of participants that successfully completed the task over the # of participants that attempted the task. 
  • Completion rate for the first task (Locate a resource) was at 16.67%, whereas all participants successfully completed the other tasks at 100.00

Error Rate – Count of Incorrect Menu Selected:

  • The average number of incorrect menu for the first task was 3.33, which is significantly greater than all other tasks at 0
  • The total number of incorrect menu for the first task was 20, which is also significantly higher than the other tasks administered. 

System Usability Score (SUS): 

The average SUS score was 633, which is normalized to the score of C (OK). We acknowledge the outlier SUS score of 27.5 (F – Poor) as shown in the Figure below as a key factor of skewing our results, therefore our data set does not follow a normal distribution.  

Post-Test Questionnaire

Other than the quantitative metrics collected, the moderator also debriefed the participants, and asked them about their general impression of the website and the feedback for Resolve New England. Along with the pre-test questionnaires, and the observation notes, these data were analyzed through a qualitative approach. 

Qualitative Results

Apart from the quantitative results presented above, we also summarized our qualitative findings by conducting a thematic analysis. Our team independently reviewed notes from the observation guide, and identified recurring patterns across all participants. We then organized our findings using an Google Excel sheet into either global or local (task specific) issues, reported frequency, and the participants who reported it. 

Criticality Score Calculation

In order to prioritize the issues, I developed a schema to calculate a Criticality Score for each problem. The greater the Criticality Score, the higher the priority level. Based on this score, we prioritized our findings into high, medium, and low priority and color coded the findings.

Criticality = Frequency Ranking + Severity Scale 

Frequency Ranking Scale – Calculated using the total number of instances in the heuristic evaluation and the number of participants divided by 10 (as there were 4 experts and 6 usability test participants. According to the frequency, a ranking scale was assigned using Frequency Ranking Scale Assignment as shown in the table to the right. 

Severity Score: Each task was also given a severity score (see below) based on Dumas & Redish (1999) scale in reversal, in which a higher score indicates greater severity. Based on the criticality score, we prioritize our findings into high, medium and low priority as outlined and color coded in Table 7. 

Top Problem Areas

  1. Confusing Information Architecture: Participants had difficulty finding information due to ineffective layout organization with too many tabs and cluttered pages 
  2. Mismatched Mental Model: The way site content was presented did not match users’ expectations. For instance, the resources were shown in the format of blogs and articles, but participants were expecting educational brochures or fact sheets
  3. Inconsistent Information: There were numerous instances where meeting information was inconsistent, thereby causing confusion in participants 
  4. Scrolling Frustration: Long, scrolling format of pages caused frustration and difficulty in finding information 
  5. Unhelpful Registration Forms: Several users commented on the lack of auto-population of forms 
  6. Unrelatable Images – Participants commented that stock images felt sterile and unrepresentative of the community. 

Limitation

​Non-Iterative Testing – Per best practice, formative testing should be done with a prototype design and conducted in iterative cycles. Due to the time constraints, we did not produce a prototype or conducted more than one cycle of testing. However, the practice still allowed DesignReel to gain valuable insights for design improvement.

Sample Size – Due to last-minute conflicts and cancellations, the study only included 6 participants. The small sample size limits us from claiming a statistically significant result. Despite the low number of participants, we discovered a few issues that occurred consistently throughout the participants. We were thus able to prioritize our findings based on the frequency scale. 

Diversity of Participants – Because participants were recruited within the RNE community members, all of our participants had prior experiences using the RNE website, and five out of six users have attended RNE support groups. Furthermore, all our participants were female, Caucasian and within the same age group. This further limits our ability to generalize our results to the broader population.