This research has 3 impacts for conversational design:
01. Example guidance = better performance; rule guidance = better improvement
02. Providing examples upon request is better than giving them when users fail
03. Design conversation according to purpose: performance or learning-driven
How can we provide better guidance
for task-oriented chatbot users?
I designed and led pioneering research on conversational user interfaces, co-authored a paper published in a top human-computer interaction conference, and received a best paper award (5%).
Between-subject experiment, Literature Review, Interviews,
Survey, Affinity Diagramming,
Sep 2021 - Aug 2022
2 Researchers (including me)
2 Coding assistants
1 Project Lead
Discovering Research Gap
Initially, I was intrigued by the challenge of assisting users in recovering from conversational failure. After reading over 100+ research papers, my team and I were able to narrow our focus to designing better guidance. We discovered previous research lacks consensus and presents conflicting ideas on the ideal timing and type of guidance to offer.
Designing Guidance Combinations
In order to fill the existing research gap, we wanted to explore
eight combinations of two guidance types (Example-based and Rule-based) and four timings (Service-onboarding, Task-intro, Upon-request, and After-failure) on user performance and experience.
To guide our research, we formulated three research questions and identified the necessary data to answer them. To ensure comprehensive results, we adopted a mixed-methods approach that included a lab experiment and reflection sessions.
Justifying Context & Task
Once we had defined the research scope, we turned our attention to selecting the tasks. Our choice was to create chatbots for two popular contexts: travel arrangement and movie booking, to ensure that the outcomes are broadly applicable.
We developed IBM Watson chatbots and crafted nine guidance conditions, including a control group that did not receive any guidance. I led the conversation-design process, which entailed collecting sample dialogues, mapping 12 conversation flows, and iterating the design with 10+ pilot testings.
Talking, And More Talking!
The study consisted of two phases. In the first phase, as the researcher, I observed participants as they interacted with the chatbot while performing six tasks of varying complexities, with the chatbot providing one of nine possible guidance combinations. Following this phase, participants were asked to complete a survey that measured their satisfaction with the guidance provided.
In the second phase, I conducted interviews with the participants to gain insight into their perceptions, attitudes, and concerns regarding each guidance combination. During this phase, participants were also asked to rank the guidance combinations in order of preference and provide explanations for their rankings.
Getting Our Hands Dirty With Data
We decided to use physical affinity diagramming due to the abstract nature of the problem. We want to gain clarity by physically moving around and rearranging notes in a tangible space.
I led the team through this process, which allowed us to synthesize over 1000 notes into three main topics: task efficiency, performance improvement, and diverse opinions on guidance and timing. This was a challenging process as we had to carefully examine numerous notes, and towards the end, generating unique and innovative insights became difficult due to repetition.
In addition to quotes, we believe that users' actual interactions are indicative of their performance and overall experience. To delve deeper, we analyzed task-completion time, non-progress events, and improvements using statistical methods with R and Python. This part of the process was fairly straightforward, and I took the lead in discussing which statistical method to use and defining the quantitative metrics.
Working on this project for almost a year has really taught me how to conduct research with both rigor and attention to detail. I've also come to realize that exploring research gaps that are truly worth exploring requires a significant investment of time.
Being The First
We not only identified patterns in the effectiveness of these pairings, but also explored the underlying reasons for these patterns. We were able to generate a set of design recommendations for chatbot practitioners. Our study is just a starting point, and we encourage future researchers to validate the effectiveness of our proposed designs in real-life settings.
Taking thorough and relevant notes is crucial to helping the team synthesize information more effectively and arrive at better insights that inform design decisions.
Just like UX case studies, academic papers are also about telling a good story. Incorporating various methods can greatly improve the quality of my narrative.
I was involved in high-level tasks such as selecting the appropriate statistical methods and determining the key terms to be considered. I'd like to hone my quant execution skills more.