Building AI chatbots people actually trust


In a recent event hosted by MERL Tech’s Sandbox Working Group, Alyssa Young and Xian Ho of Dimagi walked through several years of work building, testing, and deploying generative AI chatbots for family planning and sexual reproductive health in Kenya and Senegal.

The project, Open Chat Studio started back before the arrival of ChatGPT as a way to build simple, menu-based chatbots. Once GenAI was in the picture, the team shifted to a different approach. They realized their team, as well as implementing partners, needed a way to create their own GenAI chatbots, so they built OCS and then open sourced it as a digital public good with support from the Gates Foundation.

OCS sits in between the user and large language model. It’s agnostic to which LLM is being used, whether it’s Anthropic or OpenAI, other providers or open source models found on Hugging Face. It supports a number of channels, including WhatsApp, Facebook Messenger and Telegram. By sitting in the middle, Dimagi can help orchestrate how the different models are responding to user input.

The session covered some of the important lessons learned during the development of GenAI chatbots in Kenya and Senegal by Dimagi and two organizations working on Adolescent and Youth Sexual and Reproductive Health. 

In Kenya, the Dimagi team worked with Shujaaz Inc, a Nairobi-based youth brand whose comics, radio, and social channels reach a majority of adolescent girls and young women in the country each month to build the Tubonge chatbot which features two older characters with an existing combined following of nearly 5 million users, were selected through co-design practices as the voices to share the messaging.  

In Senegal, they partnered with RAES, a Dakar based NGO behind the edutainment series C’est la vie! which already had around 1 million followers across Facebook, Instagram, YouTube and Tiktok. There the chatbot takes on the personality of Marguerite, a midwife, and Dr. Moulaye, a health specialist.

Key takeaways to consider in chatbot design:

1. Focus on safety more than the conversation

When a user messages one of the Kenya bots, the query doesn’t go straight to a conversational model. First, it gets filtered by a crisis classifier which is built to tell the difference between a joke in slang and a genuine cry for help. If it detects an active crisis, it triggers two events:

  • A silent alert to the research team for follow-up
  • Routes the users to a referral with urgent hotline numbers and care options

If a message is deemed safe, only then will they be sent to the main router which uses chain-of-thought reasoning using the current message and chat history to decide which of the specialized nodes is best prepared to develop a response. The nodes typically include: an advisor (a neutral “probe-then-solve” conversationalist), a referral node (which pulls verified clinics), a roleplay node, a quiz node, and a Shujaaz brand-logistics node. Underneath both instances is retrieval-augmented generation (RAG) drawing only on validated local source material (national health policies, partner content, curated databases, etc)

2. Listen to users

Roleplay and quiz features weren’t on the original spec. They came directly out of co-design sessions with young people. Dimagi found that roleplaying gave users a safe simulation space to rehearse hard conversations such as those with a strict parent, a difficult partner, or a judgemental nurse. 

Users also made requests for sexual and reproductive health (SRH) and mental health content for boys and men as well as offered feedback on the quality of local languages and ways to interact with the chatbot. 

3. Make sure there is a human in the loop

Deployment of this chatbot has not been a case of ‘set it and forget it.’ Transcripts are reviewed daily, with a prioritization mechanism that flags conversations needing closer follow-up and a defined human-escalation plan. Dimagi maintains chatbot managers, and that role transfers to local partners as the tools are handed over. Because SRH is sensitive and the technology is new, live implementation includes a safety disclaimer and consent flow, with an age requirement (18 and up) at the front door. As the team shared, the goal is to support not replace human referrals and relationships.

4. Building it is easy, proving it’s useful is hard

Dr. Ho’s closing reflection provided the project’s biggest takeaway. Developing an LLM chatbot is simple, but building a genuinely useful one (and keeping it that way) is hard. To mitigate that, the team developed a four part evaluation framework which they use continuously.

Measuring performance of GenAI Chatbots

 It includes:

  • Regular user testing which measured acceptability and experience through a validated questionnaire. 
  • Expert evaluations with study-team reviewers rating query-and-response pairs on key metrics including accuracy, acceptability, safety, authenticity, empathy, and language quality.
  • Automated evaluations using another LLM as a judge to acquire those same metrics more quickly and widely
  • Head to head comparisons which compared the custom bots to regular ‘vanilla’ bots in blind tests.

A resource to further explore: 

Open Chat Studio, Dimagi’s open-source platform for building, testing, and deploying LLM-based chatbots with built-in guardrails. Learn more at sites.dimagi.com/open-chat-studio or browse the code at github.com/dimagi/open-chat-studio.

Leave a Reply

Your email address will not be published. Required fields are marked *