No Impact Without Engagement: Towards Standardised Product Metrics for Social and Behaviour Change Chatbots


Back in 2024, MTI convened a workshop with 50+ development practitioners to understand if and how to integrate AI into SBC programming, and what evidence was needed in order to support decision-making by practitioners. During the co-creation of this research agenda, participants flagged that the field was missing a set of shared metrics and benchmarks around what a ‘successful’ AI intervention looks. The following year, our SBC learning group hosted a session on metrics from AI powered chatbots across health and agriculture, where in the course of just an hour we identified commonalities and differences in how AI implementers are measuring engagement with their AI chatbots – without which, meaningful impact is virtually impossible. 

In parallel to these efforts, the Generative AI Evaluation Playbook developed by CGD and The Agency Fund, offers a 4-level framework to support the overall evaluation of GenAI Behavioral Chatbots, including a set of metrics to consider at Level 2 (Product Evaluation). The need for common metrics and benchmarks for digital tools is not new, but it has been accelerated by the proliferation of funding focused on AI chatbots. MTI is currently working on a project that explores the importance of Formative Research, Digital Design, and strong gender and connectivity lenses when designing GenAI chatbots and in conducting MERL on them.

We believe the time is right to come together as a community to explore the co-development of shared metrics and promote transparency around benchmarks in order to enhance the quality of digital products, and the success of AI investments.

Join our event on May 12, at 11 am ET / 5 pm CET / 6 pm EAT 

Join us on May 12 to think through these questions collectively, and start to unpack the potential metrics that might be useful for GenAI chatbots. Together with Caryl Feldacker (Gates Foundation), Chelsea McKevitt and Ruth Orbach (GSMA), and Nicola Harford (iMedia Associates), we will explore questions such as: 

  • What are some historical challenges with identifying and defining metrics for social and behavioral change activities? How might we adapt commercial metrics to our sector?
  • What are examples of these metrics, past and present, and how are they being tracked?
  • How can metrics related to gender and inclusion be strengthened?
  • What common indicators might we agree on across behavioral chatbots? And what contextual and thematic nuances might require us to track different or separate indicators?
  • What do we mean by “scale” and what are reasonable expectations for scale?
  • And more!

We are also interested in hearing from you: How are you measuring the success of your AI tools in terms of engagement and early impact proxies?

Get involved 

Whether you are a tool developer, a practitioner in digital development projects, a trainer/capacity builder working in the sector, or just someone who would like to engage with this work more broadly, we’d love to hear from you. We’ll be organising an invite-only working session on this after this initial meeting. Please sign up using this form, and our team will get in touch with you.

Image by Annie Spratt via Unsplash.

Leave a Reply

Your email address will not be published. Required fields are marked *