RightsCon Recap – Assessing humanitarian AI: what M&E frameworks do humanitarians need in the face of emerging AI?


On February 26th MTI convened a RightsCon Panel that included Linda Raftree – MTI founder, Quito Tsui – MTI Core Collaborator, Helen McEIhinney –  Executive Director CDAC , Heather Leson –  Digital Innovation lead IFRC , and Sarah Spencer – Independent consultant and expert on AI Policy and Governance. The panel focused on ‘Assessing humanitarian AI: what M&E frameworks do humanitarians need in the face of emerging AI?’

The humanitarian sector is awash with conversations about AI – from chatbots to anticipatory action, the hype around AI is palpable. But the sector has seen numerous emerging technologies lead to unexpected harms for communities. Amidst the AI discussion, there has been little talk of how AI tools will be evaluated – how will humanitarians assess the contributions of AI tools? What do humanitarian specific frameworks of AI look like? What information gaps exist that might impede the ability of humanitarians to understand how AI use may undermine humanitarian principles? 

Humanitarians need to know how to measure the impact of the AI tools they are seeking to deploy. This is not an easy task in the face of a rapidly changing technological and humanitarian landscape. In our session, facilitated by Linda, our expert speakers discussed the current state of M&E of humanitarian AI and considered what kind of M&E frameworks are necessary to ensure the sector can effectively assess AI tools.

AI tools in their current iteration are likely to make it more difficult for humanitarians to fulfill promises of participation and community engagement

“We risk making mistakes at the time of greatest need.”

Technology is inhumane, and in a sector focused on serving people directly, the chill of AI tools should not be overlooked. Helen shared her thoughts on the importance of trust in technology and AI in particular. Building public AI systems requires trust and transparency to ensure genuine insight and oversight of AI tools. 

Technology failure is commonplace. Even in contexts with strong checks and balances, where guardrails around innovation and experimentation, and rule of law are strong, mishaps are common. Prevention of AI harm AI is a complex process that goes beyond regulation and oversight. It requires ongoing engagement with users and those subjected to AI-mediated outcomes. In humanitarian contexts with ‘a great vulnerability and real need,’ Helen noted the importance of trust and humanitarian accountability to communities and to peers. 

Quito explained how AI tools often play up to the sector’s biggest stressors – efficiency, resource constraints and solutionism – making AI tools especially tempting in spaces where good solutions are expensive or hard to come by. Given this, it is vital for humanitarians to have a robust understanding of what participation means and to ensure that AI tools are attuned to those clearly delineated methods rather than being inserted wholesale as a solution.

Humanitarians need to gather more evidence on whether and how AI tools contribute to change and delivering impact

“In the sector we speak in platitudes but we need specific inquiries – what impact will this have on humanitarians? On communities?”

When it comes to assessing the impact of AI, the details matter! Sarah emphasised the importance of gathering evidence on the benefits and costs of using AI in very specific use cases and in the humanitarian agenda more broadly. Breaking down how we think and talk about the impact of AI must take place before AI tools are deployed. Broad claims about the ability of AI tools to generate efficiencies – to reduce bureaucracies and improve productivity – do little to illuminate how AI-related benefits in one area of humanitarian work can have adverse consequences such as job losses in another area. Speaking to this dynamic, Sarah stated that ‘‘Who wins and who loses’ and where they are (‘Geneva or Goma?’) should also be accounted for in the impact discussion. 

The nature of AI tools can make it difficult to measure benefits and trade-offs with any level of specificity. The ‘amorphous nature of AI tools’ described by Quito means AI is often seen as a jack-of-all trades. This brings evaluation challenges: as AI tools become integrated within other workflows, understanding specific risks and harms related to AI use is less simple. 

In order for tools to be properly assessed and understood, the sector as a whole needs better digital literacy. Sustaining digital literacy efforts means ensuring humanitarians bring their knowledge and awareness of data and digital related risks with them as they move from one emergency to the next. Gathering evidence on risks is often hindered by the lack of personnel. Many of those previously engaged in data and digital literacy work are moving into AI literacy work, leaving a knowledge and capacity gap.

Lastly, the metrics humanitarians select shape the sector’s understanding of AI. Sarah cautioned about the need to consider how humanitarians measure models, and the metrics they choose to use and where they come from. Defining measures of efficacy in models is a contested endeavour both in the humanitarian space and the tech space. Testing assumptions of AI related benefits, as well as gaining insight into the conditions in which these benefits are or aren’t realised, requires rigorous testing and analysis before humanitarians can prove the wide-ranging claims made about AI. 

Developing humanitarian specific standards, principles and guidance requires a nuanced discussion and consideration of impact and success

“There are hundreds of standards, technical and non-technical standards, UN agencies, humanitarian and others not specific to humanitarian organizations that can inform a sector approach. Humanitarians cannot keep  waiting for things to be delivered in the systems they are used to.”

The breadth of AI applications makes standard setting difficult. Additionally, the knowledge required – as pointed out above – may be out of reach for organisations. Organisations cannot be expected to have all the knowledge to properly evaluate tools in-house, argued Heather. Both Heather and Sarah suggested instead that organisations think about how to develop a rigorous approach to vetting and evaluation that includes external third-party auditors. 

Sarah went on to describe the high costs of pre-procurement testing and questioned whether humanitarians should be using tools that they cannot properly test. This points to underlying tension in the M+E of AI tools: organisations lack both the internal knowledge to properly test tools and the resources for these tools to be tested externally. Donors and funders need to step up and offer resources for vetting AI systems and tools. The viability of standards and guidance is intricately bound up in actual implementation and the possible limitations or challenges of implementation. Humanitarians don’t seem to understand the kind of support they would need to properly implement standards.

Helen emphasised the importance of listening to communities and knowing ‘what they think is important during moments of crisis,’ and how to define evaluation frameworks in tandem with communities. CDAC has already undertaken work on understanding how conflict affected groups think about tech and AI, finding that digital literacy and AI familiarity are two disparate things. Community engagement can help underpin any use of AI in community understanding and preferences, however, as Quito explained, clear accountability mechanisms and pathways are also needed to complement community work.

In addition, learnings from other past mistakes or system shocks can help inform the approach to AI. Data minimization policies, review of biometric use, and other sector-level pivots show how the sector has learnt from its own previous challenges. However, we can also learn from other spaces such as academic circles, where salient conversations on power in relation to innovation can help guide humanitarian AI use.

Key takeaways

  • The dominance of GenAI tools in humanitarian AI discussions is a distraction; most AI use is still in the realm of non-generative AI tools. As Sarah pointed out, GenAI tools are actually ‘not where the most exciting uses and opportunities are.’
  • Even more important than digital literacy might be an understanding of how humanitarian situations operate and honesty about limitations, especially in relation to implementation
  • The sector cannot overlook M+E teams who are often the first to utilise AI tools in their workflows and as a result are more experienced when it comes to testing and evaluating AI tools. Humanitarian organisations can look to their M&E colleagues to take the first steps toward a more robust and coherent framework for evaluating humanitarian AI.
  • Shared conversation is key – creating spaces where different nodes of the humanitarian sector, especially smaller humanitarian NGOs, can gather is crucial. Humanitarians are already using AI tools, meaning that the sector needs to consider how to properly prepare itself to grapple with a range of hard questions.
  • Lifecycle focused M+E processes are necessary to understand humanitarian AI across the different stages of inception, development and deployment. At each stage, humanitarians need a clear approach to understanding the relationship between harms and benefits.

Looking forward

The AI conversation is still unfolding within the sector. We really enjoyed hearing from other experts about how they think humanitarians should approach the monitoring and evaluation of humanitarian AI. We intend to continue these threads of conversation within the Humanitarian AI+MERL Working Group. If you’re interested in learning more and becoming a part of these discussions, then make sure to join the NLP CoP, a community of 1500 development and humanitarian practitioners working at the intersection of AI, digital development, and MERL. 

A big thank you to Helen McIlhinney, Sarah Spencer, and Heather Leson for joining the conversation and contributing their valuable insights. 

Image via Unsplash.

Leave a Reply

Your email address will not be published. Required fields are marked *