Push that boundary! Hold that boundary! Exploring multiple paths of inquiry for AI and Evaluation


This post was written by Silva Ferretti, Grace-Lyn Higdon, Kecia Bertermann and Linda Raftree, collaborators on an upcoming session on AI and Ethics at the European Evaluation Society Conference, September 23-27, 2024.

Illustration by Silva Ferretti: Two cartoon characters asking if they should “push that boundary or hold that boundary?”

When our team of four first gathered to prepare for our upcoming European Evaluation Society (EES) 2024 Conference session on “Ethical Use of Emerging AI,” what began as introductions quickly evolved into a comparison of personal AI experiences. Unlike typical software discussions, where the scope is narrower, AI conversations tend to expand into broader pictures and diverse stances. Our approaches to this technology provide a glimpse into how AI is reshaping our professional landscapes and personal worldviews.

When people passionate about AI get together, the conversation naturally and immediately broadens. And this aspect is crucial: AI isn’t just about sharing expertise on a set of tools whose perimeter is overall quite clear —it’s about discovering opportunities that will evolve in response to who’s using it and how and for which task.These discussions are essential for collectively assessing AI’s potential while also exposing our concerns and fears. That’s why the upcoming EES fishbowl is such an exciting opportunity—it’s a space where we can explore AI together, in a way that’s deeply connected to who we are.

The “Ethical Use of Emerging AI: How Can Strategic Guidance Support Rather Than Stifle Safe Innovation for Evaluation?” fishbowl session will take place on Wednesday, 25 Sept at 5:15pm in Borgo 1. See the full EES program here.

Our guiding ideas emerged immediately, revealing themselves for what they truly are: core questions that we continuously ask ourselves, reflecting our deepest concerns. These are the questions we want to put front and center, but they’re not the kind we can easily answer or fully unpack. Every attempt to unravel them only leads to more layers of nested considerations.

And while we think of AI as being so new, we should remember that evaluation itself is relatively new! The current methods are only a few decades old – and not set in stone! So there is ample space for discovery. AI might be a tool that allows us to more easily do more of the same – more efficient, cheaper conventional evaluations. It could also be a tool that lets us explore completely new things that we never imagined. Regardless of which angle we’re looking at, there are a lot of questions, concerns and possibilities when it comes to AI.

AI opens multiple paths of inquiry

We began walking along a number of paths of inquiry at our prep meeting, and we are very excited to venture further down them at our session, with a room of people who share our mix of curiosity and skepticism. Some provocations we are excited to chat with other evaluators about include:

AI is not simply ‘software.’ We need to explore and discuss the multifaceted nature of AI, its impact on various domains, and how our personal and professional experiences shape our interaction with it – whilst dispelling the limited understanding we often see amongst M&E professionals (The oft-asked question “what AI app do you use?” reveals that the core nature of AI and ways it might impact AI in evaluation is not yet on all of our radars). “AI is not just software” means we recognise the many forms that AI can take. It also reminds us that while we can look at the backend of a software program to learn how it arrives at its outputs, the inner functioning of AI, its “black box” remains largely opaque even to its creators and programmers. We need to better understand AI and its nature if we are to integrate it into our Evaluation practice.

“People always confabulate. Half-truths and misremembered details are hallmarks of human conversation: Confabulation is a signature of human memory. These models are doing something just like people.” – Geoffrey Hinton, ‘Godfather of AI’, MIT Technology Review

Replicability matters. Between the replicability crisis in social science and conversations pushing the envelope on traditional considerations of validity, the requirement for replicability in rapidly changing contexts where variables cannot remain static has come under intense scrutiny. AI highlights that not everything is replicable, and sometimes not even traceable. Forcing replicability into complexity may be like fitting round pegs in a square hole. Understanding what elements of validity are important, rather than replicability, may be the more central question both for complexity evaluation and for AI in evaluation. We need to continue having this discussion in the Evaluation sector.

Bias is universal. A lot is discussed about the potential biases of AI, but humans are also biased. Could AI actually help us recognize and address our own biases more effectively? In fact, could AI be used as a tool to reveal such bias (e.g. perspective that are male, white, and global minority based) in a way that (human) colleagues may struggle to? What about the range of evidence that shows how AI reinforces existing biases? How might we work with biased tools to challenge our own explicit and implicit biases? As Justin Wolfers notes in a talk from 2023 about the use of Generative AI by university students, AI is critiqued for being “trained on a canon of dead white men” yet “so are our intro econ students.” The same could be said of how most evaluators are trained. We need to address bias everywhere!

Screen shot from a 2023 talk by Justin Wolfers about GenAI use among university students.

Ethical guardrails created through guidance and practice is central to our work and our EES fishbowl activity. The need for ethical guardrails extends beyond widely known challenges around consent to touch on fundamental questions about AI’s role in our field. We must consider the implications of AI-driven analysis and simulated stakeholders, where human involvement is limited to final approval. This scenario isn’t entirely new – we’ve seen similar challenges in sectors such as predictive policing and credit scores – but AI’s capabilities raise the stakes significantly as governments and others yearn to improve efficiencies and ‘objectivity’ with the use of AI. While EU data protection regulations give people the right to contest  automated decisions (in other words, decisions made by AI algorithms) and ask that they be reviewed by humans, are evaluators aware of how they might be ‘automating decisions’ by using AI in their own practice? Are the subjects of evaluation informed about this right and able to demand it? Automated decisions are only one part of the big ethical questions that evaluators need to consider when adopting AI.

As we navigate this landscape, we must ensure that our ethical frameworks evolve alongside the technology, promoting responsible innovation while safeguarding the integrity and human-centered nature of evaluation. 

Come to our EES Session!

AI highlights the tension between pushing boundaries and maintaining ethical guardrails. Our EES 2024 fishbowl session is an opportunity to collectively explore how AI is reshaping our field and how we want to consider ethical guardrails as we explore AI’s uses in our field. Whether you’re an AI enthusiast or skeptic, your voice and experiences are welcome in this ongoing dialogue. Please join us!

1 comment

  1. There are a couple of statements in this note that I really connect with. I came to AI as a means of checking my own biases. At the time people were quite surprised about that use and I’m glad that you are raising this.

    I am also very interested in issues of validity, so the comments on trying to fit our ‘traditional’ approaches to validity into complexity is something that I’ve been talking about for years – our worship of Theories of Change and “Program Logic” has blinded us to this simple fact. Hope to see you all in EES, although I think I have to leave town before your session.

Leave a Reply

Your email address will not be published. Required fields are marked *