Event Recap: AI and MERL in Latin America, launching the Working Group


By María de los Ángeles Lasa and Fernando C. Barbosa*. Lee en español aquí.

On May 19, we kicked off “AI and MERL in Latin America”, a new group hosted by the MERL Tech Initiative’s Community of Practice. The online session in Spanish, brought together practitioners interested in how artificial intelligence is already being used in monitoring, evaluation, research, and learning (MERL) across the region.

Earlier this year, we consulted members of the community, and their needs were consistent: practical tools, real examples and challenges from the region, as well as responsible-use questions on ethics, safeguards, and data governance. With that in mind, our aim was to create a sustained space to discuss these issues grounded in the realities of Latin America (and in the languages of our region).

For the launch, we invited two practitioners working with AI in different but complementary settings: Ana Henríquez Orrego from Chile, Director of Academic Audits at Universidad de Las Américas and a member of its Observatory on Artificial Intelligence in Education; and Dr. Alejandra Lucero Manzano from Argentina, a professor and consultant in the planning and evaluation of development policies and international cooperation programs. The idea was grounding the conversation in their day-to-day practice and opening the space for future exchanges. 

Write a job profile, not just a prompt

Ana Henríquez Orrego introduced an easy-to-grasp way to stop chasing the “perfect prompt”. When a team brings in a new colleague, it does not simply give that person one instruction. Instead, it defines a role with quality standards: “Imagine someone is coming to collaborate with you. What would you ask of them, and what instructions would you give so they understand what to do?” Ana applies the same logic to AI through what she calls a “job profile” (perfil de cargo).

This framing turns prompting into workflow design. A useful AI assistant needs context, a defined role, expected outputs, quality criteria, limits, approved sources, and a way to check its work. Going through those decisions also reveals whether a tool offers enough control for professional use.

Ana also outlined three roles that AI can play:

Assistant: completes a task that the user already understands and can supervise.

Tutor: helps a person or team learn while doing the work.

Character: simulates a person or scenario, for example to test an interview guide or survey instrument.

In academic quality assurance, Ana has created two complementary tutors to support accreditation audits: one in NotebookLM and another as a custom GPT. Deans and program directors use them to understand what will be assessed, what evidence is expected, and how to prepare. The tools do not conduct the audit independently; they help people participate more in a process that already exists. She also relies on purpose-built assistants: a curricular advisor, a learning-outcomes reviewer, even a report corrector she applies to her own drafts.

It matters well beyond higher education. In MERL, AI may help organize evidence, review an evaluation matrix, compare documents, or produce a first draft. But practitioners still have to understand the task well enough to specify the role and assess the result. As Ana emphasized, AI should strengthen and accompany existing work, not replace it.

Map AI onto the evaluation cycle, then start small

Alejandra Lucero Manzano zoomed out to the whole evaluation cycle. AI can support preparation, methodological design, data collection, analysis, communication, and the use of findings. The value of a tool, however, depends on the task it is being asked to perform.

She started with a readiness question: What is the organization’s current level of AI maturity? An independent consultant and a large organization face different opportunities and constraints. Adoption should begin from that reality rather than from an ideal version.

Walking through each phase, Alejandra showed how AI can support background reviews and terms of reference, visualize a theory of change, prototype small applications, simulate respondents to test instruments, and assist in qualitative coding, quantitative analysis, and the communication of findings.

Yet her most useful recommendation was to resist automating the entire cycle at once. For teams at an early stage, building small workflows is more realistic and easier to verify. One example was to draft a survey with a conversational AI, request an output compatible with KoboToolbox, upload it, and then validate the instrument and its branching logic.

The same principle applies to analysis. Evaluators should specify the analytical approach, know the source documents, and distinguish between organizing information and interpreting it. AI can miss regional expressions, irony, sarcasm, and other contextual signals. Alejandra suggested requesting specific citations and separating descriptive processing from inference, otherwise, a system may blend the two. The evaluator’s judgment remains the guardrail.

Her warning was clear: a vague prompt produces a vague response, and no prompt can substitute for methodological clarity or contextual knowledge.

Responsible use begins with institutional rules

Questions from participants brought the discussion to trust, disclosure, and data protection. Should an evaluation report disclose the use of AI? How can teams build confidence in AI-assisted products?

Ana argued that institutions need to set the rules before expecting consistent individual decisions. In the institutions where she works, AI use must be declared, and approved tools are governed by institutional policies. Personal data should not be processed through free, personal-use accounts. Systems that handle such data require appropriate security review, administration, and informed consent.

Alejandra added practical safeguards: review privacy settings rather than accepting defaults, separate professional and personal accounts, anonymizing data before using cloud services, and understanding where information is stored. Consent, confidentiality, and responsible data management have long been part of sound MERL practice. AI makes them more urgent and failures less visible.

The discussion reinforced a broader point that keeping a human “in the loop” is not enough if that person lacks the authority, time, or contextual knowledge to challenge the output. Meaningful oversight requires people who understand the evidence, can interrogate the model’s reasoning, and remain accountable for the final judgment.

Building a regional space for practice and learning

Across both presentations, several shared principles emerged: start with the problem and workflow rather than the tool; clearly define the AI system’s role, sources, criteria, and limits; match adoption to organizational readiness; begin with small, verifiable workflows; and keep contextual interpretation, accountability, transparency, privacy, and data governance at the center of the process.

These concerns are also regional. Tools developed elsewhere do not automatically understand Latin American institutions, languages, professional practices, or social contexts. Creating space to compare experience across the region can help practitioners distinguish what transfers and what requires adaptation.

This first event began that exchange. The group will convene some public sessions each year but also serves as an open channel for members to propose topics, share resources, and connect with peers in between.

Watch the recording of the launch event and join the NLP Community of Practice to take part in future AI and MERL in Latin America activities.

Our thanks to Ana and Alejandra for a generous first session, and everyone who joined and contributed with excellent questions. ¡Nos vemos en la próxima!

* Codex was used to summarize the event transcript and adjust the text language. 

Leave a Reply

Your email address will not be published. Required fields are marked *