by Linda Raftree, Independent Consultant and MERL Tech organizer
Back in 2014, the humanitarian and development sectors were in the heyday of excitement over innovation and Information and Communication Technologies for Development (ICT4D). The role of ICTs specifically for monitoring, evaluation, research and learning (aka “MERL Tech“) had not been systematized (as far as I know), and it was unclear whether there actually was “a field.” I had the privilege of writing a discussion paper with Michael Bamberger to explore how and why new technologies were being tested and used in the different steps of a traditional planning, monitoring and evaluation cycle. (See graphic 1 below, from our paper).
The approaches highlighted in 2014 focused on mobile phones, for example: text messages (SMS), mobile data gathering, use of mobiles for photos and recording, mapping with specific handheld global positioning systems (GPS) devices or GPS installed in mobile phones. Promising technologies included tablets, which were only beginning to be used for M&E; “the cloud,” which enabled easier updating of software and applications; remote sensing and satellite imagery, dashboards, and online software that helped evaluators do their work more easily. Social media was also really taking off in 2014. It was seen as a potential way to monitor discussions among program participants, gather feedback from program participants, and considered an underutilized tool for greater dissemination of evaluation results and learning. Real-time data and big data and feedback loops were emerging as ways that program monitoring could be improved, and quicker adaptation could happen.
In our paper, we outlined five main challenges for the use of ICTs for M&E: selectivity bias; technology- or tool-driven M&E processes; over-reliance on digital data and remotely collected data; low institutional capacity and resistance to change; and privacy and protection. We also suggested key areas to consider when integrating ICTs into M&E: quality M&E planning, design validity; value-add (or not) of ICTs; using the right combination of tools; adapting and testing new processes before roll-out; technology access and inclusion; motivation to use ICTs, privacy and protection; unintended consequences; local capacity; measuring what matters (not just what the tech allows you to measure); and effectively using and sharing M&E information and learning.
We concluded that:
The field of ICTs in M&E is emerging and activity is happening at multiple levels and with a wide range of tools and approaches and actors.
The field needs more documentation on the utility and impact of ICTs for M&E.
Pressure to show impact may open up space for testing new M&E approaches.
A number of pitfalls need to be avoided when designing an evaluation plan that involves ICTs.
Investment in the development, application and evaluation of new M&E methods could help evaluators and organizations adapt their approaches throughout the entire program cycle, making them more flexible and adjusted to the complex environments in which development initiatives and M&E take place.
Where are we now: MERL Tech in 2019
Much has happened globally over the past five years in the wider field of technology, communications, infrastructure, and society, and these changes have influenced the MERL Tech space. Our 2014 focus on basic mobile phones, SMS, mobile surveys, mapping, and crowdsourcing might now appear quaint, considering that worldwide access to smartphones and the Internet has expanded beyond the expectations of many. We know that access is not evenly distributed, but the fact that more and more people are getting online cannot be disputed. Some MERL practitioners are using advanced artificial intelligence, machine learning, biometrics, and sentiment analysis in their work. And as smartphone and Internet use continue to grow, more data will be produced by people around the world. The way that MERL practitioners access and use data will likely continue to shift, and the composition of MERL teams and their required skillsets will also change.
The excitement over innovation and new technologies seen in 2014 could also be seen as naive, however, considering some of the negative consequences that have emerged, for example social media inspired violence (such as that in Myanmar), election and political interference through the Internet, misinformation and disinformation, and the race to the bottom through the online “gig economy.”
In this changing context, a team of MERL Tech practitioners (both enthusiasts and skeptics) embarked on a second round of research in order to try to provide an updated “State of the Field” for MERL Tech that looks at changes in the space between 2014 and 2019.
Based on MERL Tech conferences and wider conversations in the MERL Tech space, we identified three general waves of technology emergence in MERL:
First wave: Tech for Traditional MERL: Use of technology (including mobile phones, satellites, and increasingly sophisticated data bases) to do ‘what we’ve always done,’ with a focus on digital data collection and management. For these uses of “MERL Tech” there is a growing evidence base.
Second wave: Big Data. Exploration of big data and data science for MERL purposes. While plenty has been written about big data for other sectors, the literature on the use of big data and data science for MERL is somewhat limited, and it is more focused on potential than actual use.
Third wave: Emerging approaches. Technologies and approaches that generate new sources and forms of data; offer different modalities of data collection; provide ways to store and organize data, and provide new techniques for data processing and analysis. The potential of these has been explored, but there seems to be little evidence base to be found on their actual use for MERL.
We’ll be doing a few sessions at the American Evaluation Association conference this week to share what we’ve been finding in our research. Please join us if you’ll be attending the conference!
by asking for information that can then be misused.
In the quest for understanding What Works, the focus is often too narrowly on program goals rather than the safety of people. A classic example in the environmental domain is the use of DDT: “promoted as a wonder-chemical, the simple solution to pest problems large and small. Today, nearly 40 years after DDT was banned in the U.S., we continue to live with its long-lasting effects.” The original evaluation of its effects had failed to identify harm and emphasized its benefits. Only when harm to the ecosystem became more apparent was evidence presented in Rachel Carson’s book Silent Spring. We should not have to wait for failure to be so apparent before evaluating for harm.
Ethical standards have been developed for evaluators, which are discussed at conferences and included in professional training. Yet institutional monitoring and evaluation practices still struggle to fully get to grips with the reality of harm in the pressure to get results reported. If we want monitoring and evaluation to be safer for the 21st Century we need to shift from training and evaluator-to-evaluator discussions to changing institutional practices.
At a workshop convened by Oxfam and the Rockefeller Foundation in 2019, we sought to identify core issues that could cause harm and get to grips with areas where institutions need to change practices. The workshop brought together partners from UN agencies, philanthropies, research organizations and NGOs. This meeting sought to give substance to issues. It was noted by a participant that though the UNEG Norms and Standards and UNDP’s evaluation policy are designed to make evaluation safe, in practice there is little consideration given to capturing or understanding the unintended or perverse consequences of programs or policies. The workshop explored this and other issues and identified three areas of practice that could help to reframe institutional monitoring and evaluation in a practical manner.
1. Data rights, privacy and protection:
In working on rights in the 21st Century, data and Information are some of the most important ‘levers’ pulled to harm and disadvantage people. Oxfam has had a Responsible Data in Program policy in place since 2015 goes some way towards recognizing this.But we know we need to more fully implement data privacy and protection measures in our work.
Planned and future work includes stronger governance, standardized baseline measures of privacy & information security, and communications/guidance/change management. This includes changes in evaluation protocols related to how we assess risk to the people we work with, who gets access to the data and ensure consent for how the data will be used.
This is a start, but consistent implementation is hard and if we know we aren’t competent at operating the controls within our reach, it becomes more difficult in how we call others out if they are causing harm when they misuse theirs.
2. Harm prevention lens for evaluation
The discussion highlighted that evaluation has not often sought to understand the harm of practices or interventions. When they do, however, the results can powerfully shed new light on an issue. A case that starkly illustrates potential under-reporting is that of the UN Military Operation in Liberia (UNMIL). UNMIL was put in place with the aim “to consolidate peace, address insecurity and catalyze the broader development of Liberia”. Traditionally we would evaluate this objective. Taking a harm lens we may evaluate the sexual exploitation and abuse related to the deployment. The reporting system highlights low levels of abuse, 14 from 2007 – 2008 and 6 in 2015. A study by Beber, Gilligan, Guardado and Karim, however, estimated through representative randomized survey that more than half of eighteen- to thirty-year-old women in greater Monrovia have engaged in transactional sex and that most of them (more than three-quarters, or about 58,000 women) have done so with UN personnel, typically in exchange for money.
Changing evaluation practice should not just focus on harm in the human systems, but also provide insight in the broader ecosystem. Institutionally there needs to be championship for identifying harm within and through monitoring and evaluation practice and changes in practice.
3. Strengthening safeguarding and evaluation skills
We need to resource teams appropriately so they have the capacity to be responsive to harm and reflective on the potential for harm. This is both about tools and procedures and conceptual frames.
Tools and procedures can include, for example:
Codes-of-conduct that create a safe environment for reporting issues
Transparent reporting lines to safeguarding/safe programming advisors
Training based on actual cases
Safe data protocols (see above)
All of these fall by the way-side, however, if the values and concepts that guide implementation are absent. Rodney Hopson at the workshop, drawing on environmental policy and concepts of ecology, presented a frame to increasing evaluators’ usefulness in complex ecologies where safeguarding issues are prevalent, that emphasizes:
Relationships – the need to identify and relate to key interests, interactions, variables and stakeholders amid dynamic and complex issues in an honest manner that is based on building trust.
Responsibilities – acting with propriety, doing what is proper, fair, right, just in evaluation against standards.
Relevance – being accurate and meaningful technically, culturally and contextually.
Safe monitoring and evaluation in the 21st Century does not just seek ‘What Works’ and will need to be relentless at looking at ‘How we can work differently?’. This includes us understanding connectivity in harm between human and environmental systems. The three areas noted here are a start of a conversation and a challenge to institutions to think more about what it means to be safe in monitoring and evaluation practice.
Planning to attend the American Evaluation Association Conference this week? Join us for the session “Institutionalizing Doing no Harm in Monitoring and Evaluation” on Thursday, Nov 14, 2019, from 8- 9:00 AM) in room CC M100 H.
Panelists will discuss ideas to better address harm in regards to: (i) harm identification and mitigation in evaluation practice; (ii) responsible data practice evaluation in complex ecologies, (iii) understanding harm in an international development context, and (iv) evaluation in complex ecologies.
The panel will be chaired by Veronica M Olazabal, (Senior Advisor & Director, Measurement, Evaluation and Organizational Performance, The Rockefeller Foundation) , with speakers Stephen Porter (Evaluation Strategy Advisor, World Bank), Linda Raftree (Independent Consultant, Organizer of MERL Tech), Dugan Fraser (Prof & Director CLEAR-AA – University of the Witwatersrand, Johannesburg) and Rodney Hopson (Prof of Evaluation, Department of Ed Psych, University of Illinois Urbana-Champaign). View the full program here: https://lnkd.in/g-CHMEj
People are accessing the Internet, smartphones, and social media like never before, and the social and behavior change communication community is exploring the use of digital tools and social media for influencing behavior. The MERL Tech session, “Engaging for responsible change in a connected world: Good practices for measuring SBCC impact” was put together by Linda Raftree, Khwezi Magwaza, and Yvonne MacPherson, and it set out to help dive into Digital Social and Behavior Change Communication (SBCC).
Linda is the MERL Tech Organizer, but she also works as an independent consultant. She has worked as an Advisor for Girl Effect on research and digital safeguarding in digital behavior change programs with adolescent girls. She also recently wrote a landscaping paper for iMedia on Digital SBCC. Linda opened the session by sharing lessons from the paper, complemented by learning drawn from research and practice at Girl Effect.
Digital SBCC is expanding due to smartphone access. In the work with Girl Effect, it was clear that even when girls in lower income communities did not own smartphones they often borrowed them. Project leaders should consider several relevant theories on influencing human behavior, such as social cognitive theory, behavioral economics, and social norm theory. Additionally, an ethical issue in SBCC projects is whether there is transparency about the behavior change efforts an organization is carrying out, and whether people even want their behaviors to be challenged or changed.
When it comes to creating a SBCC project, Linda shared a few tips:
Users are largely unaware of data risks when sharing personal information online
We need to understand peoples’ habits. Being in tune with local context is important, as is design for habits, preferences, and interests.
Avoid being fooled by vanity metrics. For example, even if something had a lot of clicks, how do you know an action was taken afterwards?
Data can be sensitive to deal with. For some, just looking at information online, such as facts on contraception, can put them at risk. Be sure to be careful of this when developing content.
The session’s second presenter was Khwezi Magwaza who has worked as a writer and radio, digital, and television producer. She worked as a content editor for Praekelt.org and also served as the Content Lead at Girl Effect. Khwezi is currently providing advisory to an International Rescue Committee platform in Tanzania that aims to support improved gender integration in refugee settings. Lessons from Khwezi from working in digital SBCC included:
Sex education can be taboo, and community healthcare workers are often people’s first touch point.
There is a difference between social behavior change and, more precisely, individual behavior change.
People and organizations working in SBCC need to think outside the box and learn how to measure it in non-traditional ways.
Just because something is free doesn’t mean people will like it. We need to aim for high quality, modern, engaging content when creating SBCC programs.
It’s also critical to hire the right staff. Khwezi suggested building up engineering capacity in house rather than relying entirely on external developers. Having a digital company hand something over to you that you’re stuck with is like inheriting a dinosaur. Organizations need to have a real working relationship with their tech supplier and to make sure the tech can grow and adapt as the program does.
The third panelist from the session was Yvonne MacPherson, the U.S. Director of BBC Media Action, which is the BBC’s international NGO that was made to use communication and media to further development. Yvonne noted that:
Donors often want an app, but it’s important to push back on solely digital platforms.
Face-to-face contact and personal connections are vital in programs, and social media should not be the only form of communication within SBCC programs.
There is a need to look at social media outreach experiences from various sectors to learn, but that the contexts that INGOs and national NGOs are working in is different from the environments where most people with digital engagement skills have worked, so we need more research and it’s critical to understand local context and behaviors of the populations we want to engage.
Challenges are being seen with so-called “dark channels,” (WhatsApp, Facebook Messenger) where many people are moving and where it becomes difficult to track behaviors. Ethical issues with dark channels have also emerged, as there are rich content options on them, but researchers have yet to figure out how to obtain consent to use these channels for research without interrupting the dynamic within channels.
I asked Yvonne if, in her experience and research, she thought Instagram or Facebook influencers (like celebrities) influenced young girls more than local community members could. She said there’s really no one answer for that one. There actually needs to be a detailed ethnographic research or study to understand the local context before making any decisions on design of an SBCC campaign. It’s critical to understand the target group — what ages they are, where do they come from, and other similar questions.
Resources for the Reader
To learn more about digital SBCC check out these resources, or get in touch with each of the speakers on Twitter:
by: Sylvia Otieno, MPA candidate at George Washington University and Consultant at the World Bank’s IEG; and Allana Nelson, Senior Manager for the Digital Principles at DIAL
For nearly a decade, the Principles of Digital Development (Digital Principles) have served to guide practitioners in developing and implementing digital tools in their programming. The plenary session at MERL Tech DC 2019 titled “Living Our Vision: Applying the Principles of Digital Development as an Evaluative Methodology” introduced attendees to four evaluation tools that have been developed to help organizations incorporate the Digital Principles into their design, planning, and assessments.
This panel – organized and moderated by Allana Nelson, Senior Manager for the Digital Principles stewardship at the Digital Impact Alliance (DIAL) – highlighted digital development frameworks and tools developed by SIMLab, USAID in collaboration with John Snow Inc., Digital Impact Alliance (DIAL) in collaboration with TechChange, and the Response Innovation Lab. These frameworks and toolkits were built on the good practice guidance provided by the Principles for Digital Development. They are intended to assist development practitioners to be more thoughtful about how they use technology and digital innovations in their programs and organizations. Furthermore, the toolkits assist organizations with building evidence to inform program development.
Laura Walker McDonald, Senior Director for Insights and Impact at DIAL, presented the Monitoring and Evaluation Framework (developed during her time at SIMLab), which assists practitioners in measuring the impact of their work and the contribution of inclusive technologies to their impact and outcomes. This Monitoring and Evaluation Framework was developed out of the need for more evidence of the successes and failures of technology for social change. “We have almost no evidence of how innovation is brought to scale. This work is trying to reflect publicly the practice of sharing learnings and evaluations. Technology and development isn’t as good as it could be because of this lack of evidence,” McDonald said. The Principles for Digital Development provide the Framework’s benchmarks. McDonald continues to refine this Framework based on feedback from community experts, and she welcomes input that can be shared through this document.
Christopher Neu, COO of TechChange, introduced the new, cross-sector Digital Principles Maturity Matrix Tool for Proposal Evaluation that his team developed on behalf of DIAL. The Maturity Matrix tool helps donors and implementers asses how the Digital Principles are planned to be used during the program proposal creation process. Donors may use the tool to evaluate proposal responses to their funding opportunities, and implementers may use the tool as they write their proposals. “This is a tool to give donors and implementers a way to talk about the Digital Principles in their work. This is the beginning of the process, not the end,” Neu said during the session. Users of the Maturity Matrix Tool score themselves on a rating between one and three against metrics that span each of the nine Digital Principles and across the four stages of the Digital Principles project lifecycle. A program is scored one when it loosely incorporates the identified activity or action into proposals and implementation. A score of two indicates that the program is clearly in line with best practices or that the proposal’s writers have at least thought considerably about them. Those who incorporate the Digital Principles on a deeper level and provide an action plan to increase engagement earn a score of three. It is important to note that not every project will require the same level of Digital Principles Maturity, and not every Digital Principle may be required to be used in a program. The scores are intended to provide donors and organizations evidence that they are making the best and most responsible investment in technology.
Steve Ollis, Senior Digital Health Advisor at John Snow Inc., presented the Digital Health Investment Review Tool (DHIRT), which assists donors investing in Digital Health programs to make informed decisions about their funding. The tool asks donors to adhere to the Digital Principles and the Principles of Donor Alignment for Digital Health (Digital Investment Principles), which are also based on the Digital Principles. When implementing this tool, practitioners can assess implementer proposals across 12 criteria. After receiving a score between one to five (one being nascent and five being optimized), organizations can better assess how effectively they incorporate the Digital Principles and other best practices (including change management) into their project proposals.
Max Vielle, Global Director of Response Innovation Lab, introduced the Innovation Evidence Toolkit, which helps technology innovators in the humanitarian sector build evidence to thoughtfully develop and assess their prototypes and pilots. “We wanted to build a range of tools for implementors to assess their ability to scale the project,” Vielle said of the toolkit. Additionally, the tool assists innovators in determining the scalability of their technologies. The Innovation Evidence Toolkit helps humanitarian innovators and social entrepreneurs think through how they use technology when developing, piloting, and scaling their projects. “We want to remove the barriers for non-humanitarian actors to act in humanitarian responses to get services to people who need them,” Vielle said. This accessible toolkit can be used by organizations with varying levels of capacity and is available offline for those working in low-connectivity environments.
Evidence-based decision making is key to improving the use of technologies in the development industry. The coupling of the Principles of Digital Development and evaluation methodologies will assist development practitioners, donors, and innovators not only in building evidence, but also in effectively implementing programs that align with the Digital Principles.
Guest post from Jo Kaybryn, an international development consultant currently directing evaluation frameworks, evaluation quality assurance services, and leading evaluations for UN agencies and INGOs.
“Upping the Ex Ante” is a series of articles aimed at evaluators in international development exploring how our work is affected by – and affects – digital data and technology. I’ve been having lots of exciting conversations with people from all corners of the universe about our brave new world. But I’ve also been conscious that for those who have not engaged a lot with the rapid changes in technologies around us, it can be a bit daunting to know where to start. These articles explore a range of technologies and innovations against the backdrop of international development and the particular context of evaluation. For readers not yet well versed in technology there are lots of sources to do further research on areas of interest.
series is half way through, with 4 articles published.
in Part 1 the series has gone back to the olden days (1948!) to consider the
origin story of cybernetics and the influences that are present right now in
algorithms and big data. The philosophical and ethical dilemmas are a recurring
theme in later articles.
examines the problems of distance which is something that technology offers
huge strides forwards in, and yet it remains never fully solved, with a
discussion on what blockchains mean for the veracity of data.
considers qualitative data and shines a light on the gulf between our digital
data-centric and analogue-centric worlds and the need for data scientists and social
scientists to cooperate to make sense of it.
looks at quantitative data and the implications for better decision making, why
evaluators really don’t like an algorithmic “black box”; and reflections on how humans’
assumptions and biases leak into our technologies whether digital or analogue.
few articles will see a focus on ethics, psychology and bias; a case study on a
hypothetical machine learning intervention to identify children at risk of
maltreatment (lots more risk and ethical considerations), and some thoughts about putting it all
in perspective (i.e. Don’t
FHI 360 Academy Hall, 8th Floor 1825 Connecticut Avenue NW Washington, DC 20009
We gathered at the first MERL Tech Conference in 2014 to discuss how technology was enabling the field of monitoring, evaluation, research and learning (MERL). Since then, rapid advances in technology and data have altered how most MERL practitioners conceive of and carry out their work. New media and ICTs have permeated the field to the point where most of us can’t imagine conducting MERL without the aid of digital devices and digital data.
The rosy picture of the digital data revolution and an expanded capacity for decision-making based on digital data and ICTs has been clouded, however, with legitimate questions about how new technologies, devices, and platforms — and the data they generate — can lead to unintended negative consequences or be used to harm individuals, groups and societies.
Join us in Washington, DC, on September 5-6 for this year’s MERL Tech Conference where we’ll be taking stock of changes in the space since 2014; showcasing promising technologies, ideas and case studies; sharing learning and challenges; debating ideas and approaches; and sketching out a vision for an ideal MERL future and the steps we need to take to get there.
Tech and traditional MERL: How is digital technology enabling us to do what we’ve always done, but better (consultation, design, community engagement, data collection and analysis, databases, feedback, knowledge management)? What case studies can be shared to help the wider sector learn and grow? What kinks do we still need to work out? What evidence base exists that can support us to identify good practices? What lessons have we learned? How can we share these lessons and/or skills with the wider community?
Data, data, and more data: How are new forms and sources of data allowing MERL practitioners to enhance their work? How are MERL Practitioners using online platforms, big data, digitized administrative data, artificial intelligence, machine learning, sensors, drones? What does that mean for the ways that we conduct MERL and for who conducts MERL? What concerns are there about how these new forms and sources of data are being used and how can we address them? What evidence shows that these new forms and sources of data are improving MERL (or not improving MERL)? What good practices can inform how we use new forms and sources of data? What skills can be strengthened and shared with the wider MERL community to achieve more with data?
Emerging tools and approaches: What can we do now that we’ve never done before? What new tools and approaches are enabling MERL practitioners to go the extra mile? Is there a use case for blockchain? What about facial recognition and sentiment analysis in MERL? What are the capabilities of these tools and approaches? What early cases or evidence is there to indicate their promise? What ideas are taking shape that should be tried and tested in the sector? What skills can be shared to enable others to explore these tools and approaches? What are the ethical implications of some of these emerging technological capabilities?
The Future of MERL: Where should we be going and what should the future of MERL look like? What does the state of the sector, of digital data, of technology, and of the world in which we live mean for an ideal future for the MERL sector? Where do we need to build stronger bridges for improved MERL? How should we partner and with whom? Where should investments be taking place to enhance MERL practices, skills and capacities? How will we continue to improve local ownership, diversity, inclusion and ethics in technology-enabled MERL? What wider changes need to happen in the sector to enable responsible, effective, inclusive and modern MERL?
Cross-cutting themes include diversity, inclusion, ethics and responsible data, and bridge-building across disciplines.
You’ll join some of the brightest minds working on MERL across a wide range of disciplines – evaluators, development and humanitarian MERL practitioners, small and large non-profit organizations, government and foundations, data scientists and analysts, consulting firms and contractors, technology developers, and data ethicists – for 2 days of in-depth sharing and exploration of what’s been happening across this multidisciplinary field and where we should be heading.
There is no real
evidence base about what does and does not work for applying blockchain
technology to interventions seeking social impacts. Most current blockchain interventions are
driven by developers (programmers) and visionary entrepreneurs. There is little
thinking in current blockchain interventions around designing for “social”
impact (there is an over abundant trust in technology to achieve the outcomes
and little focus on the humans interacting with the technology) and integrating
relevant evidence from behavioral economics, behavior change design, human
centered design, etc.
To build the needed evidence base, Monitoring, Evaluation, Research and Learning (MERL) practitioners will have to not only get to know the broad strokes of blockchain technology but the specifics of token design and tokenomics (the political economics of tokenized ecosystems). Token design could become the focal point for MERL on blockchain interventions since:
If not all, the vast majority of blockchain interventions will involve some type of desired behavior change
The token provides the link between the ledger (which is the blockchain) and the social ecosystem created by the token in which the behavior change is meant to happen
Hence the token is the “nudge” meant to leverage behavior change in the social ecosystem while governing the transactions on the blockchain ledger.
(While this blog will focus on these points, it will not go into a full discussion of what tokens are and how they create ecosystems. But there are some very good resources out there that do this which you can review at your leisure and to the degree that works for you. The Complexity Institute has published a book exploring the various attributes of complexity and main themes involved with tokenomics while Outlier Ventures has published, what I consider, to be the best guidance on token design. The Outlier Ventures guidance contains many of the tools MERL practitioners will be familiar with (problem analysis, stakeholder mapping, etc.) and should be consulted.)
Hence it could be that by understanding token design and its requirements and mapping it against our current MERL thinking, tools and practices, we can develop new thinking and tools that could be the beginning point in building our much-needed evidence base.
What is a “blockchain intervention”?
As MERL practitioners
we roughly define an “intervention” as a group of inputs and activities meant
to leverage outcomes within a given eco-system.
“Interventions” are what we are usually mandated to asses, evaluate and
When thinking about MERL and blockchain, it is useful to think of two categories of “blockchain interventions”.
1) Integrating the blockchain into MERL data collection, entry, management, analysis or dissemination practices and
2) MERL strategies for interventions using the blockchain in some way shape or form.
Here we will focus on the #2 and in so doing demonstrate that while the blockchain is an innovative, potentially disruptive technology, evaluating its applications on social outcomes is still an issue of assessing behavior change against dimensions of intervention design.
Designing for Behavior Change
We generally design
interventions (programs, projects, activities) to “nudge” a certain type of behavior (stated as
outcomes in a theory of change) amongst a certain population (beneficiaries,
stakeholders, etc.). We often attempt to
integrate mechanisms of change into our intervention design, but often do not
for a variety of reasons (lack of understanding, lack of resources, lack of
political will, etc.). This lack of due
diligence in design is partly responsible for the lack of evidence around what
works and what does not work in our current universe of interventions.
Enter blockchain technology, which as MERL practitioners, we will be responsible for assessing in the foreseeable future. Hence, we will need to determine how interventions using the blockchain attempt to nudge behavior, what behaviors they seek to nudge, amongst whom, when and how well the design of the intervention accomplishes these functions. In order to do that we will need to better understand how blockchains use tokens to nudge behavior.
The Centrality of the Token
We have all used tokens before. Stores issue coupons that can only be used at those stores, we get receipts for groceries as soon as we pay, arcades make you buy tokens instead of just using quarters. The coupons and arcade tokens can be considered utility tokens, meaning that they can only be used in a specific “ecosystem” which in this case is a store and arcade respectively. The grocery store receipt is a token because it demonstrates ownership, if you are stopped on the way out the store and you show your receipt you are demonstrating that you now have rights to ownership over the foodstuffs in your bag.
Whether you realize
it or not at the time, these tokens are trying to nudge your behavior. The store gives you the coupon because the
more time you spend in their store trying to redeem coupons, the greatly
likelihood you will spend additional money there. The grocery store wants you to pay for all
your groceries while the arcade wants you to buy more tokens than you end up
If needed, we could design
MERL strategies to assess how well these different tokens nudged the desired
behaviors. We would do this, in part, by thinking about how each token is
designed relative to the behavior it wants (i.e. the value, frequency and
duration of coupons, etc.).
Thinking about these ecosystems and their respective tokens will help us understand the interdependence between 1) the blockchain as a ledger that records transactions, 2) the token that captures the governance structures for how transactions are stored on the blockchain ledger as well as the incentive models for 3) the mechanisms of change in the social eco-system created by the token.
Figure #1: The inter-relationship between the blockchain
(ledger), token and social eco-system
Token Design as Intervention Design
Just as we assess
theories of change and their mechanisms against intervention design, we will
assess blockchain based interventions against their token design in much the
same way. This is because blockchain
tokens capture all the design dimensions of an intervention; namely the problem
to be solved, stakeholders and how they influence the problem (and thus the
solution), stakeholder attributes (as mapped out in something like a
stakeholder analysis), the beneficiary population, assumptions/risks, etc.
Outlier Ventures has adapted what they call a Token
Utility Canvas as a milestone in
their token design process. The canvas
can be correlated to the various dimensions of an evaluability
assessment tool (I am using the evaluability
assessment tool as a demonstration of the necessary dimensions of an
interventions design, meaning that the evaluability assessment tool assesses
the health of all the components of an intervention design). The Token Utility Canvas is a useful
milestone in the token design process that captures many of the problem
diagnostic, stakeholder assessment and other due diligence tools that are
familiar to MERL practitioners who have seen them used in intervention
design. Hence token design could be
largely thought of as intervention design and evaluated as such.
Comparing Token Design with Dimensions of Program Design (as represented in an
This table is not meant to be exhaustive and not all of the fields will be explained here but in general, it could be a useful starting point in developing our own thinking and tools for this emerging space.
The Token as a Tool
for Behavior Change
Coming up with a taxonomy of blockchain interventions and relevant tokens is a necessary task, but all blockchains that need to nudge behavior will have to have a token.
Consider supply chain management. Blockchains are increasingly being used as the ledger system for supply chain management. Supply chains are typically comprised of numerous actors packaging, shipping, receiving, applying quality control protocols to various goods, all with their own ledgers of the relevant goods as they snake their way through the supply chain. This leads to ample opportunities for fraud, theft and high costs associated with reconciling the different ledgers of the different actors at different points in the supply chain. Using the blockchain as the common ledger system, many of these costs are diminished as a single ledger is used with trusted data, hence transactions (shipping, receiving, repackaging, etc.) can happen more seamlessly and reconciliation costs drop.
However even in “simple” applications such as this there are behavior change implications. We still want the supply chain actors to perform their functions in a manner that adds value to the supply chain ecosystem as a whole, rewarding them for good behavior within the ecosystem and punishing for bad.
What if those shippers trying to pass on a faulty product had
already deposited a certain value of currency in an escrow account (housed in a
contract on the blockchain)? Meaning that if they are found to be
attempting a prohibited behavior (passing on faulty products) they surrender a
certain amount automatically from the escrow account in the blockchain smart
contract. How much should be deposited
in the escrow account? What is the ratio
between the degree of punishment and undesired action? These are behavior questions around a
mechanism of change that are dimensions of current intervention designs and will
be increasingly relevant in token design.
The point of this is to demonstrate that even “benign”
applications of the blockchain, like supply chain management, have behavior
change implications and thus require good due diligence in token design.
There is a lot that could be said about the validation function
of this process, who validates that the bad behavior has taken place and should
be punished or that good behavior should be rewarded? There are lessons to be learned from results
based contracting and the role of the validator in such a contracting
vehicle. This “validating” function will
need to be thought out in terms of what can be automated and what needs a
“human touch” (and who is responsible, what methods they should use,
Implications for MERL
If tokens are fundamental to MERL strategies for blockchain
interventions, there are several critical implications:
MERL practitioners will need to be heavily integrated into the due diligence processes and tools for token design
MERL strategies will need to be highly formative, if not developmental, in facilitating the timeliness and overall effectiveness of the feedback loops informing token design
New thinking and tools will need to be developed to assess the relationships between blockchain governance, token design and mechanisms of change in the resulting social ecosystem.
The opportunity cost for impact and “learning” could go up the less MERL practitioners are integrated into the due diligence of token design. This is because the costs to adapt token design are relatively low compared to current social interventions, partly due to the ability to integrate automated feedback.
Blockchain based interventions present us with significant learning opportunities due to our ability to use the technology itself as a data collection/management tool in learning about what does and does not work. Feedback from an appropriate MERL strategy could inform decision making around token design that could be coded into the token on an iterative basis. For example as incentives of stakeholder’s shift (i.e. supply chain shippers incur new costs and their value proposition changes) token adaptation can respond in a timely fashion so long as the MERL feedback that informs the token design is accurate.
There is need to determine what components of these feedback
loops can be completed by automated functions and what requires a “human
touch”. For example, what dimensions of
token design can be informed by smart infrastructure (i.e. temp gauges on
shipping containers in the supply chain) versus household surveys completed by
enumerators? This will be a task to
complete and iteratively improve starting with initial token design and lasting
through the lifecycle of the intervention.
Token design dimensions, outlined in the Token Utility Canvas, and decision-making
will need to result in MERL questions that are correlated to the best strategy
to answer them, automated or human, much the same as we do now in current
While many of our current due diligence tools used in both
intervention and evaluation design (things like stakeholder mapping, problem
analysis, cost benefit analysis, value propositions, etc.), will need to be
adapted to the type of relationships that are within a tokenized eco-systems. These include the relationships of influence
between the social eco-system as well as the blockchain ledger itself (or more
specifically the governance of that ledger) as demonstrated in figure #1.
This could be our, as MERL practitioners, biggest priority. While blockchain interventions could create incredible opportunities for social experimentation, the need for human centered due diligence (incentivizing humans for positive behavior change) in token design is critical. Over reliance on the technology to drive social outcomes is already a well evidenced opportunity cost that could be avoided with blockchain-based solutions if the gap between technologists, social scientists and practitioners can be bridged.
Guest post by Michael Cooper, a former DoS, MCC Associate Director for Policy and Evaluation who now runs Emergence. Mike advises numerous donors, private clients and foundations on program design, MEL, adaptive management and other analytical functions.
International development projects using the blockchain in
some way are increasing at a rapid
rate and our window for developing evidence around what does and does not
work (and more importantly why) is narrow before we run into un-intended
consequences. Given that blockchain is a
highly disruptive technology, these un-intended consequences could be significant,
creating a higher urgency to generate the evidence to guide how we design and
evaluate blockchain applications.
Our window for developing evidence around what does and does not work (and more importantly why) is narrow before we run into un-intended consequences.
To inform this discussion, Emergence has put out a working
paper that outlines 1.) what the blockchain is, 2.) how it can be used to
leverage behavior change outcomes in international development projects and 3.)
the implications for how we could design and evaluate blockchain based
interventions. The paper utilizes systems
and behaviorism principles in comparing how we currently design behavior change
interventions to how we could design/evaluate the same interventions using the
blockchain. This article summarizes the
main points of the paper and its conclusions to generate discussion around how
to best produce the evidence we need to fully realize the potential of
blockchain interventions for social impact.
Given the scope of possibilities surrounding the blockchain,
both in how it could be used and in the impact it could leverage, the
implications for how MEL is conducted are significant. The time is long gone where value adding MEL practitioners
are not involved in intervention design.
Blockchain based interventions will require additional integration of
MEL skill sets in the early design phases since so much will need to be
“tested” to determine what is and is not working. While rigid statistical evaluations will
needed for some of these blockchain based interventions, the level of
complexity involved and the lack of an evidence base indicate that more
flexible, adaptive and more formative MEL approaches will be needed. The more these approaches are proactive and
involved in intervention design, the more frequent and informative the feedback
loops will be into our evidence base.
The Blockchain as a Decentralizing
At its core, the blockchain is just a ledger but the
importance of ledgers in how society functions cannot
be understated. Ledgers, and the
control of them, are crucial in how supply chains are managed, financial
transactions are conducted, how data is shared, etc. Control of ledgers is a primary factor in
limiting access to life changing goods and services, especially for the worlds’
poor. In part, the discussion over decentralization
is essentially a discussion over who owns and how ledgers are managed.
has been a prominent theme in international development and there is strong
evidence of its positive impact across various sectors, especially regarding
local service delivery. One of the
primary value adds of decentralization is empowering those further from traditional
concentrations of power to have more authority over the problems that impact
them. As a decentralizing technology,
the blockchain holds a lot of potential in reaching these same impacts from
decentralization (empowerment, etc.) in a more efficient and effective manner partly
due to its ability to better align interests around common problems. With better aligned interests, less resources
(inputs) are needed to try and facilitate a desired behavior change.
Up until now, efforts of international development actors have
focused on “nudging” behavior change amongst stakeholders and in very rare
cases, such as in results based financing, give loosely defined parameters to
implementers with less emphasis on the manner in which outcomes are
achieved. Both of these approaches are
relevant in the design and testing of blockchain based interventions but they
will be integrated in unique new ways that will require new thinking and skills
sets amongst practitioners.
Current Designing and
Evaluating for Behavior Change
MEL usually starts with the relevant theory of change,
namely what mechanisms bring about targeted behavior change and how. Recent years have seen a focus on how
behavior change is achieved through an understanding
of mindsets and how they can be nudged
to achieve a social outcome. However the
international development space has recognized the limitations of designing
interventions that attempt to nudge behavior change. These limitations center around the level of
complexity involved, the inability to recognize and manage this complexity and lack
of awareness about the root causes of problems.
Hence the rise in things like results
based financing where the type of prescribed top-down causal pathway
(usually laid out in a theory of change) is not as heavily emphasized as in
more traditional interventions. Donors
using this approach can still mandate certain principles of implementation
(such as the inclusion of vulnerable populations, environmental safeguards,
timelines, etc.) but there is much more flexibility to create a causal pathway
to achieve the outcome.
Or, for example, take the popular PDIA approach where the focus is on
iteratively identifying and solving problems encountered on the pathway to
reform. These efforts do not start with
a mandated theory of change, but instead start with generally described
targeted outcomes and then the pathway to those outcomes is iteratively
created, similar to what Lant Pritchett has called “crawling
the design space”. Such an approach
has large overlaps with adaptive management practices and other more
integrative MEL frameworks and could lend themselves to how blockchain based
interventions are designed, implemented and evaluated.
How the Blockchain
Could Achieve Outcomes and Implications for MEL
Because of its decentralizing
effects, any theory of change for a blockchain based intervention could
include some possible common attributes that influence how outcomes are
Empowerment of those closest to problems to
inform the relevant solutions
Alleviation of traditional intermediary services
and relevant third party actors
Assessing these three attributes, and how they influence
outcomes, could be the foundation of any appropriate MEL strategy for a
blockchain-based intervention. This is
because these attributes are the “value add” of a blockchain-based
intervention. For example, traditional
financial inclusion interventions may seek to extend financial services of a
bank to rural areas through digital money, extension agents, etc. A blockchain-based solution, however, may cut
out the bank entirely and empower local communities to receive financial
services from completely new providers from anywhere in the world on much more
affordable terms in and in a much more convenient manner. Such a solution could see an alignment of
interests amongst producers and consumers of these services since the new
relationships are mutually serving.
Because of this alignment there is a less of a need, or even less of a
benefit, of having donors script out the causal pathway for the outcomes to be
achieved. Because of this alignment of
interests, those closest to the problem(s) and solutions can work it out
because it is in their interest to do so.
Hence while a MEL framework for such a project could still use more standardized measures around outcomes like increased access to financial services and could even use statistical methods to evaluate questions around attributable changes in poverty status; there will need to be adaptive and formative MEL that assess the dynamics of these attributes given their criticality to whether and how outcomes could be achieved. The dynamics between these attributes and the surrounding social eco-system have the potential to be very fluid (going back to the disruptive nature of blockchain technology), hence flexible MEL will be required to respond to new trends as they emerge.
Table: Blockchain Intervention Attributes and the Skill Sets
to Assess Them
Empowerment of those closest to problems to inform the
Problem driven design and MEL approach,
stakeholder mapping (to identify relevant actors) Decentralization focused MEL (MEL that focuses
on outcomes associated with decentralization)
Alignment of interests
Political economy analysis to identify
incentives and interests Adaptive MEL to assess shifting alignment of interest
between various actors
Alleviation of traditional intermediary services
Political economy analysis to inform risk
mitigation strategy for potential spoilers and relevant MEL
While there will need to be standard accountability and
other uses, feedback from an appropriate MEL strategy could have two primary
end uses in a blockchain based intervention: governance and trust.
The Role of
Governance and Trust
governance sets outs the rules for how consensus (ie. agreement) is achieved
for deciding what transactions are valid on a blockchain. While this may sound mundane it is critical
for achieving outcomes since how the blockchain is governed decides how well
those closest to the problems are empowered to identify and achieve solutions
and aligned interests. Hence the governance framework for the blockchain will
need to be informed by an appropriate MEL strategy. A giant learning gap we currently have is how
to iteratively adapt blockchain governance structures, using MEL feedback, into
increasingly more efficient versions.
Closing this gap will be critical to assessing the cost effectiveness of
blockchain based solutions over other solutions (ie. alternatives/cost benefit
analysis tools) as well as maximizing impact.
A giant learning gap we currently have is how to iteratively adapt blockchain governance structures, using MEL feedback, into increasingly more efficient versions.
Another focus of an appropriate MEL strategy would be to
facilitate trust in the blockchain-based solution amongst users much the same
as other technology-led solutions like mobile money or pay as you go metering
for service delivery. This includes not
only the digital interface between the user and the technology (a phone app,
SMS or other interface) but other dimensions of “trust” that would facilitate
uptake of the technology. These
dimensions of trust would be informed by an analysis of the barriers to uptake
of the technology amongst intended users, given it could be an entirely new
service for beneficiaries or an old service delivered in a new fashion. There is already a good evidence base around
what works in this area (ie. marketing and communication tools for digital
financial services, assistance in completing registration paperwork for pay as
you go metering, etc.).
The Road Ahead
There is A LOT we need to learn and a short time to do it in
before we feel the negative effects from a lack of preparedness. This risk is heightened when you consider
that the international development industry has a poor
track record of designing and evaluating technology-led solutions
(primarily due to the fact that these projects usually neglect uptake of the
technology and operate on the assumption that the technology will drive
outcomes instead of users using the technology as a tool to drive the
The lessons from MEL in results based financing could be
especially informative to the future of evaluating blockchain-based solutions
given their similarities in letting solutions work themselves out and the role
of the “validator” in ensuring outcomes are achieved. In fact the blockchain has already
been used in this role in some simple output based programming.
As alluded to, pre-existing MEL skill sets can add a lot of
value to building an evidence base but MEL practitioners will need to develop a
greater understanding of the attributes of blockchain technology, otherwise our
MEL strategies will not be suited to blockchain based programming.
by Isaac D. Castillo, Director of Outcomes, Assessment, and Learning at Venture Philanthropy Partners.
Evaluators don’t make mistakes.
Or do they?
Well, actually, they do. In fact, I’ve got a number of fantastic failures under my belt that turned into important learning opportunities. So, when I was asked to share my experience at the MERL Tech DC 2018 session on failure, I jumped at the chance.
Part of the Problem
As someone of Mexican descent, I am keenly aware of the problems that can arise when culturally and linguistically inappropriate evaluation practices are used. However, as a young evaluator, I was often part of the problem.
Early in my evaluation career, I was tasked with collecting data to determine why teenage youth became involved in gangs. In addition to developing the interview guides, I was also responsible for leading all of the on-site interviews in cities with large Latinx populations. Since I am Latinx, I had a sufficient grasp of Spanish to prepare the interview guides and conduct the interviews. I felt confident that I would be sensitive to all of the cultural and linguistic challenges to ensure an effective data collection process. Unfortunately, I had forgotten an important tenet of effective culturally competent evaluation: cultures and languages are not monolithic. Differences in regional cultures or dialects can lead even experienced evaluators into embarrassment, scorn, or the worst outcome of all: inaccurate data.
Sentate, Por Favor
For example, when first interacting with the gang members, I introduced myself and asked them to “Please sit down,” to start the interview by saying “Siéntate, por favor.” What I did not know at the time is that a large portion of the gang members I was interviewing were born in El Salvador or were of Salvadoran descent, and the accurate way to say it using Salvadoran Spanish would have been, “Sentate, por favor.”
Does one word make that much difference? In most cases it did not matter, but it caused several gang members to openly question my Spanish from the outset, which created an uncomfortable beginning to interviews about potentially sensitive subjects.
Amigo or Chero?
I next asked the gang members to think of their “friends.” In most dialects of Spanish, using amigos to ask about friends is accurate and proper. However, in the context of street slang, some gang members prefer the term chero, especially in informal contexts.
Again, was this a huge mistake? No. But it did lead to enough quizzical looks and requests for clarification that started to doubt if I was getting completely honest or accurate answers from some of the respondents. Unfortunately, this error did not arise until I had conducted nearly 30 interviews. I had not thought to test the wordings of the questions in multiple Spanish-speaking communities across several states.
Would You Like a Concha?
Perhaps my most memorable mistake during this evaluation occurred after I had completed an interview with a gang leader outside of a bakery. After we were done, the gang leader called over the rest of his gang to meet me. As I was meeting everyone, I glanced inside the bakery and noticed a type of Mexican pastry that I enjoyed as a child. I asked the gang leader if he would like to go inside and join me for a concha, a round pastry that looks like a shell. Everyone (except me) began to laugh hysterically. The gang leader then let me in on the joke. He understood that I was asking about the pan dulce (sweet bread), but he informed me that in his dialect, concha was used as a vulgar reference to female genitalia. This taught me a valuable lesson about how even casual references or language choices can be interpreted in many different ways.
What did I learn from this?
While I can look back on these mistakes and laugh, I am also reminded of the important lessons learned that I carry with me to this day.
Translate with the local context in mind. When translating materials
or preparing for field work, get a detailed sense of who you will be collecting data from, including what cultures and subgroups people represent and whether or not there are specific topics or words that should be avoided.
Translate with the local population in mind. When developing data collection tools (in any language, even if you are fluent in it), take the time to pre-test the language in the tools.
Be okay with your inevitable mistakes. Recognize that no matter how much preparation you do, you will make mistakes in your data collection related to culture and language issues. Remember it is how you respond in those situations that is most important.
Digitization is everywhere! Digital technologies and data have changed the way we engage with each other and how we work. We cannot escape the effects of digitization. Whether in our personal capacity — how our own data is being used — or in our professional capacity, in terms of understanding how to use data and technology. These changes are exciting! But we also need to consider the challenges they present to the MERL community and their impact on development.
The advent and proliferation of big data has the potential to change how evaluations are conducted. New skills are needed to process and analyse big data. Mathematics, statistics and analytical skills will be ever more important. As evaluators, we need to be discerning about the data we use. In a world of copious amounts of data, we need to ensure we have the ability to select the right data to answer our evaluation questions.
We also have an ethical and moral duty to manage data responsibly. We need new strategies and tools to guide the ways in which we collect, store, use and report data. Evaluators need to improve our skills as related to processing and analysing data. Evaluative thinking in the digital age is evolving and we need to consider the technical and soft skills required to maintain integrity of the data and interpretation thereof.
Though technology can make data collection faster and cheaper, two important considerations are access to technology by vulnerable groups and data integrity. Women, girls and people in rural areas normally do not have the same levels of access to technology as men and boys This impacts on our ability to rely solely on technology to collect data from these population groups, because we need to be aware of inclusion, bias and representativity. Equally we need to consider how to maintain the quality of data being collected through new technologies such as mobile phones and to understand how the use of new devices might change or alter how people respond.
In a rapidly changing world where technologies such as AI, Blockchain, Internet of Things, drones and machine learning are on the horizon, evaluators need to be robust and agile in how we change and adapt.
For this reason, a new strand has been introduced at the African Evaluation Association (AfrEA) conference, taking place from 11 – 15 March 2019 in Abidjan, Cote d’Ivoire. This stream, The Fourth Industrial Revolution and its Impact on Development: Implications for Evaluation, will focus on five sub-themes:
Guide to Industry 4.0 and Next Generation Tech
Talent and Skills in Industry 4.0
Changing World of Work
Evaluating youth programmes in Industry 4.0
Genesis Analytics will be curating this strand. We are excited to invite experts working in digital development and practitioners at the forefront of technological innovation for development and evaluation to submit abstracts for this strand.