Guest post from Jo Kaybryn, an international development consultant currently directing evaluation frameworks, evaluation quality assurance services, and leading evaluations for UN agencies and INGOs.
“Upping the Ex Ante” is a series of articles aimed at evaluators in international development exploring how our work is affected by – and affects – digital data and technology. I’ve been having lots of exciting conversations with people from all corners of the universe about our brave new world. But I’ve also been conscious that for those who have not engaged a lot with the rapid changes in technologies around us, it can be a bit daunting to know where to start. These articles explore a range of technologies and innovations against the backdrop of international development and the particular context of evaluation. For readers not yet well versed in technology there are lots of sources to do further research on areas of interest.
series is half way through, with 4 articles published.
in Part 1 the series has gone back to the olden days (1948!) to consider the
origin story of cybernetics and the influences that are present right now in
algorithms and big data. The philosophical and ethical dilemmas are a recurring
theme in later articles.
examines the problems of distance which is something that technology offers
huge strides forwards in, and yet it remains never fully solved, with a
discussion on what blockchains mean for the veracity of data.
considers qualitative data and shines a light on the gulf between our digital
data-centric and analogue-centric worlds and the need for data scientists and social
scientists to cooperate to make sense of it.
looks at quantitative data and the implications for better decision making, why
evaluators really don’t like an algorithmic “black box”; and reflections on how humans’
assumptions and biases leak into our technologies whether digital or analogue.
few articles will see a focus on ethics, psychology and bias; a case study on a
hypothetical machine learning intervention to identify children at risk of
maltreatment (lots more risk and ethical considerations), and some thoughts about putting it all
in perspective (i.e. Don’t
FHI 360 Academy Hall, 8th Floor 1825 Connecticut Avenue NW Washington, DC 20009
We gathered at the first MERL Tech Conference in 2014 to discuss how technology was enabling the field of monitoring, evaluation, research and learning (MERL). Since then, rapid advances in technology and data have altered how most MERL practitioners conceive of and carry out their work. New media and ICTs have permeated the field to the point where most of us can’t imagine conducting MERL without the aid of digital devices and digital data.
The rosy picture of the digital data revolution and an expanded capacity for decision-making based on digital data and ICTs has been clouded, however, with legitimate questions about how new technologies, devices, and platforms — and the data they generate — can lead to unintended negative consequences or be used to harm individuals, groups and societies.
Join us in Washington, DC, on September 5-6 for this year’s MERL Tech Conference where we’ll be taking stock of changes in the space since 2014; showcasing promising technologies, ideas and case studies; sharing learning and challenges; debating ideas and approaches; and sketching out a vision for an ideal MERL future and the steps we need to take to get there.
Tech and traditional MERL: How is digital technology enabling us to do what we’ve always done, but better (consultation, design, community engagement, data collection and analysis, databases, feedback, knowledge management)? What case studies can be shared to help the wider sector learn and grow? What kinks do we still need to work out? What evidence base exists that can support us to identify good practices? What lessons have we learned? How can we share these lessons and/or skills with the wider community?
Data, data, and more data: How are new forms and sources of data allowing MERL practitioners to enhance their work? How are MERL Practitioners using online platforms, big data, digitized administrative data, artificial intelligence, machine learning, sensors, drones? What does that mean for the ways that we conduct MERL and for who conducts MERL? What concerns are there about how these new forms and sources of data are being used and how can we address them? What evidence shows that these new forms and sources of data are improving MERL (or not improving MERL)? What good practices can inform how we use new forms and sources of data? What skills can be strengthened and shared with the wider MERL community to achieve more with data?
Emerging tools and approaches: What can we do now that we’ve never done before? What new tools and approaches are enabling MERL practitioners to go the extra mile? Is there a use case for blockchain? What about facial recognition and sentiment analysis in MERL? What are the capabilities of these tools and approaches? What early cases or evidence is there to indicate their promise? What ideas are taking shape that should be tried and tested in the sector? What skills can be shared to enable others to explore these tools and approaches? What are the ethical implications of some of these emerging technological capabilities?
The Future of MERL: Where should we be going and what should the future of MERL look like? What does the state of the sector, of digital data, of technology, and of the world in which we live mean for an ideal future for the MERL sector? Where do we need to build stronger bridges for improved MERL? How should we partner and with whom? Where should investments be taking place to enhance MERL practices, skills and capacities? How will we continue to improve local ownership, diversity, inclusion and ethics in technology-enabled MERL? What wider changes need to happen in the sector to enable responsible, effective, inclusive and modern MERL?
Cross-cutting themes include diversity, inclusion, ethics and responsible data, and bridge-building across disciplines.
You’ll join some of the brightest minds working on MERL across a wide range of disciplines – evaluators, development and humanitarian MERL practitioners, small and large non-profit organizations, government and foundations, data scientists and analysts, consulting firms and contractors, technology developers, and data ethicists – for 2 days of in-depth sharing and exploration of what’s been happening across this multidisciplinary field and where we should be heading.
There is no real
evidence base about what does and does not work for applying blockchain
technology to interventions seeking social impacts. Most current blockchain interventions are
driven by developers (programmers) and visionary entrepreneurs. There is little
thinking in current blockchain interventions around designing for “social”
impact (there is an over abundant trust in technology to achieve the outcomes
and little focus on the humans interacting with the technology) and integrating
relevant evidence from behavioral economics, behavior change design, human
centered design, etc.
To build the needed evidence base, Monitoring, Evaluation, Research and Learning (MERL) practitioners will have to not only get to know the broad strokes of blockchain technology but the specifics of token design and tokenomics (the political economics of tokenized ecosystems). Token design could become the focal point for MERL on blockchain interventions since:
If not all, the vast majority of blockchain interventions will involve some type of desired behavior change
The token provides the link between the ledger (which is the blockchain) and the social ecosystem created by the token in which the behavior change is meant to happen
Hence the token is the “nudge” meant to leverage behavior change in the social ecosystem while governing the transactions on the blockchain ledger.
(While this blog will focus on these points, it will not go into a full discussion of what tokens are and how they create ecosystems. But there are some very good resources out there that do this which you can review at your leisure and to the degree that works for you. The Complexity Institute has published a book exploring the various attributes of complexity and main themes involved with tokenomics while Outlier Ventures has published, what I consider, to be the best guidance on token design. The Outlier Ventures guidance contains many of the tools MERL practitioners will be familiar with (problem analysis, stakeholder mapping, etc.) and should be consulted.)
Hence it could be that by understanding token design and its requirements and mapping it against our current MERL thinking, tools and practices, we can develop new thinking and tools that could be the beginning point in building our much-needed evidence base.
What is a “blockchain intervention”?
As MERL practitioners
we roughly define an “intervention” as a group of inputs and activities meant
to leverage outcomes within a given eco-system.
“Interventions” are what we are usually mandated to asses, evaluate and
When thinking about MERL and blockchain, it is useful to think of two categories of “blockchain interventions”.
1) Integrating the blockchain into MERL data collection, entry, management, analysis or dissemination practices and
2) MERL strategies for interventions using the blockchain in some way shape or form.
Here we will focus on the #2 and in so doing demonstrate that while the blockchain is an innovative, potentially disruptive technology, evaluating its applications on social outcomes is still an issue of assessing behavior change against dimensions of intervention design.
Designing for Behavior Change
We generally design
interventions (programs, projects, activities) to “nudge” a certain type of behavior (stated as
outcomes in a theory of change) amongst a certain population (beneficiaries,
stakeholders, etc.). We often attempt to
integrate mechanisms of change into our intervention design, but often do not
for a variety of reasons (lack of understanding, lack of resources, lack of
political will, etc.). This lack of due
diligence in design is partly responsible for the lack of evidence around what
works and what does not work in our current universe of interventions.
Enter blockchain technology, which as MERL practitioners, we will be responsible for assessing in the foreseeable future. Hence, we will need to determine how interventions using the blockchain attempt to nudge behavior, what behaviors they seek to nudge, amongst whom, when and how well the design of the intervention accomplishes these functions. In order to do that we will need to better understand how blockchains use tokens to nudge behavior.
The Centrality of the Token
We have all used tokens before. Stores issue coupons that can only be used at those stores, we get receipts for groceries as soon as we pay, arcades make you buy tokens instead of just using quarters. The coupons and arcade tokens can be considered utility tokens, meaning that they can only be used in a specific “ecosystem” which in this case is a store and arcade respectively. The grocery store receipt is a token because it demonstrates ownership, if you are stopped on the way out the store and you show your receipt you are demonstrating that you now have rights to ownership over the foodstuffs in your bag.
Whether you realize
it or not at the time, these tokens are trying to nudge your behavior. The store gives you the coupon because the
more time you spend in their store trying to redeem coupons, the greatly
likelihood you will spend additional money there. The grocery store wants you to pay for all
your groceries while the arcade wants you to buy more tokens than you end up
If needed, we could design
MERL strategies to assess how well these different tokens nudged the desired
behaviors. We would do this, in part, by thinking about how each token is
designed relative to the behavior it wants (i.e. the value, frequency and
duration of coupons, etc.).
Thinking about these ecosystems and their respective tokens will help us understand the interdependence between 1) the blockchain as a ledger that records transactions, 2) the token that captures the governance structures for how transactions are stored on the blockchain ledger as well as the incentive models for 3) the mechanisms of change in the social eco-system created by the token.
Figure #1: The inter-relationship between the blockchain
(ledger), token and social eco-system
Token Design as Intervention Design
Just as we assess
theories of change and their mechanisms against intervention design, we will
assess blockchain based interventions against their token design in much the
same way. This is because blockchain
tokens capture all the design dimensions of an intervention; namely the problem
to be solved, stakeholders and how they influence the problem (and thus the
solution), stakeholder attributes (as mapped out in something like a
stakeholder analysis), the beneficiary population, assumptions/risks, etc.
Outlier Ventures has adapted what they call a Token
Utility Canvas as a milestone in
their token design process. The canvas
can be correlated to the various dimensions of an evaluability
assessment tool (I am using the evaluability
assessment tool as a demonstration of the necessary dimensions of an
interventions design, meaning that the evaluability assessment tool assesses
the health of all the components of an intervention design). The Token Utility Canvas is a useful
milestone in the token design process that captures many of the problem
diagnostic, stakeholder assessment and other due diligence tools that are
familiar to MERL practitioners who have seen them used in intervention
design. Hence token design could be
largely thought of as intervention design and evaluated as such.
Comparing Token Design with Dimensions of Program Design (as represented in an
This table is not meant to be exhaustive and not all of the fields will be explained here but in general, it could be a useful starting point in developing our own thinking and tools for this emerging space.
The Token as a Tool
for Behavior Change
Coming up with a taxonomy of blockchain interventions and relevant tokens is a necessary task, but all blockchains that need to nudge behavior will have to have a token.
Consider supply chain management. Blockchains are increasingly being used as the ledger system for supply chain management. Supply chains are typically comprised of numerous actors packaging, shipping, receiving, applying quality control protocols to various goods, all with their own ledgers of the relevant goods as they snake their way through the supply chain. This leads to ample opportunities for fraud, theft and high costs associated with reconciling the different ledgers of the different actors at different points in the supply chain. Using the blockchain as the common ledger system, many of these costs are diminished as a single ledger is used with trusted data, hence transactions (shipping, receiving, repackaging, etc.) can happen more seamlessly and reconciliation costs drop.
However even in “simple” applications such as this there are behavior change implications. We still want the supply chain actors to perform their functions in a manner that adds value to the supply chain ecosystem as a whole, rewarding them for good behavior within the ecosystem and punishing for bad.
What if those shippers trying to pass on a faulty product had
already deposited a certain value of currency in an escrow account (housed in a
contract on the blockchain)? Meaning that if they are found to be
attempting a prohibited behavior (passing on faulty products) they surrender a
certain amount automatically from the escrow account in the blockchain smart
contract. How much should be deposited
in the escrow account? What is the ratio
between the degree of punishment and undesired action? These are behavior questions around a
mechanism of change that are dimensions of current intervention designs and will
be increasingly relevant in token design.
The point of this is to demonstrate that even “benign”
applications of the blockchain, like supply chain management, have behavior
change implications and thus require good due diligence in token design.
There is a lot that could be said about the validation function
of this process, who validates that the bad behavior has taken place and should
be punished or that good behavior should be rewarded? There are lessons to be learned from results
based contracting and the role of the validator in such a contracting
vehicle. This “validating” function will
need to be thought out in terms of what can be automated and what needs a
“human touch” (and who is responsible, what methods they should use,
Implications for MERL
If tokens are fundamental to MERL strategies for blockchain
interventions, there are several critical implications:
MERL practitioners will need to be heavily integrated into the due diligence processes and tools for token design
MERL strategies will need to be highly formative, if not developmental, in facilitating the timeliness and overall effectiveness of the feedback loops informing token design
New thinking and tools will need to be developed to assess the relationships between blockchain governance, token design and mechanisms of change in the resulting social ecosystem.
The opportunity cost for impact and “learning” could go up the less MERL practitioners are integrated into the due diligence of token design. This is because the costs to adapt token design are relatively low compared to current social interventions, partly due to the ability to integrate automated feedback.
Blockchain based interventions present us with significant learning opportunities due to our ability to use the technology itself as a data collection/management tool in learning about what does and does not work. Feedback from an appropriate MERL strategy could inform decision making around token design that could be coded into the token on an iterative basis. For example as incentives of stakeholder’s shift (i.e. supply chain shippers incur new costs and their value proposition changes) token adaptation can respond in a timely fashion so long as the MERL feedback that informs the token design is accurate.
There is need to determine what components of these feedback
loops can be completed by automated functions and what requires a “human
touch”. For example, what dimensions of
token design can be informed by smart infrastructure (i.e. temp gauges on
shipping containers in the supply chain) versus household surveys completed by
enumerators? This will be a task to
complete and iteratively improve starting with initial token design and lasting
through the lifecycle of the intervention.
Token design dimensions, outlined in the Token Utility Canvas, and decision-making
will need to result in MERL questions that are correlated to the best strategy
to answer them, automated or human, much the same as we do now in current
While many of our current due diligence tools used in both
intervention and evaluation design (things like stakeholder mapping, problem
analysis, cost benefit analysis, value propositions, etc.), will need to be
adapted to the type of relationships that are within a tokenized eco-systems. These include the relationships of influence
between the social eco-system as well as the blockchain ledger itself (or more
specifically the governance of that ledger) as demonstrated in figure #1.
This could be our, as MERL practitioners, biggest priority. While blockchain interventions could create incredible opportunities for social experimentation, the need for human centered due diligence (incentivizing humans for positive behavior change) in token design is critical. Over reliance on the technology to drive social outcomes is already a well evidenced opportunity cost that could be avoided with blockchain-based solutions if the gap between technologists, social scientists and practitioners can be bridged.
Guest post by Michael Cooper, a former DoS, MCC Associate Director for Policy and Evaluation who now runs Emergence. Mike advises numerous donors, private clients and foundations on program design, MEL, adaptive management and other analytical functions.
International development projects using the blockchain in
some way are increasing at a rapid
rate and our window for developing evidence around what does and does not
work (and more importantly why) is narrow before we run into un-intended
consequences. Given that blockchain is a
highly disruptive technology, these un-intended consequences could be significant,
creating a higher urgency to generate the evidence to guide how we design and
evaluate blockchain applications.
Our window for developing evidence around what does and does not work (and more importantly why) is narrow before we run into un-intended consequences.
To inform this discussion, Emergence has put out a working
paper that outlines 1.) what the blockchain is, 2.) how it can be used to
leverage behavior change outcomes in international development projects and 3.)
the implications for how we could design and evaluate blockchain based
interventions. The paper utilizes systems
and behaviorism principles in comparing how we currently design behavior change
interventions to how we could design/evaluate the same interventions using the
blockchain. This article summarizes the
main points of the paper and its conclusions to generate discussion around how
to best produce the evidence we need to fully realize the potential of
blockchain interventions for social impact.
Given the scope of possibilities surrounding the blockchain,
both in how it could be used and in the impact it could leverage, the
implications for how MEL is conducted are significant. The time is long gone where value adding MEL practitioners
are not involved in intervention design.
Blockchain based interventions will require additional integration of
MEL skill sets in the early design phases since so much will need to be
“tested” to determine what is and is not working. While rigid statistical evaluations will
needed for some of these blockchain based interventions, the level of
complexity involved and the lack of an evidence base indicate that more
flexible, adaptive and more formative MEL approaches will be needed. The more these approaches are proactive and
involved in intervention design, the more frequent and informative the feedback
loops will be into our evidence base.
The Blockchain as a Decentralizing
At its core, the blockchain is just a ledger but the
importance of ledgers in how society functions cannot
be understated. Ledgers, and the
control of them, are crucial in how supply chains are managed, financial
transactions are conducted, how data is shared, etc. Control of ledgers is a primary factor in
limiting access to life changing goods and services, especially for the worlds’
poor. In part, the discussion over decentralization
is essentially a discussion over who owns and how ledgers are managed.
has been a prominent theme in international development and there is strong
evidence of its positive impact across various sectors, especially regarding
local service delivery. One of the
primary value adds of decentralization is empowering those further from traditional
concentrations of power to have more authority over the problems that impact
them. As a decentralizing technology,
the blockchain holds a lot of potential in reaching these same impacts from
decentralization (empowerment, etc.) in a more efficient and effective manner partly
due to its ability to better align interests around common problems. With better aligned interests, less resources
(inputs) are needed to try and facilitate a desired behavior change.
Up until now, efforts of international development actors have
focused on “nudging” behavior change amongst stakeholders and in very rare
cases, such as in results based financing, give loosely defined parameters to
implementers with less emphasis on the manner in which outcomes are
achieved. Both of these approaches are
relevant in the design and testing of blockchain based interventions but they
will be integrated in unique new ways that will require new thinking and skills
sets amongst practitioners.
Current Designing and
Evaluating for Behavior Change
MEL usually starts with the relevant theory of change,
namely what mechanisms bring about targeted behavior change and how. Recent years have seen a focus on how
behavior change is achieved through an understanding
of mindsets and how they can be nudged
to achieve a social outcome. However the
international development space has recognized the limitations of designing
interventions that attempt to nudge behavior change. These limitations center around the level of
complexity involved, the inability to recognize and manage this complexity and lack
of awareness about the root causes of problems.
Hence the rise in things like results
based financing where the type of prescribed top-down causal pathway
(usually laid out in a theory of change) is not as heavily emphasized as in
more traditional interventions. Donors
using this approach can still mandate certain principles of implementation
(such as the inclusion of vulnerable populations, environmental safeguards,
timelines, etc.) but there is much more flexibility to create a causal pathway
to achieve the outcome.
Or, for example, take the popular PDIA approach where the focus is on
iteratively identifying and solving problems encountered on the pathway to
reform. These efforts do not start with
a mandated theory of change, but instead start with generally described
targeted outcomes and then the pathway to those outcomes is iteratively
created, similar to what Lant Pritchett has called “crawling
the design space”. Such an approach
has large overlaps with adaptive management practices and other more
integrative MEL frameworks and could lend themselves to how blockchain based
interventions are designed, implemented and evaluated.
How the Blockchain
Could Achieve Outcomes and Implications for MEL
Because of its decentralizing
effects, any theory of change for a blockchain based intervention could
include some possible common attributes that influence how outcomes are
Empowerment of those closest to problems to
inform the relevant solutions
Alleviation of traditional intermediary services
and relevant third party actors
Assessing these three attributes, and how they influence
outcomes, could be the foundation of any appropriate MEL strategy for a
blockchain-based intervention. This is
because these attributes are the “value add” of a blockchain-based
intervention. For example, traditional
financial inclusion interventions may seek to extend financial services of a
bank to rural areas through digital money, extension agents, etc. A blockchain-based solution, however, may cut
out the bank entirely and empower local communities to receive financial
services from completely new providers from anywhere in the world on much more
affordable terms in and in a much more convenient manner. Such a solution could see an alignment of
interests amongst producers and consumers of these services since the new
relationships are mutually serving.
Because of this alignment there is a less of a need, or even less of a
benefit, of having donors script out the causal pathway for the outcomes to be
achieved. Because of this alignment of
interests, those closest to the problem(s) and solutions can work it out
because it is in their interest to do so.
Hence while a MEL framework for such a project could still use more standardized measures around outcomes like increased access to financial services and could even use statistical methods to evaluate questions around attributable changes in poverty status; there will need to be adaptive and formative MEL that assess the dynamics of these attributes given their criticality to whether and how outcomes could be achieved. The dynamics between these attributes and the surrounding social eco-system have the potential to be very fluid (going back to the disruptive nature of blockchain technology), hence flexible MEL will be required to respond to new trends as they emerge.
Table: Blockchain Intervention Attributes and the Skill Sets
to Assess Them
Empowerment of those closest to problems to inform the
Problem driven design and MEL approach,
stakeholder mapping (to identify relevant actors) Decentralization focused MEL (MEL that focuses
on outcomes associated with decentralization)
Alignment of interests
Political economy analysis to identify
incentives and interests Adaptive MEL to assess shifting alignment of interest
between various actors
Alleviation of traditional intermediary services
Political economy analysis to inform risk
mitigation strategy for potential spoilers and relevant MEL
While there will need to be standard accountability and
other uses, feedback from an appropriate MEL strategy could have two primary
end uses in a blockchain based intervention: governance and trust.
The Role of
Governance and Trust
governance sets outs the rules for how consensus (ie. agreement) is achieved
for deciding what transactions are valid on a blockchain. While this may sound mundane it is critical
for achieving outcomes since how the blockchain is governed decides how well
those closest to the problems are empowered to identify and achieve solutions
and aligned interests. Hence the governance framework for the blockchain will
need to be informed by an appropriate MEL strategy. A giant learning gap we currently have is how
to iteratively adapt blockchain governance structures, using MEL feedback, into
increasingly more efficient versions.
Closing this gap will be critical to assessing the cost effectiveness of
blockchain based solutions over other solutions (ie. alternatives/cost benefit
analysis tools) as well as maximizing impact.
A giant learning gap we currently have is how to iteratively adapt blockchain governance structures, using MEL feedback, into increasingly more efficient versions.
Another focus of an appropriate MEL strategy would be to
facilitate trust in the blockchain-based solution amongst users much the same
as other technology-led solutions like mobile money or pay as you go metering
for service delivery. This includes not
only the digital interface between the user and the technology (a phone app,
SMS or other interface) but other dimensions of “trust” that would facilitate
uptake of the technology. These
dimensions of trust would be informed by an analysis of the barriers to uptake
of the technology amongst intended users, given it could be an entirely new
service for beneficiaries or an old service delivered in a new fashion. There is already a good evidence base around
what works in this area (ie. marketing and communication tools for digital
financial services, assistance in completing registration paperwork for pay as
you go metering, etc.).
The Road Ahead
There is A LOT we need to learn and a short time to do it in
before we feel the negative effects from a lack of preparedness. This risk is heightened when you consider
that the international development industry has a poor
track record of designing and evaluating technology-led solutions
(primarily due to the fact that these projects usually neglect uptake of the
technology and operate on the assumption that the technology will drive
outcomes instead of users using the technology as a tool to drive the
The lessons from MEL in results based financing could be
especially informative to the future of evaluating blockchain-based solutions
given their similarities in letting solutions work themselves out and the role
of the “validator” in ensuring outcomes are achieved. In fact the blockchain has already
been used in this role in some simple output based programming.
As alluded to, pre-existing MEL skill sets can add a lot of
value to building an evidence base but MEL practitioners will need to develop a
greater understanding of the attributes of blockchain technology, otherwise our
MEL strategies will not be suited to blockchain based programming.
by Isaac D. Castillo, Director of Outcomes, Assessment, and Learning at Venture Philanthropy Partners.
Evaluators don’t make mistakes.
Or do they?
Well, actually, they do. In fact, I’ve got a number of fantastic failures under my belt that turned into important learning opportunities. So, when I was asked to share my experience at the MERL Tech DC 2018 session on failure, I jumped at the chance.
Part of the Problem
As someone of Mexican descent, I am keenly aware of the problems that can arise when culturally and linguistically inappropriate evaluation practices are used. However, as a young evaluator, I was often part of the problem.
Early in my evaluation career, I was tasked with collecting data to determine why teenage youth became involved in gangs. In addition to developing the interview guides, I was also responsible for leading all of the on-site interviews in cities with large Latinx populations. Since I am Latinx, I had a sufficient grasp of Spanish to prepare the interview guides and conduct the interviews. I felt confident that I would be sensitive to all of the cultural and linguistic challenges to ensure an effective data collection process. Unfortunately, I had forgotten an important tenet of effective culturally competent evaluation: cultures and languages are not monolithic. Differences in regional cultures or dialects can lead even experienced evaluators into embarrassment, scorn, or the worst outcome of all: inaccurate data.
Sentate, Por Favor
For example, when first interacting with the gang members, I introduced myself and asked them to “Please sit down,” to start the interview by saying “Siéntate, por favor.” What I did not know at the time is that a large portion of the gang members I was interviewing were born in El Salvador or were of Salvadoran descent, and the accurate way to say it using Salvadoran Spanish would have been, “Sentate, por favor.”
Does one word make that much difference? In most cases it did not matter, but it caused several gang members to openly question my Spanish from the outset, which created an uncomfortable beginning to interviews about potentially sensitive subjects.
Amigo or Chero?
I next asked the gang members to think of their “friends.” In most dialects of Spanish, using amigos to ask about friends is accurate and proper. However, in the context of street slang, some gang members prefer the term chero, especially in informal contexts.
Again, was this a huge mistake? No. But it did lead to enough quizzical looks and requests for clarification that started to doubt if I was getting completely honest or accurate answers from some of the respondents. Unfortunately, this error did not arise until I had conducted nearly 30 interviews. I had not thought to test the wordings of the questions in multiple Spanish-speaking communities across several states.
Would You Like a Concha?
Perhaps my most memorable mistake during this evaluation occurred after I had completed an interview with a gang leader outside of a bakery. After we were done, the gang leader called over the rest of his gang to meet me. As I was meeting everyone, I glanced inside the bakery and noticed a type of Mexican pastry that I enjoyed as a child. I asked the gang leader if he would like to go inside and join me for a concha, a round pastry that looks like a shell. Everyone (except me) began to laugh hysterically. The gang leader then let me in on the joke. He understood that I was asking about the pan dulce (sweet bread), but he informed me that in his dialect, concha was used as a vulgar reference to female genitalia. This taught me a valuable lesson about how even casual references or language choices can be interpreted in many different ways.
What did I learn from this?
While I can look back on these mistakes and laugh, I am also reminded of the important lessons learned that I carry with me to this day.
Translate with the local context in mind. When translating materials
or preparing for field work, get a detailed sense of who you will be collecting data from, including what cultures and subgroups people represent and whether or not there are specific topics or words that should be avoided.
Translate with the local population in mind. When developing data collection tools (in any language, even if you are fluent in it), take the time to pre-test the language in the tools.
Be okay with your inevitable mistakes. Recognize that no matter how much preparation you do, you will make mistakes in your data collection related to culture and language issues. Remember it is how you respond in those situations that is most important.
Digitization is everywhere! Digital technologies and data have changed the way we engage with each other and how we work. We cannot escape the effects of digitization. Whether in our personal capacity — how our own data is being used — or in our professional capacity, in terms of understanding how to use data and technology. These changes are exciting! But we also need to consider the challenges they present to the MERL community and their impact on development.
The advent and proliferation of big data has the potential to change how evaluations are conducted. New skills are needed to process and analyse big data. Mathematics, statistics and analytical skills will be ever more important. As evaluators, we need to be discerning about the data we use. In a world of copious amounts of data, we need to ensure we have the ability to select the right data to answer our evaluation questions.
We also have an ethical and moral duty to manage data responsibly. We need new strategies and tools to guide the ways in which we collect, store, use and report data. Evaluators need to improve our skills as related to processing and analysing data. Evaluative thinking in the digital age is evolving and we need to consider the technical and soft skills required to maintain integrity of the data and interpretation thereof.
Though technology can make data collection faster and cheaper, two important considerations are access to technology by vulnerable groups and data integrity. Women, girls and people in rural areas normally do not have the same levels of access to technology as men and boys This impacts on our ability to rely solely on technology to collect data from these population groups, because we need to be aware of inclusion, bias and representativity. Equally we need to consider how to maintain the quality of data being collected through new technologies such as mobile phones and to understand how the use of new devices might change or alter how people respond.
In a rapidly changing world where technologies such as AI, Blockchain, Internet of Things, drones and machine learning are on the horizon, evaluators need to be robust and agile in how we change and adapt.
For this reason, a new strand has been introduced at the African Evaluation Association (AfrEA) conference, taking place from 11 – 15 March 2019 in Abidjan, Cote d’Ivoire. This stream, The Fourth Industrial Revolution and its Impact on Development: Implications for Evaluation, will focus on five sub-themes:
Guide to Industry 4.0 and Next Generation Tech
Talent and Skills in Industry 4.0
Changing World of Work
Evaluating youth programmes in Industry 4.0
Genesis Analytics will be curating this strand. We are excited to invite experts working in digital development and practitioners at the forefront of technological innovation for development and evaluation to submit abstracts for this strand.
by Zach Tilton, a Peacebuilding Evaluation Consultant and a Doctoral Research Associate at the Interdisciplinary PhD in Evaluation program at Western Michigan University.
In 2013 Dan Airley quipped“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it….” In 2015 the metaphor was imported to the international development sector by Ben Ramalingam, in 2016 it became a MERL Tech DC lightning talk, and has been ringing in our ears ever since. So, what about 2018? Well, unlike US national trends in teenage sex, there are some signals that big or at least‘bigger’ data is continuing to make its way not only into the realm of digital development, but also evaluation. I recently attended the 2018 MERL Tech DC pre-conference workshop Big Data and Evaluation where participants were introduced to real ways practitioners are putting this trope to bed(sorry, not sorry). In this blog post I share some key conversations from the workshop framed against the ethics of using this new technology, but to do that let me first provide some background.
I entered the workshop on my heels. Given the recent spate of security breaches and revelations about micro-targeting,‘Big Data’ has been somewhat of a boogie-man for myself and others. I have taken some pains to limit my digital data-footprint, have written passionately about big data and surveillance capitalism, and have long been skeptical of big data applications for serving marginalized populations in digital development and peacebuilding. As I found my seat before the workshop started I thought,“Is it appropriate or ethical to use big data for development evaluation?” My mind caught hold of a 2008 Evaluation Café debate between evaluation giants Michael Scriven and Tom Cook on causal inference in evaluation and the ethics of Randomized Control Trials. After hearing Scriven’s concerns about the ethics of withholding interventions from control groups, Cook asks,“But what about the ethics of not doing randomized experiments?” He continues,“What about the ethics of having causal information that is in fact based on weaker evidence and is wrong? When this happens, you carry on for years and years with practices that don’t work whose warrant lies in studies that are logically weaker than experiments provide.”
While I sided with Scriven for most of that debate, this question haunted me. It reminded me of an explanation of structural violence by peace researcher Johan Galtung who writes,“If a person died from tuberculosis in the eighteenth century it would be hard to conceive of this as violence since it might have been quite unavoidable, but if he dies from it today, despite all the medical resources in the world, then violence is present according to our definition.” Galtung’s intellectual work on violence deals with the difference between potential and the actual realizations and what increases that difference. While there are real issues with data responsibility, algorithmic biases, and automated discrimination that need to be addressed, if there are actually existing technologies and resources not being used to address social and material inequities in the world today, is this unethical, even violent?“What about the ethics of not using big data?” I asked myself back. The following are highlights of the actually existing resources for using big data in the evaluation of social amelioration.
Actually Existing Data
During the workshop, Kerry Bruce from Social Impact shared with participants her personal mantra,“We need to do a better job of secondary data analysis before we collect any more primary data.” She challenged us to consider how to make use of the secondary data available to our organizations. She gave examples of potential big data sources such as satellite images, remote sensors, GPS location data, social media, internet searches, call-in radio programs, biometrics, administrative data and integrated data platforms that merge many secondary data files such as public records and social service agency and client files. The key here is there are a ton of actually existing data, many of which are collected passively, digitally, and longitudinally. Despite noting real limitations to accessing existing secondary data, including donor reluctance to fund such work, limited training in appropriate methodologies in research teams, and differences in data availability between contexts, to underscore the potential of using secondary data, she shared a case study where she lead a team to use large amounts of secondary indirect data to identify ecosystems of modern day slavery at a significantly reduced cost than collecting the data first-hand. The outputs of this work will help pinpoint interventions and guide further research into the factors that may lead to predicting and prescribing what works well for stopping people from becoming victims of slavery.
Actually Existing Tech(and math)
Peter York from BCT Partners provided a primer on big data and data science including the reality-check that most of the work is the unsexy “ETL,” or the extraction, transformation, and loading of data. He contextualized the potential of the so-called big data revolution by reminding participants that the V’s of big data, Velocity, Volume, and Variety, are made possible by the technological and social infrastructure of increasingly networked populations and how these digital connections enable the monitoring, capturing, and tracking of ever increasing aspects of our lives in an unprecedented way. He shared,“A lot of what we’ve done in research were hacks because we couldn’t reach entire populations.” With advances in the tech stacks and infrastructure that connect people and their internet-connected devices with each other and the cloud, the utility of inferential statistics and experimental design lessens when entire populations of users are producing observational behavior data. When this occurs, evaluators can apply machine learning to discover the naturally occurring experiments in big data sets, what Peter terms‘Data-driven Quasi-Experimental Design.’ This is exactly what Peter does when he builds causal models to predict and prescribe better programs for child welfare and juvenile justice to automate outcome evaluation, taking cues from precision medicine.
One example of a naturally occurring experiment was the 1854 Broad Street cholera outbreak in which physician John Snow used a dot map to identify a pattern that revealed the source of the outbreak, the Broad Street water pump. By finding patterns in the data John Snow was able to lay the groundwork for rejecting the false Miasma Theory and replace it with a proto-typical Germ Theory. And although he was already skeptical of miasma theory, by using the data to inform his theory-building he was also practicing a form of proto-typical Grounded Theory. Grounded theory is simply building theory inductively, after data collection and analysis, not before, resulting in theory that is grounded in data. Peter explained,“Machine learning is Grounded Theory on steroids. Once we’ve built the theory, found the pattern by machine learning, we can go back and let the machine learning test the theory.” In effect, machine learning is like having a million John Snows to pour over data to find the naturally occurring experiments or patterns in the maps of reality that are big data.
A key aspect of the value of applying machine learning in big data is that patterns more readily present themselves in datasets that are‘wide’ as opposed to‘tall.’ Peter continued,“If you are used to datasets you are thinking in rows. However, traditional statistical models break down with more features, or more columns.” So, Peter and evaluators like him that are applying data science to their evaluative practice are evolving from traditional Frequentist to Bayesian statistical approaches. While there is more to the distinction here, the latter uses prior knowledge, or degrees of belief, to determine the probability of success, where the former does not. This distinction is significant for evaluators who are wanting to move beyond predictive correlation to prescriptive evaluation. Peter expounded,“Prescriptive analytics is figuring out what will best work for each case or situation.” For example, with prediction, we can make statements that a foster child with certain attributes is 70% not likely to find a home. Using the same data points with prescriptive analytics we can find 30 children that are similar to that foster child and find out what they did to find a permanent home. In a way, only using predictive analytics can cause us to surrender while including prescriptive analytics can cause us to endeavor.
The last category of existing resources for applying big data for evaluation was mostly captured by the comments of independent evaluation consultant, Michael Bamberger. He spoke of the latent capacity that existed in evaluation professionals and teams, but that we’re not taking full advantage of big data: “Big data is being used by development agencies, but less by evaluators in these agencies. Evaluators don’t use big data, so there is a big gap.”
He outlined two scenarios for the future of evaluation in this new wave of data analytics: a state of divergence where evaluators are replaced by big data analysts and a state of convergence where evaluators develop a literacy with the principles of big data for their evaluative practice. One problematic consideration with this hypothetical is that many data scientists are not interested in causation, as Peter York noted. To move toward the future of convergence, he shared how big data can enhance the evaluation cycle from appraisal and planning through monitoring, reporting and evaluating sustainability. Michael went on to share a series of caveats emptor that include issues with extractive versus inclusive uses of big data, the fallacy of large numbers, data quality control, and different perspectives on theory, all of which could warrant their own blog posts for development evaluation.
While I deepened my basic understandings of data analytics including the tools and techniques, benefits and challenges, and guidelines for big data and evaluation, my biggest take away is reconsidering big data for social good by considering the ethical dilemma of not using existing data, tech, and capacity to improve development programs, possibly even prescribing specific interventions by identifying their probable efficacy through predictive models before they are deployed.
The MERL Tech Conference explores the intersection of Monitoring, Evaluation, Research and Learning (MERL) and technology. The main goals of “MERL Tech” as an initiative are to:
Transform and modernize MERL in an intentionally responsible and inclusive way
Promote ethical and appropriate use of tech (for MERL and more broadly)
Encourage diversity & inclusion in the sector & its approaches
Improve development, tech, data & MERL literacy
Build/strengthen community, convene, help people talk to each other
Help people find and use evidence & good practices
Provide a platform for hard and honest talks about MERL and tech and the wider sector
Spot trends and future-scope for the sector
Our fifth MERL Tech DC conference took place on September 6-7, 2018, with a day of pre-workshops on September 5th. Some 300 people from 160 organizations joined us for the 2-days, and another 70 people attended the pre-workshops.
Attendees came from a wide diversity of professions and disciplines:
An unofficial estimate on speaker racial and gender diversity is here.
Building bridges, connections, community, and capacity
Sharing experiences, examples, challenges, and good practice
Strengthening the evidence base on MERL Tech and ICT4D approaches
Facing our challenges and shortcomings
Exploring the future of MERL
As always, sessions were related to: technology for MERL, MERL of ICT4D and Digital Development programs, MERL of MERL Tech, digital data for adaptive decisions/management, ethical and responsible data approaches and cross-disciplinary community building.
Sessions included plenaries, lightning talks and breakout sessions. You can find a list of sessions here, including any presentations that have been shared by speakers and session leads. (Go to the agenda and click on the session of interest. If we have received a copy of the presentation, there will be a link to it in the session description).
One topic that we explored more in-depth over the two days was the need to get better at measuring ourselves and understanding both the impact of technology on MERL (the MERL of MERL Tech) and the impact of technology overall on development and societies.
As Anahi Ayala Iacucci said in her opening talk — “let’s think less about what technology can do for development, and more about what technology does to development.” As another person put it, “We assume that access to tech is a good thing and immediately helps development outcomes — but do we have evidence of that?”
Some 17.5% of participants filled out our post-conference feedback survey, and 70% of them rated their experience either “awesome” or “good”. Another 7% of participants rated individual sessions through the “Sched” app, with an average session satisfaction rating of 8.8 out of 10.
Topics that survey respondents suggested for next time include: more basic tracks and more advanced tracks, more sessions relating to ethics and responsible data and a greater focus on accountability in the sector. Read the full Feedback Report here!
What’s next? State of the Field Research!
In order to arrive at an updated sense of where the field of technology-enabled MERL is, a small team of us is planning to conduct some research over the next year. At our opening session, we did a little crowdsourcing to gather input and ideas about what the most pressing questions are for the “MERL Tech” sector.
We’ll be keeping you informed here on the blog about this research and welcome any further input or support! We’ll also be sharing more about individual sessions here.
As we all know, big data and data science are becoming increasingly important in all aspects of our lives. There is a similar rapid growth in the applications of big data in the design and implementation of development programs. Examples range from the use of satellite images and remote sensors in emergency relief and the identification of poverty hotspots, through the use of mobile phones to track migration and to estimate changes in income (by tracking airtime purchases), social media analysis to track sentiments and predict increases in ethnic tension, and using smart phones on Internet of Things (IOT) to monitor health through biometric indicators.
Despite the rapidly increasing role of big data in development programs, there is speculation that evaluators have been slower to adopt big data than have colleagues working in other areas of development programs. Some of the evidence for the slow take-up of big data by evaluators is summarized in “The future of development evaluation in the age of big data”. However, there is currently very limited empirical evidence to test these concerns.
To try to fill this gap, my colleagues Rick Davies and Linda Raftree and I would like to invite those of you who are interested in big data and/or the future of evaluation to complete the attached survey. This survey, which takes about 10 minutes to complete asks evaluators to report on the data collection and data analysis techniques that you use in the evaluations you design, manage or analyze; while at the same time asking data scientists how familiar they are with evaluation tools and techniques.
The survey was originally designed to obtain feedback from participants in the MERL Tech conferences on “Exploring the Role of Technology in Monitoring, Evaluation, Research and Learning in Development” that are held annually in London and Washington, DC, but we would now like to broaden the focus to include a wider range of evaluators and data scientists.
One of the ways in which the findings will be used is to help build bridges between evaluators and data scientists by designing integrated training programs for both professions that introduce the tools and techniques of both conventional evaluation practice and data science, and show how they can be combined to strengthen both evaluations and data science research. “Building bridges between evaluators and big data analysts” summarizes some of the elements of a strategy to bring the two fields closer together.
The findings of the survey will be shared through this and other sites, and we hope this will stimulate a follow-up discussion. Thank you for your cooperation and we hope that the survey and the follow-up discussions will provide you with new ways of thinking about the present and potential role of big data and data science in program evaluation.
This year at MERL Tech DC, in addition to the regular conference on September 6th and 7th, we’re offering two full-day, in-depth workshops on September 5th. Join us for a deeper look into the possibilities and pitfalls of Blockchain for MERL and Big Data for Evaluation!
What can Blockchain offer MERL? with Shailee Adinolfi, Michael Cooper, and Val Gandhi, co-hosted by Chemonics International, 1717 H St. NW, Washington, DC 20016.
Tired of the blockchain hype, but still curious on how it will impact MERL? Join us for a full day workshop with development practitioners who have implemented blockchain solutions with social impact goals in various countries. Gain knowledge of the technical promises and drawbacks of blockchain technology as it stands today and brainstorm how it may be able to solve for some of the challenges in MERL in the future. Learn about ethical design principles for blockchain and how to engage with blockchain service providers to ensure that your ideas and programs are realistic and avoid harm. See the agenda here.
Big Data and Evaluation with Michael Bamberger, Kerry Bruce and Peter York, co-hosted by the Independent Evaluation Group at the World Bank – “I” Building, Room: I-1-200, 1850 I St NW, Washington, DC 20006
Join us for a one-day, in-depth workshop on big data and evaluation where you’ll get an introduction to Big Data for Evaluators. We’ll provide an overview of applications of big data in international development evaluation, discuss ways that evaluators are (or could be) using big data and big data analytics in their work. You’ll also learn about the various tools of data science and potential applications, as well as run through specific cases where evaluators have employed big data as one of their methods. We will also address the important question as to why many evaluators have been slower and more reluctant to incorporate big data into their work than have their colleagues in research, program planning, management and other areas such as emergency relief programs. Lastly, we’ll discuss the ethics of using big data in our work. See the agenda here!