MERL Tech News

Oops! Satellite Imagery Cannot Predict Human Development Indicators

Guest post from Wayan Vota

In many developing country environments, it is difficult or impossible to obtain recent, reliable estimates of human development. Nationally representative household surveys, which are the standard instrument for determining development policy and priorities, are typically too expensive to collect with any regularity.

Recently, however, researchers have shown the potential for remote sensing technologies to provide a possible solution to this data constraint. In particular, recent work indicates that satellite imagery can be processed with deep neural networks to accurately estimate the sub-regional distribution of wealth in sub-Saharan Africa.

Testing Neural Networks to Process Satellite Imagery

In the paper, Can Human Development be Measured with Satellite Imagery?, Andrew Head, Mélanie Manguin, Nhat Tran, and Joshua Blumenstock explore the extent to which the same approach – of using convolutional neural networks to process satellite imagery – can be used to measure a broader set of human development indicators, in a broader range of geographic contexts.

Their analysis produces three main results:

  • They successfully replicate prior work showing that satellite images can accurately infer a wealth-based index of poverty in sub-Saharan Africa.
  • They show that this approach can generalize to predicting poverty in other countries and continents, but that the performance is sensitive to the hyperparameters used to tune the learning algorithm.
  • They find that this approach does not trivially generalize to predicting other measures of development such as educational attainment, access to drinking water, and a variety of health-related indicators.

This paper shows that while satellite imagery and machine learning may provide a powerful paradigm for estimating the wealth of small regions in sub-Saharan Africa, the same approach does not trivially generalize to other geographical contexts or to other measures of human development.

In this assessment, it is important to emphasize what they mean by “trivially,” because in truth the point they are making is somewhat circumspect. Specifically, what they have shown is that the exact framework—of retraining a deep neural network on night-lights data, and then using those features to predict the wealth of small regions in sub-Saharan Africa—cannot be directly applied to predicting arbitrary indicators in any country with uniformly good results.

This is an important point to make because absent empirical evidence to the contrary, it is likely that policymakers eager to gain quick access to micro-regional measurements of development might be tempted to do exactly what they have done in this paper, without paying careful attention to the thorny issues of generalizability that they have uncovered in this analysis.

It is not the researchers’ intent to impugn the potential for related approaches to provide important new methods for measuring development, but rather to say that such efforts should proceed with caution, and with careful validation.

Why Satellite Imagery Might Fail to Predict Development

The results showed that while some indicators like wealth and education can be predicted reasonably well in many countries, other development indicators are much more brittle, exhibiting high variance between and within countries, and others perform poorly everywhere.

Thus it is useful to distinguish between two possible reasons why the current approach may have failed to generalize to these measures of development.

  • It may be that this exercise is fundamentally not possible, and that no amount of additional work would yield qualitatively different results.
  • It is quite possible that their investigation to date has been not been sufficiently thorough, and that more concerted efforts could significantly improve the performance of these models

Insufficient “signal” in the satellite imagery.

The researchers’ overarching goal is to use information in satellite images to measure different aspects of human development. The premise of such an approach is that the original satellite imagery must contain useful information about the development indicator of interest. Absent of such a signal, no matter how sophisticated our computational model, the model is destined to fail.

The fact that wealth specifically can be measured from satellite imagery is quite intuitive. For instance, there are visual features one might expect correlate with wealth—large buildings, metals roofs, nicely paved roads, and so forth.

It may be the case that other measures of human development cannot be seen from above. For instance, it may be a fundamentally difficult task to infer the prevalence of malnutrition from satellite imagery, if the regions with high and low rates of malnutrition appear similar, even though they hypothesize that these indices should correlate with wealth index.

They were, however, surprised by the relative under-performance of models designed to predict access to drinking water, as they expected the satellite-based features to capture proximity to bodies of water, which in turn might affect access to drinking water.

(Over-) reliance on night-lights may not generalize.

Their reliance on night lights might help explain why some indicators were predicted less successfully in some countries than others. An example in their study includes Nepal, where the accuracy in predicting access to electricity was much lower (R2 = 0.24) than in the other countries (R2 = 0.69, 0.44, and 0.54 in Rwanda, Nigeria, and Haiti, respectively).

This may be partly due to the fact that Nepal has a very low population density (half as dense as Haiti and Rwanda) and very high levels of electrification (twice as high as Haiti, Rwanda, and Nigeria).

If the links between electrification, night-lights, and daytime imagery are broken in Nepal, they would expect their modeling approach to fail. More generally, they expect that when a development indicator does not clearly relate to the presence of nighttime lights, it may be unreasonable to expect good performance from the transfer learning process as a whole.

Deep learning vs. supervised feature engineering.

In this paper, the researchers focused explicitly on using the deep/transfer learning approach to extracting information from satellite images. While powerful, it is also possible that other approaches to feature engineering might be more successful than the brute force approach of the convolutional neural network.

For instance, Gros and Tiecke have recently shown how hand-labeled features from satellites, and specifically information about the types of buildings that are present in each image, can be quite effective in predicting population density. Labeling images in this manner is resource intensive, and they did not have the opportunity to test such approaches.

However, they believe that careful encoding of the relevant information from satellite imagery would likely bolster the performance of specific prediction tasks.

Neural Networks Can Still Process Satellite Imagery

Broadly, the researchers remain optimistic that future work using novel sources of data and new computational algorithms can engender significant advances in the measurement of human development.

However, it is imperative that such work proceeds carefully, with appropriate benchmarking and external calibration. Promising new tools for measurement have the potential to be implemented widely, possibly by individuals who do not have extensive expertise in the underlying algorithms.

Applied blindly, these algorithms have the potential to skew subsequent policy in unpredictable and undesirable ways. They view the results of this study as a cautionary example of how a promising algorithm should not be expected to work “off the shelf” in a context that is significantly different from the one in which it was originally developed.

The post Oops! Satellite Imagery Cannot Predict Human Development Indicators appeared first on ICTworks.

Tips for Increasing Mobile Survey Response Rates

Guest post from James Mwangi, the Deployment Lead at Echo Mobile. This post was originally published on the Echo Mobile blog.

Users often ask us, “what response rate will I get from my survey?”, or “how can I increase my survey’s response rate?”

The truth is …. it depends!

Response rates depend on your organisation, your respondents, and their motivation for responding. Most of our users assume that financial incentives are the most effective for stimulating engagement, and indeed research shows they can enhance response rates. But they are not always necessary and rarely sufficient. The design of your survey — its structure, tone and content — is equally important and often ignored.

In a recent SMS survey conducted for the third time on behalf of a UN agency and government ministry, Echo’s Deployment team demonstrated that minor adjustments to survey design can drastically increase response rates, regardless of financial incentives.

In May 2017, the team sent a survey with a KES 35 airtime incentive to 25,000 Kenyan government employees, 21% of whom completed it. In October 2017, Deployment sent the same survey to the same group with the same airtime incentive. This time only 16% completed it. In February 2018, we sent the survey again, with minor design tweaks and no financial incentives. The completion rate nearly doubled to 29%.

Win-win! Our client saved money by dropping the airtime transfers and got more results. More of their beneficiaries were able to engage and provide critical feedback. Here are the design changes we made to the survey. Consider them next time you’re using Echo for Monitoring and Evaluation (M&E):

Personalize the content

The Echo Platform allows users to personalize messages using standard fields — basic, common data points like name, ID and location, which can be stored in Echo contact profiles and can be integrated into large-scale messages.

Unlike in 2017, in the 2018 version of the UN survey, our Deployment team added the NAME field to the first SMS. As a result, all recipients immediately saw their name before automatically progressing to the first question. This builds a sense of trust, captures recipients’ attention, and is less likely to be mistaken for spam.

And you don’t need to to just stick to standard fields! Any prior response to a survey can be stored as a custom field. If you ask recipients their favorite football team and store the response as a custom field, the next time you send them SMS you can personalize your content even further: “Hi [NAME]. Hope [FOOTBALL_TEAM] is doing well this week….”

Skip the “opt-in”

The Echo platform’s survey builder allows you to add an invitation message as the first SMS sent to a contact. To move from this intro message on to the first question, recipients must “opt-in” by responding to this initial message with something like “ok” or “begin” (any word/number will do).

Sample survey designs, before optimisation.

Invitation messages are extremely useful. They help you be polite, introduce yourself if the recipient doesn’t know you, and say what your survey is about and why and how they can proceed (more below on instructions!). But they can also create a barrier to completion.

Observing that many respondents had failed to opt in to our 2017 survey, for the 2018 version of the survey we dropped the invitation message. Instead, we took that content and sent it as an info question, which, by design, automatically progress to the next question, regardless of a response or not.

Optimised survey ; personalised, does not require the respondent to opt in, and has clear instructions on how to reply.

Removing the opt-in invitation message won’t always be an option, but in this case, respondents were employees of our client and had been engaging on their shortcode for years. In some ways the intro message just added an extra step for them, as they had already provided their phone numbers and given consent to allow our client to engage them. Personally Identifiable Information (PII) is also not collected, nor shared and the respondents have an option to unsubscribe entirely from our system by sending the word STOP at any time, an option that has been communicated to them repeatedly.

In other cases, users might be suspicious of the opt-in request. Many Kenyans have encountered premium SMS services that push messages to unknowing respondents and deduct airtime from them once they opt in. Messaging with Echo is totally free for your respondents, but consider how they might react to an opt-in intro message, and design your survey accordingly!

Give clear Instructions

Keeping in mind SMS character limit, our Deployment team added quick instructions at the end of each question in the 2018 survey. These guided the respondents on how to answer specific question types. In the prior 2017 versions, each SMS had only contained the question, without instructions on how to answer:

Send reminders

For the 2017 surveys, we automated a reminder, sent 24 hours after the survey to those who had not yet started or completed it. For the 2018 version we added a second reminder, sent 12 hours later.

Reminders like these nudge contacts who are willing to respond to the survey but may have become distracted before completing it. This is especially true for long surveys like the one we have been deploying for the UN, which risk respondent fatigue. Reminders are a subtle way of urging them to finish the survey. Better yet — keep it lean!

So, what’s the take away here?

While research on the potential impact of financial incentives is clear, no amount of money or airtime can make up for suboptimal survey design!

Monetary rewards can move the response rate in the margins, but not always, and only if you get the design right first. Financial incentives are complementary to a well designed survey that has useful and clear content, an efficient structure, and a personal tone.

That said, non-financial incentives — the broader reasons why your contacts might want to engage with you at all — are an extremely important consideration. Not everyone’s time and information can be bought.

Consider for your next survey or engagement what informational, relational, or emotional incentives you might be explicitly or implicitly offering up front. As with any relationship, both sides ultimately need to feel like there is some benefit to the commitment. We’ll blog more about this idea soon!

Want to learn more from the Echo Deployment team? We consult on mobile engagement strategy and techniques, and can provide implementation support for survey creation, setup, optimization, deployment, and tracking on the Echo Platform.

Early Concepts for Designing and Evaluating Blockchain Interventions for Behavior Change

Guest post by Michael Cooper, a former DoS, MCC Associate Director for Policy and Evaluation who now runs Emergence.  Mike advises numerous donors, private clients and foundations on program design, MEL, adaptive management and other analytical functions.

International development projects using the blockchain in some way are increasing at a rapid rate and our window for developing evidence around what does and does not work (and more importantly why) is narrow before we run into un-intended consequences.  Given that blockchain is a highly disruptive technology, these un-intended consequences could be significant, creating a higher urgency to generate the evidence to guide how we design and evaluate blockchain applications. 

Our window for developing evidence around what does and does not work (and more importantly why) is narrow before we run into un-intended consequences.

To inform this discussion, Emergence has put out a working paper that outlines 1.) what the blockchain is, 2.) how it can be used to leverage behavior change outcomes in international development projects and 3.) the implications for how we could design and evaluate blockchain based interventions.  The paper utilizes systems and behaviorism principles in comparing how we currently design behavior change interventions to how we could design/evaluate the same interventions using the blockchain.  This article summarizes the main points of the paper and its conclusions to generate discussion around how to best produce the evidence we need to fully realize the potential of blockchain interventions for social impact.

Given the scope of possibilities surrounding the blockchain, both in how it could be used and in the impact it could leverage, the implications for how MEL is conducted are significant.  The time is long gone where value adding MEL practitioners are not involved in intervention design.  Blockchain based interventions will require additional integration of MEL skill sets in the early design phases since so much will need to be “tested” to determine what is and is not working.  While rigid statistical evaluations will needed for some of these blockchain based interventions, the level of complexity involved and the lack of an evidence base indicate that more flexible, adaptive and more formative MEL approaches will be needed.  The more these approaches are proactive and involved in intervention design, the more frequent and informative the feedback loops will be into our evidence base. 

The Blockchain as a Decentralizing Technology

At its core, the blockchain is just a ledger but the importance of ledgers in how society functions cannot be understated.  Ledgers, and the control of them, are crucial in how supply chains are managed, financial transactions are conducted, how data is shared, etc.  Control of ledgers is a primary factor in limiting access to life changing goods and services, especially for the worlds’ poor. In part, the discussion over decentralization is essentially a discussion over who owns and how ledgers are managed. 

Decentralization has been a prominent theme in international development and there is strong evidence of its positive impact across various sectors, especially regarding local service delivery.  One of the primary value adds of decentralization is empowering those further from traditional concentrations of power to have more authority over the problems that impact them.  As a decentralizing technology, the blockchain holds a lot of potential in reaching these same impacts from decentralization (empowerment, etc.) in a more efficient and effective manner partly due to its ability to better align interests around common problems.  With better aligned interests, less resources (inputs) are needed to try and facilitate a desired behavior change. 

Up until now, efforts of international development actors have focused on “nudging” behavior change amongst stakeholders and in very rare cases, such as in results based financing, give loosely defined parameters to implementers with less emphasis on the manner in which outcomes are achieved.  Both of these approaches are relevant in the design and testing of blockchain based interventions but they will be integrated in unique new ways that will require new thinking and skills sets amongst practitioners. 

Current Designing and Evaluating for Behavior Change

MEL usually starts with the relevant theory of change, namely what mechanisms bring about targeted behavior change and how.  Recent years have seen a focus on how behavior change is achieved through an understanding of mindsets and how they can be nudged to achieve a social outcome.  However the international development space has recognized the limitations of designing interventions that attempt to nudge behavior change.  These limitations center around the level of complexity involved, the inability to recognize and manage this complexity and lack of awareness about the root causes of problems.  Hence the rise in things like results based financing where the type of prescribed top-down causal pathway (usually laid out in a theory of change) is not as heavily emphasized as in more traditional interventions.  Donors using this approach can still mandate certain principles of implementation (such as the inclusion of vulnerable populations, environmental safeguards, timelines, etc.) but there is much more flexibility to create a causal pathway to achieve the outcome. 

Or, for example, take the popular PDIA approach where the focus is on iteratively identifying and solving problems encountered on the pathway to reform.  These efforts do not start with a mandated theory of change, but instead start with generally described targeted outcomes and then the pathway to those outcomes is iteratively created, similar to what Lant Pritchett has called “crawling the design space”.  Such an approach has large overlaps with adaptive management practices and other more integrative MEL frameworks and could lend themselves to how blockchain based interventions are designed, implemented and evaluated. 

How the Blockchain Could Achieve Outcomes and Implications for MEL

Because of its decentralizing effects, any theory of change for a blockchain based intervention could include some possible common attributes that influence how outcomes are achieved:

  • Empowerment of those closest to problems to inform the relevant solutions
  • Alignment of interests around these solutions
  • Alleviation of traditional intermediary services and relevant third party actors

Assessing these three attributes, and how they influence outcomes, could be the foundation of any appropriate MEL strategy for a blockchain-based intervention.  This is because these attributes are the “value add” of a blockchain-based intervention.  For example, traditional financial inclusion interventions may seek to extend financial services of a bank to rural areas through digital money, extension agents, etc.  A blockchain-based solution, however, may cut out the bank entirely and empower local communities to receive financial services from completely new providers from anywhere in the world on much more affordable terms in and in a much more convenient manner.  Such a solution could see an alignment of interests amongst producers and consumers of these services since the new relationships are mutually serving.  Because of this alignment there is a less of a need, or even less of a benefit, of having donors script out the causal pathway for the outcomes to be achieved.  Because of this alignment of interests, those closest to the problem(s) and solutions can work it out because it is in their interest to do so. 

Hence while a MEL framework for such a project could still use more standardized measures around outcomes like increased access to financial services and could even use statistical methods to evaluate questions around attributable changes in poverty status; there will need to be adaptive and formative MEL that assess the dynamics of these attributes given their criticality to whether and how outcomes could be achieved.  The dynamics between these attributes and the surrounding social eco-system have the potential to be very fluid (going back to the disruptive nature of blockchain technology), hence flexible MEL will be required to respond to new trends as they emerge. 

Table: Blockchain Intervention Attributes and the Skill Sets to Assess Them

Blockchain Attributes Possible MEL Approaches
Empowerment of those closest to problems to inform the relevant solutions   Problem driven design and MEL approach, stakeholder mapping (to identify relevant actors) Decentralization focused MEL (MEL that focuses on outcomes associated with decentralization)
Alignment of interests Political economy analysis to identify incentives and interests Adaptive MEL to assess shifting alignment of interest between various actors
Alleviation of traditional intermediary services Political economy analysis to inform risk mitigation strategy for potential spoilers and relevant MEL

While there will need to be standard accountability and other uses, feedback from an appropriate MEL strategy could have two primary end uses in a blockchain based intervention: governance and trust.

The Role of Governance and Trust

Blockchain governance sets outs the rules for how consensus (ie. agreement) is achieved for deciding what transactions are valid on a blockchain.  While this may sound mundane it is critical for achieving outcomes since how the blockchain is governed decides how well those closest to the problems are empowered to identify and achieve solutions and aligned interests. Hence the governance framework for the blockchain will need to be informed by an appropriate MEL strategy.  A giant learning gap we currently have is how to iteratively adapt blockchain governance structures, using MEL feedback, into increasingly more efficient versions.  Closing this gap will be critical to assessing the cost effectiveness of blockchain based solutions over other solutions (ie. alternatives/cost benefit analysis tools) as well as maximizing impact. 

A giant learning gap we currently have is how to iteratively adapt blockchain governance structures, using MEL feedback, into increasingly more efficient versions. 

Another focus of an appropriate MEL strategy would be to facilitate trust in the blockchain-based solution amongst users much the same as other technology-led solutions like mobile money or pay as you go metering for service delivery.  This includes not only the digital interface between the user and the technology (a phone app, SMS or other interface) but other dimensions of “trust” that would facilitate uptake of the technology.  These dimensions of trust would be informed by an analysis of the barriers to uptake of the technology amongst intended users, given it could be an entirely new service for beneficiaries or an old service delivered in a new fashion.  There is already a good evidence base around what works in this area (ie. marketing and communication tools for digital financial services, assistance in completing registration paperwork for pay as you go metering, etc.). 

The Road Ahead

There is A LOT we need to learn and a short time to do it in before we feel the negative effects from a lack of preparedness.  This risk is heightened when you consider that the international development industry has a poor track record of designing and evaluating technology-led solutions (primarily due to the fact that these projects usually neglect uptake of the technology and operate on the assumption that the technology will drive outcomes instead of users using the technology as a tool to drive the outcomes). 

The lessons from MEL in results based financing could be especially informative to the future of evaluating blockchain-based solutions given their similarities in letting solutions work themselves out and the role of the “validator” in ensuring outcomes are achieved.  In fact the blockchain has already been used in this role in some simple output based programming. 

As alluded to, pre-existing MEL skill sets can add a lot of value to building an evidence base but MEL practitioners will need to develop a greater understanding of the attributes of blockchain technology, otherwise our MEL strategies will not be suited to blockchain based programming.

Mobile survey completion rates: Correlation versus causation

by Kim Rodgers, Software Engineer at Echo Mobile. Original post appeared on Medium.

Introduction

We hear the terms “correlation” and “causation” a lot, but what do they actually mean?

Correlation: defines how two variables relate with each other when they change. When one variable increases, the other may increase, decrease or remain the same. For example, when it rains more, people tend to buy more umbrellas.

Causation: implies that one variable causes another variable to change. For example, we can confidently conclude that more rain causes more people to acquire umbrellas.

In this post, I will explore the meaning of the terms and try to explain a way of deciding how they relate. I will use a real-world example to explore and explain.

Survey completion rate correlations

Echo Mobile helps organizations in Africa engage, influence, and understand their target audience via mobile channels. Our core product is a web-based SaaS platform that, among many other things, enables users to design, send and analyze the results of mobile surveys. Our users can deploy their surveys via SMS (Short Messaging Service), USSD (Unstructured Supplementary Service Data), IVR (Interactive Voice Response), and Android apps, but SMS is the most heavily used channel.

Surveys are key to our overall mission, as they give our users a tool to better understand their target audiences — usually their customers or beneficiaries. To optimize the effectiveness of this tool, one thing that we really wanted to do was identify key factors that lead to more people completing surveys sent by our users from the Echo platform. This would enable us to advise our users on how to get more value from our platform through better engagement and understanding of their audiences.

The completion rate of a survey is the percentage of people who complete a survey after being invited to take part in it. We came up with different factors that we thought could effect the completion rate of surveys:

  • post_incentive: The incentive (a small amount of money or airtime) offered after completing the survey
  • invite_day_of_month: The date of the month a respondent was invited to the survey
  • invite_day_of_the_week: The day of the week a respondent was asked to take part in the survey
  • invite_hour: The hour of the day the respondent was invited to the survey
  • num_questions: The number of questions in the survey
  • reminded: whether the respondent was reminded to complete the survey or not
  • channel: The manner in which the survey was done. These were either by use of SMS, USSD, IVR, web, or Android app. SMS is the most popular channel and accounts for over 90% of surveys
  • completion_rate: Of those invited to a survey, the percentage that completed

We used the performance of surveys deployed from the beginning of 2017 to August of 2017 to look for the correlations between the sample factors above. The correlations between the factors are shown in the table below. Since the focus was more on how the completion rate relates with other factors, I will focus on those relationships more.

The bigger the correlation magnitude, the stronger the correlation relationship. A positive correlation indicates that when one factor is increased the other should also increase. For a negative correlation value, the relationship is inverse. When one increases, the other decreases.

Correlations between different survey factors. completion_rate has the strongest correlation with invite_hour

The rows of the table are arranged in a descending order of the correlation between completion rate and other factors. Looking at the table, invite_hour with a positive correlation of 0.25 is the factor with strongest correlation with the completion rate. It is then followed by reminded while invite_day_of_the_month is the most negatively correlated with the completion_rate. The correlation between any other factors can also be obtained from the table, for example the correlation between number_of_questions and reminded is 0.05.

Survey completion causations?

The findings above can lead to incorrect conclusions if one is not careful. For example, a conclusion can be made that the invite hour with a correlation of 0.25 has the highest causal influence on the completion_rate of a survey. As a result, you might start trying to find the right time to send out surveys with the hope of getting more of them completed. With this mentality, it might be concluded that some invite hour is the optimum time to send out a survey. But that would be to hold to the (incorrect) idea that correlation implies causation.

The high correlation may mean that either one factor causes the other, the factors jointly cause each other, both factors are caused by the same separate third factor, or even that the correlation is as a result of coincidence.

We can, therefore, see that correlation does not always imply causation. With careful investigation, however, it is possible to more confidently conclude whether correlation implies that one variable causes the other.

How can we verify if correlation might imply causation?

1. Use statistically sound techniques to determine the relationship.

Ensure that you use statistically legitimate methods to find the correlation. These include:

  • use of variables that correctly quantify the relationship.
  • make sure there are no outliers .
  • ensure the sample is an appropriate representation of the population.
  • use of an appropriate correlation coefficient based on the scales of the relationship metrics.

2. Explain the relationships found

  • exposure always precedes the outcome. If A is supposed to cause B, check that A always occurs before B.
  • check if the relationship ties in with other existing theories.
  • check if the proposed relationship is similar to other relationships in related fields.
  • check if there is no other relationship that can explain the relationship. In the case above, a proper explanation for the headaches could be drinking instead of sleeping with shoes.

3. Validate the relationships

  • Conditions 1 and 2 above should be tested to determine if they are true or false. The common methods of testing are experiments and checking for consistency of the relationship. An experiment usually requires a model of the relationship, a testable hypothesis based on the model, incorporation of variance control measures, collection of suitable metrics for the relationship, and an appropriate analysis. Experiments done several times should lead to consistent conclusions.

We have not yet carried out these tests on our completion rate correlations. So we don’t yet know, for example, whether particular invite hours cause higher completion rates — only whether they are correlated.

Conclusion

We need to be careful before concluding that a particular relationship implies causation. It is generally better not to have a conclusion than to land on an incorrect one which might lead to wrong actions being taken!


The original version of this post was written by Rodgers Kim.  Kim works at Echo Mobile as a Software Engineer and is interested in data science and enjoys writing.

Integrating MERL with program design is good program management

by Yaquta Fatehi, Program Manager of Performance Measurement at the William Davidson Institute at the University of Michigan; and Heather Esper, Senior Program Manager of Performance Measurement at the William Davidson Institute at the University of Michigan.

At MERL Tech DC 2018, we — Yaquta Fatehi and Heather Esper — led a session titled “Integrating MERL with program design: Presenting an approach to balance your MERL strategy with four principles” The session focused on our experience of implementing this approach.

The challenge: There are a number of pressing tensions and challenges in development programs related to MERL implementation. These include project teams and MERL teams working in silos and, just as importantly, leadership’s lack of understanding and commitment to MERL (as leadership often views MERL only in terms of accountability). And while there are solutions developed to address some of these challenges, our consortium, the Balanced Design, Monitoring, Evaluation, Research, and Learning (BalanceD-MERL) consortium (under U.S. Agency for International Development’s (USAID’s) MERLIN program) saw that there was still a strong need for integration of MERL in program design for good program management and adaptive management. We chose four principles – relevant, right-sized, responsible, and trustworthy – to guide this approach to enable sustainable integration of MERL with program design and adaptive management. Definitions of the principles can be found here.

How to integrate program design and MERL (a case example): Our consortium aimed to identify the benefits of such integration and application of these principles in the Women + Water Global Development Alliance program. The Alliance is a five year public/private partnership between USAID and Gap, Inc., and four other non-profit sector partners. The Alliance draws upon these organizations’ complementary strengths to improve and sustain the health and well-being of women and communities touched by the apparel industry in India. Gap, Inc. had not partnered with USAID before and had limited experience with MERL on a complex program such as this which consisted of multiple individual activities or projects implemented by multiple partners. The BalanceD-MERL consortium’s services were requested during the program design stage, to develop a rigorous program-wide, high-level, MERL strategy. We proposed co-developing the MERL activities with the Women + Water partners as listed in the MERL Strategy Template (see Table 1 in the case study shared below) – that has been developed by our consortium partner – Institute for Development Impact.

Our first step was to co-design the program’s theory of change with the Women + Water partners to establish a shared understanding of what was the problem and how it was to be addressed by the program. We used the theory of change as a communication asset that helped bring a shared understanding of the solution among partners. We found that through this process we also identified gaps in the program design that could then be addressed, in turn making the program design stronger. Grounded by the theory of change in order to be relevant and trustworthy, we co-developed a risk matrix, which was one of the most useful exercises for Gap, Inc. because it helped them place judgment on their assumptions and identify risks that needed to be frequently monitored. Following this, we co-identified the key performance indicators and associated metadata using the Performance Indicator Reference Sheets format. This exercise, done iteratively with all partners, helped them understand the tradeoffs between trustworthy and right-size; helped to ensure the feasibility of data collection and that indicators were right-sized and relevant; verified that methods were responsible and not placing unnecessary burden on key stakeholders; and confirmed that data was trustworthy enough to provide insights on the activity’s progress and changing context.

In order to integrate MERL with the program design, we closely co-created these key components with the partners. We also co-developed questions for a learning agenda and recommended adaptive management tasks such as quarterly pause and reflect sessions so that leadership and program managers could make necessary adaptations to the program based on performance data. The consortium was also tasked with developing the performance management information system.

Findings: Through this experience, we found that the theory of change can serve as a key tool to integrate MERL with program design and it can form the foundation on which to build remaining MERL activities. Additionally, indeed, MERL can be compromised by an immature program design that has been informed by an incomplete needs assessment. For all key takeaways from this experience of applying the approach and principles as well as action items for program and MERL practitioners and key questions for leadership, please see the following case study.

All in all, it was an engaging session and we heard good questions and comments from our audience. To learn more or if you have any questions on the approach, feel free to email us at wdi-performancemeasurement@umich.edu

This publication was produced by William Davidson Institute at the University of Michigan (WDI) in collaboration with World Vision (WV) under the BalanceD-MERL Program, Cooperative Agreement Number AID-OAA-A-15-00061, funded by the U.S. Agency for International Development (USAID). This study/ report/ audio/ visual/other information/ media product (specify) is made possible by the generous support of the American people through the USAID. The contents are the responsibility of the William Davidson Institute and World Vision and do not necessarily reflect the views of USAID or the United States Government.

Using Social Network Analysis and Feedback to Measure Systems Change

by Alexis Smart, Senior Technical Officer, and Alexis Banks, Technical Officer, at Root Change

As part of their session at MERL Tech DC 2018, Root Change launched Pando, an online platform that makes it possible to visualize, learn from, and engage with the systems where you work. Pando harnesses the power of network maps and feedback surveys to help organizations strengthen systems and improve their impact.

Decades of experience in the field of international development has taught our team that trust and relationships are at the heart of social change. Our research shows that achieving and sustaining development outcomes depends on the contributions of multiple actors embedded in thick webs of social relationships and interactions. However, traditional MERL approaches have failed to help us understand the complex dynamics within those relationships. Pando was created to enable organizations to measure trust, relationships, and accountability between development actors.

Relationship Management & Network Maps

Grounded in social network analysis, Pando uses web-based relationship surveys to identify diverse organizations within a system and track relationships in real time. The platform automatically-generates a network map that visualizes the organizations and relationships within asystem. Data filters and analysis tools help uncover key actors, areas ofcollaboration, and network structures and dynamics.

Feedback Surveys & Analysis

Pando is integrated with Keystone Accountability’s Feedback Commons, an online tool that gives map administrators the ability to collect and analyze feedback about levels of trust and relationship quality among map participants. The combined power of network maps and feedback surveys helps create a holistic understanding of the system of organizations that impact a social issue, facilitate dialogue, and track change over time as actors work together to strengthen the system.

Examples of Systems Analysis

During Root Change’s session, “Measuring Complexity: A Real-Time Systems Analysis Tool,”Root Change Co-Founder, Evan Bloom and Senior Technical Officer, Alexis Smart, highlighted four examples of using network analysis to create social change from our work:

  • Evaluating Local Humanitarian ResponseSystems: We worked with the Harvard Humanitarian Institute (HHI) to evaluate the effect of local capacity development efforts on local ownership within humanitarian response networks in the Philippines, Kenya, Myanmar, and Ethiopia. Using social network analysis, Root Change and HHI assessed the roles of local and international organizations within each network to determine thedegree to which each system was locally-led.
  • Supporting Collective Impact in Nigeria: Network mapping has also been used in the USAID funded Strengthening Advocacy and Civic Engagement (SACE) project in Nigeria. Over five years, more than 1,300 organizationsand 2,000 relationships across 17 advocacy issue areas were identified andtracked. Nigerian organizations used the map to form meaningful partnerships,set common agendas, coordinate strategies, and hold the government accountable.
  • Informing Project Design in Kenya – Root Change and the Aga Khan Foundation (AKF) collected relationship data from hundreds of youth and organizations supporting youth opportunities in coastal Kenya. Analysis revealed gaps in expertise within the system, and opportunities to improve relationships among organizations and youth. These insights helped inform AKF’s program design, and ongoing mapping will be used to monitor system change. 
  • Tracking Local Ownership: This year, under USAID Local Works, Root Change is working with USAID missions to measure local ownership of development initiatives using newly designed localization metrics on Pando. USAID Bosnia and Herzegovina (BiH) launched a national Local Works map, identifying over 1,000 organizations working together on community development. Root Change and USAID BiH are exploring a pilot to use this map to continue to collect data and track localization metrics and train a local organization to support with this process.
     

Join the MERL Tech DC Network Map

As part of the MERL Tech DC 2018 conference, Root Change launched a map of the MERL Tech community. Event participants were invited to join this collaborative mapping effort to identify and visualize the relationships between organizations working to design, fund, and implement technology that supports monitoring, evaluation, research, and learning (MERL) efforts in development.

It’s not too late to join! Email info@mypando.org for an invitation to join the MERL Tech DC map and a chance to explore Pando.

Learn more about Pando

Pando is the culmination of more than a decade of experience providing training and coaching on the use of social network analysis and feedback surveys to design, monitor, and evaluate systems change initiatives. Initial feedback from international and local NGOs, governments, community-based organizations, and more is promising. But don’t take our word for it. We want to hear from you about ways that Pando could be useful in your social impact work. Contact us to discuss ways Pando could be applied in your programs.

Blockchain for International Development: Using a Learning Agenda to Address Knowledge Gaps

Guest post by John Burg, Christine Murphy, and Jean Paul Pétraud, international development professionals who presented a one-hour session at the  MERL Tech DC 2018 conference on Sept. 7, 2018. Their presentation focused on the topic of creating a learning agenda to help MERL practitioners gauge the value of blockchain technology for development programming. Opinions and work expressed here are their own.

We attended the MERL Tech DC 2018 conference held on Sept. 7, 2018 and led a session related to the creation of a learning agenda to help MERL practitioners gauge the value of blockchain technology for development programming.

As a trio of monitoring, evaluation, research, and learning, (MERL) practitioners in international development, we are keenly aware of the quickly growing interest in blockchain technology. Blockchain is a type of distributed database that creates a nearly unalterable record of cryptographically secure peer-to-peer transactions without a central, trusted administrator. While it was originally designed for digital financial transactions, it is also being applied to a wide variety of interventions, including land registries, humanitarian aid disbursement in refugee camps, and evidence-driven education subsidies. International development actors, including government agencies, multilateral organizations, and think tanks, are looking at blockchain to improve effectiveness or efficiency in their work.

Naturally, as MERL practitioners, we wanted to learn more. Could this radically transparent, shared database managed by its users, have important benefits for data collection, management, and use? As MERL practice evolves to better suit adaptive management, what role might blockchain play? For example, one inherent feature of blockchain is the unbreakable and traceable linkages between blocks of data. How might such a feature improve the efficiency or effectiveness of data collection, management, and use? What are the advantages of blockchain over other more commonly used technologies? To guide our learning we started with an inquiry designed to help us determine if, and to what degree, the various features of blockchain add value to the practice of MERL. With our agenda established, we set out eagerly to find a blockchain case study to examine, with the goal of presenting our findings at the September 2018 MERL Tech DC conference.

What we did

We documented 43 blockchain use-cases through internet searches, most of which were described with glowing claims like “operational costs… reduced up to 90%,” or with the assurance of “accurate and secure data capture and storage.” We found a proliferation of press releases, white papers, and persuasively written articles. However, we found no documentation or evidence of the results blockchain was purported to have achieved in these claims. We also did not find lessons learned or practical insights, as are available for other technologies in development.

We fared no better when we reached out directly to several blockchain firms, via email, phone, and in person. Not one was willing to share data on program results, MERL processes, or adaptive management for potential scale-up. Despite all the hype about how blockchain will bring unheralded transparency to processes and operations in low-trust environments, the industry is itself opaque. From this, we determined the lack of evidence supporting value claims of blockchain in the international development space is a critical gap for potential adopters.

What we learned

Blockchain firms supporting development pilots are not practicing what they preach — improving transparency — by sharing data and lessons learned about what is working, what isn’t working, and why. There are many generic decision trees and sales pitches available to convince development practitioners of the value blockchain will add to their work. But, there is a lack of detailed data about what happens when development interventions use blockchain technology.

Since the function of MERL is to bridge knowledge gaps and help decision-makers take action informed by evidence, we decided to explore the crucial questions MERL practitioners may ask before determining whether blockchain will add value to data collection, management, and use. More specifically, rather than a go/no-go decision tool, we propose using a learning agenda to probe the role of blockchain in data collection, data management and data use at each stage of project implementation.   “Before you embark on that shiny blockchain project, you need to have a very clear idea of why you are using a blockchain.”  

Avoiding the Pointless Blockchain Project, Gideon Greenspan (2015)

Typically, “A learning agenda is a set of questions, assembled by an organization or team, that identifies what needs to be learned before a project can be planned and implemented.” The process of developing and finding answers to learning questions is most useful when it’s employed continuously throughout the duration of project implementation, so that changes can be made based on what is learned about changes in the project’s context, and to support the process of applying evidence to decision-making in adaptive management.

We explored various learning agenda questions for data collection, management and use that should continue to be developed and answered throughout the project cycle. However, because the content of a learning agenda is highly context-dependent, we focused on general themes. Examples of questions that might be asked by beneficiaries, implementing partners, donors, and host-country governments, include:

  • What could each of a project’s stakeholder groups gain from the use of blockchain across the stages of design and implementation, and, would the benefits of blockchain incentivize them to participate?
  • Can blockchain resolve trust or transparency issues between disparate stakeholder groups, e.g. to ensure that data reported represent reality, or that they are of sufficient quality for decision-making?
  • Are there less-expensive, more appropriate, or easier to execute, existing technologies that already meet each group’s MERL needs?
  • Are there unaddressed MERL management needs blockchain could help address, or capabilities blockchain offers that might inspire new and innovative thinking about what is done, and how it gets done?

This approach resonated with other MERL for development practitioners

We presented this approach to a diverse group of professionals at MERL Tech DC, including other MERL practitioners and IT support professionals, representing organizations from multilateral development banks to US-based NGOs. Facilitated as a participatory roundtable, the session participants discussed how MERL professionals could use learning agendas to help their organizations both decide whether blockchain is appropriate for intervention design, as well as guide learning during implementation to strengthen adaptive management.

Questions and issues raised by the session participants ranged widely, from how blockchain works, to expressing doubt that organizational leaders would have the risk appetite required to pilot blockchain when time and costs (financial and human resource) were unknown. Session participants demonstrated an intense interest in this topic and our approach. Our session ran over time and side conversations continued into the corridors long after the session had ended.

Next Steps

Our approach, as it turns out, echoes others in the field who question whether the benefits of blockchain add value above and beyond existing technologies, or accrue to stakeholders beyond the donors that fund them. This trio of practitioners will continue to explore ways MERL professionals can help their teams learn about the benefits of blockchain technology for international development. But, in the end, it may turn out that the real value of blockchain wasn’t the application of the technology itself, but rather as an impetus to question what we do, why we do it, and how we could do it better.

Creative Commons License
Blockchain for International Development: Using a Learning Agenda to Address Knowledge Gaps by John Burg, Christine Murphy, and Jean-Paul Petraud is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Using Real-Time Data to Improve International Development Programming

by Erica Gendell, Program Analyst at USAID; and Rebecca Saxton-Fox, ICT Policy Advisor at USAID

Real-time data applications in international development

There are a wide range of applications of real-time data in international development programs, including:

  • Gathering demographic and assessment data following trainings, in order to improve outputs and outreach for future trainings;
  • Tracking migration flows following natural disasters to understand population locations and best locate relief efforts;
  • Analyzing real-time disease outbreak data to understand where medical resources will be most effectively deployed; and
  • Analyzing of radio and social media to understand and adapt communication outreach.

Using digital tools (such as mobile phone based text messaging, web-based applications, social media platforms, etc.) or large digital datasets (such as satellite or cell phone tower data) for collecting real-time data helps programs and projects respond quickly to community needs or potentially changing circumstances on the ground. However, these digital tools and datasets are often not well understood or mapped into decision-making processes.

Real Example of Real-time Data

In USAID/Ghana’s ADVANCE II program, project staff implemented a smart card ID technology that collects and stores data in an effort to have more accurate monitoring and evaluation data on project beneficiaries. The ID cards allowed USAID and project officers to see real-time results and build more effective and targeted programming. ADVANCE II has been successful in providing unique beneficiary data for over 120,000 people who participated in 5,111 training sessions. This information enabled the project to increase the number of trainings tailored to female farmers, a previously underrepresented population in trainings. This is a great example of how to incorporate data use and digital tools into a project or activity.

Data to Action Framework

At MERL Tech DC, we presented the ADVANCE II project as a way to use the “Data to Action” Framework. This is one approach to map how information flows and how decisions are made across a set of stakeholders in a program. It can be used as a conversation tool to identify barriers to action. You can also use it to identify where digital tools could help move information to decision makers faster.

This framework is just one tool to start thinking about uses of real-time data to enable adaptive management in development programs.

USAID explores these and other topics in a newly released portfolio of research on Real-time Data for Adaptive Management (RTD4AM), which give insight into the barriers to real-time data use in development. We look forward to continuing to build the community of practice of adaptive management within the MERL community.

 

 

We Wrote the Book on Evaluation Failures. Literally.

by Isaac D. Castillo, Director of Outcomes, Assessment, and Learning at Venture Philanthropy Partners.

Evaluators don’t make mistakes.

Or do they?

Well, actually, they do. In fact, I’ve got a number of fantastic failures under my belt that turned into important learning opportunities. So, when I was asked to share my experience at the MERL Tech DC 2018 session on failure, I jumped at the chance.

Part of the Problem

As someone of Mexican descent, I am keenly aware of the problems that can arise when culturally and linguistically inappropriate evaluation practices are used. However, as a young evaluator, I was often part of the problem.

Early in my evaluation career, I was tasked with collecting data to determine why teenage youth became involved in gangs. In addition to developing the interview guides, I was also responsible for leading all of the on-site interviews in cities with large Latinx populations. Since I am Latinx, I had a sufficient grasp of Spanish to prepare the interview guides and conduct the interviews. I felt confident that I would be sensitive to all of the cultural and linguistic challenges to ensure an effective data collection process. Unfortunately, I had forgotten an important tenet of effective culturally competent evaluation: cultures and languages are not monolithic. Differences in regional cultures or dialects can lead even experienced evaluators into embarrassment, scorn, or the worst outcome of all: inaccurate data.

Sentate, Por Favor

For example, when first interacting with the gang members, I introduced myself and asked them to “Please sit down,” to start the interview by saying “Siéntate, por favor.” What I did not know at the time is that a large portion of the gang members I was interviewing were born in El Salvador or were of Salvadoran descent, and the accurate way to say it using Salvadoran Spanish would have been, “Sentate, por favor.”

Does one word make that much difference? In most cases it did not matter, but it caused several gang members to openly question my Spanish from the outset, which created an uncomfortable beginning to interviews about potentially sensitive subjects.

Amigo or Chero?

I next asked the gang members to think of their “friends.” In most dialects of Spanish, using amigos to ask about friends is accurate and proper. However, in the context of street slang, some gang members prefer the term chero, especially in informal contexts.

Again, was this a huge mistake? No. But it did lead to enough quizzical looks and requests for clarification that started to doubt if I was getting completely honest or accurate answers from some of the respondents. Unfortunately, this error did not arise until I had conducted nearly 30 interviews. I had not thought to test the wordings of the questions in multiple Spanish-speaking communities across several states.

Would You Like a Concha?

Perhaps my most memorable mistake during this evaluation occurred after I had completed an interview with a gang leader outside of a bakery. After we were done, the gang leader called over the rest of his gang to meet me. As I was meeting everyone, I glanced inside the bakery and noticed a type of Mexican pastry that I enjoyed as a child. I asked the gang leader if he would like to go inside and join me for a concha, a round pastry that looks like a shell. Everyone (except me) began to laugh hysterically. The gang leader then let me in on the joke. He understood that I was asking about the pan dulce (sweet bread), but he informed me that in his dialect, concha was used as a vulgar reference to female genitalia. This taught me a valuable lesson about how even casual references or language choices can be interpreted in many different ways.

What did I learn from this?

While I can look back on these mistakes and laugh, I am also reminded of the important lessons learned that I carry with me to this day.

  • Translate with the local context in mind. When translating materials
    or preparing for field work, get a detailed sense of who you will be collecting data from, including what cultures and subgroups people represent and whether or not there are specific topics or words that should be avoided.
  • Translate with the local population in mind. When developing data collection tools (in any language, even if you are fluent in it), take the time to pre-test the language in the tools.

Be okay with your inevitable mistakes. Recognize that no matter how much preparation you do, you will make mistakes in your data collection related to culture and language issues. Remember it is how you respond in those situations that is most important.

As far as failures like this go, it turns out I’m in good company. My story is one of 22 candid, real-life examples from seasoned evaluators that are included in Kylie Hutchinson’s new book, Evaluation Failures: 22 Tales of Mistakes Made and Lessons Learned. Entertaining and informative, I guarantee it will give you plenty of opportunities to reflect and learn.

3 Lessons Learned using Machine Learning to Measure Media Quality

by Samhir Vasdev, Technical Adviser for Digital Development at IREX’s Center for Applied Learning and Impact. The post 3 Lessons Learned using Machine Learning to Measure Media Quality appeared first on ICTworks.

Moving from hype to practice is an important but challenging step for ICT4D practitioners. As the technical adviser for digital development at IREX, a global development and education organization, I’ve been watching with cautious optimism as international development stakeholders begin to explore how artificial intelligence tools like machine learning can help them address problems and introduce efficiencies to amplify their impact.

So while USAID was developing their guide to making machine learning work for international development and TechChange rolled out their new course on Artificial Intelligence for International Development, we spent a few months this summer exploring whether we could put machine learning to work to measure media quality.

Of course, we didn’t turn to machine learning just for the sake of contributing to the “breathless commentary of ML proponents” (as USAID aptly puts it).

As we shared in a session with our artificial intelligence partner Lore at MERLTech DC 2018, some of our programs face a very real set of problems that could be alleviated through smarter use of digital tools.

Our Machine Learning Experiment

In our USAID-funded Media Strengthening Program in Mozambique, for example, a small team of human evaluators manually score thousands of news articles based on 18 measures of media quality.

This process is time consuming (some evaluators spend up to four hours a day reading and evaluating articles), inefficient (when staff turns over, we need to reinvest resources to train up new hires), and inconsistent (even well-trained evaluators might score articles differently).

To test whether we can make the process of measuring media quality less resource-intensive, we spent a few months training software to automatically detect one of these 18 measures of media quality: whether journalists keep their own opinions out of their news articles. The results of this experiment are very compelling:

  • The software had 95% accuracy in recognizing sentences containing opinions within the dataset of 1,200 articles.
  • The software’s ability to “learn” was evident. Anecdotally, the evaluation team noticed a marked improvement in the accuracy of the software’s suggestions after showing it only twenty sentences that had opinions. The accuracy, precision, and recall results highlighted above were achieved after only sixteen rounds of training the software.
  • Accuracy and precision increased the more that the model was trained. There is a clear relationship between the number of times the evaluators trained the software and the accuracy and precision of the results. The recall results did not improve over time as consistently.

These results, although promising, simplify some numbers and calculations. Check out our full report for details.

What does this all mean? Let’s start with the good news. The results suggest that some parts of media quality—specifically, whether an article is impartial or whether it echoes its author’s opinions—can be automatically measured by machine learning.

The software also introduces the possibility of unprecedented scale, scanning thousands of articles in seconds for this specific indicator. These implications introduce ways for media support programs to spend their limited resources more efficiently.

3 Lessons Learned from using Machine Learning

Of course, the machine learning experience was not without problems. With any cutting-edge technology, there will be lessons we can learn and share to improve everyone’s experience. Here are our three lessons learned working with machine learning:

1. Forget about being tech-literate; we need to be more problem-literate.

Defining a coherent, specific, actionable problem statement was one of the important steps of this experiment. This wasn’t easy. Hard trade-offs had to be made (Which of 18 indicators should we focus on?), and we had to focus on things we could measure in order to demonstrate efficiency games using this new approach (How much time do evaluators currently spend scoring articles?).

When planning your own machine learning project, devote plenty of time at the outset—together with your technology partner—to define the specific problem you’ll try to address. These conversations result in a deeper shared understanding of both the sector and the technology that will make the experiment more successful.

2. Take the time to communicate results effectively.

Since completing the experiment, people have asked me to explain how “accurate” the software is. But in practice, machine learning software uses different methods to define “accuracy”, which in turn can vary according to the specific model (the software we used deploys several models).

What starts off as a simple question (How accurate is our software?) can easily turn into a discussion of related concepts like precision, recall, false positives, and false negatives. We found that producing clean visuals (like this or this) became the most effective way to explain our results.

3. Start small and manage expectations.

Stakeholders with even a passing awareness of machine learning will be aware of its hype. Even now, some colleagues ask me how we “automated the entire media quality assessment process”—even though we only used machine learning to identify one of 18 indicators of media quality. To help mitigate inflated expectations, we invested a small amount into this “minimum viable product” (MVP) to prove the fundamental concept before expanding on it later.

Approaching your first machine learning project this way might help to keep expectations in line with reality, minimize risks associated with experimentation, and provide air cover for you to adjust your scope as you discover limitations or adjacent opportunities during the process.