by Erica Gendell, Program Analyst at USAID; and Rebecca Saxton-Fox, ICT Policy Advisor at USAID
Real-time data applications in international development
There are a wide range of applications of real-time data in international development programs, including:
Gathering demographic and assessment data following trainings, in order to improve outputs and outreach for future trainings;
Tracking migration flows following natural disasters to understand population locations and best locate relief efforts;
Analyzing real-time disease outbreak data to understand where medical resources will be most effectively deployed; and
Analyzing of radio and social media to understand and adapt communication outreach.
Using digital tools (such as mobile phone based text messaging, web-based applications, social media platforms, etc.) or large digital datasets (such as satellite or cell phone tower data) for collecting real-time data helps programs and projects respond quickly to community needs or potentially changing circumstances on the ground. However, these digital tools and datasets are often not well understood or mapped into decision-making processes.
Real Example of Real-time Data
In USAID/Ghana’s ADVANCE II program, project staff implemented a smart card ID technology that collects and stores data in an effort to have more accurate monitoring and evaluation data on project beneficiaries. The ID cards allowed USAID and project officers to see real-time results and build more effective and targeted programming. ADVANCE II has been successful in providing unique beneficiary data for over 120,000 people who participated in 5,111 training sessions. This information enabled the project to increase the number of trainings tailored to female farmers, a previously underrepresented population in trainings. This is a great example of how to incorporate data use and digital tools into a project or activity.
Data to Action Framework
At MERL Tech DC, we presented the ADVANCE II project as a way to use the “Data to Action” Framework. This is one approach to map how information flows and how decisions are made across a set of stakeholders in a program. It can be used as a conversation tool to identify barriers to action. You can also use it to identify where digital tools could help move information to decision makers faster.
This framework is just one tool to start thinking about uses of real-time data to enable adaptive management in development programs.
USAID explores these and other topics in a newly released portfolio of research on Real-time Data for Adaptive Management (RTD4AM), which give insight into the barriers to real-time data use in development. We look forward to continuing to build the community of practice of adaptive management within the MERL community.
by Isaac D. Castillo, Director of Outcomes, Assessment, and Learning at Venture Philanthropy Partners.
Evaluators don’t make mistakes.
Or do they?
Well, actually, they do. In fact, I’ve got a number of fantastic failures under my belt that turned into important learning opportunities. So, when I was asked to share my experience at the MERL Tech DC 2018 session on failure, I jumped at the chance.
Part of the Problem
As someone of Mexican descent, I am keenly aware of the problems that can arise when culturally and linguistically inappropriate evaluation practices are used. However, as a young evaluator, I was often part of the problem.
Early in my evaluation career, I was tasked with collecting data to determine why teenage youth became involved in gangs. In addition to developing the interview guides, I was also responsible for leading all of the on-site interviews in cities with large Latinx populations. Since I am Latinx, I had a sufficient grasp of Spanish to prepare the interview guides and conduct the interviews. I felt confident that I would be sensitive to all of the cultural and linguistic challenges to ensure an effective data collection process. Unfortunately, I had forgotten an important tenet of effective culturally competent evaluation: cultures and languages are not monolithic. Differences in regional cultures or dialects can lead even experienced evaluators into embarrassment, scorn, or the worst outcome of all: inaccurate data.
Sentate, Por Favor
For example, when first interacting with the gang members, I introduced myself and asked them to “Please sit down,” to start the interview by saying “Siéntate, por favor.” What I did not know at the time is that a large portion of the gang members I was interviewing were born in El Salvador or were of Salvadoran descent, and the accurate way to say it using Salvadoran Spanish would have been, “Sentate, por favor.”
Does one word make that much difference? In most cases it did not matter, but it caused several gang members to openly question my Spanish from the outset, which created an uncomfortable beginning to interviews about potentially sensitive subjects.
Amigo or Chero?
I next asked the gang members to think of their “friends.” In most dialects of Spanish, using amigos to ask about friends is accurate and proper. However, in the context of street slang, some gang members prefer the term chero, especially in informal contexts.
Again, was this a huge mistake? No. But it did lead to enough quizzical looks and requests for clarification that started to doubt if I was getting completely honest or accurate answers from some of the respondents. Unfortunately, this error did not arise until I had conducted nearly 30 interviews. I had not thought to test the wordings of the questions in multiple Spanish-speaking communities across several states.
Would You Like a Concha?
Perhaps my most memorable mistake during this evaluation occurred after I had completed an interview with a gang leader outside of a bakery. After we were done, the gang leader called over the rest of his gang to meet me. As I was meeting everyone, I glanced inside the bakery and noticed a type of Mexican pastry that I enjoyed as a child. I asked the gang leader if he would like to go inside and join me for a concha, a round pastry that looks like a shell. Everyone (except me) began to laugh hysterically. The gang leader then let me in on the joke. He understood that I was asking about the pan dulce (sweet bread), but he informed me that in his dialect, concha was used as a vulgar reference to female genitalia. This taught me a valuable lesson about how even casual references or language choices can be interpreted in many different ways.
What did I learn from this?
While I can look back on these mistakes and laugh, I am also reminded of the important lessons learned that I carry with me to this day.
Translate with the local context in mind. When translating materials
or preparing for field work, get a detailed sense of who you will be collecting data from, including what cultures and subgroups people represent and whether or not there are specific topics or words that should be avoided.
Translate with the local population in mind. When developing data collection tools (in any language, even if you are fluent in it), take the time to pre-test the language in the tools.
Be okay with your inevitable mistakes. Recognize that no matter how much preparation you do, you will make mistakes in your data collection related to culture and language issues. Remember it is how you respond in those situations that is most important.
Moving from hype to practice is an important but challenging step for ICT4D practitioners. As the technical adviser for digital development at IREX, a global development and education organization, I’ve been watching with cautious optimism as international development stakeholders begin to explore how artificial intelligence tools like machine learning can help them address problems and introduce efficiencies to amplify their impact.
So while USAID was developing theirguide to making machine learning work for international development and TechChange rolled out theirnew courseon Artificial Intelligence for International Development, we spent a few months this summer exploring whether we could put machine learning to work to measure media quality.
Of course, we didn’t turn to machine learning just for the sake of contributing to the “breathless commentary of ML proponents” (as USAID aptly puts it).
As we shared in asessionwith our artificial intelligence partnerLoreatMERLTech DC 2018, some of our programs face a very real set of problems that could be alleviated through smarter use of digital tools.
Our Machine Learning Experiment
In our USAID-funded Media Strengthening Program in Mozambique, for example, a small team of human evaluators manually score thousands of news articles based on18 measures of media quality.
This process is time consuming (some evaluators spend up to four hours a day reading and evaluating articles), inefficient (when staff turns over, we need to reinvest resources to train up new hires), and inconsistent (even well-trained evaluators might score articles differently).
To test whether we can make the process of measuring media quality less resource-intensive, wespent a few monthstraining software to automatically detect one of these 18 measures of media quality: whether journalists keep their own opinions out of their news articles. The results of this experiment are very compelling:
The software had 95% accuracyin recognizing sentences containing opinions within the dataset of 1,200 articles.
The software’s ability to “learn” was evident. Anecdotally, the evaluation team noticed a marked improvement in the accuracy of the software’s suggestions after showing it only twenty sentences that had opinions. The accuracy, precision, and recall results highlighted above were achieved after only sixteen rounds of training the software.
Accuracy and precision increasedthe more that the model was trained. There is a clear relationship between the number of times the evaluators trained the software and the accuracy and precision of the results. The recall results did not improve over time as consistently.
What does this all mean? Let’s start with the good news. The results suggest that some parts of media quality—specifically, whether an article is impartial or whether it echoes its author’s opinions—can be automatically measured by machine learning.
The software also introduces the possibility of unprecedented scale, scanning thousands of articles in seconds for this specific indicator. These implications introduce ways for media support programs to spend their limited resources more efficiently.
3 Lessons Learned from using Machine Learning
Of course, the machine learning experience was not without problems. With any cutting-edge technology, there will be lessons we can learn and share to improve everyone’s experience. Here are our three lessons learned working with machine learning:
1. Forget about being tech-literate; we need to be more problem-literate.
Defining a coherent, specific, actionable problem statement was one of the important steps of this experiment. This wasn’t easy. Hard trade-offs had to be made (Which of 18 indicators should we focus on?), and we had to focus on things we could measure in order to demonstrate efficiency games using this new approach (How much time do evaluators currently spend scoring articles?).
When planning your own machine learning project, devote plenty of time at the outset—together with your technology partner—to define the specific problem you’ll try to address. These conversations result in a deeper shared understanding of both the sector and the technology that will make the experiment more successful.
2. Take the time to communicate results effectively.
Since completing the experiment, people have asked me to explain how “accurate” the software is. But in practice, machine learning software uses different methods to define “accuracy”, which in turn can vary according to the specific model (the software we used deploys several models).
What starts off as a simple question (How accurate is our software?) can easily turn into a discussion of related concepts like precision, recall, false positives, and false negatives. We found that producing clean visuals (like this or this) became the most effective way to explain our results.
3. Start small and manage expectations.
Stakeholders with even a passing awareness of machine learning will be aware of its hype. Even now, some colleagues ask me how we “automated the entire media quality assessment process”—even though we only used machine learning to identify one of 18 indicators of media quality. To help mitigate inflated expectations, we invested a small amount into this “minimum viable product” (MVP) to prove the fundamental concept before expanding on it later.
Approaching your first machine learning project this way might help to keep expectations in line with reality, minimize risks associated with experimentation, and provide air cover for you to adjust your scope as you discover limitations or adjacent opportunities during the process.
Our team wanted to evaluate our impact, so we applied a new framework to find answers.
What We Tested
Every social organization, GlobalGiving included, needs to know if it’s having an impact on the communities it serves. For us, that means understanding the ways in which we are (or aren’t!) helping our nonprofit partners around the world improve their own effectiveness and capacity to create change, regardless of the type of work they do.
Why It Matters
Without this knowledge, social organizations can’t make informed decisions about the strategies to use to deliver their services. Unfortunately, this kind of rigorous impact evaluation is usually quite expensive and can take years to carry out. As a result, most organizations struggle to evaluate their impact.
We knew the challenges going into our own impact research would be substantial, but it was too important for us not to try.
The Big Question
Do organizations with access to GlobalGiving’s services improve their performance differently than organizations that don’t? Are there particular focus areas where GlobalGiving is having more of an impact than others?
Ideally, we’d randomly assign certain organizations to receive the “treatment” of being part of GlobalGiving and then compare their performance with another randomly assigned control group. But, we can’t just tell random organizations that they aren’t allowed to be part of our community. So, instead we compared a treatment group—organizations that have completed the GlobalGiving vetting process and become full partners on the website—with a control group of organizations that have successfully passed the vetting process but haven’t joined the web community. Since we can’t choose these groups randomly, we had to ensure the organizations in each group are as similar as possible so that our results aren’t biased by underlying differences between the control and treatment groups.
To do this, we worked only with organizations based in India. We chose India because we have lots of relationships with organizations there, and we needed as large a sample size as possible to increase confidence that our conclusions are reliable. India is also well-suited for this study because it requires organizations to have special permission to receive funds from overseas under the Foreign Contribution Regulation Act (FCRA). Organizations must have strong operations in place to earn this permission. The fact that all participant organizations are established enough to earn both an FCRA certification and pass GlobalGiving’s own high vetting standards means that any differences in our results are unlikely to be caused by geographic or quality differences.
We also needed a way to measure nonprofit performance in a concrete way. For this, we used the “Organizational Performance Index” (OPI) framework created by Pact. The OPI provides a structured way to understand a nonprofit’s capacity along eight different categories, including its ability to deliver programs, the diversity of its funding sources, and its use of community feedback. The OPI scores organizations on a scale of 1 (lowest) to 4 (highest). With the help of a fantastic team of volunteers in India, we gathered two years of OPI data from both the treatment and control groups, then compared how their scores changed over time to get an initial indicator of GlobalGiving’s impact.
The most notable result we found was that organizations that were part of GlobalGiving demonstrated significantly more participatory planning and decision-making processes (what we call “community leadership”), and improved their use of stakeholder feedback to inform their work, in comparison to control group organizations. We did not see a similar significant result in the other seven categories that the OPI tracks. The easiest way to see this result is to visualize how organizations’ scores shifted over time. The chart below shows differences in target population scores—Pact’s wording for “community leadership and feedback.”
Differences in Target Population Score Changes
For example, look at the organizations that started out with a score of two in the control group on the left. Roughly one third of those increased their score to three, one third stayed the same, and one third had their scores drop to one. In contrast, in the treatment group on the right, nearly half the organizations increased their scores and about half stayed the same, while only a tiny fraction dropped. You can see a similar pattern across the two groups regardless of their starting score.
In contrast, here’s the same diagram for another OPI category where we didn’t see a statistically significant difference between the two groups. There’s not nearly as clear a pattern—both the treatment and control organizations change their scores about the same amount.
Differences in Delivery Score Changes
For more technical details about our research design process, our statistical methodology, and the conclusions we’ve drawn, please check out the full write-up of this work, which is available on the Social Science Research Network.
Our initial finding—that our emphasis on feedback is having a measurable impact—is an encouraging sign.
On the other hand, we didn’t see that GlobalGiving was driving significant changes in any of the other seven OPI categories. Some of these categories, like adherence to national or international standards, aren’t areas where GlobalGiving focuses much. Others, like how well an organization learns over time, are closely related to what we do (Listen, Act, Learn. Repeat. is one of our core values). We’ll need to continue to explore why we’re not seeing results in these areas and, if necessary, make adjustments to our programs accordingly.
Make It Yours
Putting together an impact study, even a smaller one like this, is a major undertaking for any organization. Many organizations talk about applying a more scientific approach to their impact, but few nonprofits or funders take on the challenge of carrying out the research needed to do so. This study demonstrates how organizations can make meaningful progress towards rigorously measuring impact, even without a decade of work and an eight-figure budget.
If your organization is considering something similar, here are a few suggestions to keep in mind that we’ve learned as a result of this project:
1. If you can’t randomize, make sure you consider possible biases.
Logistics, processes, and ethics are all reasons why an organization might not be able to randomly assign treatment groups. If that’s the case for you, think carefully about the rest of your design and how you’ll reduce the chance that a result you see can be attributed to a different cause.
2. Choose a measurement framework that aligns with your theory of change and is precise as possible.
We used the OPI because it was easy to understand, reliable, and well-accepted in the development sector. But, the OPI’s four-level scale made it difficult to make precise distinctions between organizations, and there were some categories that didn’t make sense in the context of how GlobalGiving works. These are areas we’ll look to improve in future versions of this work.
3. Get on the record.
Creating a clear record of your study, both inside and outside your organization, is critical for avoiding “scope creep.” We used Git to keep track of all changes in our data, code, and written analysis, and shared our initial study design at the 2017 American Evaluation Association conference.
4. Enlist outside help.
This study would not have been possible without lots of extra help, from our volunteer team in India, to our friends at Pact, to the economists and data scientists who checked our math, particularly Alex Hughes at UC Berkeley and Ted Dunmire at Booz Allen Hamilton.
We’re pleased about what we’ve learned about GlobalGiving’s impact, where we can improve, and how we might build on this initial work, and we can’t wait to continue to build on this progress moving forward in service of improved outcomes for our nonprofit partners worldwide.
Digitization is everywhere! Digital technologies and data have changed the way we engage with each other and how we work. We cannot escape the effects of digitization. Whether in our personal capacity — how our own data is being used — or in our professional capacity, in terms of understanding how to use data and technology. These changes are exciting! But we also need to consider the challenges they present to the MERL community and their impact on development.
The advent and proliferation of big data has the potential to change how evaluations are conducted. New skills are needed to process and analyse big data. Mathematics, statistics and analytical skills will be ever more important. As evaluators, we need to be discerning about the data we use. In a world of copious amounts of data, we need to ensure we have the ability to select the right data to answer our evaluation questions.
We also have an ethical and moral duty to manage data responsibly. We need new strategies and tools to guide the ways in which we collect, store, use and report data. Evaluators need to improve our skills as related to processing and analysing data. Evaluative thinking in the digital age is evolving and we need to consider the technical and soft skills required to maintain integrity of the data and interpretation thereof.
Though technology can make data collection faster and cheaper, two important considerations are access to technology by vulnerable groups and data integrity. Women, girls and people in rural areas normally do not have the same levels of access to technology as men and boys This impacts on our ability to rely solely on technology to collect data from these population groups, because we need to be aware of inclusion, bias and representativity. Equally we need to consider how to maintain the quality of data being collected through new technologies such as mobile phones and to understand how the use of new devices might change or alter how people respond.
In a rapidly changing world where technologies such as AI, Blockchain, Internet of Things, drones and machine learning are on the horizon, evaluators need to be robust and agile in how we change and adapt.
For this reason, a new strand has been introduced at the African Evaluation Association (AfrEA) conference, taking place from 11 – 15 March 2019 in Abidjan, Cote d’Ivoire. This stream, The Fourth Industrial Revolution and its Impact on Development: Implications for Evaluation, will focus on five sub-themes:
Guide to Industry 4.0 and Next Generation Tech
Talent and Skills in Industry 4.0
Changing World of Work
Evaluating youth programmes in Industry 4.0
Genesis Analytics will be curating this strand. We are excited to invite experts working in digital development and practitioners at the forefront of technological innovation for development and evaluation to submit abstracts for this strand.
by Stacey Berlow, Managing Partner at Project Balance and Jana Melpolder, MERL Tech DC Volunteer and Communications Manager at Inveneo. Find Jana on Twitter: @JanaMelpolder
At MERL Tech DC 2018, Project Balance’s Stacey Berlow led a session titled “Application Maintenance Isn’t Sexy, But Critical to Success.” In her session and presentation, she outlined several reasons why software maintenance planning and funding is essential to the sustainability of an M&E software solution.
The problems that arise with software or applications go well beyond day-to-day care and management. A foundational study on software maintenance by P. Lientz and E. Burton  looked at the activities of 487 IT orgs and found that maintenance activities can be broken down into four types:
Corrective (bug fixing),
Adaptive (impacts due to changes outside the system),
Perfective (enhancements), and
Preventive (monitoring and optimization)
The table below outlines the percentage of time IT departments spend on the different types of maintenance. Note that most of the time dedicated to maintenance is not defect fixing (corrective), but enhancing (perfecting) the tool or system.
Corrective (Total: 21.7%)
Emergency fixes: 12.4%
Routine debugging: 9.3%
Adaptive (Total: 23.6%)
Changes to data inputs and files: 17.4%
Changes to hardware and system software: 6.2%
Perfective (Total: 51.3%)
Customer enhancements: 41.8%
Improvements to documentation: 5.5%
Other (Total: 3.4%)
The study also pointed out some of the most common maintenance problems:
Poor quality application system documentation
Excessive demand from customers
Competing demands for maintenance personnel time
Inadequate training of user personnel
Turnover in the user organizations
Does Your Project Need Innovations or Just Maintenance?
Organizations often prioritize innovation over maintenance. They have a list of enhancing strategies or improvements they want to make, and they’ll start new projects when what they should really be focusing on is maintenance. International development organizations often want to develop new software with the latest technology — they want NEW software for their projects. In reality, what is usually needed is software maintenance and enhancement of an existing product.
Moreover, when an organization is considering adopting a new piece of software, it’s absolutely vital that it think about the cost of maintenance in addition to the cost of development. Experts estimate that the cost of maintenance can vary from 40%-90% of the original build cost . Maintenance costs a lot more than many organizations realize.
It’s also not easy to know beforehand or to estimate what the actual cost of maintenance will be. Creating a Service Level Agreement (SLA), which specifies the time required to respond to issues or deploy enhancements as part of a maintenance contract, is vital to having a handle on the human resources, price levels and estimated costs of maintenance.
As Stacey emphasizes, “Open Source does not mean ‘free’. Updates to DHIS2 versions, Open MRS, Open HIE, Drupal, WordPress, and more WILL require maintenance to custom code.”
It’s All About the Teamwork
Another point to consider when it comes to the cost of maintenance for your app or software is the time and money spent on staff. Members of your team will not always be well-versed in a certain type of software. Also, when transferring a software asset to a funder or ministry/government entity, consider the skill level of the receiving team as well as the time availability of team members. Many software products cannot be well maintained by teams that not involved in developing them. As a result, they often fall into disrepair and become unusable. A software vendor may be better equipped to monitor and respond to issues than the team.
What Can You Do?
So what are effective ways to ensure the sustainability of software tools? There’s a few strategies you can use. First of all, ensure that your IT staff members are involved in the planning of your project or organization’s RFP process. They will give you valuable metrics on efforts and cost, right up front, so that you can secure funding. Second, scale down the size of your project so that your tool budget matches your funds. Consider what the minimum software functionality is that you need, and enhance the tools later. Third, invite the right stakeholders and IT staff members to meetings and conference calls as soon as the project begins. Having the right people on board early on will make a huge difference in how you manage and transition software to country stakeholders later at the end of the project!
The session at MERL Tech ended with a discussion of the incredible need and value of involving local skills and IT experts as part of the programming team. Local knowledge and IT expertise is one of the most important, if not the most important, pieces of the application maintenance puzzle. One of the key ideas I learned was that application maintenance should start at the local level and grow from there. Local IT personnel will be able to answer many technical questions and address many maintenance issues. Furthermore, IT staff members from international development agencies will be able to learn from local IT experts as well, giving a boost in the capacity of all staff members across the board.
Application maintenance may not be the most interesting part of an international development project, but it is certainly one of the most vital to help ensure the project’s success and ongoing sustainability.
Written by Jana Melpolder, MERL Tech DC Volunteer and former ICT Works Editor. Find Jana on Twitter: @JanaMelpolder
As organizations grow, they become increasingly aware of how important MERL (Monitoring, Evaluation, Research, and Learning) is to their international development programs. To meet this challenge, new hires need to be brought on board, but more importantly, changes need to happen in the organization’s culture.
How can nonprofits and organizations change to include more MERL? Friday afternoon’s MERL Tech DC session “Creating a MERL Culture at Your Nonprofit” set out to answer that question. Representatives from Salesforce.org and Samaschool.org were part of the discussion.
Salesforce.org staff members Eric Barela and Morgan Buras-Finlay emphasized that their organization has set aside resources (financial and otherwise) for international and external M&E. “A MERL culture is the foundation for the effective use of technology!” shared Eric Barela.
Data is a vital part of MERL, but those providing it to organizations often need to “hold the hands” of those on the receiving end. What is especially vital is helping people understand this data and gain deeper insight from it. It’s not just about the numbers – it’s about what is meant by those numbers and how people can learn and improve using the data.
According to Salesforce.org, an organization’s MERL culture is comprised of its understanding of the benefit of defining, measuring, understanding, and learning for social impact with rigor. And building or maintaining a MERL culture doesn’t just mean letting the data team do whatever they like or being the ones in charge. Instead, it’s vital to focus on outcomes. Salesforce.org discussed how its MERL staff prioritize keeping a foot in the door in many places and meeting often with people from different departments.
Where does technology fit into all of this? According to Salesforce.org, the push is on keep the technology ethical. Morgan Buras-Finlay described it well, saying “technology goes from building a useful tool to a tool that will actually be used.”
Another participant on Friday’s panel was Samaschool’s Director of Impact, Kosar Jahani. Samaschool describes itself as a San Francisco-based nonprofit focused on preparing low-income populations to succeed as independent workers. The organization has “brought together a passionate group of social entrepreneurs and educators who are reimagining workforce development for the 21st century.”
Samaschool creates a MERL culture through Learning Calls for their different audiences and funders. These Learning Calls are done regularly, they have a clear agenda, and sometimes they even happen openly on Facebook LIVE.
By ensuring a high level of transparency, Samasource is also aiming to create a culture of accountability where it can learn from failures as well as successes. By using social media, doors are opened and people have an easier time gaining access to information that otherwise would have been difficult to obtain.
Kosar explained a few negative aspects of this kind of transparency, saying that there is a risk to putting information in such a public place to view. It can lead to lost future investment. However, the organization feels this has helped build relationships and enhanced interactions.
Sadly, flight delays prevented a third organization. Big Elephant Studios and its founder Andrew Means from attending MERL Tech. Luckily, his slides were presented by Eric Barela. Andrew’s slides highlighted the following three things that are needed to create a MERL Culture:
Tools – investments in tools that help an organization acquire, access, and analyze the data it needs to make informed decisions
Processes – Investments in time to focus on utilizing data and supporting decision making
Culture – Organizational values that ensure that data is invested in, utilized, and listened to
One of Andrew’s main points was that generally, people really do want to gain insight and learn from data. The other members of the panel reiterated this as well.
A few lingering questions from the audience included:
How do you measure how culture is changing within an organization?
How does one determine if an organization’s culture is more focused on MERL that previously?
Which social media platforms and strategies can be used to create a MERL culture that provides transparency to clients, funders, and other stakeholders?
What about you? How do you create and measure the “MERL Culture” in your organization?
MERL Tech DC kicked off with a pre-conference workshop on September 5th that focused on what the Blockchain is and how it could influence MEL.
The workshop was broken into four parts: 1) blockchain 101, 2) how the blockchain is influencing and could influence MEL, 3) case studies to demonstrate early lessons learned, and 4) outstanding issues and emerging themes.
This blog focuses and builds on the fourth area. At the end, we provide additional resources that will be helpful to all interested in exploring how the blockchain could disrupt and impact international development at large.
Workshop Takeaways and Afterthoughts
For our purposes here, we have distilled some of the key takeaways from the workshop. This section includes a series of questions that we will respond to and link to various related reference materials.
Who are the main blockchain providers and what are they offering?
Any time a new “innovation” is introduced into the international development space, potential users lack knowledge about what the innovation is, the value it can add, and the costs of implementing it. This lack of knowledge opens the door for “snake oil salesmen” who engage in predatory attempts to sell their services to users who don’t have the knowledge to make informed decisions.
We’ve seen this phenomenon play out with blockchain. Take, for example, the numerous Initial Coin Offerings (ICO’s) that defrauded their investors, or the many instances of service providers offering low quality blockchain education trainings and/or project solutions.
Education is the best defense against being taken advantage of by snake oil salesmen. If you’re looking for general education about blockchain, we’ve included a collection of helpful tools in the table below. If your group is working to determine whether a blockchain solution is right for the problem at hand, the USAID Blockchain Primer offers easy to use decision trees that can help you. Beyond these, Mercy Corp has just published Block by Block, which outlines the attributes of various distributed ledgers along some very helpful lines that are useful when considering what distributed ledger technology to use.
Words of warning aside, there are agencies that provide genuine blockchain solutions. For a full list of providers please visit www.blockchainomics.tech, an information database run by The Development CAFE on all things blockchain.
Bottom Line: Beware the snake oil salesmen preaching the benefits of blockchain but silent on the feasibility of their solution. Unless the service provider is just as focused on your problem as you are, be wary that they are just trying to pitch a solution (viable or not) and not solve the problem. Before approaching the companies or service providers, always identify your problem and see if Blockchain is indeed a viable solutions.
How does governance of the blockchain influence its sustainability?
In the past, we’ve seen technology-led social impact solutions make initial gains that diminished over time until there is no sustained impact. Current evidence shows that many solutions of this sort fail because they are not designed to solve a specific problem in a relevant ecosystem. This insight has given rise to the Digital Development Principles and the Ethical Considerations that should be taken into account for blockchain solutions.
Bottom Line: Impact is achieved and sustained by the people who use a tool. Thus, blockchain, as a tool, does not sustain impacts on its own. People do so by applying knowledge about the principles and ethics needed for impact. Understanding this, our next step is to generate more customized principles and ethical considerations for blockchain solutions through case studies and other desperately needed research.
How do the blockchain, big data, and Artificial Intelligence influence each other?
The blockchain is a new type of distributed ledger system that could have massive social implications. Big Data refers to the exponential increase in data we experience through the Internet of Things (IoT) and other data sources (Smart Infrastructure, etc.). Artificial Intelligence (AI) assists in identifying and analyzing this new data at exponentially faster rates than is currently the case.
Blockchain is a distributed ledger, in essence, a database of transactions, just like any other database, it’s a repository, and it is contributing to the growth of Big Data. AI can be used to automate the process of data entry into the blockchain. This is how the three are connected.
The blockchain is considered a leading contender as the ledger of choice for big data because: 1) due to its distributed nature it can handle much larger amounts of data in a more secure fashion than is currently possible with cloud computing, and 2) it is possible to automate the way big data is uploaded to the blockchain. AI tools are easily integrated into blockchain functions to run searches and analyze data, and this opens up the capacity to collect, analyze and report findings on big data in a transparent and secure manner more efficiently than ever before.
Bit by Bit is a very readable and innovative overview of how to conduct social science research in the digital age of big data, artificial intelligence and the blockchain. It gives the reader a quality introduction into some of the dominant themes and issues to consider when attempting to evaluate either a technology lead solution or use technology to conduct social research.
Given its immutability, how can an adaptive management system work with the blockchain?
This is a critical point. The blockchain is an immutable record, it is almost impossible (meaning it has never been done and there are no simulations where current technology is able to take control of a properly designed blockchain) to hijack, hack, or alter. Thus the blockchain provides the security needed to mitigate corruption and facilitate audits.
This immutability does not mitigate any type of adaptive management approach, however. Adaptive Management requires small iterative course corrections informed by quality data around what is and is not working. This data record and the course corrections provide a rich data set that is extremely valuable to replication efforts because they subvert the main barrier to replication — lack of data on what does and does not work. Hence in this case the immutability of the blockchain is a value add to Adaptive Management. This is more of a question of good adaptive management practices rather than whether the blockchain is a viable tool for these purposes.
It is important to note that you can append information on blocks (not amend), so there will always be a record of previous mistakes (auditability), but the most recent layer of truth is what’s being viewed/queried/verified, etc. Hence, immutability is not a hurdle but a help.
What are the first steps an organization should take when deciding on whether to adopt a blockchain solution?
Each problem that an organization faces is unique, but the following simple steps can help one make a decision:
Identify your problem (using tools such as Developmental Evaluation or Principles of Digital Development)
Understand the blockchain technology, concepts, functionality, requirements and cost
See if your problem can be solved by blockchain rather than a centralized database
Consider the advantages and disadvantages
Identify the right provider and work with them in developing the blockchain
Consider ethical principles and privacy concerns as well as other social inequalities
Deploy in pilot phases and evaluate the results using an agile approach
What can be done to protect PII and other sensitive information on a blockchain?
Blockchain uses cryptography to store its data. That PII and other information cannot be viewed by anyone other than those who have access to the ‘keys’. While developing a blockchain, it’s important to ensure that what goes in is protected and that access to is regulated. Another critical step is promoting literacy on the use of blockchain and its features among stakeholders.
References Correlated to Take Aways
This table organizes current reference materials as related to the main questions we discussed in the workshop. (The question is in the left hand column and the reference material with a brief explanation and hyperlink is in the right hand column).
Resources and Considerations
Who are the main blockchain platforms? Who are the providers and what are they offering?
IBM, ConsenSys, Microsoft, AWS, Cognizant, R3, and others, are offering products and enterprise solutions.
Block by Block is a valuable comparison tool for assessing various platforms.
How does governance of the blockchain influence its sustainability?
See Beeck Center’s Blockchain Ethical Design Framework. Decentralization (how many nodes), equity amongst nodes, rules, transparency are all factors in long-term sustainability. Likewise the Principles for Digital Development have a lot of evidence behind them for their contributions to sustainability.
How do the blockchain, big data and Artificial Intelligence influence each other?
They can be combined in various ways to strengthen a particular service or product. There is no blanket approach, just as there is not blanket solution to any social impact problem. The key is to know the root cause of the problem at hand and how the function of each tool used separately and in conjunction can address these root causes.
Given its immutability, how can an adaptive management system work with the blockchain?
Ask how mistakes are corrected when creating a customized solution, or purchasing a product. Usually, there will be a way to do that, through an easy to use, user interface.
What are the first steps an organization should take when they are deciding on whether to adopt a blockchain solution?
Participate in demos, and test some of the solutions for your own purposes or use cases. Use the USAID Blockchain Primer and reach out to trusted experts to provide advice. Given that the blockchain is primarily open source code, once you have decided that a blockchain is a viable solution for your problem, GitHub is full of open source code that you can modify for your own purposes.
by Zach Tilton, a Peacebuilding Evaluation Consultant and a Doctoral Research Associate at the Interdisciplinary PhD in Evaluation program at Western Michigan University.
In 2013 Dan Airley quipped“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it….” In 2015 the metaphor was imported to the international development sector by Ben Ramalingam, in 2016 it became a MERL Tech DC lightning talk, and has been ringing in our ears ever since. So, what about 2018? Well, unlike US national trends in teenage sex, there are some signals that big or at least‘bigger’ data is continuing to make its way not only into the realm of digital development, but also evaluation. I recently attended the 2018 MERL Tech DC pre-conference workshop Big Data and Evaluation where participants were introduced to real ways practitioners are putting this trope to bed(sorry, not sorry). In this blog post I share some key conversations from the workshop framed against the ethics of using this new technology, but to do that let me first provide some background.
I entered the workshop on my heels. Given the recent spate of security breaches and revelations about micro-targeting,‘Big Data’ has been somewhat of a boogie-man for myself and others. I have taken some pains to limit my digital data-footprint, have written passionately about big data and surveillance capitalism, and have long been skeptical of big data applications for serving marginalized populations in digital development and peacebuilding. As I found my seat before the workshop started I thought,“Is it appropriate or ethical to use big data for development evaluation?” My mind caught hold of a 2008 Evaluation Café debate between evaluation giants Michael Scriven and Tom Cook on causal inference in evaluation and the ethics of Randomized Control Trials. After hearing Scriven’s concerns about the ethics of withholding interventions from control groups, Cook asks,“But what about the ethics of not doing randomized experiments?” He continues,“What about the ethics of having causal information that is in fact based on weaker evidence and is wrong? When this happens, you carry on for years and years with practices that don’t work whose warrant lies in studies that are logically weaker than experiments provide.”
While I sided with Scriven for most of that debate, this question haunted me. It reminded me of an explanation of structural violence by peace researcher Johan Galtung who writes,“If a person died from tuberculosis in the eighteenth century it would be hard to conceive of this as violence since it might have been quite unavoidable, but if he dies from it today, despite all the medical resources in the world, then violence is present according to our definition.” Galtung’s intellectual work on violence deals with the difference between potential and the actual realizations and what increases that difference. While there are real issues with data responsibility, algorithmic biases, and automated discrimination that need to be addressed, if there are actually existing technologies and resources not being used to address social and material inequities in the world today, is this unethical, even violent?“What about the ethics of not using big data?” I asked myself back. The following are highlights of the actually existing resources for using big data in the evaluation of social amelioration.
Actually Existing Data
During the workshop, Kerry Bruce from Social Impact shared with participants her personal mantra,“We need to do a better job of secondary data analysis before we collect any more primary data.” She challenged us to consider how to make use of the secondary data available to our organizations. She gave examples of potential big data sources such as satellite images, remote sensors, GPS location data, social media, internet searches, call-in radio programs, biometrics, administrative data and integrated data platforms that merge many secondary data files such as public records and social service agency and client files. The key here is there are a ton of actually existing data, many of which are collected passively, digitally, and longitudinally. Despite noting real limitations to accessing existing secondary data, including donor reluctance to fund such work, limited training in appropriate methodologies in research teams, and differences in data availability between contexts, to underscore the potential of using secondary data, she shared a case study where she lead a team to use large amounts of secondary indirect data to identify ecosystems of modern day slavery at a significantly reduced cost than collecting the data first-hand. The outputs of this work will help pinpoint interventions and guide further research into the factors that may lead to predicting and prescribing what works well for stopping people from becoming victims of slavery.
Actually Existing Tech(and math)
Peter York from BCT Partners provided a primer on big data and data science including the reality-check that most of the work is the unsexy “ETL,” or the extraction, transformation, and loading of data. He contextualized the potential of the so-called big data revolution by reminding participants that the V’s of big data, Velocity, Volume, and Variety, are made possible by the technological and social infrastructure of increasingly networked populations and how these digital connections enable the monitoring, capturing, and tracking of ever increasing aspects of our lives in an unprecedented way. He shared,“A lot of what we’ve done in research were hacks because we couldn’t reach entire populations.” With advances in the tech stacks and infrastructure that connect people and their internet-connected devices with each other and the cloud, the utility of inferential statistics and experimental design lessens when entire populations of users are producing observational behavior data. When this occurs, evaluators can apply machine learning to discover the naturally occurring experiments in big data sets, what Peter terms‘Data-driven Quasi-Experimental Design.’ This is exactly what Peter does when he builds causal models to predict and prescribe better programs for child welfare and juvenile justice to automate outcome evaluation, taking cues from precision medicine.
One example of a naturally occurring experiment was the 1854 Broad Street cholera outbreak in which physician John Snow used a dot map to identify a pattern that revealed the source of the outbreak, the Broad Street water pump. By finding patterns in the data John Snow was able to lay the groundwork for rejecting the false Miasma Theory and replace it with a proto-typical Germ Theory. And although he was already skeptical of miasma theory, by using the data to inform his theory-building he was also practicing a form of proto-typical Grounded Theory. Grounded theory is simply building theory inductively, after data collection and analysis, not before, resulting in theory that is grounded in data. Peter explained,“Machine learning is Grounded Theory on steroids. Once we’ve built the theory, found the pattern by machine learning, we can go back and let the machine learning test the theory.” In effect, machine learning is like having a million John Snows to pour over data to find the naturally occurring experiments or patterns in the maps of reality that are big data.
A key aspect of the value of applying machine learning in big data is that patterns more readily present themselves in datasets that are‘wide’ as opposed to‘tall.’ Peter continued,“If you are used to datasets you are thinking in rows. However, traditional statistical models break down with more features, or more columns.” So, Peter and evaluators like him that are applying data science to their evaluative practice are evolving from traditional Frequentist to Bayesian statistical approaches. While there is more to the distinction here, the latter uses prior knowledge, or degrees of belief, to determine the probability of success, where the former does not. This distinction is significant for evaluators who are wanting to move beyond predictive correlation to prescriptive evaluation. Peter expounded,“Prescriptive analytics is figuring out what will best work for each case or situation.” For example, with prediction, we can make statements that a foster child with certain attributes is 70% not likely to find a home. Using the same data points with prescriptive analytics we can find 30 children that are similar to that foster child and find out what they did to find a permanent home. In a way, only using predictive analytics can cause us to surrender while including prescriptive analytics can cause us to endeavor.
The last category of existing resources for applying big data for evaluation was mostly captured by the comments of independent evaluation consultant, Michael Bamberger. He spoke of the latent capacity that existed in evaluation professionals and teams, but that we’re not taking full advantage of big data: “Big data is being used by development agencies, but less by evaluators in these agencies. Evaluators don’t use big data, so there is a big gap.”
He outlined two scenarios for the future of evaluation in this new wave of data analytics: a state of divergence where evaluators are replaced by big data analysts and a state of convergence where evaluators develop a literacy with the principles of big data for their evaluative practice. One problematic consideration with this hypothetical is that many data scientists are not interested in causation, as Peter York noted. To move toward the future of convergence, he shared how big data can enhance the evaluation cycle from appraisal and planning through monitoring, reporting and evaluating sustainability. Michael went on to share a series of caveats emptor that include issues with extractive versus inclusive uses of big data, the fallacy of large numbers, data quality control, and different perspectives on theory, all of which could warrant their own blog posts for development evaluation.
While I deepened my basic understandings of data analytics including the tools and techniques, benefits and challenges, and guidelines for big data and evaluation, my biggest take away is reconsidering big data for social good by considering the ethical dilemma of not using existing data, tech, and capacity to improve development programs, possibly even prescribing specific interventions by identifying their probable efficacy through predictive models before they are deployed.
The MERL Tech Conference explores the intersection of Monitoring, Evaluation, Research and Learning (MERL) and technology. The main goals of “MERL Tech” as an initiative are to:
Transform and modernize MERL in an intentionally responsible and inclusive way
Promote ethical and appropriate use of tech (for MERL and more broadly)
Encourage diversity & inclusion in the sector & its approaches
Improve development, tech, data & MERL literacy
Build/strengthen community, convene, help people talk to each other
Help people find and use evidence & good practices
Provide a platform for hard and honest talks about MERL and tech and the wider sector
Spot trends and future-scope for the sector
Our fifth MERL Tech DC conference took place on September 6-7, 2018, with a day of pre-workshops on September 5th. Some 300 people from 160 organizations joined us for the 2-days, and another 70 people attended the pre-workshops.
Attendees came from a wide diversity of professions and disciplines:
An unofficial estimate on speaker racial and gender diversity is here.
Building bridges, connections, community, and capacity
Sharing experiences, examples, challenges, and good practice
Strengthening the evidence base on MERL Tech and ICT4D approaches
Facing our challenges and shortcomings
Exploring the future of MERL
As always, sessions were related to: technology for MERL, MERL of ICT4D and Digital Development programs, MERL of MERL Tech, digital data for adaptive decisions/management, ethical and responsible data approaches and cross-disciplinary community building.
Sessions included plenaries, lightning talks and breakout sessions. You can find a list of sessions here, including any presentations that have been shared by speakers and session leads. (Go to the agenda and click on the session of interest. If we have received a copy of the presentation, there will be a link to it in the session description).
One topic that we explored more in-depth over the two days was the need to get better at measuring ourselves and understanding both the impact of technology on MERL (the MERL of MERL Tech) and the impact of technology overall on development and societies.
As Anahi Ayala Iacucci said in her opening talk — “let’s think less about what technology can do for development, and more about what technology does to development.” As another person put it, “We assume that access to tech is a good thing and immediately helps development outcomes — but do we have evidence of that?”
Some 17.5% of participants filled out our post-conference feedback survey, and 70% of them rated their experience either “awesome” or “good”. Another 7% of participants rated individual sessions through the “Sched” app, with an average session satisfaction rating of 8.8 out of 10.
Topics that survey respondents suggested for next time include: more basic tracks and more advanced tracks, more sessions relating to ethics and responsible data and a greater focus on accountability in the sector. Read the full Feedback Report here!
What’s next? State of the Field Research!
In order to arrive at an updated sense of where the field of technology-enabled MERL is, a small team of us is planning to conduct some research over the next year. At our opening session, we did a little crowdsourcing to gather input and ideas about what the most pressing questions are for the “MERL Tech” sector.
We’ll be keeping you informed here on the blog about this research and welcome any further input or support! We’ll also be sharing more about individual sessions here.