Tag Archives: merltech

3 Lessons Learned using Machine Learning to Measure Media Quality

by Samhir Vasdev, Technical Adviser for Digital Development at IREX’s Center for Applied Learning and Impact. The post 3 Lessons Learned using Machine Learning to Measure Media Quality appeared first on ICTworks.

Moving from hype to practice is an important but challenging step for ICT4D practitioners. As the technical adviser for digital development at IREX, a global development and education organization, I’ve been watching with cautious optimism as international development stakeholders begin to explore how artificial intelligence tools like machine learning can help them address problems and introduce efficiencies to amplify their impact.

So while USAID was developing their guide to making machine learning work for international development and TechChange rolled out their new course on Artificial Intelligence for International Development, we spent a few months this summer exploring whether we could put machine learning to work to measure media quality.

Of course, we didn’t turn to machine learning just for the sake of contributing to the “breathless commentary of ML proponents” (as USAID aptly puts it).

As we shared in a session with our artificial intelligence partner Lore at MERLTech DC 2018, some of our programs face a very real set of problems that could be alleviated through smarter use of digital tools.

Our Machine Learning Experiment

In our USAID-funded Media Strengthening Program in Mozambique, for example, a small team of human evaluators manually score thousands of news articles based on 18 measures of media quality.

This process is time consuming (some evaluators spend up to four hours a day reading and evaluating articles), inefficient (when staff turns over, we need to reinvest resources to train up new hires), and inconsistent (even well-trained evaluators might score articles differently).

To test whether we can make the process of measuring media quality less resource-intensive, we spent a few months training software to automatically detect one of these 18 measures of media quality: whether journalists keep their own opinions out of their news articles. The results of this experiment are very compelling:

  • The software had 95% accuracy in recognizing sentences containing opinions within the dataset of 1,200 articles.
  • The software’s ability to “learn” was evident. Anecdotally, the evaluation team noticed a marked improvement in the accuracy of the software’s suggestions after showing it only twenty sentences that had opinions. The accuracy, precision, and recall results highlighted above were achieved after only sixteen rounds of training the software.
  • Accuracy and precision increased the more that the model was trained. There is a clear relationship between the number of times the evaluators trained the software and the accuracy and precision of the results. The recall results did not improve over time as consistently.

These results, although promising, simplify some numbers and calculations. Check out our full report for details.

What does this all mean? Let’s start with the good news. The results suggest that some parts of media quality—specifically, whether an article is impartial or whether it echoes its author’s opinions—can be automatically measured by machine learning.

The software also introduces the possibility of unprecedented scale, scanning thousands of articles in seconds for this specific indicator. These implications introduce ways for media support programs to spend their limited resources more efficiently.

3 Lessons Learned from using Machine Learning

Of course, the machine learning experience was not without problems. With any cutting-edge technology, there will be lessons we can learn and share to improve everyone’s experience. Here are our three lessons learned working with machine learning:

1. Forget about being tech-literate; we need to be more problem-literate.

Defining a coherent, specific, actionable problem statement was one of the important steps of this experiment. This wasn’t easy. Hard trade-offs had to be made (Which of 18 indicators should we focus on?), and we had to focus on things we could measure in order to demonstrate efficiency games using this new approach (How much time do evaluators currently spend scoring articles?).

When planning your own machine learning project, devote plenty of time at the outset—together with your technology partner—to define the specific problem you’ll try to address. These conversations result in a deeper shared understanding of both the sector and the technology that will make the experiment more successful.

2. Take the time to communicate results effectively.

Since completing the experiment, people have asked me to explain how “accurate” the software is. But in practice, machine learning software uses different methods to define “accuracy”, which in turn can vary according to the specific model (the software we used deploys several models).

What starts off as a simple question (How accurate is our software?) can easily turn into a discussion of related concepts like precision, recall, false positives, and false negatives. We found that producing clean visuals (like this or this) became the most effective way to explain our results.

3. Start small and manage expectations.

Stakeholders with even a passing awareness of machine learning will be aware of its hype. Even now, some colleagues ask me how we “automated the entire media quality assessment process”—even though we only used machine learning to identify one of 18 indicators of media quality. To help mitigate inflated expectations, we invested a small amount into this “minimum viable product” (MVP) to prove the fundamental concept before expanding on it later.

Approaching your first machine learning project this way might help to keep expectations in line with reality, minimize risks associated with experimentation, and provide air cover for you to adjust your scope as you discover limitations or adjacent opportunities during the process.

How does GlobalGiving tell whether it’s having an impact?

by Nick Hamlin, Data Scientist at Global Giving. This post was originally published here on October 1, 2018, titled “How Can We Tell if GlobalGiving is Making an Impact,” The full study can be found here.

Our team wanted to evaluate our impact, so we applied a new framework to find answers.


What We Tested

Every social organization, GlobalGiving included, needs to know if it’s having an impact on the communities it serves. For us, that means understanding the ways in which we are (or aren’t!) helping our nonprofit partners around the world improve their own effectiveness and capacity to create change, regardless of the type of work they do.

Why It Matters

Without this knowledge, social organizations can’t make informed decisions about the strategies to use to deliver their services. Unfortunately, this kind of rigorous impact evaluation is usually quite expensive and can take years to carry out. As a result, most organizations struggle to evaluate their impact.

We knew the challenges going into our own impact research would be substantial, but it was too important for us not to try.

The Big Question

Do organizations with access to GlobalGiving’s services improve their performance differently than organizations that don’t? Are there particular focus areas where GlobalGiving is having more of an impact than others?

Our Method

Ideally, we’d randomly assign certain organizations to receive the “treatment” of being part of GlobalGiving and then compare their performance with another randomly assigned control group. But, we can’t just tell random organizations that they aren’t allowed to be part of our community. So, instead we compared a treatment group—organizations that have completed the GlobalGiving vetting process and become full partners on the website—with a control group of organizations that have successfully passed the vetting process but haven’t joined the web community. Since we can’t choose these groups randomly, we had to ensure the organizations in each group are as similar as possible so that our results aren’t biased by underlying differences between the control and treatment groups.

To do this, we worked only with organizations based in India. We chose India because we have lots of relationships with organizations there, and we needed as large a sample size as possible to increase confidence that our conclusions are reliable. India is also well-suited for this study because it requires organizations to have special permission to receive funds from overseas under the Foreign Contribution Regulation Act (FCRA). Organizations must have strong operations in place to earn this permission. The fact that all participant organizations are established enough to earn both an FCRA certification and pass GlobalGiving’s own high vetting standards means that any differences in our results are unlikely to be caused by geographic or quality differences.

We also needed a way to measure nonprofit performance in a concrete way. For this, we used the “Organizational Performance Index” (OPI) framework created by Pact. The OPI provides a structured way to understand a nonprofit’s capacity along eight different categories, including its ability to deliver programs, the diversity of its funding sources, and its use of community feedback. The OPI scores organizations on a scale of 1 (lowest) to 4 (highest). With the help of a fantastic team of volunteers in India, we gathered two years of OPI data from both the treatment and control groups, then compared how their scores changed over time to get an initial indicator of GlobalGiving’s impact.

The Results

The most notable result we found was that organizations that were part of GlobalGiving demonstrated significantly more participatory planning and decision-making processes (what we call “community leadership”), and improved their use of stakeholder feedback to inform their work, in comparison to control group organizations. We did not see a similar significant result in the other seven categories that the OPI tracks. The easiest way to see this result is to visualize how organizations’ scores shifted over time. The chart below shows differences in target population scores—Pact’s wording for “community leadership and feedback.”

Differences in Target Population Score Changes

Differences in Target Population Score

For example, look at the organizations that started out with a score of two in the control group on the left. Roughly one third of those increased their score to three, one third stayed the same, and one third had their scores drop to one. In contrast, in the treatment group on the right, nearly half the organizations increased their scores and about half stayed the same, while only a tiny fraction dropped. You can see a similar pattern across the two groups regardless of their starting score.

In contrast, here’s the same diagram for another OPI category where we didn’t see a statistically significant difference between the two groups. There’s not nearly as clear a pattern—both the treatment and control organizations change their scores about the same amount.

Differences in Delivery Score Changes

GlobalGiving Impact Study Delivery Score Changes

For more technical details about our research design process, our statistical methodology, and the conclusions we’ve drawn, please check out the full write-up of this work, which is available on the Social Science Research Network.

The Ultimate Outcome

GlobalGiving spends lots of time focusing on helping organizations use feedback to become more community-led, because we believe that’s what delivers greater impact.

Our initial finding—that our emphasis on feedback is having a measurable impact—is an encouraging sign.

On the other hand, we didn’t see that GlobalGiving was driving significant changes in any of the other seven OPI categories. Some of these categories, like adherence to national or international standards, aren’t areas where GlobalGiving focuses much. Others, like how well an organization learns over time, are closely related to what we do (Listen, Act, Learn. Repeat. is one of our core values). We’ll need to continue to explore why we’re not seeing results in these areas and, if necessary, make adjustments to our programs accordingly.

Make It Yours

Putting together an impact study, even a smaller one like this, is a major undertaking for any organization. Many organizations talk about applying a more scientific approach to their impact, but few nonprofits or funders take on the challenge of carrying out the research needed to do so. This study demonstrates how organizations can make meaningful progress towards rigorously measuring impact, even without a decade of work and an eight-figure budget.

If your organization is considering something similar, here are a few suggestions to keep in mind that we’ve learned as a result of this project:

1. If you can’t randomize, make sure you consider possible biases.

    •  Logistics, processes, and ethics are all reasons why an organization might not be able to randomly assign treatment groups. If that’s the case for you, think carefully about the rest of your design and how you’ll reduce the chance that a result you see can be attributed to a different cause.

2. Choose a measurement framework that aligns with your theory of change and is precise as possible.

    •  We used the OPI because it was easy to understand, reliable, and well-accepted in the development sector. But, the OPI’s four-level scale made it difficult to make precise distinctions between organizations, and there were some categories that didn’t make sense in the context of how GlobalGiving works. These are areas we’ll look to improve in future versions of this work.

3. Get on the record. 

    • Creating a clear record of your study, both inside and outside your organization, is critical for avoiding “scope creep.” We used Git to keep track of all changes in our data, code, and written analysis, and shared our initial study design at the 2017 American Evaluation Association conference.

4. Enlist outside help. 

    This study would not have been possible without lots of extra help, from our volunteer team in India, to our friends at Pact, to the economists and data scientists who checked our math, particularly Alex Hughes at UC Berkeley and Ted Dunmire at Booz Allen Hamilton.

We’re pleased about what we’ve learned about GlobalGiving’s impact, where we can improve, and how we might build on this initial work, and we can’t wait to continue to build on this progress moving forward in service of improved outcomes for our nonprofit partners worldwide.

Find the full study here.

Blockchain: the ultimate solution?

by Ricardo Santana, MERL Practitioner

I had the opportunity during MERL Tech London 2018 to attend a very interesting session to discuss blockchains and how can they be applied in the MERL space. This session was led by Valentine Gandhi, Founder of The Development CAFÉ, Zara Rahman, Research and Team Lead at the The Engine Room, and Wayan Vota, Co-founder of Kurante.

The first part of the session was an introduction to blockchain, which is basically an distributed ledger system. Why is it an interesting solution? Because the geographically distributed traces left in multiple devices make for a very robust and secure system. It is not possible to take a unilateral decision to scrap or eliminate data because it would be reflected in the distributed constitution of the data chain. Is it possible to corrupt the system? Well, yes, but what makes it robust and secure is that for that to happen, every single person participating in the blockchain system must agree to do so.

That is the powerful innovation of the technology. It remains somehow to the torrents of technology to share files:  it is very hard to control this when your file storage is not in a single server but rather in an enormous number of end-user terminals.

What I want to share from this session, however, is not how the technology works! That information is readily available on the Internet and other sources.

What I really found interesting was the part of the session where professionals interested in blockchain shared our doubts and the questions that we would need to clarify in order to decide whether blockchain technology would be required or not.

Some of the most interesting shared doubts and concerns around this technology were:

What sources of training and other useful resources are available if you want to implement blockchain?

  • Say the organization or leadership team decides that a blockchain is required for the solution. I am pretty sure it is not hard to find information about blockchain on the Internet, but we all face the same problem — the enormous amount of information available makes it tricky to reach the holy grail that provides just enough information without losing hours to desktop research. It would be incredibly beneficial to have a suggested place where this info can be find, even more if it were a specialized guide aimed at the MERL space.

What are the data space constraints?

  • I found this question very important. It is a key aspect of the design and scalability of the solution. I assume that it will not be an important amount of data but I really don’t know. And maybe it is not a significant amount of information for a desktop or a laptop, but what if we are using cell phones as end terminals too? This need to be addressed so the design is based on facts and not assumptions.

Use cases.

  • Again, there are probably a lot of them to be found all over the Internet, but they are hardly going to be insightful for a specific MERL approach. Is it possible to have a repository of relevant cases for the MERL space?

When is blockchain really required?

  • It would be really helpful to have a simple guide that helps any professional clarify whether the volume or importance of the information is worth the implementation of a Blockchain system or not.

Is there a right to be forgotten in Blockchain?

  • Recent events give a special relevance to this question. Blockchains are very powerful to achieve traceability, but what if I want my information to be eliminated because it is simply my right? This is an important aspect in technologies that have a distributed logic. How to use the powerful advantages of blockchain while allocating the individual rights of every single person to take unilateral decisions on their private or personal information?

I am not an expert in the matter but I do recognize the importance of these questions and the hope is that the people able to address them can pick them up and provide useful answers and guidance to clarify some or all of them.

If you have answers to these questions, or more questions about blockchain and MERL, please add them in the comments!

If you’d like to be a part of discussions like this one, register to attend the next MERL Tech conference! MERL Tech Jozi is happening August 1-2, 2018 and we just opened up registration today! MERL Tech DC is coming up September 6-7. Today’s the last day to submit your session ideas, so hurry up and fill out the form if you have an idea to present or share!

 

 

MERL Tech London 2018 Agenda is out!

We’ve been working hard over the past several weeks to finish up the agenda for MERL Tech London 2018, and it’s now ready!

We’ve got workshops, panels, discussions, case studies, lightning talks, demos, community building, socializing, and an evening reception with a Fail Fest!

Topics range from mobile data collection, to organizational capacity, to learning and good practice for information systems, to data science approaches, to qualitative methods using mobile ethnography and video, to biometrics and blockchain, to data ethics and privacy and more.

You can search the agenda to find the topics, themes and tools that are most interesting, identify sessions that are most relevant to your organization’s size and approach, pick the session methodologies that you prefer (some of us like participatory and some of us like listening), and to learn more about the different speakers and facilitators and their work.

Tickets are going fast, so be sure to snap yours up before it’s too late! (Register here!)

View the MERL Tech London schedule & directory.

 

DataDay TV: MERL Tech Edition

What data superpower would you ask for? How would you describe data to your grandparents? What’s the worst use of data you’ve come across? 

These are a few of the questions that TechChange’s DataDay TV Show tackles in its latest episode.

The DataDay Team (Nick Martin, Samhir Vasdev, and Priyanka Pathak) traveled to MERL Tech DC last September to ask attendees some tough data-related questions. They came away with insightful, unusual, and occasionally funny answers….

If you’re a fan of discussing data, technology and MERL, join us at MERL Tech London on March 19th and 20th. 

Tickets are going fast, so be sure to register soon if you’d like to attend!

If you want to take your learning to the next level with a full-blown course, TechChange has a great 2018 schedule, including topics like blockchain, AI, digital health, data visualization, e-learning, and more. Check out their course catalog here.

What about you, what data superpower would you ask for?

 

M&E software – 8 Tips on How to Talk to IT Folks

.

You want to take your M&E system one step further and introduce a proper M&E software? That’s great, because a software has the potential of making the monitoring process more efficient and transparent, reducing errors and getting more accurate data. But how to go about it? You have three options:

  1. You build your own system, for example in Microsoft Excel;
  2. You purchase an M&E software package off-the-shelf;
  3. You hire an IT consultant to set up a customized M&E system according to your organization’s specific requirements.

If options one and two do not work out for you, you can hire consultants to develop a solution for you. You will probably start a public tender to find the most suitable IT company to entrust with this task. While there a lot of things to pay attention to when formulating the Terms of Reference (TOR), I would like to give you some tips specifically about the communication with the hired IT consultants. These insights come from years of experience of being on both sides: The party who wants a tool and needs to describe it to the implementing programmers and being the IT guy (or rather lady) who implements Excel and web-based database tools for M&E.

To be on the safe side, I recommend you to work with this assumption: IT consultants have no clue about M&E. There are few IT companies who come from the development sector, like energypedia consult does, and are familiar with M&E concepts such as indicators, logframes and impact chains. To still get what you need, you should pay attention to the following communication tips:

  1. Take your time explaining what you need: Writing TOR takes time – but it takes even longer and becomes more costly when you hire somebody for something that is not thought through. If you don’t know all the details right from the start, get some expert assistance in formulating terms – it’s worthwhile.
  2. Use graphs: Instead of using words to describe your monitoring logic and the system you need, it is much easier to make graphs to depict the structure, user groups, linking of information, flow of monitoring data etc.
  3. Give examples: When unsure about how to put a feature into words, send a link or a screenshot of the function that you might have come across elsewhere and wish to have in your tool.
  4. Explain concepts and terminology: Many results frameworks work with the terms “input” and “output”. Most IT guys, however, will not have equipment and finished schools in mind, but rather data flows that consist of inputs and outputs. Make sure you clarify this. Also, the term web-based or web monitoring itself is a source of misunderstanding. In the IT world, web monitoring refers to monitoring activity in the internet, for example website visits or monitoring a server. That is probably not what you want when building up an M&E system for e.g. a good governance programme.
  5. Meet in person: In your budget calculation, allow for at least one workshop where you meet in person, for example a kick-off workshop in which you clarify your requirements. This is not only a possibility to ask each other questions, but also to get a feeling of the other party’s language and way of thinking.
  6. Maintain a dialogue: During the implementation phase, make sure to stay in regular touch with the programmers. Ask them to show you updates every once in a while to allow you to give feedback. When you detect that the programmers are heading into the wrong direction, you want to find out rather sooner than later.
  7. Document communication: When we implement web-based systems, we typically create a page within the web platform itself that outlines all the agreed steps. This list serves as a to-do list and an implementation protocol at the same time. It facilitates communication, particularly when on both sides multiple persons are involved that are not always present in all meetings or phone calls.
  8. Be prepared for misunderstandings: They happen. It’s normal. Plan for some buffer days before launching the final tool.

In general, the implementation phase should allow for some flexibility. As both parties learn from each other during the process, you should not be afraid to adjust initial plans, because the final tool will benefit greatly from it (if the contract has some flexibility). Big customized IT projects take some time.

If you need more advice on this matter and some more insights on setting up IT-based M&E systems, please feel free to contact me any time! In the past we supported some clients by setting up a prototype for their web-based M&E system with our flexible WebMo approach. During the prototype process the client learnt a lot and afterwards it was quite easy for other developers to copy the prototype and migrate it to their e.g. Microsoft Share Point environment (in case your IT guys don’t believe in Open Source or don’t want to host third-party software on their server).

Please leave your comments, if you think that I have missed an important communication rule.

Good luck!

Qualitative Coding: From Low Tech to High Tech Options

by Daniel Ramirez-Raftree, MERL Tech volunteer

In their MERL Tech DC session on qualitative coding, Charles Guedenet and Anne Laesecke from IREX together with Danielle de Garcia of Social Impact offered an introduction to the qualitative coding process followed by a hands-on demonstration on using Excel and Dedoose for coding and analyzing text.

They began by defining content analysis as any effort to make sense of qualitative data that takes a volume of qualitative material and attempts to identify core consistencies and meanings. More concretely, it is a research method that uses a set of procedures to make valid inferences from text. They also shared their thoughts on what makes for a good qualitative coding method.

Their belief is that: it should

  • consider what is already known about the topic being explored
  • be logically grounded in this existing knowledge
  • use existing knowledge as a basis for looking for evidence in the text being analyzed

With this definition laid out, they moved to a discussion about the coding process where they elaborated on four general steps:

  1. develop codes and a codebook
  2. decide on a sampling plan
  3. code your data
  4. go back and do it again!
  5. test for reliability

Developing codes and a codebook is important for establishing consistency in the coding process, especially if there will be multiple coders working on the data. A good way to start developing these codes is to consider what is already known. For example, you can think about literature that exists on the subject you’re studying. Alternatively, you can simply turn to the research questions the project seeks to answer and use them as a guide for creating your codes. Beyond this, it is also useful to go through the content and think about what you notice as you read. Once a codebook is created, it will lend stability and some measure of objectivity to the project.

The next important issue is the question of sampling. When determining sample size, though a larger sample will yield more robust results, one must of course consider the practical constraints of time, cost and effort. Does the benefit of higher quality results justify the additional investment? Fortunately, the type of data will often inform sampling. For example, if there is a huge volume of data, it may be impossible to analyze it all, but it would be prudent to sample at least 30% of it. On the other hand, usually interview and focus group data will all be analyzed, because otherwise the effort of obtaining the data would have gone to waste.

Regarding sampling method, session leads highlighted two strategies that produce sound results. One is systematic random sampling and the other is quota sampling–a method employed to ensure that the proportions of demographic group data are fairly represented.

Once these key decisions have been made, the actual coding can begin. Here, all coders should work from the same codebook and apply the codes to the same unit of analysis. Typical units of analysis are: single words, themes, sentences, paragraphs, and items (such as articles, images, books, or programs). Consistency is essential. A way to test the level of consistency is to have a 10% overlap in the content each coder analyzes and aim for 80% agreement between their coding of that content. If the coders are not applying the same codes to the same units this could either mean that they are not trained properly or that the code book needs to be altered.

Along a similar vein, the fourth step in the coding process is to test for reliability. Challenges in producing stable and consistent results in coding could include: using a unit of analysis that is too large for a simple code to be reliably applied, coding themes or concepts that are ambiguous, and coding nonverbal items. For each of these, the central problem is that the units of analysis leave too much room for subjective interpretation that can introduce bias. Having a detailed codebook can help to mitigate against this.

After giving an overview of the coding process, the session leads suggested a few possible strategies for data visualization. One is to use a word tree, which helps one look at the context in which a word appears. Another is a bubble chart, which is useful if one has descriptive data and demographic information. Thirdly, correlation maps are good for showing what sorts of relationships exist among the data. The leads suggested visiting the website stephanieevergreen.com/blog for more ideas about data visualization.

Finally, the leads covered low-tech and high-tech options for coding. On the low-tech end of the spectrum, paper and pen get the job done. They are useful when there are few data sources to analyze, when the coding is simple, and when there is limited tech literacy among the coders. Next up the scale is Excel, which works when there are few data sources and when the coders are familiar with Excel. Then the session leads closed their presentation with a demonstration of Dedoose, which is a qualitative coding tool with advanced capabilities like the capacity to code audio and video files and specialized visualization tools. In addition to Dedoose, the presenters mentioned Nvivo and Atlas as other available qualitative coding software.

Despite the range of qualitative content available for analysis, there are a few core principles that can help ensure that it is analyzed well, these include consistency and disciplined methodology. And if qualitative coding will be an ongoing part of your organization’s operations, there are several options for specialized software that are available for you to explore. [Click here for links and additional resources from the session.]

Data quality in the age of lean data

by Daniel Ramirez-Raftree, MERL Tech support team.

Evolving data collection methods call for evolving quality assurance methods. In their session titled Data Quality in the Age of Lean Data, Sam Schueth of Intermedia, Woubedle Alemayehu of Oxford Policy Management, Julie Peachey of the Progress out of Poverty Index, and Christina Villella of MEASURE Evaluation discussed problems, solutions, and ethics related to digital data collection methods. [Bios and background materials here]

Sam opened the conversation by comparing the quality assurance and control challenges in paper assisted personal interviewing (PAPI) to those in digital assisted personal interviewing (DAPI). Across both methods, the fundamental problem is that the data that is delivered is a black box. It comes in, it’s turned into numbers and it’s disseminated, but in this process alone there is no easily apparent information about what actually happened on the ground.

During the age of PAPI, this was dealt with by sending independent quality control teams to the field to review the paper questionnaire that was administered and perform spot checks by visiting random homes to validate data accuracy. Under DAPI, the quality control process becomes remote. Survey administrators can now schedule survey sessions to be recorded automatically and without the interviewer’s knowledge, thus effectively gathering a random sample of interviews that can give them a sense of how well the sessions were conducted. Additionally, it is now possible to use GPS to track the interviewers’ movements and verify the range of households visited. The key point here is that with some creativity, new technological capacities can be used to ensure higher data quality.

Woubedle presented next and elaborated on the theme of quality control for DAPI. She brought up the point that data quality checks can be automated, but that this requires pre-survey-implementation decisions about what indicators to monitor and how to manage the data. The amount of work that is put into programming this upfront design has a direct relationship on the ultimate data quality.

One useful tool is a progress indicator. Here, one collects information on trends such as the number of surveys attempted compared to those completed. Processing this data could lead to further questions about whether there is a pattern in the populations that did or did not complete the survey, thus alerting researchers to potential bias. Additionally, one can calculate the average time taken to complete a survey and use it to identify outliers that took too little or too long to finish. Another good practice is to embed consistency checks in the survey itself; for example, making certain questions required or including two questions that, if answered in a particular way, would be logically contradictory, thus signaling a problem in either the question design or the survey responses. One more practice could be to apply constraints to the survey, depending on the households one is working with.

After this discussion, Julie spoke about research that was done to assess the quality of different methods for measuring the Progress out of Poverty Index (PPI). She began by explaining that the PPI is a household level poverty measurement tool unique to each country. To create it, the answers to 10 questions about a household’s characteristics and asset ownership are scored to compute the likelihood that the household is living below the poverty line. It is a simple, yet effective method to evaluate household level poverty. The research project Julie described set out to determine if the process of collecting data to create the PPI could be made less expensive by using SMS, IVR or phone calls.

Grameen Foundation conducted the study and tested four survey methods for gathering data: 1) in-person and at home, 2) in-person and away from home, 3) in-person and over the phone, and 4) automated and over the phone. Further, it randomized key aspects of the study, including the interview method and the enumerator.

Ultimately, Grameen Foundation determined that the interview method does affect completion rates, responses to questions, and the resulting estimated poverty rates. However, the differences in estimated poverty rates was likely not due to the method itself, but rather to completion rates (which were affected by the method). Thus, as long as completion rates don’t differ significantly, neither will the results. Given that the in-person at home and in-person away from home surveys had similar completion rates (84% and 91% respectively), either could be feasibly used with little deviation in output. On the other hand, in-person over the phone surveys had a 60% completion rate and automated over the phone surveys had a 12% completion rate, making both methods fairly problematic. And with this understanding, developers of the PPI have an evidence-based sense of the quality of their data.

This case study illustrates the the possibility of testing data quality before any changes are made to collection methods, which is a powerful strategy for minimizing the use of low quality data.

Christina closed the session with a presentation on ethics in data collection. She spoke about digital health data ethics in particular, which is the intersection of public health ethics, clinical ethics, and information systems security. She grounded her discussion in MEASURE Evaluation’s experience thinking through ethical problems, which include: the vulnerability of devices where data is collected and stored, the privacy and confidentiality of the data on these devices, the effect of interoperability on privacy, data loss if the device is damaged, and the possibility of wastefully collecting unnecessary data.

To explore these issues, MEASURE conducted a landscape assessment in Kenya and Tanzania and analyzed peer reviewed research to identify key themes for ethics. Five themes emerged: 1) legal frameworks and the need for laws, 2) institutional structures to oversee implementation and enforcement, 3) information systems security knowledge (especially for countries that may not have the expertise), 4) knowledge of the context and users (are clients comfortable with their data being used?), and 5) incorporating tools and standard operating procedures.

Based in this framework, MEASURE has made progress towards rolling out tools that can help institute a stronger ethics infrastructure. They’ve been developing guidelines that countries can use to develop policies, building health informatic capacity through a university course, and working with countries to strengthen their health information systems governance structures.

Finally, Christina explained her take on how ethics are related to data quality. In her view, it comes down to trust. If a device is lost, this may lead to incomplete data. If the clients are mistrustful, this could lead to inaccurate data. If a health worker is unable to check or clean data, this could create a lack of confidence. Each of these risks can lead to the erosion of data integrity.

Register for MERL Tech London, March 19-20th 2018! Session ideas due November 10th.

MERL Tech and the World of ICT Social Entrepreneurs (WISE)

by Dale Hill, an economist/evaluator with over 35 years experience in development and humanitarian work. Dale led the session on “The growing world of ICT Social Entrepreneurs (WISE): Is social Impact significant?” at MERL Tech DC 2018.

Roger Nathanial Ashby of OpenWise and Christopher Robert of Dobility share experiences at MERL Tech.
Roger Nathanial Ashby of OpenWise and Christopher Robert of Dobility share experiences at MERL Tech.

What happens when evaluators trying to build bridges with new private sector actors meet real social entrepreneurs? A new appreciation for the dynamic “World of ICT Social Entrepreneurs (WISE)” and the challenges they face in marketing, pricing, and financing (not to mention measurement of social impact.)

During this MERL Tech session on WISE, Dale Hill, evaluation consultant, presented grant funded research on measurement of social impact of social entrepreneurship ventures (SEVs) from three perspectives. She then invited five ICT company CEOs to comment.

The three perspectives are:

  • the public: How to hold companies accountable, particularly if they have chosen to be legal or certified “benefit corporations”?
  • the social entrepreneurs, who are plenty occupied trying to reach financial sustainability or profit goals, while also serving the public good; and
  • evaluators, who see the important influence of these new actors, but know their professional tools need adaptation to capture their impact.

Dale’s introduction covered overlapping definitions of various categories of SEVs, including legally defined “benefit corporations”, and “B Corps”, which are intertwined with the options of certification available to social entrepreneurs. The “new middle” of SEVs are on a spectrum between for-profit companies on one end and not-for profit organizations on the other. Various types of funders, including social impact investors, new certification agencies, and monitoring and evaluation (M&E) professionals, are now interested in measuring the growing social impact of these enterprises. A show of hands revealed that representatives of most of these types of actors were present at the session.

The five social entrepreneur panelists all had ICT businesses with global reach, but they varied in legal and certification status and the number of years operating (1 to 11). All aimed to deploy new technologies to non-profit organizations or social sector agencies on high value, low price terms. Some had worked in non-profits in the past and hoped that venture capital rather than grant funding would prove easier to obtain. Others had worked for Government and observed the need for customized solutions, which required market incentives to fully develop.

The evaluator and CEO panelists’ identification of challenges converged in some cases:

  • maintaining affordability and quality when using market pricing
  • obtaining venture capital or other financing
  • worry over “mission drift” – if financial sustainability imperatives or shareholder profit maximization preferences prevail over founders’ social impact goals; and
  • the still present digital divide, when serving global customers (insufficient bandwidth, affordability issues, limited small business capital in some client countries.

New issues raised by the CEOs (and some social entrepreneurs in the audience) included:

  • the need to provide incentives to customers to use quality assurance or security features of software, to avoid falling short of achieving the SEV’s “public good” goals;
  • the possibility of hostile takeover, given high value of technological innovations;
  • the fact that mention of a “social impact goal” was a red flag to some funders who then went elsewhere to seek profit maximization.

There was also a rich discussion on the benefits and costs of obtaining certification: it was a useful “branding and market signal” to some consumers, but a negative one to some funders; also, it posed an added burden on managers to document and report social impact, sometimes according to guidelines not in line with their preferences.

Surprises?

a) Despite the “hype”, social impact investment funding proved elusive to the panelists. Options for them included: sliding scale pricing; establishment of a complementary for-profit arm; or debt financing;

b) Many firms were not yet implementing planned monitoring and evaluation (M&E) programs, despite M&E being one of their service offerings; and

c) The legislation on reporting social impact of benefit corporations among the 31 states varies considerably, and the degree of enforcement is not clear.

A conclusion for evaluators: Social entrepreneurs’ use of market solutions indeed provides an evolving, dynamic environment which poses more complex challenges for measuring social impact, and requires new criteria and tools, ideally timed with an understanding of market ups and downs, and developed with full participation of the business managers.

Discrete choice experiment (DCE) to generate weights for a multidimensional index

In his MERL Tech Lightning Talk, Simone Lombardini, Global Impact Evaluation Adviser, Oxfam, discussed his experience with an innovative method for applying tech to help determine appropriate metrics for measuring concepts that escape easy definition. To frame his talk, he referenced Oxfam’s recent experience with using discrete choice experiments (DCE) to establish a strategy for measuring women’s empowerment.

Two methods already exist, Simone points out, for transforming soft concepts into hard metrics. First, the evaluator could assume full authority and responsibility over defining the metrics. Alternatively, the evaluator could design the evaluation so that relevant stakeholders are incorporated into the process and use their input to help define the metrics.

Though both methods are common, they are missing (for practical reasons) the level of mass input that could make them truly accurate reflections of the social perception of whatever concept is being considered. Tech has a role to play in scaling the quantity of input that can be collected. If used correctly, this could lead to better evaluation metrics.

Simone described this approach as “context-specific” and “multi-dimensional.” The process starts by defining the relevant characteristics (such as those found in empowered women) in their social context, then translating these characteristics into indicators, and finally combining indicators into one empowerment index for evaluating the project.

After the characteristics are defined, a discrete choice experiment can be used to determine its “weight” in a particular social context. A discrete choice experiment (DCE) is a technique that’s frequently been used in health economics and marketing, but not much in impact evaluation. To implement a DCE, researchers present different hypothetical scenarios to respondents and ask them to decide which one they consider to best reflect the concept in question (i.e. women’s empowerment). The responses are used to assess the indicators covered by the DCE, and these can then be used to develop an empowerment index.

This process was integrated into data collection process and added 10 mins at the end of a one hour survey, and was made practicable due to the ubiquity of smartphones. The results from Oxfam’s trial run using this method are still being analyzed. For more on this, watch Lombardini’s video below!