Tag Archives: MERL

Evaluating ICT4D projects against the Digital Principles

By Laura Walker McDonald,  This post was originally published on the Digital Impact Alliance’s Blog on March 29, 2018.

As I have written about elsewhere, we need more evidence of what works and what doesn’t in the ICT4D and tech for social change spaces – and we need to hold ourselves to account more thoroughly and share what we know so that all of our work improves. We should be examining how well a particular channel, tool or platform works in a given scenario or domain; how it contributes to development goals in combination with other channels and tools; how the team selected and deployed it; whether it is a better choice than not using technology or using a different sort of technology; and whether or not it is sustainable.

At SIMLab, we developed our Framework for Monitoring and Evaluation of Technology in Social Change projects to help implementers to better measure the impact of their work. It offers resources towards a minimum standard of best practice which implementers can use or work toward, including on how to design and conduct evaluations. With the support of the Digital Impact Alliance (DIAL), the resource is now finalized and we have added new evaluation criteria based on the Principles for Digital Development.

Last week at MERL Tech London, DIAL was able to formally launch this product by sharing a 2-page summary available at the event and engaging attendees in a conversation about how it could be used. At the event, we joined over 100 organizations to discuss Monitoring, Evaluation, Research and Learning related to technology used for social good.

Why evaluate?

Evaluations provide snapshots of the ongoing activity and the progress of a project at a specific point in time, based on systematic and objective review against certain criteria. They may inform future funding and program design; adjust current program design; or to gather evidence to establish whether a particular approach is useful. They can be used to examine how, and how far, technology contributes to wider programmatic goals. If set up well, your program should already have evaluation criteria and research questions defined, well before it’s time to commission the evaluation.

Evaluation criteria provide a useful frame for an evaluation, bringing in an external logic that might go beyond the questions that implementers and their management have about the project (such as ‘did our partnerships on the ground work effectively?’ or ‘how did this specific event in the host country affect operations?’) to incorporate policy and best practice questions about, for example, protection of target populations, risk management, and sustainability. The criteria for an evaluation could be any set of questions that draw on an organization’s mission, values, principles for action; industry standards or other best practice guidance; or other thoughtful ideas of what ‘good’ looks like for that project or organization. Efforts like the Principles for Digital Development can set useful standards for good practice, and could be used as evaluation criteria.

Evaluating our work, and sharing learning, is radical – and critically important

While the potential for technology to improve the lives of vulnerable people around the world is clear, it is also evident that these improvements are not keeping pace with the advances in the sector. Understanding why requires looking critically at our work and holding ourselves to account. There is still insufficient evidence of the contribution technology makes to social change work. What evidence there is often is not shared or the analysis doesn’t get to the core issues. Even more important, the learnings from what has not worked and why have not been documented and absorbed.

Technology-enabled interventions succeed or fail based on their sustainability, business models, data practices, choice of communications channel and technology platform; organizational change, risk models, and user support – among many other factors. We need to build and examine evidence that considers these issues and that tells us what has been successful, what has failed, and why. Holding ourselves to account against standards like the Principles is a great way to improve our practice, and honor our commitment to the people we seek to help through our work.

Using the Digital Principles as evaluation criteria

The Principles for Digital Development are a set of living guidance intended to help practitioners succeed in applying technology to development programs. They were developed, based on some pre-existing frameworks, by a working group of practitioners and are now hosted by the Digital Impact Alliance.

These nine principles could also form a useful set of evaluation criteria, not unlike OECD evaluation criteria, or Sphere standards. Principles overlap, so data can be used to examine more than one criterion, and ot every evaluation would need to consider all of the Digital Principles.

Below are some examples of Digital Principles and sample questions that could initiate, or contribute to, an evaluation.

Design with the User: Great projects are designed with input from the stakeholders and users who are central to the intended change. How far did the team design the project with its users, based on their current tools, workflows, needs and habits, and work from clear theories of change and adaptive processes?

Understand the Existing Ecosystem: Great projects and programs are built, managed, and owned with consideration given to the local ecosystem. How far did the project work to understand the local, technology and broader global ecosystem in which the project is situated? Did it build on existing projects and platforms rather than duplicating effort? Did the project work sensitively within its ecosystem, being conscious of its potential influence and sharing information and learning?

Build for Sustainability: Great projects factor in the physical, human, and financial resources that will be necessary for long-term sustainability. How far did the project: 1) think through the business model, ensuring that the value for money and incentives are in place not only during the funded period but afterwards, and 2) ensure that long-term financial investments in critical elements like system maintenance and support, capacity building, and monitoring and evaluation are in place? Did the team consider whether there was an appropriate local partner to work through, hand over to, or support the development of, such as a local business or government department?

Be Data Driven: Great projects fully leverage data, where appropriate, to support project planning and decision-making. How far did the project use real-time data to make decisions, use open data standards wherever possible, and collect and use data responsibly according to international norms and standards?

Use Open Standards, Open Data, Open Source, and Open Innovation: Great projects make appropriate choices, based on the circumstances and the sensitivity of their project and its data, about how far to use open standards, open the project’s data, use open source tools and share new innovations openly. How far did the project: 1) take an informed and thoughtful approach to openness, thinking it through in the context of the theory of change and considering risk and reward, 2) communicate about what being open means for the project, and 3) use and manage data responsibly according to international norms and standards?

For a more complete set of guidance, see the complete Framework for Monitoring and Evaluating Technology, and the more nuanced and in-depth guidance on the Principles, available on the Digital Principles website.

Digital Data Collection and the Maturing of a MERL Technology

by Christopher Robert, CEO of Dobility (Survey CTO). This post was originally published on March 15, 2018, on the Survey CTO blog.

Digital data collection: stakeholders and complex relationships

Needs, markets, and innovation combine to produce technological change. This is as true in the international development sector as it is anywhere else. And within that sector, it’s as true in the broad category of MERL (monitoring and evaluation, research, and learning) technologies as it is in the narrower sub-category of digital data collection technologies. Here, I’ll consider the recent history of digital data collection technology as an example of MERL technology maturation – and as an example, more broadly, of the importance of market structure in shaping the evolution of a technology.

My basic observation is that, as digital data collection technology has matured, the same stakeholders have been involved – but the market structure has changed their relative power and influence over time. And it has been these very changes in power and influence that have changed the cost and nature of the technology itself.

First, when it comes to digital data collection in the development context, who are the stakeholders?

  • Donors. These are the primary actors who fund development work, evaluation of development policies and programs, and related research. There are mega-actors like USAID, Gates, and the UN agencies, but also many other charities, philanthropies, and public or nonprofit actors, from Catholic Charities to the U.S. Centers for Disease Control and Prevention.
  • Developers. These are the designers and software engineers involved in producing technology in the space. Some are students or university faculty, some are consultants, many work full-time for nonprofits or businesses in the space. (While some work on open-source initiatives in a voluntary capacity, that seems quite uncommon in practice. The vast majority of developers working on open-source projects in the space get paid for that work.)
  • Consultants and consulting agencies.These are the technologists and other specialists who help research and program teams use technology in the space. For example, they might help to set up servers and program digital survey instruments.
  • Researchers. These are the folks who do the more rigorous research or impact evaluations, generally applying social-science training in public health, economics, agriculture, or other related fields.
  • M&E professionals.These are the people responsible for program monitoring and evaluation. They are most often part of an implementing program team, but it’s also not uncommon to share more centralized (and specialized) M&E teams across programs or conduct outside evaluations that more fully separate some M&E activities from the implementing program team.
  • IT professionals.These are the people responsible for information technology within those organizations implementing international development programs and/or carrying out MERL activities.
  • Program beneficiaries. These are the end beneficiaries meant to be aided by international development policies and programs. The vast majority of MERL activities are ultimately concerned with learning about these beneficiaries.

Digital data collection stakeholders

These different stakeholders have different needs and preferences, and the market for digital data collection technologies has changed over time – privileging different stakeholders in different ways. Two distinct stages seem clear, and a third is coming into focus:

  1. The early days of donor-driven pilots and open source. These were the days of one-offs, building-your-own, and “pilotitis,” where donors and developers were effectively in charge and there was a costly additional layer of technical consultants between the donors/developers and the researchers and M&E professionals who had actual needs in the field. Costs were high, and some combination of donor and developer preferences reigned supreme.
  2. Intensifying competition in program-adopted professional products.Over time, professional products emerged that began to directly market to – and serve – researchers and M&E professionals. Costs fell with economies of scale, and the preferences of actual users in the field suddenly started to matter in a more direct, tangible, and meaningful way.
  3. Intensifying competition in IT-adopted professional products.Now that use of affordable, accessible, and effective data-collection technology has become ubiquitous, it’s natural for IT organizations to begin viewing it as a kind of core organizational infrastructure, to be adopted, supported, and managed by IT. This means that IT’s particular preferences and needs – like scale, standardization, integration, and compliance – start to become more central, and costs unfortunately rise.

While I still consider us to be in the glory days of the middle stage, where costs are low and end-users matter most, there are still plenty of projects and organizations living in that first stage of more costly pilots, open source projects, and one-offs. And I think that the writing’s very much on the wall when it comes to our progression toward the third stage, where IT comes to drive the space, innovation slows, and end-user needs are no longer dominant.

Full disclosure: I myself have long been a proponent of the middle phase, and I am proud that my social enterprise has been able to help graduate thousands of users from that costly first phase. So my enthusiasm for the middle phase began many years ago and in fact helped to launch Dobility.

THE EARLY DAYS OF DONOR-DRIVEN PILOTS AND OPEN SOURCE

Digital data collection stage 1 (the early days)

In the beginning, there were pioneering developers, patient donors, and program or research teams all willing to take risks and invest in a better way to collect data from the field. They took cutting-edge technologies and found ways to fit them into some of the world’s most difficult, least-cutting-edge settings.

In these early days, it mattered a lot what could excite donors enough to open their checkbooks – and what would keep them excited enough to keep the checks coming. So the vital need for large and ongoing capital injections gave donors a lot of influence over what got done.

Developers also had a lot of sway. Donors couldn’t do anything without them, and they also didn’t really know how to actively manage them. If a developer said “no, that would be too hard or expensive” or even “that wouldn’t work,” what could the donor really say or do? They could cut off funding, but that kind of leverage only worked for the big stuff, the major milestones and the primary objectives. For that stuff, donors were definitely in charge. But for the hundreds or thousands of day-to-day decisions that go into any technology solution, it was the developers effectively in charge.

Actual end-users in the field – the researchers and M&E professionals who were piloting or even trying to use these solutions – might have had some solid ideas about how to guide the technology development, but they had essentially no levers of control. In practice, the solutions being built by the developers were often so technically-complex to configure and use that there was an additional layer of consultants (technical specialists) sitting between the developers and the end-users. But even if there wasn’t, the developers’ inevitable “no, sorry, that’s not feasible,” “we can’t realistically fit that into this release,” or simple silence was typically the end of the story for users in the field. What could they do?

Unfortunately, without meaning any harm, most developers react by pushing back on whatever is contrary to their own preferences (I say this as a lifelong developer myself). Something might seem like a hassle, or architecturally unclean, and so a developer will push back, say it’s a bad idea, drag their heels, even play out the clock. In the past five years of Dobility, there have been hundreds of cases where a developer has said something to the effect of “no, that’s too hard” or “that’s a bad idea” to things that have turned out to (a) take as little as an hour to actually complete and (b) provide massive amounts of benefit to end-users. There’s absolutely no malice involved, it’s just the way most of them/us are.

This stage lasted a long time – too long, in my view! – and an entire industry of technical consultants and paid open-source contributors grew up around an approach to digital data collection that didn’t quite embrace economies of scale and never quite privileged the needs or preferences of actual users in the field. Costs were high and complaints about “pilotitis” grew louder.

INTENSIFYING COMPETITION IN PROGRAM-ADOPTED PROFESSIONAL PRODUCTS

Digital data collection stage 2 (the glory days)

But ultimately, the protagonists of the early days succeeded in establishing and honing the core technologies, and in the process they helped to reveal just how much was common across projects of different kinds, even across sectors. Some of those protagonists also had the foresight and courage to release their technologies with the kinds of permissive open-source licenses that would allow professionalization and experimentation in service and support models. A new breed of professional products directly serving research, program, and M&E teams was born – in no small part out of a single, tremendously-successful open-source project, Open Data Kit (ODK).

These products tended to be sold directly to end-users, and were increasingly intended for those end-users to be able to use themselves, without the help of technical staff or consultants. For traditionalists of the first stage, this was a kind of heresy: it was considered gauche at best and morally wrong at worst to charge money for technology, and it was seen as some combination of impossible and naive to think that end-users could effectively deploy and manage these technologies without technical assistance.

In fact, the new class of professional products were not designed to be used entirely without assistance. But they were designed to require as little assistance as possible, and the assistance came with the product instead of being provided by a separate (and separately-compensated) internal or external team.

A particularly successful breed of products came to use a “Software as a Service” (SaaS) model that streamlined both product delivery and support, ramping up economies of scale and driving down costs in the process (like SurveyCTO). When such products offered technical support free-of-charge as part of the purchase or subscription price, there was a built-in incentive to improve the product: since tech support was so costly to deliver, improving the product such that it required less support became one of the strongest incentives driving product development. Those who adopted the SaaS model not only had to earn every dollar of revenue from end-users, but they had to keep earning that revenue month in, month out, year in, year out, in order to retain business and therefore the revenue needed to pay the bills. (Read about other SaaS benefits for M&E in this recent DevResults post.)

It would be difficult to overstate the importance of these incentives to improve the product and earn revenue from end-users. They are nothing short of transformative. Particularly once there is active competition among vendors, users are squarely in charge. They control the money, their decisions make or break vendors, and so their preferences and needs are finally at the center.

Now, in addition to the “it’s heresy to charge money or think that end-users can wield this kind of technology” complaints that used to be more common, there started to be a different kind of complaint: there are too many solutions! It’s too overwhelming, how many digital data collection solutions there are now. Some go so far as to decry the duplication of effort, to claim that the free market is inefficient or failing; they suggest that donors, consultants, or experts be put back in charge of resource allocation, to re-impose some semblance of sanity to the space.

But meanwhile, we’ve experienced a kind of golden age in terms of who can afford digital data collection technology, who can wield it effectively, and in what kinds of settings. There are a dizzying number of solutions – but most of them cater to a particular type of need, or have optimized their business model in a particular sort of way. Some, like us, rely nearly 100% on subscription revenues, others fund themselves more primarily from service provision, others are trying interesting ways to cross-subsidize from bigger, richer users so that they can offer free or low-cost options to smaller, poorer ones. We’ve overcome pilotitis, economies of scale are finally kicking in, and I think that the social benefits have been tremendous.

INTENSIFYING COMPETITION IN IT-ADOPTED PROFESSIONAL PRODUCTS

Digital data collection stage 3 (the coming days)

It was the success of the first stage that laid the foundation for the second stage, and so too it has been the success of the second stage that has laid the foundation for the third: precisely because digital data collection technology has become so affordable, accessible, and ubiquitous, organizations are increasingly thinking that it should be IT departments that procure and manage that technology.

Part of the motivation is the very proliferation of options that I mentioned above. While economics and the historical success of capitalism has taught us that a marketplace thriving with competition is most often a very good thing, it’s less clear that a wide variety of options is good within any single organization. At the very least, there are very good reasons to want to standardize some software and processes, so that different people and teams can more effortlessly share knowledge and collaborate, and so that there can be some economies of scale in training, support, and compliance.

Imagine if every team used its own product and file format for writing documents, for example. It would be a total disaster! The frictions across and between teams would be enormous. And as data becomes more and more core to the operations of more organizations – the way that digital documents became core many years ago – it makes sense to want to standardize and scale data systems, to streamline integrations, just for efficiency purposes.

Growing compliance needs only up the ante. The arrival of the EU’s General Data Protection Regulation (GDPR) this year, for example, raises the stakes for EU-based (or even EU-touching) organizations considerably, imposing stiff new data privacy requirements and steep penalties for violations. Coming into compliance with GDPR and other data-security regulations will be effectively impossible if IT can’t play a more active role in the procurement, configuration, and ongoing management of data systems; and it will be impractical for IT to play such a role for a vast array of constantly-shifting technologies. After all, IT will require some degree of stability and scale.

But if IT takes over digital data collection technology, what changes? Does the golden age come to an end?

Potentially. And there are certainly very good reasons to worry.

First, changing who controls the dollars – who’s in charge of procurement – threatens to entirely up-end the current regime, where end-users are directly in charge and their needs and preferences are catered to by a growing body of vendors eager to earn their business.

It starts with the procurement process itself. When IT is in charge, procurement processes are long, intensive, and tend to result in a “winner take all” contract. After all, it makes sense that IT departments would want to take their time and choose carefully; they tend to be choosing solutions for the organization as a whole (or at least for some large class of users within the organization), and they most often intend to choose a solution, invest heavily in it, and have it work for as long as possible.

This very natural and appropriate method that IT uses to procure is radically different from the method used by research, program, and M&E teams. And it creates a radically different dynamic for vendors.

Vendors first have to buy into the idea of investing heavily in these procurement processes – which some may simply choose not to do. Then they have to ask themselves, “what do these IT folks care most about?” In order to win these procurements, they need to understand the core concerns driving the purchasing decision. As in the old saying “nobody ever got fired for choosing IBM,” safety, stability, and reputation are likely to be very important. Compliance issues are likely to matter a lot too, including the vendor’s established ability to meet new and evolving standards. Integrations with corporate systems are likely to count for a lot too (e.g., integrating with internal data and identity-management systems).

Does it still matter how well the vendor meets the needs of end-users within the organization? Of course. But note the very important shift in the dynamic: vendors now have to get the IT folks to “yes” and so would be quite right to prioritize meeting their particular needs. Nobody will disagree that end-users ultimately matter, but meanwhile the focus will be on the decision-makers. The vendors that meet the decision-makers’ needs will live, the others will die. That’s simply one aspect of how a free market works.

Note also the subtle change in dynamic once a vendor wins a contract: the SaaS model where vendors had to re-earn every customer’s revenue month in, month out, is largely gone now. Even if the contract is formally structured as a subscription or has lots of exit options, the IT model for technology adoption is inherently stickier. There is a lot more lock-in in practice. Solutions are adopted, they’re invested in at large scale, and nobody wants to walk away from that investment. Innovation can easily slow, and nobody wants to repeat the pain of procurement and adoption in order to switch solutions.

And speaking of the pain of the procurement process: costs have been rising. After all, the procurement process itself is extremely costly to the vendor – especially when it loses, but even when it wins. So that’s got to get priced in somewhere. And then all of the compliance requirements, all of the integrations with corporate systems, all of that stuff’s really expensive too. What had been an inexpensive, flexible, off-the-shelf product can easily become far more expensive and far less flexible as it works itself through IT and compliance processes.

What had started out on a very positive note (“let’s standardize and scale, and comply with evolving data regulations”) has turned in a decidedly dystopian direction. It’s sounding pretty bad now, and you wouldn’t be wrong to think “wait, is this why a bunch of the products I use for work are so much more frustrating than the products I use as a consumer?” or “if Microsoft had to re-earn every user’s revenue for Excel, every month, how much better would it be?”

While I don’t think there’s anything wrong with the instinct for IT to take increasing control over digital data collection technologies, I do think that there’s plenty of reason to worry. There’s considerable risk that we lose the deep user orientation that has just been picking up momentum in the space.

WHERE WE’RE HEADED: STRIKING A BALANCE

Digital data collection stage 4 (finding a balance?)

If we don’t want to lose the benefits of a deep user orientation in this particular technology space, we will need to work pretty hard – and be awfully clever – to avoid it. People will say “oh, but IT just needs to consult research, program, and M&E teams, include them in the process,” but that’s hogwash. Or rather, it’s woefully inadequate. The natural power of those controlling resources to bend the world to their preferences and needs is just too powerful for mere consultation or inclusion to overcome.

And the thing is: what IT wants and needs is good. So the solution isn’t just “let’s not let them anywhere near this, let’s keep the end-users in charge.” No, that approach collapses under its own weight eventually, and certainly it can’t meet rising compliance requirements. It has its own weaknesses and inefficiencies.

What we need is an approach – a market structure – that allows the needs of IT and the needs of end-users both to matter to appropriate degrees.

With SurveyCTO, we’re currently in an interesting place: we’re becoming split between serving end-users and serving IT organizations. And I suppose as long as we’re split, with large parts of our revenue coming from each type of decision-maker, we remain incentivized to keep meeting everybody’s needs. But I see trouble on the horizon: the IT organizations can pay more, and more organizations are shifting in that direction… so once a large-enough proportion of our revenue starts coming from big, winner-take-all IT contracts, I fear that our incentives will be forever changed. In the language of economics, I think that we’re currently living in an unstable equilibrium. And I really want the next equilibrium to serve end-users as well as the last one!

M&E Squared: Evaluating M&E Technologies

by Roger Nathanial Ashby, Co-Founder & Principal Consultant, OpenWise

The universe of MERL Tech solutions has grown exponentially. In 2008 monitoring and evaluating tech within global development could mostly be confined to mobile data collection tools like Open Data Kit (ODK), and Excel spreadsheets to analyze and visualize survey data. In the intervening decade a myriad of tools, companies and NGOs have been created to advance the efficiency and effectiveness of monitoring, evaluation, research and learning (MERL) through the use of technology. Whether it’s M&E platforms or suites, satellite imagery, remote sensors, or chatbots, new innovations are being deployed every day in the field.

However, how do we evaluate the impact when MERL Tech is the intervention itself? That was the question and task put to participants of the “M&E Squared” workshop at MERL Tech 2017.

Workshop participants were separated into three groups that were each given a case study to discuss and analyze. One group was given a case about improving the learning efficiency of health workers in Liberia through the mHero Health Information System (HIS). The system was deployed as a possible remedy to some of the information communication challenges identified during the 2014 West African Ebola outbreak. A second group was given a case about the use of RapidPro to remind women to attend antenatal care (ANC) for preventive malaria medicine in Guinea. The USAID StopPalu project goal was to improve the health of infants by increasing the percent of women attending ANC visits. The final group was given a case about using remote images to assist East African pastoralists. The Satellite Assisted Pastoral Resource Management System (SAPARM) informs pastoralists of vegetation through remote sensing imagery so they can make better decisions about migrating their livestock.

After familiarizing ourselves with the particulars of the case studies, each group was tasked to present their findings to all participants after pondering a series of questions. Some of the issues under discussion included

(1) “How would you assess your MERL Tech’s relevance?”

(2) “How would you evaluate the effectiveness of your MERL Tech?”

(3) “How would you measure efficiency?” and

(4) “How will you access sustainability?”.

Each group came up with some innovative answers to the questions posed and our facilitators and session leads (Alexandra Robinson & Sutyajeet Soneja from USAID and Molly Chen from RTI) will soon synthesize the workshop findings and notes into a concise written brief for the MERL Tech community.

Before the workshop closed we were all introduced to the great work done by SIMLab (Social Impact Lab) in this area through their SIMLab Monitoring and Evaluation Framework. The framework identifies key criteria for evaluating M&E including:

  1. Relevance – The extent to which the technology choice is appropriately suited to the priorities and capacities of the context of the target group or organization.
  2. Effectiveness – A measure of the extent to which an information and communication channel, technology tool, technology platform, or a combination of these attains its objectives.
  3. Efficiency – Measure of the outputs (qualitative and quantitative) in relation to the inputs.
  4. Impact – The positive and negative changed produced by technology introduction, change in a technology tool, or platform on the overall development intervention (directly or indirectly; intended or unintended).
  5. Sustainability – Measure of whether the benefits of a technology tool or platform are likely to continue after donor funding has been withdrawn.
  6. Coherence – How related is the technology to the broader policy context (development, market, communication networks, data standards & interoperability mandates, and national & international law) within which the technology was developed and implemented.

While it’s unfortunate that SIMLab stopped most operations in early September 2017, their exceptional work in this and other areas lives on and you can access the full framework here.

I learned a great deal in this session from the facilitators and my colleagues attending the workshop. I would encourage everyone in the MERL Tech community to take the ideas generated during this workshop and the great work done by SIMLab into their development practice. We certainly intend to integrate much of these insights into our work at OpenWise. Read more about “The Evidence Agenda” here on SIMLab’s blog. 

 

 

 

Maturity Models: Visualizing Progress Towards Next-Generation Transparency and Accountability

photo-sep-08-768x953By Alison Miranda (TAI) and Megan Colnar (Open Society Foundation). This is a cross-post of a piece published on September 17th on the Transparency and Accountability Initiative’s blog.

How can we assess progress on a second-generation way of working in the transparency, accountability and participation (TAP) field? Monitoring, evaluation, research, and learning (MERL) maturity models can provide some inspiration. The 2017 MERL Tech conference in Washington, DC was a two-day bonanza of lightening talks, plenary sessions, and hands-on workshops among participants who use technology for MERL.

Here are key conference takeaways from two MEL practitioners in the TAP field.

1. Making open data useful

Several MERL Tech sessions resonated deeply with the TAP field’s efforts to transition from fighting for transparent and open data towards linking open data to accountability and governance outcomes. Development Gateway and InterAction drew on findings from “Avoiding Data Graveyards” as we explored progress and challenges for International Aid Transparency Initiative (IATI) data use cases. While transparency is an important value, what is gained (or lost) in data use for collaboration when there are many different potential data consumers?

A partnership between Freedom House and DataKind is moving the Freedom in the World study towards a more transparent display of index sub-indicators, and building a more robust – and usable! – data set by reformatting and integrating their data and other secondary big data sets. What could such an initiative yield for the Extractive Industry Transparency Initiative (EITI), for example, if equivalent data sets were available?

And finally, as TAP practitioners are keenly aware, power and politics can overshadow evidence in decision making. Another Development Gateway presentation reminded us that it is important to work with data producers and users to identify decisions that are (or can be) data-driven, and to recognize when other factors are driving decisions. (The incentives to supply open data is whole other can of worms!)

Drawing on our second-generation TAP approach, more work is needed for the TAP and MERL fields to move from “open data everywhere, all of the time” to planning for, and encouraging more effective data use.

2. Tech for MERL for improved policy, practice, and outcomes

Among our favorite moments at MERL Tech was when Dobility Founder and CEO Christopher Robert remarked that “the most interesting innovations at MERL Tech aren’t the new, cutting-edge technology developments, but generic technology applied in innovative ways.” Unsurprising for a tech company focused on using affordable technology to enable quality data collection for social impact, but a refreshing reminder amidst the talk of ‘AI’, ‘chatbots’, and ‘blockchains’ for development coursing through the conference.

The TAP field is certainly not a stranger to employing technology from apps to curb trade corruption in Nigeria to Citizen Helpdesks in Nepal, Liberia, and Mali to crowdsourced political campaign expenditure monitoring in Bolivia, but our second-generation TAP insights remind us technology tools are not an end in themselves. MERL and technology are our means for collecting effective data, generating important insights and learning, building larger movements, and gathering context-specific evidence on transparency and accountability.

We are undoubtedly on the precipice of revolutionary technological advancements that can be readily (and maybe even affordably) deployed[1] to solve complex global challenges, but they will still be tools and not solutions.

3. Balancing inclusion and participation with efficiency and accuracy

We explored a constant conundrum for MERL: how to balance inclusion and participation with efficiency and accuracy. Girl Effect and Praekelt Foundation took “mixed methods” research to another level, combining online and offline efforts to understand user needs of adolescent girls and to support user-developed digital media content. Their iterative process showcased an effective way to integrate tech into the balancing act of inclusive – and holistic – design, paired with real-time data use.

This session on technology in citizen generated data brought to light two case studies of how tech can both help and hinder this balancing act. The World Café discussions underscored the importance of planning for – and recognizing the constraints on – feedback loops. And provided us a helpful reminder that MERL and tech professionals are often considering different “end users” in their design work!

So, which is it – balancing act or zero-sum game between inclusion and efficiency? The MERL community has long applied participatory methods. And tech solutions abound that can help with efficiency, accuracy, and inclusion. Indeed, the second-generation TAP focus on learning and collaboration is grounded in effective data use – but there are many potential “end users” to consider. These principles and practices can force uncomfortable compromises – particularly in the face of finite resources and limited data availability – but they are not at odds with each other. Perhaps the MERL and TAP communities can draw lessons from each other in striking the right balance.

4. Tech sees no development sector silos

One of the things that makes MERL Tech such an exciting conference, is the deliberate mixing of tech nerds with MERL nerds. It’s pretty unique in its dual targeting of both types of professionals who share a common purpose of social impact (where as conferences like ICT4D cast a wider net looking at application of technology to broader development issues). And, though we MERL professionals like to think of design and iteration as squarely within our wheelhouse, being in a room full of tech experts can quickly remind you that our adaptation game has a lot of room to grow. We talk about user-centered design in TAP, but when the tech crowd was asked in plenary “would you even think of designing software or an app without knowing who was going to use it?” they responded with a loud and exuberant laugh.

Tech has long employed systematic approaches to user-centered design, prototyping, iteration, and adaptation, all of which can offer compelling lessons to guide MERL practices and methods. Though we know Context is King, it is exhilarating to know that the tech specialists assembled at the conference work across traditional silos of development work (from health to corruption, and everything in between). End users are, of course, crucial to the final product but the life cycle process and steps follow a regular pattern, regardless of the topic area or users.

The second-generation wave in TAP similarly moved away from project-specific, fragmented, or siloed planning and learning towards a focus on collective approaches and long-term, more organic engagement.

American Evaluation Association President, Kathy Newcomer, quipped that maybe an ‘Academy Awards for Adaptation’ could inspire better informed and more adept evolutions to strategy as circumstances and context shift around us. Adding to this, and borrowing from the tech community, we wonder where we can build more room to beta test, follow real demand, and fail fast. Are we looking towards other sectors and industries enough or continuing to reinvent the wheel?

Alison left thinking:

  • Concepts and practices are colliding across the overlapping MERL, tech, and TAP worlds! In leading the Transparency and Accountability Initiative’s learning strategy, and supporting our work on data use for accountability, I often find myself toggling between different meanings of ‘data’, ‘data users’, and tech applications that can enable both of these themes in our strategy. These worlds don’t have to be compatible all the time, and concepts don’t have to compute immediately (I am personally still working out hypothetical blockchain applications for my MERL work!). But this collision of worlds is a helpful reminder that there are many perspectives to draw from in tackling accountable governance outcomes.
  • Maturity models come in all shapes and sizes, as we saw in the creative depictions created at MERL Tech that included, steps, arrows, paths, circles, cycles, and carrots! And the transparency and accountability field is collectively pursuing a next generation of more effective practice that will take unique turns for different accountability actors and outcomes. Regardless of what our organizational or programmatic models look like, MERL Tech reminded me that champions of continuous improvement are needed at all stages of the model – in MERL, in tech for development, and in the TAP field.

Megan left thinking:

  • That I am beginning to feel like I’m a Dr. Seuss book. We talked ‘big data’, ‘small data’, ‘lean data’, and ‘thick data’. Such jargon-filled conversations can be useful for communicating complex concepts simply with others. Ah, but this is also the problem. This shorthand glosses over the nuances that explain what we actually mean. Jargon is also exclusive—it clearly defines the limits of your community and makes it difficult for newcomers. In TAP, I can’t help but see missed opportunities for connecting our work to other development sectors. How can health outcomes improve without holding governments and service providers accountable for delivering quality healthcare? How can smallholder farmers expect better prices without governments budgeting for and building better roads? Jargon is helpful until it divides us up. We have collective, global problems and we need to figure out how to talk to each other if we’re going to solve them.
  • In general, I’m observing a trend towards organic, participatory, and inclusive processes—in MERL, in TAP, and across the board in development and governance work. This is, almost universally speaking, a good thing. In MERL, a lot of this movement is a backlash to randomistas and imposing The RCT Gold Standard to social impact work. And, while I confess to being overjoyed that the “RCT-or-bust” mindset is fading out, I can’t help but think we’re on a slippery slope. We need scientific rigor, validation, and objective evidence. There has to be a line between ‘asking some good questions’ and ‘conducting an evaluation’. Collectively, we are working to eradicate unjust systems and eliminate poverty, and these issues require not just our best efforts and intentions, but workable solutions. Listen to Freakonomic’s recent podcast When Helping Hurts and commit with me to find ways to keep participatory and inclusive evaluation techniques rigorous and scientific, too.

[1] https://channels.theinnovationenterprise.com/articles/ai-in-developing-countries

Building bridges between evaluators and big data analysts

By Michael Bamberger, Independent Evaluation Consultant. Michael has been involved in development evaluation for 50 years and recently wrote the report: “Integrating Big Data into the Monitoring and Evaluation of Development Programs” for UN Global Pulse.

MERLTech-2016_Panel_VisualNotes

In Part 1 of this series we argued that, while applications of big data and data analytics are expanding rapidly in many areas of development programs, evaluators have been slow to adopt these applications. We predicted that one possible future scenario could be that evaluation may no longer be considered as a separate function, and that it may be treated as one of the outputs of the integrated information systems that will gradually be adopted by many development agencies. Furthermore, many evaluations will use data analytics approaches, rather than conventional evaluation designs. (Image: Big Data session notes from USAIDLearning’s Katherine Haugh [@katherine_haugh}. MERL Tech DC 2016).

Here, in Part 2 we identify some of the reasons why development evaluators have been slow to adopt big data analytics and we propose some promising approaches for building bridges between evaluators and data analysts.

Why have evaluators been slow to adopt big data analytics?

Caroline Heider at the World Bank Independent Evaluation Group identifies four sets of data collection-related challenges affecting the adoption of new technologies by evaluators: ethics, governance, biases (potentially amplified through the use of ICT), and capacity.

We also see:

1. Weak institutional linkages. Over the past few years some development agencies have created data centers to explore ways to exploit new information technologies. These centers are mainly staffed by people with a background in data science or statistics and the institutional links to the agency’s evaluation office are often weak.

2. Many evaluators have limited familiarity with big data/analytics. Evaluation training programs tend to only present conventional experimental, quasi-experimental and mixed-methods/qualitative designs. They usually do not cover smart data analytics (see Part 1 of this blog). Similarly, many data scientists do not have a background in conventional evaluation methodology (though there are of course exceptions).

3. Methodological differences. Many big data approaches do not conform to the basic principles that underpin conventional program evaluation, for example:

  • Data quality: real-time big data provides one of the potentially most powerful sources of data for development programs. Among other things, real-time data can provide early warning signals of potential diseases (e.g. Google Flu), ethnic tension, drought and poverty (Meier 2015). However, when an evaluator asks if the data is biased or of poor quality, the data analyst may respond “Sure the data is biased (e.g. only captured from mobile phone users or twitter feeds) and it may be of poor quality. All data is biased and usually of poor quality, but it does not matter because tomorrow we will have new data.” This reflects the very different kinds of data that evaluators and data analysts typically work with, and the difference can be explained, but a statement such as the above can create the impression that data analysts do not take issues of bias and data quality very seriously.
  • Data mining: Many data analytics methods are based on the mining of large data sets to identify patterns of correlation, which are then built into predictive models, normally using Bayesian statistics. Many evaluators frown on data mining due to its potentially identifying spurious associations.
  • The role of theory: Most (but not all) evaluators believe that an evaluation design should be based on a theoretical framework (theory of change or program theory) that hypothesizes the processes through which the intended outcomes will be achieved. In contrast, there is plenty of debate among data analysts concerning the role of theory, and whether it is necessary at all. Some even go as far as to claim that data analytics means “the end of theory”(Anderson 2008). This, combined with data mining, creates the impression among some evaluators that data analytics uses whatever data is easily accessible with no theoretical framework to guide the selection of evaluation questions as to assess the adequacy of available data.
  • Experimental designs versus predictive analytics: Most quantitative evaluation designs are based on an experimental or quasi-experimental design using a pretest/posttest comparison group design. Given the high cost of data collection, statistical power calculations are frequently used to estimate the minimum size of sample required to ensure a certain level of statistical significance. Usually this means that analysis can only be conducted on the total sample, as sample size does not permit statistical significance testing for sub-samples. In contrast, predictive analytics usually employ Bayesian probability models. Due to the low cost of data collection and analysis, it is usually possible to conduct the analysis on the total population (rather than a sample), so that disaggregated analysis can be conducted to compare sub-populations, and often (particularly when also using machine learning) to compute outcome probabilities for individual subjects. There continues to be heated debates concerning the merits of each approach, and there has been much less discussion of how experimental and predictive analytics approaches could complement each other.
As Pete York at CommunityScience.com observes: “Herein lies the opportunity – we evaluators can’t battle the wave of big data and data science that will transform the way we do research. However, we can force it to have to succumb to the rules of objective rigor via the scientific method. Evaluators/researchers train people how to do it, they can train machines. We are already doing so.”  (Personal communication 8/7/17)

4. Ethical and political concerns: Many evaluators also have concerns about who designs and markets big data apps and who benefits financially. Many commercial agencies collect data on low income populations (for example their consumption patterns) which may then be sold to consumer products companies with little or no benefit going to the populations from which the information is collected. Some of the algorithms may also include a bias against poor and vulnerable groups (O’Neil 2016) that are difficult to detect given the proprietary nature of the algorithms.

Another set of issues concern whether the ways in which big data are collected and used (for making decisions affecting poor and vulnerable groups) tends to be exclusive (governments and donors use big data to make decisions about programs affecting the poor without consulting them), or whether big data is used to promote inclusion (giving voice to vulnerable groups). These issues are discussed in a recent Rockefeller Foundation blog. There are also many issues around privacy and data security. There is of course no simple answer to these questions, but many of these concerns are often lurking in the background when evaluators are considering the possibility of incorporating big data into their evaluations.

Table 1. Reasons evaluators have been slow to adopt big data and opportunities for bridge building between evaluators and data analysts

Reason for slow adoption

Opportunities for bridge building

1. Weak institutional linkages
  • Strengthening formal and informal links between data centers and evaluators
2. Evaluators have limited knowledge about big data and data analytics
  • Capacity development programs covering both big data and conventional evaluation
  • Collaborative pilot evaluation projects
3. Methodological differences
  • Creating opportunities for dialogue to explore differences and to determine how they can be reconciled
  • Viewing data analytics and evaluation as being complementary rather than competing
4. Ethical and political concerns about big data
  • Greater focus on ethical codes of conduct, privacy and data security
  • Focusing on making approaches to big data and evaluation inclusive and avoiding exclusive/extractive approaches

Building bridges between evaluators and big data/analytics 

There are a number of possible steps that could be taken to build bridges between evaluators and big data analysts, and thus to promote the integration of big data into development evaluation. Catherine Cheney (2016) presents interviews with a number of data scientists and development practitioners stressing that data driven development needs both social and computer scientists. No single approach is likely to be successful, and the best approach(es) will depend on each specific context, but we could consider:

  • Strengthening the formal and informal linkages between data centers and evaluation offices. It may be possible to achieve this within the existing organizational structure, but it will often require some formal organizational changes in terms of lines of communication. Linda Raftree provides a useful framework for understanding how different “buckets” of data (including among others, traditional data and big data) can be brought together, which suggests one pathway to collaboration between data centers and evaluation offices.
  • Identifying opportunities for collaborative pilot projects. A useful starting point may be to identify opportunities for collaboration on pilot projects in order to test/demonstrate the value-added of cooperation between the data analysts and evaluators. The pilots should be carefully selected to ensure that both groups are involved equally in the design of the initiative. Time should be budgeted to promote team-building so that each team can understand the other’s approach.
  • Promoting dialogue to explore ways to reconcile differences of approach and methodology between big data and evaluation. While many of these differences may at first appear to be based on fundamental differences of approach, at least some differences result at least in part from questions of terminology and in other cases it may be that different approaches can be applied at different stages of the evaluation process. For example:
    • Many evaluators are suspicious of real-time data from sources such as twitter, or analysis of phone records due to selection bias and issues of data quality. However, evaluators are familiar with exploratory data (collected, for example, during project visits, or feedback from staff), which is then checked more systematically in a follow-up study. When presented in this way, the two teams would be able to discuss in a non-confrontational way, how many kinds of real-time data could be built into evaluation designs.
    • When using Bayesian probability analysis it is necessary to begin with a prior distribution. The probabilities are then updated as more data becomes available. The results of a conventional experimental design can often be used as an input to the definition of the prior distribution. Consequently, it may be possible to consider experimental designs and Bayesian probability analysis as sequential stages of an evaluation rather than as competing approaches.
  • Integrated capacity development programs for data analysts and evaluators. These activities would both help develop a broader common methodological framework and serve as an opportunity for team building.

Conclusion

There are a number of factors that together explain the slow take-up of big data and data analytics by development evaluators. A number of promising approaches are proposed for building bridges to overcoming these barriers and to promote the integration of big data into development evaluation.

See Part 1 for a list of useful references!

Buckets of data for MERL

by Linda Raftree, Independent Consultant and MERL Tech Organizer

It can be overwhelming to get your head around all the different kinds of data and the various approaches to collecting or finding data for development and humanitarian monitoring, evaluation, research and learning (MERL).

Though there are many ways of categorizing data, lately I find myself conceptually organizing data streams into four general buckets when thinking about MERL in the aid and development space:

  1. ‘Traditional’ data. How we’ve been doing things for(pretty much)ever. Researchers, evaluators and/or enumerators are in relative control of the process. They design a specific questionnaire or a data gathering process and go out and collect qualitative or quantitative data; they send out a survey and request feedback; they do focus group discussions or interviews; or they collect data on paper and eventually digitize the data for analysis and decision-making. Increasingly, we’re using digital tools for all of these processes, but they are still quite traditional approaches (and there is nothing wrong with traditional!).
  2. ‘Found’ data.  The Internet, digital data and open data have made it lots easier to find, share, and re-use datasets collected by others, whether this is internally in our own organizations, with partners or just in general.These tend to be datasets collected in traditional ways, such as government or agency data sets. In cases where the datasets are digitized and have proper descriptions, clear provenance, consent has been obtained for use/re-use, and care has been taken to de-identify them, they can eliminate the need to collect the same data over again. Data hubs are springing up that aim to collect and organize these data sets to make them easier to find and use.
  3. ‘Seamless’ data. Development and humanitarian agencies are increasingly using digital applications and platforms in their work — whether bespoke or commercially available ones. Data generated by users of these platforms can provide insights that help answer specific questions about their behaviors, and the data is not limited to quantitative data. This data is normally used to improve applications and platform experiences, interfaces, content, etc. but it can also provide clues into a host of other online and offline behaviors, including knowledge, attitudes, and practices. One cautionary note is that because this data is collected seamlessly, users of these tools and platforms may not realize that they are generating data or understand the degree to which their behaviors are being tracked and used for MERL purposes (even if they’ve checked “I agree” to the terms and conditions). This has big implications for privacy that organizations should think about, especially as new regulations are being developed such a the EU’s General Data Protection Regulations (GDPR). The commercial sector is great at this type of data analysis, but the development set are only just starting to get more sophisticated at it.
  4. ‘Big’ data. In addition to data generated ‘seamlessly’ by platforms and applications, there are also ‘big data’ and data that exists on the Internet that can be ‘harvested’ if one only knows how. The term ‘Big data’ describes the application of analytical techniques to search, aggregate, and cross-reference large data sets in order to develop intelligence and insights. (See this post for a good overview of big data and some of the associated challenges and concerns). Data harvesting is a term used for the process of finding and turning ‘unstructured’ content (message boards, a webpage, a PDF file, Tweets, videos, comments), into ‘semi-structured’ data so that it can then be analyzed. (Estimates are that 90 percent of the data on the Internet exists as unstructured content). Currently, big data seems to be more apt for predictive modeling than for looking backward at how well a program performed or what impact it had. Development and humanitarian organizations (self included) are only just starting to better understand concepts around big data how it might be used for MERL. (This is a useful primer).

Thinking about these four buckets of data can help MERL practitioners to identify data sources and how they might complement one another in a MERL plan. Categorizing them as such can also help to map out how the different kinds of data will be responsibly collected/found/harvested, stored, shared, used, and maintained/ retained/ destroyed. Each type of data also has certain implications in terms of privacy, consent and use/re-use and how it is stored and protected. Planning for the use of different data sources and types can also help organizations choose the data management systems needed and identify the resources, capacities and skill sets required (or needing to be acquired) for modern MERL.

Organizations and evaluators are increasingly comfortable using mobile and/or tablets to do traditional data gathering, but they often are not using ‘found’ datasets. This may be because these datasets are not very ‘find-able,’ because organizations are not creating them, re-using data is not a common practice for them, the data are of questionable quality/integrity, there are no descriptors, or a variety of other reasons.

The use of ‘seamless’ data is something that development and humanitarian agencies might want to get better at. Even though large swaths of the populations that we work with are not yet online, this is changing. And if we are using digital tools and applications in our work, we shouldn’t let that data go to waste if it can help us improve our services or better understand the impact and value of the programs we are implementing. (At the very least, we had better understand what seamless data the tools, applications and platforms we’re using are collecting so that we can manage data privacy and security of our users and ensure they are not being violated by third parties!)

Big data is also new to the development sector, and there may be good reason it is not yet widely used. Many of the populations we are working with are not producing much data — though this is also changing as digital financial services and mobile phone use has become almost universal and the use of smart phones is on the rise. Normally organizations require new knowledge, skills, partnerships and tools to access and use existing big data sets or to do any data harvesting. Some say that big data along with ‘seamless’ data will one day replace our current form of MERL. As artificial intelligence and machine learning advance, who knows… (and it’s not only MERL practitioners who will be out of a job –but that’s a conversation for another time!)

Not every organization needs to be using all four of these kinds of data, but we should at least be aware that they are out there and consider whether they are of use to our MERL efforts, depending on what our programs look like, who we are working with, and what kind of MERL we are tasked with.

I’m curious how other people conceptualize their buckets of data, and where I’ve missed something or defined these buckets erroneously…. Thoughts?

Better or different or both?

by Linda Raftree, Independent Consultant and MERL Tech Organizer

As we delve into why, when, where, if, and how to incorporate various types of technology and digital data tools and approaches into monitoring, evaluation, research and learning (MERL), it can be helpful to think about MERL technologies from two angles:

  1. Doing our work better:  How can new technologies and approaches help us do what we’ve always done — the things that we know are working and having an impact — but do them better? (E.g., faster, with higher quality, more efficiently, less expensively, with greater reach or more inclusion of different voices)
  2. Doing our work differently:  What brand new, previously unthinkable things can be done because of new technologies and approaches? How might these totally new ideas contribute positively to our work or push us to work in an entirely different way.

Sometimes these two things happen simultaneously and sometimes they do not.  Some organizations are better at Thing 1, and others are set-up well to explore Thing 2. Not all organizations need to feel pressured into doing Thing 2; however, and sometimes it can be a distraction from Thing 1. Some organizations may be better off letting early adopters focus on Thing 2 and investing their own budgets and energy in Thing 1 until innovations have been tried and tested by the early adopters. Organizations may also have staff members or teams working on both Thing 1 and Thing 2 separately. Others may conceptualize this as process or pathway moving from Thing 2 to Thing 1, where Thing 2 (once tested and evaluated) is a pipeline into Thing 1.

Here are some potentially useful past discussions on the topic of innovations within development organizations that flesh out some of these thoughts:

Many of the new tools and approaches that were considered experimental 10 years ago have moved from being “brand new and innovative” to simply “helping us do what we’ve always done.” Some of these earlier “innovations” are related to digital data and data collection and processing, and they help us do better monitoring, evaluation and research.

On the flip side, monitoring, evaluation and research have played a key role in helping organizations and the sector overall learn more about how, where, when, why and in what contexts these different tools and approaches (including digital data for MERL) can be adopted. MERL on ICT4D and Digital Development approaches can help calibrate the “hype cycle” and weed out the shiny new tools and approaches that are actually not very effective or useful to the sector and highlight those that cause harm or put people at risk.

There are always going to be new tools and approaches that emerge. Humanitarian and development organizations, then, need to think strategically about what kind of organization they are (or want to be) and where they fit on the MERL Tech continuum between Thing 1 and Thing 2.

What capacities does an organization have for working on Thing 2 (brand new and different)? When and for how long should an organization focus on Thing 1, building on what it knows is working or could work, keeping an eye on the early adopters who are working on Thing 2. When does an organization have enough “proof” to start adopting new tools and approaches that seem to add value? How are these new tools and approaches being monitored, evaluated and researched to improve our use of them?

It’s difficult for widespread adoption to happen in the development space, where there is normally limited time and capacity for failure or for experimentation, without solid MERL. And even with “solid MERL” it can be difficult for organizations to adapt and change due to a multitude of factors, both internal and external.

I’m looking forward to September’s MERL Tech Conference in DC where we have some sessions that explore “the MERL on ICT4MERL?” and others that examine aspects of organizational change related to adopting newer MERL Tech tools and approaches.

(Register here if you haven’t already!)

 

 

Discrete choice experiment (DCE) to generate weights for a multidimensional index

In his MERL Tech Lightning Talk, Simone Lombardini, Global Impact Evaluation Adviser, Oxfam, discussed his experience with an innovative method for applying tech to help determine appropriate metrics for measuring concepts that escape easy definition. To frame his talk, he referenced Oxfam’s recent experience with using discrete choice experiments (DCE) to establish a strategy for measuring women’s empowerment.

Two methods already exist, Simone points out, for transforming soft concepts into hard metrics. First, the evaluator could assume full authority and responsibility over defining the metrics. Alternatively, the evaluator could design the evaluation so that relevant stakeholders are incorporated into the process and use their input to help define the metrics.

Though both methods are common, they are missing (for practical reasons) the level of mass input that could make them truly accurate reflections of the social perception of whatever concept is being considered. Tech has a role to play in scaling the quantity of input that can be collected. If used correctly, this could lead to better evaluation metrics.

Simone described this approach as “context-specific” and “multi-dimensional.” The process starts by defining the relevant characteristics (such as those found in empowered women) in their social context, then translating these characteristics into indicators, and finally combining indicators into one empowerment index for evaluating the project.

After the characteristics are defined, a discrete choice experiment can be used to determine its “weight” in a particular social context. A discrete choice experiment (DCE) is a technique that’s frequently been used in health economics and marketing, but not much in impact evaluation. To implement a DCE, researchers present different hypothetical scenarios to respondents and ask them to decide which one they consider to best reflect the concept in question (i.e. women’s empowerment). The responses are used to assess the indicators covered by the DCE, and these can then be used to develop an empowerment index.

This process was integrated into data collection process and added 10 mins at the end of a one hour survey, and was made practicable due to the ubiquity of smartphones. The results from Oxfam’s trial run using this method are still being analyzed. For more on this, watch Lombardini’s video below!

Community-led mobile research–What could it look like?

Adam Groves, Head of Programs at On Our Radar, gave a presentation at MERL Tech London in February where he elaborated on a new method for collecting qualitative ethnographic data remotely.

The problem On Our Radar sought to confront, Adam declares, is the cold and impenetrable bureaucratic machinery of complex organizations. To many people, the unresponsiveness and inhumanity of the bureaucracies that provide them with services is dispiriting, and this is a challenge to overcome for anyone that wants to provide a quality service.

On Our Radar’s solution is to enable people to share their real-time experiences of services by recording audio and SMS diaries with their basic mobile phones. Because of the intimacy they capture, these first-person accounts have the capacity to grab the people behind services and make them listen to and experience the customer’s thoughts and feelings as they happened.

Responses obtained from audio and SMS diaries are different from those obtained from other qualitative data collection methods because, unlike solutions that crowdsource feedback, these diaries contain responses from a small group of trained citizen reporters that share their experiences in these diaries over a sustained period of time. The product is a rich and textured insight into the reporters’ emotions and priorities. One can track their journeys through services and across systems.

On Our Radar worked with British Telecom (BT) to implement this technique. The objective was to help BT understand how their customers with dementia experience their services. Over a few weeks, forty people living with dementia recorded audio diaries about their experiences dealing with big companies.

Adam explained how the audio diary method was effective for this project:

  • Because diaries and dialogues are in real time, they captured emotional highs and lows (such as the anxiety of picking up the phone and making a call) that would not be recalled in post fact interviews.
  • Because diaries are focused on individuals and their journeys instead of on discrete interactions with specific services, they showed how encountering seemingly unrelated organizations or relationships impacted users’ experiences of BT. For example, cold calls became terrifying for people with dementia and made them reluctant to answer the phone for anyone.
  • Because this method follows people’s experiences over time, it allows researchers to place individual pain points and problems in the context of a broader experience.
  • Because the data is in first person and in the moment, it moved people emotionally. Data was shared with call center staff and managers, and they found it compelling. It was an emotional human story told in one’s own words. It invited decision makers to walk in other people’s shoes.

On Our Radar’s future projects include working in Sierra Leone with local researchers to understand how households are changing their practices post-Ebola and a major piece of research with the London School of Hygiene and Tropical Medicine in Malaysia and the Philippines to gain insight on people’s understanding of their health systems.

For more, find a video of Adam’s original presentation below!

Cost-benefit comparisons of IVR, SMS, and phone survey methods

In his MERL Tech London Lightning Talk back in February, Jan Liebnitzky of Firetail provided a research-backed assessment of the costs and benefits of using interactive voice response surveys (IVR), SMS surveys, and phone surveys for MERL purposes.

First, he outlined the opportunities and challenges of using phones for survey research:

  • They are a good means for providing incentives. And research shows that incentives don’t have to be limited to airtime credits. The promise of useful information is sometimes the best motivator for respondents to participate in surveys.
  • They are less likely to reach subgroupsThough mobile phones are ubiquitous, one challenge is that groups like women, illiterate people and people in low-connectivity areas do not always have access to them. Thus, phones may not be as effective as one would hope for reaching the people most often targeted by aid programs.
  • They are scalable and have expansive reach. Scripting and outsourcing phone-based surveys to call centers takes time and capacity. Fixed costs are high, while marginal costs for each new question or respondent is low. This means that they can be cost effective (compared to on the ground surveys) if implemented at a large scale or in remote and high risk areas with problematic access.

Then, Jan shared some strategies for using phones for MERL purposes:

1. Interactive Voice Response Surveys

    • These are pre-recorded and automated surveys. Respondent can reply to them by voice or with the numerical keypad.
    • IVR has been used in interactive radio programs in Tanzania, where listening posts were established for the purpose of interacting with farmers. Listening posts are multi-channel, web-based platforms that gather and analyze feedback and questions from farmers that listen to particular radio shows. The radio station will run the IVR, and farmers can call in to the radio show to submit their questions or responses. These are effective because they are run through a trusted radio shows. However, it is important that farmers receive answers for the questions they ask, as this incentivizes future participation.

2. SMS Surveys

    • These make use of mobile messaging capabilities to send questions and receive answers. Usually, the SMS survey respondent will either choose between fixed multiple choice answers or write a freeform response. Responses, however, are limited to 160 characters.
    • One example of this is U-Reporter, a free SMS social monitoring tool for community participation in Uganda. Polls are sent to U-Reporters who answer back in real time, and the results are then shared back with the community.

3. Phone Surveys

    • Phone surveys are run through call centers by enumerators. They function like face to face interview, but over the phone.
    • As an example, phone surveys were used as a monitoring tool by an agriculture extension services provider. Farmers in the area subscribed to receive texts from the provider with tips about when and how to plant crops. From the list of subscribers, prospective respondents were sampled and in-country call centers were contracted to call up to 1,000 service users to inquire about quality of service, behaviour changes and adoption of new farming technologies.
    • The challenges here were that the data were only as good as call staff was trained. Also, there was a 80% drop off rate, partly due to the language limitations of call staff.

Finally, Jan provided a rough cost and effectivity assessment for each method:

  • IVR survey: medium cost, high response
  • SMS survey: low cost, low response
  • Phone survey: high cost, medium response

Jan closed with a question: What is the value of these methods for MERL?

His answer: The surveys are quick and dirty and, to their merit, they produce timely data from remote areas at a reasonable cost. If the data is made use of, it can be effective for monitoring. However, these methods are not yet adequate for use in evaluation.

For more, watch Jan’s Lightning Talk below!