MERL Tech News

Mobile Case Management for Multi-Dimensional Accountability

This is a cross-post from Christopher Robert of Dobility. It was originally published September 13 on the SurveyCTO blog.

At MERL Tech DC 2017, Oxfam’s Emily Tomkys Valteri and I teamed up to lead a session on Mobile case management for multi-dimensional accountability. This blog post shares some highlights from that session. [Note: session slides are available here]

Background

In their Your Word Counts project, Oxfam is collaborating with local and global partners to capture, analyze, and respond to community feedback data using a mobile case management tool. The goal is to inform Oxfam’s Middle East humanitarian response and give those affected by crisis a voice for improved support and services. This project is a scale-up of an earlier pilot project, and both the pilot and the scale-up have been supported by the Humanitarian Innovation Fund.

Oxfam’s use of SurveyCTO’s case-management features has been innovative, and they have been helping to support improvements in the core technology. In this session, we discussed both the core technology and the broader organizational and logistical challenges that Oxfam has encountered in the field.

Mobile case management: an introduction 

In standard applications of mobile data collection, enumerators, inspectors, program officers, or others use a mobile phone or tablet to collect data. Whether they quietly observe things, interview people, or inspect facilities, they ultimately enter some kind of data into a mobile device. In systems like SurveyCTO, data-collection officially begins when they click a Fill Blank Formbutton and choose a digital form to fill out.

Mobile data collection

Mobile case management is much the same, but the process begins with cases and then proceeds to forms. As far as the core technology is concerned, a case might be a clinic, a school, a water point, a household – pretty much any unit that’s meaningful in the given context. Instead of choosing Fill Blank Form and choosing a form, users in the field choose Manage Cases and then choose a particular case from a list that’s filtered specifically for that user (e.g., to include only schools in their area); once they select a case, they then select one of the forms that is outstanding for that case.

Mobile case management

Behind the scenes, the case list is really just a spreadsheet. It includes columns for the unique case ID, the label that should be used to identify the case to users, the list of forms that should be filled for the case, and the users and/or user roles that should see the case listed in their case list. Importantly, the case list is not static: any form can update or add a case, and thus as users fill forms the case list can be dynamically revised and extended. (In SurveyCTO, the case list is simply a server dataset: it can be manually uploaded as a .csv, attached to forms, and updated just like any other dataset.)

Mobile case management: case list

Oxfam’s innovative use case: Your Word Counts 

Oxfam accountability feedback loop

Oxfam accountability feedback loop. Diagram credit: Oxfam GB.

In Oxfam’s Your Word Counts project, cases represent any kind of feedback from the community. Volunteers and program staff carry mobile phones and log feedback as new cases whenever they interact with community members; technical teams then work to resolve feedback within a week, filling out new forms to update cases as their status changes; and program staff then close the loop with the original community members when possible, before closing the case. Because the data is all available in a single electronic system, in-country, regional, and even global teams can then report on and analyze both the community feedback and the responses over time.

There have been some definite successes in piloting and early scale-up:

  • By listening to community members, recording their feedback, and following up, the community feedback system has helped to build trust.
  • The digital process of recording referrals, updates, and eventually responses has been rapid, speeding responsiveness to feedback overall.
  • Since all digital forms can be updated easily, the system is dynamic and flexible enough to adapt as programs or needs change.
  • The solution appears to be low-cost, scalable, and sustainable.

There have been both organizational and logistical challenges, however. For example:

  • For a system like this to truly be effective, fundamental responsibility for accountability must be shared organization-wide. While MEAL officers (monitoring, evaluation, accountability, and learning officers) can help to set up and manage accountability systems, technical teams, program teams, and senior leadership ultimately have to share ownership and responsibility in order for the system to function and sustain.
  • Globally-predefined feedback categories turned out not to fit well with early deployment contexts, and so the program team needed to re-think how to most effectively categorize feedback. (See Oxfam’s blog post on the subject.)
  • In dynamic in-country settings, staff turnover can be high, posing major logistical and sustainability challenges for systems of all kinds.
  • While community members can add and update cases offline, ultimately an Internet connection is required to synchronize case lists with a central server. In some settings, access to office Internet has been a challenge.
  • Ideally, cases would be easily referred across agencies working in a particular setting, but some agencies have been reluctant to buy into shared digital systems.

Oxfam’s MEAL team is exploring ways to facilitate a broader accountability culture throughout the organization. In country programs, for example, MEAL coordinators are looking to use office whiteboards to track key indicators of feedback performance and engage staff in discussions of what those indicators mean for them. More broadly, Oxfam is looking to highlight best practices in responding and acting on feedback and seeking other ways to incentivize teams in this area.

Oxfam’s work is ongoing, and you can follow their progress on their project blog.

Mobile case management: Where it’s going 

While Oxfam works to build and support both systems and culture for accountability in their humanitarian response programs, we at Dobility are working to improve the core technology. With Oxfam’s feedback and support, we are currently working to improve the user interface used to filter and browse case lists, both on devices (in the field) and on the web (in the office). We are also working to improve the user interface for those setting up and managing these kinds of case-management system. If you have specific ideas, please share them by commenting below!

Maturity Models: Visualizing Progress Towards Next-Generation Transparency and Accountability

photo-sep-08-768x953By Alison Miranda (TAI) and Megan Colnar (Open Society Foundation). This is a cross-post of a piece published on September 17th on the Transparency and Accountability Initiative’s blog.

How can we assess progress on a second-generation way of working in the transparency, accountability and participation (TAP) field? Monitoring, evaluation, research, and learning (MERL) maturity models can provide some inspiration. The 2017 MERL Tech conference in Washington, DC was a two-day bonanza of lightening talks, plenary sessions, and hands-on workshops among participants who use technology for MERL.

Here are key conference takeaways from two MEL practitioners in the TAP field.

1. Making open data useful

Several MERL Tech sessions resonated deeply with the TAP field’s efforts to transition from fighting for transparent and open data towards linking open data to accountability and governance outcomes. Development Gateway and InterAction drew on findings from “Avoiding Data Graveyards” as we explored progress and challenges for International Aid Transparency Initiative (IATI) data use cases. While transparency is an important value, what is gained (or lost) in data use for collaboration when there are many different potential data consumers?

A partnership between Freedom House and DataKind is moving the Freedom in the World study towards a more transparent display of index sub-indicators, and building a more robust – and usable! – data set by reformatting and integrating their data and other secondary big data sets. What could such an initiative yield for the Extractive Industry Transparency Initiative (EITI), for example, if equivalent data sets were available?

And finally, as TAP practitioners are keenly aware, power and politics can overshadow evidence in decision making. Another Development Gateway presentation reminded us that it is important to work with data producers and users to identify decisions that are (or can be) data-driven, and to recognize when other factors are driving decisions. (The incentives to supply open data is whole other can of worms!)

Drawing on our second-generation TAP approach, more work is needed for the TAP and MERL fields to move from “open data everywhere, all of the time” to planning for, and encouraging more effective data use.

2. Tech for MERL for improved policy, practice, and outcomes

Among our favorite moments at MERL Tech was when Dobility Founder and CEO Christopher Robert remarked that “the most interesting innovations at MERL Tech aren’t the new, cutting-edge technology developments, but generic technology applied in innovative ways.” Unsurprising for a tech company focused on using affordable technology to enable quality data collection for social impact, but a refreshing reminder amidst the talk of ‘AI’, ‘chatbots’, and ‘blockchains’ for development coursing through the conference.

The TAP field is certainly not a stranger to employing technology from apps to curb trade corruption in Nigeria to Citizen Helpdesks in Nepal, Liberia, and Mali to crowdsourced political campaign expenditure monitoring in Bolivia, but our second-generation TAP insights remind us technology tools are not an end in themselves. MERL and technology are our means for collecting effective data, generating important insights and learning, building larger movements, and gathering context-specific evidence on transparency and accountability.

We are undoubtedly on the precipice of revolutionary technological advancements that can be readily (and maybe even affordably) deployed[1] to solve complex global challenges, but they will still be tools and not solutions.

3. Balancing inclusion and participation with efficiency and accuracy

We explored a constant conundrum for MERL: how to balance inclusion and participation with efficiency and accuracy. Girl Effect and Praekelt Foundation took “mixed methods” research to another level, combining online and offline efforts to understand user needs of adolescent girls and to support user-developed digital media content. Their iterative process showcased an effective way to integrate tech into the balancing act of inclusive – and holistic – design, paired with real-time data use.

This session on technology in citizen generated data brought to light two case studies of how tech can both help and hinder this balancing act. The World Café discussions underscored the importance of planning for – and recognizing the constraints on – feedback loops. And provided us a helpful reminder that MERL and tech professionals are often considering different “end users” in their design work!

So, which is it – balancing act or zero-sum game between inclusion and efficiency? The MERL community has long applied participatory methods. And tech solutions abound that can help with efficiency, accuracy, and inclusion. Indeed, the second-generation TAP focus on learning and collaboration is grounded in effective data use – but there are many potential “end users” to consider. These principles and practices can force uncomfortable compromises – particularly in the face of finite resources and limited data availability – but they are not at odds with each other. Perhaps the MERL and TAP communities can draw lessons from each other in striking the right balance.

4. Tech sees no development sector silos

One of the things that makes MERL Tech such an exciting conference, is the deliberate mixing of tech nerds with MERL nerds. It’s pretty unique in its dual targeting of both types of professionals who share a common purpose of social impact (where as conferences like ICT4D cast a wider net looking at application of technology to broader development issues). And, though we MERL professionals like to think of design and iteration as squarely within our wheelhouse, being in a room full of tech experts can quickly remind you that our adaptation game has a lot of room to grow. We talk about user-centered design in TAP, but when the tech crowd was asked in plenary “would you even think of designing software or an app without knowing who was going to use it?” they responded with a loud and exuberant laugh.

Tech has long employed systematic approaches to user-centered design, prototyping, iteration, and adaptation, all of which can offer compelling lessons to guide MERL practices and methods. Though we know Context is King, it is exhilarating to know that the tech specialists assembled at the conference work across traditional silos of development work (from health to corruption, and everything in between). End users are, of course, crucial to the final product but the life cycle process and steps follow a regular pattern, regardless of the topic area or users.

The second-generation wave in TAP similarly moved away from project-specific, fragmented, or siloed planning and learning towards a focus on collective approaches and long-term, more organic engagement.

American Evaluation Association President, Kathy Newcomer, quipped that maybe an ‘Academy Awards for Adaptation’ could inspire better informed and more adept evolutions to strategy as circumstances and context shift around us. Adding to this, and borrowing from the tech community, we wonder where we can build more room to beta test, follow real demand, and fail fast. Are we looking towards other sectors and industries enough or continuing to reinvent the wheel?

Alison left thinking:

  • Concepts and practices are colliding across the overlapping MERL, tech, and TAP worlds! In leading the Transparency and Accountability Initiative’s learning strategy, and supporting our work on data use for accountability, I often find myself toggling between different meanings of ‘data’, ‘data users’, and tech applications that can enable both of these themes in our strategy. These worlds don’t have to be compatible all the time, and concepts don’t have to compute immediately (I am personally still working out hypothetical blockchain applications for my MERL work!). But this collision of worlds is a helpful reminder that there are many perspectives to draw from in tackling accountable governance outcomes.
  • Maturity models come in all shapes and sizes, as we saw in the creative depictions created at MERL Tech that included, steps, arrows, paths, circles, cycles, and carrots! And the transparency and accountability field is collectively pursuing a next generation of more effective practice that will take unique turns for different accountability actors and outcomes. Regardless of what our organizational or programmatic models look like, MERL Tech reminded me that champions of continuous improvement are needed at all stages of the model – in MERL, in tech for development, and in the TAP field.

Megan left thinking:

  • That I am beginning to feel like I’m a Dr. Seuss book. We talked ‘big data’, ‘small data’, ‘lean data’, and ‘thick data’. Such jargon-filled conversations can be useful for communicating complex concepts simply with others. Ah, but this is also the problem. This shorthand glosses over the nuances that explain what we actually mean. Jargon is also exclusive—it clearly defines the limits of your community and makes it difficult for newcomers. In TAP, I can’t help but see missed opportunities for connecting our work to other development sectors. How can health outcomes improve without holding governments and service providers accountable for delivering quality healthcare? How can smallholder farmers expect better prices without governments budgeting for and building better roads? Jargon is helpful until it divides us up. We have collective, global problems and we need to figure out how to talk to each other if we’re going to solve them.
  • In general, I’m observing a trend towards organic, participatory, and inclusive processes—in MERL, in TAP, and across the board in development and governance work. This is, almost universally speaking, a good thing. In MERL, a lot of this movement is a backlash to randomistas and imposing The RCT Gold Standard to social impact work. And, while I confess to being overjoyed that the “RCT-or-bust” mindset is fading out, I can’t help but think we’re on a slippery slope. We need scientific rigor, validation, and objective evidence. There has to be a line between ‘asking some good questions’ and ‘conducting an evaluation’. Collectively, we are working to eradicate unjust systems and eliminate poverty, and these issues require not just our best efforts and intentions, but workable solutions. Listen to Freakonomic’s recent podcast When Helping Hurts and commit with me to find ways to keep participatory and inclusive evaluation techniques rigorous and scientific, too.

[1] https://channels.theinnovationenterprise.com/articles/ai-in-developing-countries

MERL Tech DC Conference wrap up

Over 300 MERL Tech practitioners came together in Washington DC the first week of September for MERL Tech DC.

Kathy Newcomer, American Evaluation Association President, gives her opening talk on Day 2.
Kathy Newcomer, American Evaluation Association President, gives her opening talk on Day 2.
Blockchain was one of the most popular sessions.
Blockchain was one of the most popular sessions.

Core topic areas included organizational change and capacity; evaluation of MERL Tech and ICT4D; big data, small data and data analytics; tech tools to support qualitative methods; new and emerging technologies with a potential role in MERL; inclusion and ways tech can support ‘downward’ accountability; practical sessions on tools and methods; and community building in the MERL Tech sector.

Check out InSTEDD’s fantastic recap of the event in pictures and Tweets.

What does “MERL Tech Maturity” look like?

In plenary, groups worked together to discuss “MERL Tech Maturity Models” – in other words, what are the characteristics of an organization that is fully mature when it comes to MERL Tech. People also spent some time thinking about where their organizations fit on the “MERL Tech Maturity” scale: from brand new or less experienced to fully mature. (We’ll share more about this in a future post).

The Data Turnpike was voted the best depiction of a Maturity Model.
The Data Turnpike was voted the best depiction of a Maturity Model.

As always, there was plenty of socializing with old and new friends and collaborators too!

Screen Shot 2017-09-20 at 6.19.32 AMScreen Shot 2017-09-20 at 6.22.47 AMScreen Shot 2017-09-20 at 6.28.57 AMScreen Shot 2017-09-20 at 6.29.14 AM

Stay tuned for session summaries and more, coming up over the next several weeks here on MERL Tech News!

Building bridges between evaluators and big data analysts

By Michael Bamberger, Independent Evaluation Consultant. Michael has been involved in development evaluation for 50 years and recently wrote the report: “Integrating Big Data into the Monitoring and Evaluation of Development Programs” for UN Global Pulse.

MERLTech-2016_Panel_VisualNotes

In Part 1 of this series we argued that, while applications of big data and data analytics are expanding rapidly in many areas of development programs, evaluators have been slow to adopt these applications. We predicted that one possible future scenario could be that evaluation may no longer be considered as a separate function, and that it may be treated as one of the outputs of the integrated information systems that will gradually be adopted by many development agencies. Furthermore, many evaluations will use data analytics approaches, rather than conventional evaluation designs. (Image: Big Data session notes from USAIDLearning’s Katherine Haugh [@katherine_haugh}. MERL Tech DC 2016).

Here, in Part 2 we identify some of the reasons why development evaluators have been slow to adopt big data analytics and we propose some promising approaches for building bridges between evaluators and data analysts.

Why have evaluators been slow to adopt big data analytics?

Caroline Heider at the World Bank Independent Evaluation Group identifies four sets of data collection-related challenges affecting the adoption of new technologies by evaluators: ethics, governance, biases (potentially amplified through the use of ICT), and capacity.

We also see:

1. Weak institutional linkages. Over the past few years some development agencies have created data centers to explore ways to exploit new information technologies. These centers are mainly staffed by people with a background in data science or statistics and the institutional links to the agency’s evaluation office are often weak.

2. Many evaluators have limited familiarity with big data/analytics. Evaluation training programs tend to only present conventional experimental, quasi-experimental and mixed-methods/qualitative designs. They usually do not cover smart data analytics (see Part 1 of this blog). Similarly, many data scientists do not have a background in conventional evaluation methodology (though there are of course exceptions).

3. Methodological differences. Many big data approaches do not conform to the basic principles that underpin conventional program evaluation, for example:

  • Data quality: real-time big data provides one of the potentially most powerful sources of data for development programs. Among other things, real-time data can provide early warning signals of potential diseases (e.g. Google Flu), ethnic tension, drought and poverty (Meier 2015). However, when an evaluator asks if the data is biased or of poor quality, the data analyst may respond “Sure the data is biased (e.g. only captured from mobile phone users or twitter feeds) and it may be of poor quality. All data is biased and usually of poor quality, but it does not matter because tomorrow we will have new data.” This reflects the very different kinds of data that evaluators and data analysts typically work with, and the difference can be explained, but a statement such as the above can create the impression that data analysts do not take issues of bias and data quality very seriously.
  • Data mining: Many data analytics methods are based on the mining of large data sets to identify patterns of correlation, which are then built into predictive models, normally using Bayesian statistics. Many evaluators frown on data mining due to its potentially identifying spurious associations.
  • The role of theory: Most (but not all) evaluators believe that an evaluation design should be based on a theoretical framework (theory of change or program theory) that hypothesizes the processes through which the intended outcomes will be achieved. In contrast, there is plenty of debate among data analysts concerning the role of theory, and whether it is necessary at all. Some even go as far as to claim that data analytics means “the end of theory”(Anderson 2008). This, combined with data mining, creates the impression among some evaluators that data analytics uses whatever data is easily accessible with no theoretical framework to guide the selection of evaluation questions as to assess the adequacy of available data.
  • Experimental designs versus predictive analytics: Most quantitative evaluation designs are based on an experimental or quasi-experimental design using a pretest/posttest comparison group design. Given the high cost of data collection, statistical power calculations are frequently used to estimate the minimum size of sample required to ensure a certain level of statistical significance. Usually this means that analysis can only be conducted on the total sample, as sample size does not permit statistical significance testing for sub-samples. In contrast, predictive analytics usually employ Bayesian probability models. Due to the low cost of data collection and analysis, it is usually possible to conduct the analysis on the total population (rather than a sample), so that disaggregated analysis can be conducted to compare sub-populations, and often (particularly when also using machine learning) to compute outcome probabilities for individual subjects. There continues to be heated debates concerning the merits of each approach, and there has been much less discussion of how experimental and predictive analytics approaches could complement each other.
As Pete York at CommunityScience.com observes: “Herein lies the opportunity – we evaluators can’t battle the wave of big data and data science that will transform the way we do research. However, we can force it to have to succumb to the rules of objective rigor via the scientific method. Evaluators/researchers train people how to do it, they can train machines. We are already doing so.”  (Personal communication 8/7/17)

4. Ethical and political concerns: Many evaluators also have concerns about who designs and markets big data apps and who benefits financially. Many commercial agencies collect data on low income populations (for example their consumption patterns) which may then be sold to consumer products companies with little or no benefit going to the populations from which the information is collected. Some of the algorithms may also include a bias against poor and vulnerable groups (O’Neil 2016) that are difficult to detect given the proprietary nature of the algorithms.

Another set of issues concern whether the ways in which big data are collected and used (for making decisions affecting poor and vulnerable groups) tends to be exclusive (governments and donors use big data to make decisions about programs affecting the poor without consulting them), or whether big data is used to promote inclusion (giving voice to vulnerable groups). These issues are discussed in a recent Rockefeller Foundation blog. There are also many issues around privacy and data security. There is of course no simple answer to these questions, but many of these concerns are often lurking in the background when evaluators are considering the possibility of incorporating big data into their evaluations.

Table 1. Reasons evaluators have been slow to adopt big data and opportunities for bridge building between evaluators and data analysts

Reason for slow adoption

Opportunities for bridge building

1. Weak institutional linkages
  • Strengthening formal and informal links between data centers and evaluators
2. Evaluators have limited knowledge about big data and data analytics
  • Capacity development programs covering both big data and conventional evaluation
  • Collaborative pilot evaluation projects
3. Methodological differences
  • Creating opportunities for dialogue to explore differences and to determine how they can be reconciled
  • Viewing data analytics and evaluation as being complementary rather than competing
4. Ethical and political concerns about big data
  • Greater focus on ethical codes of conduct, privacy and data security
  • Focusing on making approaches to big data and evaluation inclusive and avoiding exclusive/extractive approaches

Building bridges between evaluators and big data/analytics 

There are a number of possible steps that could be taken to build bridges between evaluators and big data analysts, and thus to promote the integration of big data into development evaluation. Catherine Cheney (2016) presents interviews with a number of data scientists and development practitioners stressing that data driven development needs both social and computer scientists. No single approach is likely to be successful, and the best approach(es) will depend on each specific context, but we could consider:

  • Strengthening the formal and informal linkages between data centers and evaluation offices. It may be possible to achieve this within the existing organizational structure, but it will often require some formal organizational changes in terms of lines of communication. Linda Raftree provides a useful framework for understanding how different “buckets” of data (including among others, traditional data and big data) can be brought together, which suggests one pathway to collaboration between data centers and evaluation offices.
  • Identifying opportunities for collaborative pilot projects. A useful starting point may be to identify opportunities for collaboration on pilot projects in order to test/demonstrate the value-added of cooperation between the data analysts and evaluators. The pilots should be carefully selected to ensure that both groups are involved equally in the design of the initiative. Time should be budgeted to promote team-building so that each team can understand the other’s approach.
  • Promoting dialogue to explore ways to reconcile differences of approach and methodology between big data and evaluation. While many of these differences may at first appear to be based on fundamental differences of approach, at least some differences result at least in part from questions of terminology and in other cases it may be that different approaches can be applied at different stages of the evaluation process. For example:
    • Many evaluators are suspicious of real-time data from sources such as twitter, or analysis of phone records due to selection bias and issues of data quality. However, evaluators are familiar with exploratory data (collected, for example, during project visits, or feedback from staff), which is then checked more systematically in a follow-up study. When presented in this way, the two teams would be able to discuss in a non-confrontational way, how many kinds of real-time data could be built into evaluation designs.
    • When using Bayesian probability analysis it is necessary to begin with a prior distribution. The probabilities are then updated as more data becomes available. The results of a conventional experimental design can often be used as an input to the definition of the prior distribution. Consequently, it may be possible to consider experimental designs and Bayesian probability analysis as sequential stages of an evaluation rather than as competing approaches.
  • Integrated capacity development programs for data analysts and evaluators. These activities would both help develop a broader common methodological framework and serve as an opportunity for team building.

Conclusion

There are a number of factors that together explain the slow take-up of big data and data analytics by development evaluators. A number of promising approaches are proposed for building bridges to overcoming these barriers and to promote the integration of big data into development evaluation.

See Part 1 for a list of useful references!

The future of development evaluation in the age of big data

Screen Shot 2017-07-22 at 1.52.33 PMBy Michael Bamberger, Independent Evaluation Consultant. Michael has been involved in development evaluation for 50 years and recently wrote the report: “Integrating Big Data into the Monitoring and Evaluation of Development Programs” for UN Global Pulse.

We are living in an increasingly quantified world.

There are multiple sources of data that can be generated and analyzed in real-time. They can be synthesized to capture complex interactions among data streams and to identify previously unsuspected linkages among seemingly unrelated factors [such as the purchase of diapers and increased sales of beer]. We can now quantify and monitor ourselves, our houses (even the contents of our refrigerator!), our communities, our cities, our purchases and preferences, our ecosystem, and multiple dimensions of the state of the world.

These rich sources of data are becoming increasingly accessible to individuals, researchers and businesses through huge numbers of mobile phone and tablet apps and user-friendly data analysis programs.

The influence of digital technology on international development is growing.

Many of these apps and other big data/data analytics tools are now being adopted by international development agencies. Due to their relatively low cost, ease of application, and accessibility in remote rural areas, the approaches are proving particularly attractive to non-profit organizations; and the majority of NGOs probably now use some kind of mobile phone apps.

Apps are widely-used for early warning systems, emergency relief, dissemination of information (to farmers, mothers, fishermen and other groups with limited access to markets), identifying and collecting feedback from marginal and vulnerable groups, and permitting rapid analysis of poverty. Data analytics are also used to create integrated data bases that synthesize all of the information on topics as diverse as national water resources, human trafficking, updates on conflict zones, climate change and many other development topics.

Table 1: Widely used big data/data analytics applications in international development

Application

Big data/data analytics tools

Early warning systems for natural and man-made disasters
  • Analysis of Twitter, Facebook and other social media
  • Analysis of radio call-in programs
  • Satellite images and remote sensors
  • Electronic transaction records [ATM, on-line purchases]
Emergency relief
  • GPS mapping and tracking
  • Crowd-sourcing
  • Satellite images
Dissemination of information to small farmers, mothers, fishermen and other traders
  • Mobile phones
  • Internet
Feedback from marginal and vulnerable groups and on sensitive topics
  • Crowd-sourcing
  • Secure hand-held devices [e.g. UNICEF’s “U-Report” device]
Rapid analysis of poverty and identification of low-income groups
  • Analysis of phone records
  • Social media analysis
  • Satellite images [e.g. using thatched roofs as a proxy indicator of low-income households]
  • Electronic transaction records
Creation of an integrated data base synthesizing all the multiples sources of data on a development topic
  • National water resources
  • Human trafficking
  • Agricultural conditions in a particular region


Evaluation is lagging behind.

Surprisingly, program evaluation is the area that is lagging behind in terms of the adoption of big data/analytics. The few available studies report that a high proportion of evaluators are not very familiar with big data/analytics and significantly fewer report having used big data in their professional evaluation work. Furthermore, while many international development agencies have created data development centers within the past few years, many of these are staffed by data scientists (many with limited familiarity with conventional evaluation methods) and there are weak institutional links to agency evaluation offices.

A recent study on the current status of the integration of big data into the monitoring and evaluation of development programs identified a number of reasons for the slow adoption of big data/analytics by evaluation offices:

  • Weak institutional links between data development centers and evaluation offices
  • Differences of methodology and the approach to data generation and analysis
  • Issues concerning data quality
  • Concerns by evaluators about the commercial, political and ethical nature of how big data is generated, controlled and used.

(Linda Raftree talks about a number of other reasons why parts of the development sector may be slow to adopt big data.)

Key questions for the future of evaluation in international development…

The above gives rise to two sets of questions concerning the future role of evaluation in international development:

  • The future direction of development evaluation. Given the rapid expansion of big data in international development, it is likely there will be a move towards integrated program information systems. These will begin to generate, analyze and synthesize data for program selection, design, management, monitoring, evaluation and dissemination. A possible scenario is that program evaluation will no longer be considered a specialized function that is the responsibility of a separate evaluation office, rather it will become one of the outputs generated from the program data base. If this happens, evaluation may be designed and implemented not by evaluation specialists using conventional evaluation methods (experimental and quasi-experimental designs, theory-based evaluation) but by data analysts using methods such as predictive analytics and machine learning.

Key Question: Is this scenario credible? If so how widespread will it become and over what time horizon? Is it likely that evaluation will become one of the outputs of an integrated management information system? And if so is it likely that many of the evaluation functions will be taken over by big data analysts?

  • The changing role of development evaluators and the evaluation office. We argued that currently many or perhaps most development evaluators are not very familiar with big data/analytics, and even fewer apply these approaches. There are both professional reasons (how evaluators and data scientists are trained) and organizational reasons (the limited formal links between evaluation offices and data centers in many organizations) that explain the limited adoption of big data approaches by evaluators. So, assuming the above scenario proves to be at least partially true, what will be required for evaluators to become sufficiently conversant with these new approaches to be able to contribute to how big data/focused evaluation approaches are designed and implemented? According to Pete York at Communityscience.com, the big challenge and opportunity for evaluators is to ensure that the scientific method becomes an essential part of the data analytics toolkit. Recent studies by the Global Environmental Faciity (GEF) illustrate some of the ways that big data from sources such as satellite images and remote sensors can be used to strengthen conventional quasi-experimental evaluation designs. In a number of evaluations these data sources used propensity score matching to select matched samples for pretest-posttest comparison group designs to evaluate the effectiveness of programs to protect forest cover or reserves for mangrove swamps.

Key Question: Assuming there will be a significant change in how the evaluation function is organized and managed, what will be required to bridge the gap between evaluators and data analysts? How likely is it that the evaluators will be able to assume this new role and how likely is it that organizations will make the necessary adjustments to facilitate these transformations?

What do you think? How will these scenarios play out?

Note: Stay tuned for Michael’s next post focusing on how to build bridges between evaluators and big data analysts.

Below are some useful references if you’d like to read more on this topic:

Anderson, C (2008) “The end of theory: The data deluge makes the scientific method obsolete” Wired Magazine 6/23/08. The original article in the debate on whether big data analytics requires a theoretical framework.

Bamberger, M., Raftree, L and Olazabal, V (2016) The role of new information and communication technologies in equity–focused evaluation: opportunities and challenges. Evaluation. Vol 22(2) 228–244 . A discussion of the ethical issues and challenges with new information technology

Bamberger, M (2017) Integrating big data into the monitoring and evaluation of development programs. UN Global Pulse with support from the Rockefeller Foundation. Review of progress in the incorporation of new information technology into development programs and the opportunities and challenges of building bridges between evaluators and big data specialists.

Meier , P (2015) Digital Humanitarians: How big data is changing the face of humanitarian response. CRC Press. A review, with detailed case studies, of how digital technology is being used by NGOs and civil society.

O’Neill, C (2016) The weapons of math destruction: How big data increases inequality and threatens democracy.   How widely-used digital algorithms negatively affect the poor and marginalized sectors of society. Crown books.

Petersson, G.K and Breul, J.D (editors) (2017) Cyber society, big data and evaluation. Comparative policy evaluation. Volume 24. Transaction Publications. The evolving role of evaluation in cyber society.

Wolf, G The quantified self [TED Talk]  Quick overview of the multiple self-monitoring measurements that you can collect on yourself.

World Bank (2016). Digital Dividends. World Development Report. Overview of how the expansion of digital technology is affecting all areas of our lives.

Buckets of data for MERL

by Linda Raftree, Independent Consultant and MERL Tech Organizer

It can be overwhelming to get your head around all the different kinds of data and the various approaches to collecting or finding data for development and humanitarian monitoring, evaluation, research and learning (MERL).

Though there are many ways of categorizing data, lately I find myself conceptually organizing data streams into four general buckets when thinking about MERL in the aid and development space:

  1. ‘Traditional’ data. How we’ve been doing things for(pretty much)ever. Researchers, evaluators and/or enumerators are in relative control of the process. They design a specific questionnaire or a data gathering process and go out and collect qualitative or quantitative data; they send out a survey and request feedback; they do focus group discussions or interviews; or they collect data on paper and eventually digitize the data for analysis and decision-making. Increasingly, we’re using digital tools for all of these processes, but they are still quite traditional approaches (and there is nothing wrong with traditional!).
  2. ‘Found’ data.  The Internet, digital data and open data have made it lots easier to find, share, and re-use datasets collected by others, whether this is internally in our own organizations, with partners or just in general.These tend to be datasets collected in traditional ways, such as government or agency data sets. In cases where the datasets are digitized and have proper descriptions, clear provenance, consent has been obtained for use/re-use, and care has been taken to de-identify them, they can eliminate the need to collect the same data over again. Data hubs are springing up that aim to collect and organize these data sets to make them easier to find and use.
  3. ‘Seamless’ data. Development and humanitarian agencies are increasingly using digital applications and platforms in their work — whether bespoke or commercially available ones. Data generated by users of these platforms can provide insights that help answer specific questions about their behaviors, and the data is not limited to quantitative data. This data is normally used to improve applications and platform experiences, interfaces, content, etc. but it can also provide clues into a host of other online and offline behaviors, including knowledge, attitudes, and practices. One cautionary note is that because this data is collected seamlessly, users of these tools and platforms may not realize that they are generating data or understand the degree to which their behaviors are being tracked and used for MERL purposes (even if they’ve checked “I agree” to the terms and conditions). This has big implications for privacy that organizations should think about, especially as new regulations are being developed such a the EU’s General Data Protection Regulations (GDPR). The commercial sector is great at this type of data analysis, but the development set are only just starting to get more sophisticated at it.
  4. ‘Big’ data. In addition to data generated ‘seamlessly’ by platforms and applications, there are also ‘big data’ and data that exists on the Internet that can be ‘harvested’ if one only knows how. The term ‘Big data’ describes the application of analytical techniques to search, aggregate, and cross-reference large data sets in order to develop intelligence and insights. (See this post for a good overview of big data and some of the associated challenges and concerns). Data harvesting is a term used for the process of finding and turning ‘unstructured’ content (message boards, a webpage, a PDF file, Tweets, videos, comments), into ‘semi-structured’ data so that it can then be analyzed. (Estimates are that 90 percent of the data on the Internet exists as unstructured content). Currently, big data seems to be more apt for predictive modeling than for looking backward at how well a program performed or what impact it had. Development and humanitarian organizations (self included) are only just starting to better understand concepts around big data how it might be used for MERL. (This is a useful primer).

Thinking about these four buckets of data can help MERL practitioners to identify data sources and how they might complement one another in a MERL plan. Categorizing them as such can also help to map out how the different kinds of data will be responsibly collected/found/harvested, stored, shared, used, and maintained/ retained/ destroyed. Each type of data also has certain implications in terms of privacy, consent and use/re-use and how it is stored and protected. Planning for the use of different data sources and types can also help organizations choose the data management systems needed and identify the resources, capacities and skill sets required (or needing to be acquired) for modern MERL.

Organizations and evaluators are increasingly comfortable using mobile and/or tablets to do traditional data gathering, but they often are not using ‘found’ datasets. This may be because these datasets are not very ‘find-able,’ because organizations are not creating them, re-using data is not a common practice for them, the data are of questionable quality/integrity, there are no descriptors, or a variety of other reasons.

The use of ‘seamless’ data is something that development and humanitarian agencies might want to get better at. Even though large swaths of the populations that we work with are not yet online, this is changing. And if we are using digital tools and applications in our work, we shouldn’t let that data go to waste if it can help us improve our services or better understand the impact and value of the programs we are implementing. (At the very least, we had better understand what seamless data the tools, applications and platforms we’re using are collecting so that we can manage data privacy and security of our users and ensure they are not being violated by third parties!)

Big data is also new to the development sector, and there may be good reason it is not yet widely used. Many of the populations we are working with are not producing much data — though this is also changing as digital financial services and mobile phone use has become almost universal and the use of smart phones is on the rise. Normally organizations require new knowledge, skills, partnerships and tools to access and use existing big data sets or to do any data harvesting. Some say that big data along with ‘seamless’ data will one day replace our current form of MERL. As artificial intelligence and machine learning advance, who knows… (and it’s not only MERL practitioners who will be out of a job –but that’s a conversation for another time!)

Not every organization needs to be using all four of these kinds of data, but we should at least be aware that they are out there and consider whether they are of use to our MERL efforts, depending on what our programs look like, who we are working with, and what kind of MERL we are tasked with.

I’m curious how other people conceptualize their buckets of data, and where I’ve missed something or defined these buckets erroneously…. Thoughts?

Better or different or both?

by Linda Raftree, Independent Consultant and MERL Tech Organizer

As we delve into why, when, where, if, and how to incorporate various types of technology and digital data tools and approaches into monitoring, evaluation, research and learning (MERL), it can be helpful to think about MERL technologies from two angles:

  1. Doing our work better:  How can new technologies and approaches help us do what we’ve always done — the things that we know are working and having an impact — but do them better? (E.g., faster, with higher quality, more efficiently, less expensively, with greater reach or more inclusion of different voices)
  2. Doing our work differently:  What brand new, previously unthinkable things can be done because of new technologies and approaches? How might these totally new ideas contribute positively to our work or push us to work in an entirely different way.

Sometimes these two things happen simultaneously and sometimes they do not.  Some organizations are better at Thing 1, and others are set-up well to explore Thing 2. Not all organizations need to feel pressured into doing Thing 2; however, and sometimes it can be a distraction from Thing 1. Some organizations may be better off letting early adopters focus on Thing 2 and investing their own budgets and energy in Thing 1 until innovations have been tried and tested by the early adopters. Organizations may also have staff members or teams working on both Thing 1 and Thing 2 separately. Others may conceptualize this as process or pathway moving from Thing 2 to Thing 1, where Thing 2 (once tested and evaluated) is a pipeline into Thing 1.

Here are some potentially useful past discussions on the topic of innovations within development organizations that flesh out some of these thoughts:

Many of the new tools and approaches that were considered experimental 10 years ago have moved from being “brand new and innovative” to simply “helping us do what we’ve always done.” Some of these earlier “innovations” are related to digital data and data collection and processing, and they help us do better monitoring, evaluation and research.

On the flip side, monitoring, evaluation and research have played a key role in helping organizations and the sector overall learn more about how, where, when, why and in what contexts these different tools and approaches (including digital data for MERL) can be adopted. MERL on ICT4D and Digital Development approaches can help calibrate the “hype cycle” and weed out the shiny new tools and approaches that are actually not very effective or useful to the sector and highlight those that cause harm or put people at risk.

There are always going to be new tools and approaches that emerge. Humanitarian and development organizations, then, need to think strategically about what kind of organization they are (or want to be) and where they fit on the MERL Tech continuum between Thing 1 and Thing 2.

What capacities does an organization have for working on Thing 2 (brand new and different)? When and for how long should an organization focus on Thing 1, building on what it knows is working or could work, keeping an eye on the early adopters who are working on Thing 2. When does an organization have enough “proof” to start adopting new tools and approaches that seem to add value? How are these new tools and approaches being monitored, evaluated and researched to improve our use of them?

It’s difficult for widespread adoption to happen in the development space, where there is normally limited time and capacity for failure or for experimentation, without solid MERL. And even with “solid MERL” it can be difficult for organizations to adapt and change due to a multitude of factors, both internal and external.

I’m looking forward to September’s MERL Tech Conference in DC where we have some sessions that explore “the MERL on ICT4MERL?” and others that examine aspects of organizational change related to adopting newer MERL Tech tools and approaches.

(Register here if you haven’t already!)

 

 

Discrete choice experiment (DCE) to generate weights for a multidimensional index

In his MERL Tech Lightning Talk, Simone Lombardini, Global Impact Evaluation Adviser, Oxfam, discussed his experience with an innovative method for applying tech to help determine appropriate metrics for measuring concepts that escape easy definition. To frame his talk, he referenced Oxfam’s recent experience with using discrete choice experiments (DCE) to establish a strategy for measuring women’s empowerment.

Two methods already exist, Simone points out, for transforming soft concepts into hard metrics. First, the evaluator could assume full authority and responsibility over defining the metrics. Alternatively, the evaluator could design the evaluation so that relevant stakeholders are incorporated into the process and use their input to help define the metrics.

Though both methods are common, they are missing (for practical reasons) the level of mass input that could make them truly accurate reflections of the social perception of whatever concept is being considered. Tech has a role to play in scaling the quantity of input that can be collected. If used correctly, this could lead to better evaluation metrics.

Simone described this approach as “context-specific” and “multi-dimensional.” The process starts by defining the relevant characteristics (such as those found in empowered women) in their social context, then translating these characteristics into indicators, and finally combining indicators into one empowerment index for evaluating the project.

After the characteristics are defined, a discrete choice experiment can be used to determine its “weight” in a particular social context. A discrete choice experiment (DCE) is a technique that’s frequently been used in health economics and marketing, but not much in impact evaluation. To implement a DCE, researchers present different hypothetical scenarios to respondents and ask them to decide which one they consider to best reflect the concept in question (i.e. women’s empowerment). The responses are used to assess the indicators covered by the DCE, and these can then be used to develop an empowerment index.

This process was integrated into data collection process and added 10 mins at the end of a one hour survey, and was made practicable due to the ubiquity of smartphones. The results from Oxfam’s trial run using this method are still being analyzed. For more on this, watch Lombardini’s video below!

Community-led mobile research–What could it look like?

Adam Groves, Head of Programs at On Our Radar, gave a presentation at MERL Tech London in February where he elaborated on a new method for collecting qualitative ethnographic data remotely.

The problem On Our Radar sought to confront, Adam declares, is the cold and impenetrable bureaucratic machinery of complex organizations. To many people, the unresponsiveness and inhumanity of the bureaucracies that provide them with services is dispiriting, and this is a challenge to overcome for anyone that wants to provide a quality service.

On Our Radar’s solution is to enable people to share their real-time experiences of services by recording audio and SMS diaries with their basic mobile phones. Because of the intimacy they capture, these first-person accounts have the capacity to grab the people behind services and make them listen to and experience the customer’s thoughts and feelings as they happened.

Responses obtained from audio and SMS diaries are different from those obtained from other qualitative data collection methods because, unlike solutions that crowdsource feedback, these diaries contain responses from a small group of trained citizen reporters that share their experiences in these diaries over a sustained period of time. The product is a rich and textured insight into the reporters’ emotions and priorities. One can track their journeys through services and across systems.

On Our Radar worked with British Telecom (BT) to implement this technique. The objective was to help BT understand how their customers with dementia experience their services. Over a few weeks, forty people living with dementia recorded audio diaries about their experiences dealing with big companies.

Adam explained how the audio diary method was effective for this project:

  • Because diaries and dialogues are in real time, they captured emotional highs and lows (such as the anxiety of picking up the phone and making a call) that would not be recalled in post fact interviews.
  • Because diaries are focused on individuals and their journeys instead of on discrete interactions with specific services, they showed how encountering seemingly unrelated organizations or relationships impacted users’ experiences of BT. For example, cold calls became terrifying for people with dementia and made them reluctant to answer the phone for anyone.
  • Because this method follows people’s experiences over time, it allows researchers to place individual pain points and problems in the context of a broader experience.
  • Because the data is in first person and in the moment, it moved people emotionally. Data was shared with call center staff and managers, and they found it compelling. It was an emotional human story told in one’s own words. It invited decision makers to walk in other people’s shoes.

On Our Radar’s future projects include working in Sierra Leone with local researchers to understand how households are changing their practices post-Ebola and a major piece of research with the London School of Hygiene and Tropical Medicine in Malaysia and the Philippines to gain insight on people’s understanding of their health systems.

For more, find a video of Adam’s original presentation below!

New Report: Global Innovations in Measurement and Evaluation

All 8 innovationsOn June 26th, New Philanthropy Capital (NPC) released its “Global Innovations in Measurement and Evaluation” report. In it, NPC outlines and elaborates on eight concepts that represent innovations in conducting effective measurement and evaluation of social impact programs. The list of concepts was distilled from conversations with leading evaluation experts about what is exciting in the field and what is most likely to make a long-lasting impact on the practice of evaluation. Below, we feature each of these eight concepts accompanied by brief descriptions of their meanings and implications.

User-Centric

The key to making an evaluation user-centric is to ensure that the service users are truly involved in every stage of the evaluation process. In this way, the power dynamic ceases to be unidirectional as more agency is given to the user. As a result, not only can findings become more compelling to decision makers because of more robust data collection, but also those responsible for the program now become accountable to the users in addition to the funders, a shift that is both ethically important and that is important for the trust it builds.

Shared Measurement & Evaluation

Shared measurement and evaluation requires multiple organizations with similar missions, programs or users to work together to measure their own and their combined impact. This involves using the same evaluation metrics and, at a more advanced stage, developing shared measurement tools and methodologies. Pooling data and comparing outcomes creates a bigger dataset that can support stronger conclusions and provide more insights.

Theory-Based Evaluation

The central idea behind theory-based evaluation is to not only measure the outcome of a program but to also get at the reason why it does or does not work. Typically, this approach begins with a theory of change that proposes an explanation for how activities lead to impact, and this theory is then tested and accepted, refuted or qualified. It is important to apply this concept because without an understanding of why programs work, there is a risk that mistakes will be repeated or that attempts to replicate a program will fail when attempted under different conditions.

Impact Management

Impact management is the integration of impact assessment into strategy and performance management by regularly collecting data and responding to it with course corrections designed to improve the outcomes of a program. This method contrasts with assessment strategies that only examine a program at the end of its life cycle. The objective here is to be flexible and adaptive in order to produce a more effective intervention rather than waiting to evaluate it until there is nothing that can be done to change it.

Data Linkage

Data linkage is the act of bringing together different but relevant data about a specified group of users from beyond a single organization or sub-sector dataset. One example could be a homelessness charity that supports its users in accessing social housing linking its data with the local council to see if its users ultimately remained in their homes. In essence, this method allows organizations to leverage the increasing quantities of data to create comparison groups to track the long term impacts of their programs.

Big Data

Big data is typically considered as the data generated as a by-product of digital transactions and interactions. It is a category that includes people’s social media activity, web searches and digital financial transaction trails. New technology has expanded the human ability to analyze large datasets, and consequently big data has become a powerful tool for helping identify trends and patterns, even if it does not provide explanations for them.

Remote Sensing

Remote sensing uses technology, such as mobile phones, to gather information from afar. This method is useful because it allows one to collect data that may not be typically accessible. Additionally, remote sensing data can be highly detailed, accurate, and in real time. Finally, one of its great strengths is that it is generated passively, which reduces the possibility of introducing researcher bias through human input.

Data Visualization

Data visualization is the practice of presenting data in a graphic form. New technology has made it possible to create a broad range of useful visualizations. The result is that data is now more accessible to non-specialists, and the insights produced through analysis can now be better understood and communicated.

For more details and more examples of real-world applications of these concepts, check out the full “Global Innovations in Measurement and Evaluation” report here.