by Zach Tilton, a Peacebuilding Evaluation Consultant and a Doctoral Research Associate at the Interdisciplinary PhD in Evaluation program at Western Michigan University.
In 2013 Dan Airley quipped“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it….” In 2015 the metaphor was imported to the international development sector by Ben Ramalingam, in 2016 it became a MERL Tech DC lightning talk, and has been ringing in our ears ever since. So, what about 2018? Well, unlike US national trends in teenage sex, there are some signals that big or at least‘bigger’ data is continuing to make its way not only into the realm of digital development, but also evaluation. I recently attended the 2018 MERL Tech DC pre-conference workshop Big Data and Evaluation where participants were introduced to real ways practitioners are putting this trope to bed(sorry, not sorry). In this blog post I share some key conversations from the workshop framed against the ethics of using this new technology, but to do that let me first provide some background.
I entered the workshop on my heels. Given the recent spate of security breaches and revelations about micro-targeting,‘Big Data’ has been somewhat of a boogie-man for myself and others. I have taken some pains to limit my digital data-footprint, have written passionately about big data and surveillance capitalism, and have long been skeptical of big data applications for serving marginalized populations in digital development and peacebuilding. As I found my seat before the workshop started I thought,“Is it appropriate or ethical to use big data for development evaluation?” My mind caught hold of a 2008 Evaluation Café debate between evaluation giants Michael Scriven and Tom Cook on causal inference in evaluation and the ethics of Randomized Control Trials. After hearing Scriven’s concerns about the ethics of withholding interventions from control groups, Cook asks,“But what about the ethics of not doing randomized experiments?” He continues,“What about the ethics of having causal information that is in fact based on weaker evidence and is wrong? When this happens, you carry on for years and years with practices that don’t work whose warrant lies in studies that are logically weaker than experiments provide.”
While I sided with Scriven for most of that debate, this question haunted me. It reminded me of an explanation of structural violence by peace researcher Johan Galtung who writes,“If a person died from tuberculosis in the eighteenth century it would be hard to conceive of this as violence since it might have been quite unavoidable, but if he dies from it today, despite all the medical resources in the world, then violence is present according to our definition.” Galtung’s intellectual work on violence deals with the difference between potential and the actual realizations and what increases that difference. While there are real issues with data responsibility, algorithmic biases, and automated discrimination that need to be addressed, if there are actually existing technologies and resources not being used to address social and material inequities in the world today, is this unethical, even violent?“What about the ethics of not using big data?” I asked myself back. The following are highlights of the actually existing resources for using big data in the evaluation of social amelioration.
Actually Existing Data
During the workshop, Kerry Bruce from Social Impact shared with participants her personal mantra,“We need to do a better job of secondary data analysis before we collect any more primary data.” She challenged us to consider how to make use of the secondary data available to our organizations. She gave examples of potential big data sources such as satellite images, remote sensors, GPS location data, social media, internet searches, call-in radio programs, biometrics, administrative data and integrated data platforms that merge many secondary data files such as public records and social service agency and client files. The key here is there are a ton of actually existing data, many of which are collected passively, digitally, and longitudinally. Despite noting real limitations to accessing existing secondary data, including donor reluctance to fund such work, limited training in appropriate methodologies in research teams, and differences in data availability between contexts, to underscore the potential of using secondary data, she shared a case study where she lead a team to use large amounts of secondary indirect data to identify ecosystems of modern day slavery at a significantly reduced cost than collecting the data first-hand. The outputs of this work will help pinpoint interventions and guide further research into the factors that may lead to predicting and prescribing what works well for stopping people from becoming victims of slavery.
Actually Existing Tech(and math)
Peter York from BCT Partners provided a primer on big data and data science including the reality-check that most of the work is the unsexy “ETL,” or the extraction, transformation, and loading of data. He contextualized the potential of the so-called big data revolution by reminding participants that the V’s of big data, Velocity, Volume, and Variety, are made possible by the technological and social infrastructure of increasingly networked populations and how these digital connections enable the monitoring, capturing, and tracking of ever increasing aspects of our lives in an unprecedented way. He shared,“A lot of what we’ve done in research were hacks because we couldn’t reach entire populations.” With advances in the tech stacks and infrastructure that connect people and their internet-connected devices with each other and the cloud, the utility of inferential statistics and experimental design lessens when entire populations of users are producing observational behavior data. When this occurs, evaluators can apply machine learning to discover the naturally occurring experiments in big data sets, what Peter terms‘Data-driven Quasi-Experimental Design.’ This is exactly what Peter does when he builds causal models to predict and prescribe better programs for child welfare and juvenile justice to automate outcome evaluation, taking cues from precision medicine.
One example of a naturally occurring experiment was the 1854 Broad Street cholera outbreak in which physician John Snow used a dot map to identify a pattern that revealed the source of the outbreak, the Broad Street water pump. By finding patterns in the data John Snow was able to lay the groundwork for rejecting the false Miasma Theory and replace it with a proto-typical Germ Theory. And although he was already skeptical of miasma theory, by using the data to inform his theory-building he was also practicing a form of proto-typical Grounded Theory. Grounded theory is simply building theory inductively, after data collection and analysis, not before, resulting in theory that is grounded in data. Peter explained,“Machine learning is Grounded Theory on steroids. Once we’ve built the theory, found the pattern by machine learning, we can go back and let the machine learning test the theory.” In effect, machine learning is like having a million John Snows to pour over data to find the naturally occurring experiments or patterns in the maps of reality that are big data.
A key aspect of the value of applying machine learning in big data is that patterns more readily present themselves in datasets that are‘wide’ as opposed to‘tall.’ Peter continued,“If you are used to datasets you are thinking in rows. However, traditional statistical models break down with more features, or more columns.” So, Peter and evaluators like him that are applying data science to their evaluative practice are evolving from traditional Frequentist to Bayesian statistical approaches. While there is more to the distinction here, the latter uses prior knowledge, or degrees of belief, to determine the probability of success, where the former does not. This distinction is significant for evaluators who are wanting to move beyond predictive correlation to prescriptive evaluation. Peter expounded,“Prescriptive analytics is figuring out what will best work for each case or situation.” For example, with prediction, we can make statements that a foster child with certain attributes is 70% not likely to find a home. Using the same data points with prescriptive analytics we can find 30 children that are similar to that foster child and find out what they did to find a permanent home. In a way, only using predictive analytics can cause us to surrender while including prescriptive analytics can cause us to endeavor.
The last category of existing resources for applying big data for evaluation was mostly captured by the comments of independent evaluation consultant, Michael Bamberger. He spoke of the latent capacity that existed in evaluation professionals and teams, but that we’re not taking full advantage of big data: “Big data is being used by development agencies, but less by evaluators in these agencies. Evaluators don’t use big data, so there is a big gap.”
He outlined two scenarios for the future of evaluation in this new wave of data analytics: a state of divergence where evaluators are replaced by big data analysts and a state of convergence where evaluators develop a literacy with the principles of big data for their evaluative practice. One problematic consideration with this hypothetical is that many data scientists are not interested in causation, as Peter York noted. To move toward the future of convergence, he shared how big data can enhance the evaluation cycle from appraisal and planning through monitoring, reporting and evaluating sustainability. Michael went on to share a series of caveats emptor that include issues with extractive versus inclusive uses of big data, the fallacy of large numbers, data quality control, and different perspectives on theory, all of which could warrant their own blog posts for development evaluation.
While I deepened my basic understandings of data analytics including the tools and techniques, benefits and challenges, and guidelines for big data and evaluation, my biggest take away is reconsidering big data for social good by considering the ethical dilemma of not using existing data, tech, and capacity to improve development programs, possibly even prescribing specific interventions by identifying their probable efficacy through predictive models before they are deployed.
As we all know, big data and data science are becoming increasingly important in all aspects of our lives. There is a similar rapid growth in the applications of big data in the design and implementation of development programs. Examples range from the use of satellite images and remote sensors in emergency relief and the identification of poverty hotspots, through the use of mobile phones to track migration and to estimate changes in income (by tracking airtime purchases), social media analysis to track sentiments and predict increases in ethnic tension, and using smart phones on Internet of Things (IOT) to monitor health through biometric indicators.
Despite the rapidly increasing role of big data in development programs, there is speculation that evaluators have been slower to adopt big data than have colleagues working in other areas of development programs. Some of the evidence for the slow take-up of big data by evaluators is summarized in “The future of development evaluation in the age of big data”. However, there is currently very limited empirical evidence to test these concerns.
To try to fill this gap, my colleagues Rick Davies and Linda Raftree and I would like to invite those of you who are interested in big data and/or the future of evaluation to complete the attached survey. This survey, which takes about 10 minutes to complete asks evaluators to report on the data collection and data analysis techniques that you use in the evaluations you design, manage or analyze; while at the same time asking data scientists how familiar they are with evaluation tools and techniques.
The survey was originally designed to obtain feedback from participants in the MERL Tech conferences on “Exploring the Role of Technology in Monitoring, Evaluation, Research and Learning in Development” that are held annually in London and Washington, DC, but we would now like to broaden the focus to include a wider range of evaluators and data scientists.
One of the ways in which the findings will be used is to help build bridges between evaluators and data scientists by designing integrated training programs for both professions that introduce the tools and techniques of both conventional evaluation practice and data science, and show how they can be combined to strengthen both evaluations and data science research. “Building bridges between evaluators and big data analysts” summarizes some of the elements of a strategy to bring the two fields closer together.
The findings of the survey will be shared through this and other sites, and we hope this will stimulate a follow-up discussion. Thank you for your cooperation and we hope that the survey and the follow-up discussions will provide you with new ways of thinking about the present and potential role of big data and data science in program evaluation.
At MERL Tech London, 2018, we invited Michael Bamberger and Rick Davies to debate the question of whether the enthusiasm for Big Data in Evaluation is warranted. At their session, through a formal debate (skillfully managed by Shawna Hoffman from The Rockefeller Foundation) they discussed whether Big Data and Evaluation would eventually converge, whether one would dominate the other, how can and should they relate to each other, and what risks and opportunities there are in this relationship.
Following the debate, Michael and Risk wanted to continue the discussion — this time exploring the issues in a more conversational mode on the MERL Tech Blog, because in practice both of them see more than one side to the issue.
So, what do Rick and Michael think — will big data integrate with evaluation — or is it all just hype?
Rick: In the MERL Tech debate I put a lot of emphasis on the possibility that evaluation, as a field, would be overwhelmed by big data / data science rhetoric. But since then I have been thinking about a countervailing development, which is that evaluative thinking is pushing back against unthinking enthusiasm for the use of data science algorithms. I emphasise “evaluative thinking” rather than “evaluators” as a category of people, because a lot of this pushback is coming from people who would not identify themselves as evaluators. There are different strands to this evaluative response.
One is a social justice perspective, reflected in recent books such as “Weapons of Math Destruction”, “Automated Inequality”, and “Algorithms of Oppression” which emphasise the human cost of poorly designed and or poorly supervised use of algorithms using large amounts of data to improve welfare and justice administration. Another strand is more like a form of exploratory philosophy, and has focused on how it might be possible to define “fairness” when designing and evaluating algorithms that have consequences for human welfare[ See 1, 2, 3, 4]. Another strand is perhaps more technical in focus, but still has a value concern. This is the literature on algorithmic transparency. Without transparency it is difficult to assess fairness [See 5, 6, ] Neural networks are often seen as a particular challenge. Associated with this are discussions about “the right to explanation” and what this means in practice[1,]
In parallel there is also some infiltration of data science thinking into mainstream evaluation practice. DFID is funding the World Bank’s Strategic Impact Evaluation Fund (SIEF) latest call for “nimble evaluations” . These are described as rapid and low cost and likely to take the form of an RCT but ones which are focused on improving implementation rather than assessing overall impact . This type of RCT is directly equivalent to A/B testing used by the internet giants to improve the way their platforms engage with their users. Hopefully these nimble approaches may bring a more immediate benefit to the people’s lives than RCTs which have tried to assess the impact of a whole project and then inform the design of subsequent projects.
Another recent development is the World Bank’s Data Science competition , where participants are being challenged to develop predictive models of household poverty status, based on World Bank Household Survey data. The intention is that they should provide a cheaper means of identifying poor households than simply relying on what can be very expensive and time consuming nationwide household surveys. At present the focus on the supporting website is very technical. As far as I can see there is no discussion of how the winning prediction model will be used and an how any risks of adverse effects might be monitored and managed. Yet as I suggested at MERLTech London, most algorithms used for prediction modelling will have errors. The propensity to generate False Positives and False Negatives is machine learning’s equivalent of original sin. It is to be expected, so it should be planned for. Plans should include systematic monitoring of errors and a public policy for correction, redress and compensation.
Michael: These are both important points, and it is interesting to think what conclusions we can draw for the question before us. Concerning the important issue of algorithmic transparency (AT), Rick points out that a number of widely discussed books and articles have pointed out the risk that the lack of AT poses for democracy and particularly for poor and vulnerable groups. Virginia Eubanks, one of the authors cited by Rick, talks about the “digital poorhouse” and how unregulated algorithms can help perpetuate an underclass. However, I think we should examine more carefully how evaluators are contributing to this discussion. My impression, based on very limited evidence is that evaluators are not at the center — or even perhaps the periphery — of this discussion. Much of the concern about these issues is being generated by journalists, public administration specialists or legal specialists. I argued in an earlier MERL Tech post that many evaluators are not very familiar with big data and data analytics and are often not very involved in these debates. This is a hypothesis that we hope readers can help us to test.
Rick’s second point, about the infiltration of data science into evaluation is obviously very central to our discussion. I would agree that the World Bank is one of the leaders in the promotion of data science, and the example of “nimble evaluation” may be a good example of convergence between data science and evaluation. However, there are other examples where the Bank is on the cutting edge of promoting new information technology, but where potential opportunities to integrate technology and evaluation do not seem to have been taken up. An example would be the Bank’s very interesting Big Data Innovation Challenge, which produced many exciting new applications of big data to development (e.g. climate smart agriculture, promoting financial inclusion, securing property rights through geospatial data, and mapping poverty through satellites). The use of data science to strengthen evaluation of the effectiveness of these interventions, however, was not mentioned as one of the objectives or outputs of this very exciting program.
It would also be interesting to explore to what extent the World Bank Data Science competition that Rick mentions resulted in the convergence of data science and evaluation, or whether it was simply testing new applications of data science.
Finally, I would like to mention two interesting chapters in Cybersociety, Big Data and Evaluation edited by Petersson and Breul (2017, Transaction Publications). One chapter (by Hojlund et al) reports on a survey which found that only 50% of professional evaluators claimed to be familiar with the basic concepts of big data, and only about 10% reported having used big data in an evaluation. In another chapter, Forss and Noren reviewed a sample of Terms of Reference (TOR) for evaluations conducted by different development agencies, where they found that none of the 25 TOR specifically required the evaluators to incorporate big data into their evaluation design.
It is difficult to find hard evidence on the extent to which evaluators are familiar with, sympathetic to, or using big data into their evaluations, but the examples mentioned above show that there are important questions about the progress made towards the convergence of evaluation and big data.
We invite readers to share their experiences both on how the two professions are starting to converge, or on the challenges that slow down, or even constrain the process of convergence.
by Shawna Hoffman, Specialist, Measurement, Evaluation and Organizational Performance at the Rockefeller Foundation.
Both the volume of data available at our fingertips and the speed with which it can be accessed and processed have increased exponentially over the past decade. The potential applications of this to support monitoring and evaluation (M&E) of complex development programs has generated great excitement. But is all the enthusiasm warranted? Will big data integrate with evaluation — or is this all just hype?
Before we began, we used Mentimeter to see where the audience stood on the topic:
Once the votes were in, we started.
Both Michael and Rick have fairly balanced and pragmatic viewpoints; however, for the sake of a good debate, and to help unearth the nuances and complexity surrounding the topic, they embraced the challenge of representing divergent and polarized perspectives – with Michael arguing in favor of integration, and Rick arguing against.
“Evaluation is in a state of crisis,” Michael argued, “but help is on the way.” Arguments in favor of the integration of big data and evaluation centered on a few key ideas:
There are strong use cases for integration. Data science tools and techniques can complement conventional evaluation methodology, providing cheap, quick, complexity-sensitive, longitudinal, and easily analyzable data.
Integration is possible. Incentives for cross-collaboration are strong, and barriers to working together are reducing. Traditionally these fields have been siloed, and their relationship has been characterized by a mutual lack of understanding of the other (or even questioning of the other’s motivations or professional rigor). However, data scientists are increasingly recognizing the benefits of mixed methods, and evaluators are seeing the potential to use big data to increase the number of types of evaluation that can be conducted within real-world budget, time and data constraints. There are some compelling examples (explored in this UN Global Pulse Report) of where integration has been successful.
Integration is the right thing to do. New approaches that leverage the strengths of data science and evaluation are potentially powerful instruments for giving voice to vulnerable groups and promoting participatory development and social justice. Without big data, evaluation could miss opportunities to reach the most rural and remote people. Without evaluation (which emphasizes transparency of arguments and evidence), big data algorithms can be opaque “black boxes.”
While this may paint a hopeful picture, Rick cautioned the audience to temper its enthusiasm. He warned of the risk of domination of evaluation by data science discourse, and surfaced some significant practical, technical, and ethical considerations that would make integration challenging.
First, big data are often non-representative, and the algorithms underpinning them are non-transparent. Second, “the mechanistic approaches offered by data science, are antithetical to the very notion of evaluation being about people’s values and necessarily involving their participation and consent,” he argued. It is – and will always be – critical to pay attention to the human element that evaluation brings to bear. Finally, big data are helpful for pattern recognition, but the ability to identify a pattern should not be confused with true explanation or understanding (correlation ≠ causation). Overall, there are many problems that integration would not solve for, and some that it could create or exacerbate.
The debate confirmed that this question is complex, nuanced, and multi-faceted. It helped to remind that there is cause for enthusiasm and optimism, at the same time as a healthy dose of skepticism. What was made very clear is that the future should leverage the respective strengths of these two fields in order to maximize good and minimize potential risks.
In the end, the side in favor of integration of big data and evaluation won the debate by a considerable margin.
The future of integration looks promising, but it’ll be interesting to see how this conversation unfolds as the number of examples of integration continues to grow.
Interested in learning more and exploring this further? Stay tuned for a follow-up post from Michael and Rick. You can also attend MERL Tech DC in September 2018 if you’d like to join in the discussions in person!
In Part 1 of this series we argued that, while applications of big data and data analytics are expanding rapidly in many areas of development programs, evaluators have been slow to adopt these applications. We predicted that one possible future scenario could be that evaluation may no longer be considered as a separate function, and that it may be treated as one of the outputs of the integrated information systems that will gradually be adopted by many development agencies. Furthermore, many evaluations will use data analytics approaches, rather than conventional evaluation designs. (Image: Big Data session notes from USAIDLearning’s Katherine Haugh [@katherine_haugh}. MERL Tech DC 2016).
Here, in Part 2 we identify some of the reasons why development evaluators have been slow to adopt big data analytics and we propose some promising approaches for building bridges between evaluators and data analysts.
Why have evaluators been slow to adopt big data analytics?
1. Weak institutional linkages. Over the past few years some development agencies have created data centers to explore ways to exploit new information technologies. These centers are mainly staffed by people with a background in data science or statistics and the institutional links to the agency’s evaluation office are often weak.
2. Many evaluators have limited familiarity with big data/analytics. Evaluation training programs tend to only present conventional experimental, quasi-experimental and mixed-methods/qualitative designs. They usually do not cover smart data analytics (see Part 1 of this blog). Similarly, many data scientists do not have a background in conventional evaluation methodology (though there are of course exceptions).
3. Methodological differences. Many big data approaches do not conform to the basic principles that underpin conventional program evaluation, for example:
Data quality: real-time big data provides one of the potentially most powerful sources of data for development programs. Among other things, real-time data can provide early warning signals of potential diseases (e.g. Google Flu), ethnic tension, drought and poverty (Meier 2015). However, when an evaluator asks if the data is biased or of poor quality, the data analyst may respond “Sure the data is biased (e.g. only captured from mobile phone users or twitter feeds) and it may be of poor quality. All data is biased and usually of poor quality, but it does not matter because tomorrow we will have new data.” This reflects the very different kinds of data that evaluators and data analysts typically work with, and the difference can be explained, but a statement such as the above can create the impression that data analysts do not take issues of bias and data quality very seriously.
Data mining: Many data analytics methods are based on the mining of large data sets to identify patterns of correlation, which are then built into predictive models, normally using Bayesian statistics. Many evaluators frown on data mining due to its potentially identifying spurious associations.
The role of theory: Most (but not all) evaluators believe that an evaluation design should be based on a theoretical framework (theory of change or program theory) that hypothesizes the processes through which the intended outcomes will be achieved. In contrast, there is plenty of debate among data analysts concerning the role of theory, and whether it is necessary at all. Some even go as far as to claim that data analytics means “the end of theory”(Anderson 2008). This, combined with data mining, creates the impression among some evaluators that data analytics uses whatever data is easily accessible with no theoretical framework to guide the selection of evaluation questions as to assess the adequacy of available data.
Experimental designs versus predictive analytics: Most quantitative evaluation designs are based on an experimental or quasi-experimental design using a pretest/posttest comparison group design. Given the high cost of data collection, statistical power calculations are frequently used to estimate the minimum size of sample required to ensure a certain level of statistical significance. Usually this means that analysis can only be conducted on the total sample, as sample size does not permit statistical significance testing for sub-samples. In contrast, predictive analytics usually employ Bayesian probability models. Due to the low cost of data collection and analysis, it is usually possible to conduct the analysis on the total population (rather than a sample), so that disaggregated analysis can be conducted to compare sub-populations, and often (particularly when also using machine learning) to compute outcome probabilities for individual subjects. There continues to be heated debates concerning the merits of each approach, and there has been much less discussion of how experimental and predictive analytics approaches could complement each other.
As Pete York at CommunityScience.com observes: “Herein lies the opportunity – we evaluators can’t battle the wave of big data and data science that will transform the way we do research. However, we can force it to have to succumb to the rules of objective rigor via the scientific method. Evaluators/researchers train people how to do it, they can train machines. We are already doing so.” (Personal communication 8/7/17)
4. Ethical and political concerns: Many evaluators also have concerns about who designs and markets big data apps and who benefits financially. Many commercial agencies collect data on low income populations (for example their consumption patterns) which may then be sold to consumer products companies with little or no benefit going to the populations from which the information is collected. Some of the algorithms may also include a bias against poor and vulnerable groups (O’Neil 2016) that are difficult to detect given the proprietary nature of the algorithms.
Another set of issues concern whether the ways in which big data are collected and used (for making decisions affecting poor and vulnerable groups) tends to be exclusive (governments and donors use big data to make decisions about programs affecting the poor without consulting them), or whether big data is used to promote inclusion (giving voice to vulnerable groups). These issues are discussed in a recent Rockefeller Foundation blog. There are also many issues around privacy and data security. There is of course no simple answer to these questions, but many of these concerns are often lurking in the background when evaluators are considering the possibility of incorporating big data into their evaluations.
Table 1. Reasons evaluators have been slow to adopt big data and opportunities for bridge building between evaluators and data analysts
Reason for slow adoption
Opportunities for bridge building
1. Weak institutional linkages
Strengthening formal and informal links between data centers and evaluators
2. Evaluators have limited knowledge about big data and data analytics
Capacity development programs covering both big data and conventional evaluation
Collaborative pilot evaluation projects
3. Methodological differences
Creating opportunities for dialogue to explore differences and to determine how they can be reconciled
Viewing data analytics and evaluation as being complementary rather than competing
4. Ethical and political concerns about big data
Greater focus on ethical codes of conduct, privacy and data security
Focusing on making approaches to big data and evaluation inclusive and avoiding exclusive/extractive approaches
Building bridges between evaluators and big data/analytics
There are a number of possible steps that could be taken to build bridges between evaluators and big data analysts, and thus to promote the integration of big data into development evaluation. Catherine Cheney (2016) presents interviews with a number of data scientists and development practitioners stressing that data driven development needs both social and computer scientists. No single approach is likely to be successful, and the best approach(es) will depend on each specific context, but we could consider:
Strengthening the formal and informal linkages between data centers and evaluation offices. It may be possible to achieve this within the existing organizational structure, but it will often require some formal organizational changes in terms of lines of communication. Linda Raftree provides a useful framework for understanding how different “buckets” of data (including among others, traditional data and big data) can be brought together, which suggests one pathway to collaboration between data centers and evaluation offices.
Identifying opportunities for collaborative pilot projects. A useful starting point may be to identify opportunities for collaboration on pilot projects in order to test/demonstrate the value-added of cooperation between the data analysts and evaluators. The pilots should be carefully selected to ensure that both groups are involved equally in the design of the initiative. Time should be budgeted to promote team-building so that each team can understand the other’s approach.
Promoting dialogue to explore ways to reconcile differences of approach and methodology between big data and evaluation. While many of these differences may at first appear to be based on fundamental differences of approach, at least some differences result at least in part from questions of terminology and in other cases it may be that different approaches can be applied at different stages of the evaluation process. For example:
Many evaluators are suspicious of real-time data from sources such as twitter, or analysis of phone records due to selection bias and issues of data quality. However, evaluators are familiar with exploratory data (collected, for example, during project visits, or feedback from staff), which is then checked more systematically in a follow-up study. When presented in this way, the two teams would be able to discuss in a non-confrontational way, how many kinds of real-time data could be built into evaluation designs.
When using Bayesian probability analysis it is necessary to begin with a prior distribution. The probabilities are then updated as more data becomes available. The results of a conventional experimental design can often be used as an input to the definition of the prior distribution. Consequently, it may be possible to consider experimental designs and Bayesian probability analysis as sequential stages of an evaluation rather than as competing approaches.
Integrated capacity development programs for data analysts and evaluators. These activities would both help develop a broader common methodological framework and serve as an opportunity for team building.
There are a number of factors that together explain the slow take-up of big data and data analytics by development evaluators. A number of promising approaches are proposed for building bridges to overcoming these barriers and to promote the integration of big data into development evaluation.
We are living in an increasingly quantified world.
There are multiple sources of data that can be generated and analyzed in real-time. They can be synthesized to capture complex interactions among data streams and to identify previously unsuspected linkages among seemingly unrelated factors [such as the purchase of diapers and increased sales of beer]. We can now quantify and monitor ourselves, our houses (even the contents of our refrigerator!), our communities, our cities, our purchases and preferences, our ecosystem, and multiple dimensions of the state of the world.
These rich sources of data are becoming increasingly accessible to individuals, researchers and businesses through huge numbers of mobile phone and tablet apps and user-friendly data analysis programs.
The influence of digital technology on international development is growing.
Many of these apps and other big data/data analytics tools are now being adopted by international development agencies. Due to their relatively low cost, ease of application, and accessibility in remote rural areas, the approaches are proving particularly attractive to non-profit organizations; and the majority of NGOs probably now use some kind of mobile phone apps.
Apps are widely-used for early warning systems, emergency relief, dissemination of information (to farmers, mothers, fishermen and other groups with limited access to markets), identifying and collecting feedback from marginal and vulnerable groups, and permitting rapid analysis of poverty. Data analytics are also used to create integrated data bases that synthesize all of the information on topics as diverse as national water resources, human trafficking, updates on conflict zones, climate change and many other development topics.
Table 1: Widely used big data/data analytics applications in international development
Big data/data analytics tools
Early warning systems for natural and man-made disasters
Analysis of Twitter, Facebook and other social media
Analysis of radio call-in programs
Satellite images and remote sensors
Electronic transaction records [ATM, on-line purchases]
GPS mapping and tracking
Dissemination of information to small farmers, mothers, fishermen and other traders
Feedback from marginal and vulnerable groups and on sensitive topics
Rapid analysis of poverty and identification of low-income groups
Analysis of phone records
Social media analysis
Satellite images [e.g. using thatched roofs as a proxy indicator of low-income households]
Electronic transaction records
Creation of an integrated data base synthesizing all the multiples sources of data on a development topic
National water resources
Agricultural conditions in a particular region
Evaluation is lagging behind.
Surprisingly, program evaluation is the area that is lagging behind in terms of the adoption of big data/analytics. The few available studies report that a high proportion of evaluators are not very familiar with big data/analytics and significantly fewer report having used big data in their professional evaluation work. Furthermore, while many international development agencies have created data development centers within the past few years, many of these are staffed by data scientists (many with limited familiarity with conventional evaluation methods) and there are weak institutional links to agency evaluation offices.
A recent study on the current status of the integration of big data into the monitoring and evaluation of development programs identified a number of reasons for the slow adoption of big data/analytics by evaluation offices:
Weak institutional links between data development centers and evaluation offices
Differences of methodology and the approach to data generation and analysis
Issues concerning data quality
Concerns by evaluators about the commercial, political and ethical nature of how big data is generated, controlled and used.
Key questions for the future of evaluation in international development…
The above gives rise to two sets of questions concerning the future role of evaluation in international development:
The future direction of development evaluation. Given the rapid expansion of big data in international development, it is likely there will be a move towards integrated program information systems. These will begin to generate, analyze and synthesize data for program selection, design, management, monitoring, evaluation and dissemination. A possible scenario is that program evaluation will no longer be considered a specialized function that is the responsibility of a separate evaluation office, rather it will become one of the outputs generated from the program data base. If this happens, evaluation may be designed and implemented not by evaluation specialists using conventional evaluation methods (experimental and quasi-experimental designs, theory-based evaluation) but by data analysts using methods such as predictive analytics and machine learning.
Key Question: Is this scenario credible? If so how widespread will it become and over what time horizon? Is it likely that evaluation will become one of the outputs of an integrated management information system? And if so is it likely that many of the evaluation functions will be taken over by big data analysts?
The changing role of development evaluators and the evaluation office. We argued that currently many or perhaps most development evaluators are not very familiar with big data/analytics, and even fewer apply these approaches. There are both professional reasons (how evaluators and data scientists are trained) and organizational reasons (the limited formal links between evaluation offices and data centers in many organizations) that explain the limited adoption of big data approaches by evaluators. So, assuming the above scenario proves to be at least partially true, what will be required for evaluators to become sufficiently conversant with these new approaches to be able to contribute to how big data/focused evaluation approaches are designed and implemented? According to Pete York at Communityscience.com, the big challenge and opportunity for evaluators is to ensure that the scientific method becomes an essential part of the data analytics toolkit. Recent studies by the Global Environmental Faciity (GEF) illustrate some of the ways that big data from sources such as satellite images and remote sensors can be used to strengthen conventional quasi-experimental evaluation designs. In a number of evaluations these data sources used propensity score matching to select matched samples for pretest-posttest comparison group designs to evaluate the effectiveness of programs to protect forest cover or reserves for mangrove swamps.
Key Question: Assuming there will be a significant change in how the evaluation function is organized and managed, what will be required to bridge the gap between evaluators and data analysts? How likely is it that the evaluators will be able to assume this new role and how likely is it that organizations will make the necessary adjustments to facilitate these transformations?
What do you think? How will these scenarios play out?
Note: Stay tuned for Michael’s next post focusing on how to build bridges between evaluators and big data analysts.
Below are some useful references if you’d like to read more on this topic:
Bamberger, M., Raftree, L and Olazabal, V (2016) The role of new information and communication technologies in equity–focused evaluation: opportunities and challenges. Evaluation. Vol 22(2) 228–244 . A discussion of the ethical issues and challenges with new information technology
Meier , P (2015) Digital Humanitarians: How big data is changing the face of humanitarian response. CRC Press. A review, with detailed case studies, of how digital technology is being used by NGOs and civil society.
O’Neill, C (2016) The weapons of math destruction: How big data increases inequality and threatens democracy. How widely-used digital algorithms negatively affect the poor and marginalized sectors of society. Crown books.
Petersson, G.K and Breul, J.D (editors) (2017) Cyber society, big data and evaluation. Comparative policy evaluation. Volume 24. Transaction Publications. The evolving role of evaluation in cyber society.