Big data comes with big responsibilities, where both the funder and recipient organization have ethical and data security obligations.
Big data allows organizations to count and bring visibility to marginalized populations and to improve on decision-making. However, concerns of data privacy, security and integrity pose challenges within data collection and data preservation. What does informed consent look like in data collection? What are the potential risks we bring to populations? What are the risks of compliance?
The session highlighted three takeaways organizations should consider when approaching data security.
1) Language Barriers between Evaluators and Data Scientists
Both Roytman and Woods agreed that the divide between evaluators and data scientists is the lack of knowledge of the others’ field. How do you ask a question when you didn’t know you had to?
In Woods’ experience, the Monitoring and Evaluation team and IT team each have a role in data security, but work independently. The evolving field of M+E inhibits time for staying attuned to what data security needs. Additionally, the organization’s limited resources can impede the IT team from supporting programmatic data security.
A potential solution ChildFund has considered is investing in an IT person with a focus on MERL who has experience and knowledge in the international or humanitarian sphere. However, many organizations fall short when it comes to financing data security. In addition, identifying an individual with these skills can be challenging.
2) Data Collection
Data breaches exposes confidential information, which puts vulnerable populations at risk of exploitative use of their data and potential harm. As we gather data, this constitutes a question about what informed consent looks like? Are we communicating the risks to beneficiaries of releasing their personal information?
In Woods’ experience, ChildFund approaches data security through a child-safeguarding lens across stakeholders and program participants, where all are responsible for data security. Its child safeguarding policy entails data security protocol and privacy; however, Woods mentioned the dissemination and implementation across countries is a lingering question. Many in-country civil society organizations lack capacity, knowledge, and resources to implement data security protocols, especially if they are working in a country context that does not have laws, regulations or frameworks related to data security and privacy. Currently, ChildFund is advocating for refresher trainings on policy for all involved global partnerships to be updated on organizational changes.
3) Data Preservation
The issue of data breaches is a privacy concern when organizations’ data includes sensitive information of individuals. This puts beneficiaries at-risk of exploitation by bad actors. Roytman explained that there are specific actors, risks, and threats that affect specific kinds of data; though, humanitarian aid organizations are not always a primary target. Nonetheless, this shouldn’t distract organizations from potential risks, but open discussion around how to identify and mitigate risks?
Protecting sensitive data requires a proper security system, something that not all platforms provide, especially if they are free. Ultimately, security is a financial investment that requires time in order to avoid and mitigate risks and potential threats. In order to increase support and investment in security, ChildFund is working with Dharma to pilot a small program to demonstrate the use of big data analytics with a built in data security system.
Roytman suggested approaching ethical concerns by applying the CIA Triad: Confidentiality, Availability and Integrity. There will always be tradeoffs, he said. If we don’t properly invest in data security and mitigate potential risks, there will be additional challenges to data collection. If we don’t understand data security, how can we ensure informed consent?
Many organizations find themselves doing more harm than good due to lack of funding. Big data can be an inexpensive approach to collecting large quantities of data, but if it leads to harm, there is a problem. This is is a complex issue to resolve, however, as Roytman concluded, the opposite of complexity is not simplicity, but rather transparency.
MERL Tech DC kicked off with a pre-conference workshop on September 5th that focused on what the Blockchain is and how it could influence MEL.
The workshop was broken into four parts: 1) blockchain 101, 2) how the blockchain is influencing and could influence MEL, 3) case studies to demonstrate early lessons learned, and 4) outstanding issues and emerging themes.
This blog focuses and builds on the fourth area. At the end, we provide additional resources that will be helpful to all interested in exploring how the blockchain could disrupt and impact international development at large.
Workshop Takeaways and Afterthoughts
For our purposes here, we have distilled some of the key takeaways from the workshop. This section includes a series of questions that we will respond to and link to various related reference materials.
Who are the main blockchain providers and what are they offering?
Any time a new “innovation” is introduced into the international development space, potential users lack knowledge about what the innovation is, the value it can add, and the costs of implementing it. This lack of knowledge opens the door for “snake oil salesmen” who engage in predatory attempts to sell their services to users who don’t have the knowledge to make informed decisions.
We’ve seen this phenomenon play out with blockchain. Take, for example, the numerous Initial Coin Offerings (ICO’s) that defrauded their investors, or the many instances of service providers offering low quality blockchain education trainings and/or project solutions.
Education is the best defense against being taken advantage of by snake oil salesmen. If you’re looking for general education about blockchain, we’ve included a collection of helpful tools in the table below. If your group is working to determine whether a blockchain solution is right for the problem at hand, the USAID Blockchain Primer offers easy to use decision trees that can help you. Beyond these, Mercy Corp has just published Block by Block, which outlines the attributes of various distributed ledgers along some very helpful lines that are useful when considering what distributed ledger technology to use.
Words of warning aside, there are agencies that provide genuine blockchain solutions. For a full list of providers please visit www.blockchainomics.tech, an information database run by The Development CAFE on all things blockchain.
Bottom Line: Beware the snake oil salesmen preaching the benefits of blockchain but silent on the feasibility of their solution. Unless the service provider is just as focused on your problem as you are, be wary that they are just trying to pitch a solution (viable or not) and not solve the problem. Before approaching the companies or service providers, always identify your problem and see if Blockchain is indeed a viable solutions.
How does governance of the blockchain influence its sustainability?
In the past, we’ve seen technology-led social impact solutions make initial gains that diminished over time until there is no sustained impact. Current evidence shows that many solutions of this sort fail because they are not designed to solve a specific problem in a relevant ecosystem. This insight has given rise to the Digital Development Principles and the Ethical Considerations that should be taken into account for blockchain solutions.
Bottom Line: Impact is achieved and sustained by the people who use a tool. Thus, blockchain, as a tool, does not sustain impacts on its own. People do so by applying knowledge about the principles and ethics needed for impact. Understanding this, our next step is to generate more customized principles and ethical considerations for blockchain solutions through case studies and other desperately needed research.
How do the blockchain, big data, and Artificial Intelligence influence each other?
The blockchain is a new type of distributed ledger system that could have massive social implications. Big Data refers to the exponential increase in data we experience through the Internet of Things (IoT) and other data sources (Smart Infrastructure, etc.). Artificial Intelligence (AI) assists in identifying and analyzing this new data at exponentially faster rates than is currently the case.
Blockchain is a distributed ledger, in essence, a database of transactions, just like any other database, it’s a repository, and it is contributing to the growth of Big Data. AI can be used to automate the process of data entry into the blockchain. This is how the three are connected.
The blockchain is considered a leading contender as the ledger of choice for big data because: 1) due to its distributed nature it can handle much larger amounts of data in a more secure fashion than is currently possible with cloud computing, and 2) it is possible to automate the way big data is uploaded to the blockchain. AI tools are easily integrated into blockchain functions to run searches and analyze data, and this opens up the capacity to collect, analyze and report findings on big data in a transparent and secure manner more efficiently than ever before.
Bit by Bit is a very readable and innovative overview of how to conduct social science research in the digital age of big data, artificial intelligence and the blockchain. It gives the reader a quality introduction into some of the dominant themes and issues to consider when attempting to evaluate either a technology lead solution or use technology to conduct social research.
Given its immutability, how can an adaptive management system work with the blockchain?
This is a critical point. The blockchain is an immutable record, it is almost impossible (meaning it has never been done and there are no simulations where current technology is able to take control of a properly designed blockchain) to hijack, hack, or alter. Thus the blockchain provides the security needed to mitigate corruption and facilitate audits.
This immutability does not mitigate any type of adaptive management approach, however. Adaptive Management requires small iterative course corrections informed by quality data around what is and is not working. This data record and the course corrections provide a rich data set that is extremely valuable to replication efforts because they subvert the main barrier to replication — lack of data on what does and does not work. Hence in this case the immutability of the blockchain is a value add to Adaptive Management. This is more of a question of good adaptive management practices rather than whether the blockchain is a viable tool for these purposes.
It is important to note that you can append information on blocks (not amend), so there will always be a record of previous mistakes (auditability), but the most recent layer of truth is what’s being viewed/queried/verified, etc. Hence, immutability is not a hurdle but a help.
What are the first steps an organization should take when deciding on whether to adopt a blockchain solution?
Each problem that an organization faces is unique, but the following simple steps can help one make a decision:
Identify your problem (using tools such as Developmental Evaluation or Principles of Digital Development)
Understand the blockchain technology, concepts, functionality, requirements and cost
See if your problem can be solved by blockchain rather than a centralized database
Consider the advantages and disadvantages
Identify the right provider and work with them in developing the blockchain
Consider ethical principles and privacy concerns as well as other social inequalities
Deploy in pilot phases and evaluate the results using an agile approach
What can be done to protect PII and other sensitive information on a blockchain?
Blockchain uses cryptography to store its data. That PII and other information cannot be viewed by anyone other than those who have access to the ‘keys’. While developing a blockchain, it’s important to ensure that what goes in is protected and that access to is regulated. Another critical step is promoting literacy on the use of blockchain and its features among stakeholders.
References Correlated to Take Aways
This table organizes current reference materials as related to the main questions we discussed in the workshop. (The question is in the left hand column and the reference material with a brief explanation and hyperlink is in the right hand column).
Resources and Considerations
Who are the main blockchain platforms? Who are the providers and what are they offering?
IBM, ConsenSys, Microsoft, AWS, Cognizant, R3, and others, are offering products and enterprise solutions.
Block by Block is a valuable comparison tool for assessing various platforms.
How does governance of the blockchain influence its sustainability?
See Beeck Center’s Blockchain Ethical Design Framework. Decentralization (how many nodes), equity amongst nodes, rules, transparency are all factors in long-term sustainability. Likewise the Principles for Digital Development have a lot of evidence behind them for their contributions to sustainability.
How do the blockchain, big data and Artificial Intelligence influence each other?
They can be combined in various ways to strengthen a particular service or product. There is no blanket approach, just as there is not blanket solution to any social impact problem. The key is to know the root cause of the problem at hand and how the function of each tool used separately and in conjunction can address these root causes.
Given its immutability, how can an adaptive management system work with the blockchain?
Ask how mistakes are corrected when creating a customized solution, or purchasing a product. Usually, there will be a way to do that, through an easy to use, user interface.
What are the first steps an organization should take when they are deciding on whether to adopt a blockchain solution?
Participate in demos, and test some of the solutions for your own purposes or use cases. Use the USAID Blockchain Primer and reach out to trusted experts to provide advice. Given that the blockchain is primarily open source code, once you have decided that a blockchain is a viable solution for your problem, GitHub is full of open source code that you can modify for your own purposes.
by Zach Tilton, a Peacebuilding Evaluation Consultant and a Doctoral Research Associate at the Interdisciplinary PhD in Evaluation program at Western Michigan University.
In 2013 Dan Airley quipped“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it….” In 2015 the metaphor was imported to the international development sector by Ben Ramalingam, in 2016 it became a MERL Tech DC lightning talk, and has been ringing in our ears ever since. So, what about 2018? Well, unlike US national trends in teenage sex, there are some signals that big or at least‘bigger’ data is continuing to make its way not only into the realm of digital development, but also evaluation. I recently attended the 2018 MERL Tech DC pre-conference workshop Big Data and Evaluation where participants were introduced to real ways practitioners are putting this trope to bed(sorry, not sorry). In this blog post I share some key conversations from the workshop framed against the ethics of using this new technology, but to do that let me first provide some background.
I entered the workshop on my heels. Given the recent spate of security breaches and revelations about micro-targeting,‘Big Data’ has been somewhat of a boogie-man for myself and others. I have taken some pains to limit my digital data-footprint, have written passionately about big data and surveillance capitalism, and have long been skeptical of big data applications for serving marginalized populations in digital development and peacebuilding. As I found my seat before the workshop started I thought,“Is it appropriate or ethical to use big data for development evaluation?” My mind caught hold of a 2008 Evaluation Café debate between evaluation giants Michael Scriven and Tom Cook on causal inference in evaluation and the ethics of Randomized Control Trials. After hearing Scriven’s concerns about the ethics of withholding interventions from control groups, Cook asks,“But what about the ethics of not doing randomized experiments?” He continues,“What about the ethics of having causal information that is in fact based on weaker evidence and is wrong? When this happens, you carry on for years and years with practices that don’t work whose warrant lies in studies that are logically weaker than experiments provide.”
While I sided with Scriven for most of that debate, this question haunted me. It reminded me of an explanation of structural violence by peace researcher Johan Galtung who writes,“If a person died from tuberculosis in the eighteenth century it would be hard to conceive of this as violence since it might have been quite unavoidable, but if he dies from it today, despite all the medical resources in the world, then violence is present according to our definition.” Galtung’s intellectual work on violence deals with the difference between potential and the actual realizations and what increases that difference. While there are real issues with data responsibility, algorithmic biases, and automated discrimination that need to be addressed, if there are actually existing technologies and resources not being used to address social and material inequities in the world today, is this unethical, even violent?“What about the ethics of not using big data?” I asked myself back. The following are highlights of the actually existing resources for using big data in the evaluation of social amelioration.
Actually Existing Data
During the workshop, Kerry Bruce from Social Impact shared with participants her personal mantra,“We need to do a better job of secondary data analysis before we collect any more primary data.” She challenged us to consider how to make use of the secondary data available to our organizations. She gave examples of potential big data sources such as satellite images, remote sensors, GPS location data, social media, internet searches, call-in radio programs, biometrics, administrative data and integrated data platforms that merge many secondary data files such as public records and social service agency and client files. The key here is there are a ton of actually existing data, many of which are collected passively, digitally, and longitudinally. Despite noting real limitations to accessing existing secondary data, including donor reluctance to fund such work, limited training in appropriate methodologies in research teams, and differences in data availability between contexts, to underscore the potential of using secondary data, she shared a case study where she lead a team to use large amounts of secondary indirect data to identify ecosystems of modern day slavery at a significantly reduced cost than collecting the data first-hand. The outputs of this work will help pinpoint interventions and guide further research into the factors that may lead to predicting and prescribing what works well for stopping people from becoming victims of slavery.
Actually Existing Tech(and math)
Peter York from BCT Partners provided a primer on big data and data science including the reality-check that most of the work is the unsexy “ETL,” or the extraction, transformation, and loading of data. He contextualized the potential of the so-called big data revolution by reminding participants that the V’s of big data, Velocity, Volume, and Variety, are made possible by the technological and social infrastructure of increasingly networked populations and how these digital connections enable the monitoring, capturing, and tracking of ever increasing aspects of our lives in an unprecedented way. He shared,“A lot of what we’ve done in research were hacks because we couldn’t reach entire populations.” With advances in the tech stacks and infrastructure that connect people and their internet-connected devices with each other and the cloud, the utility of inferential statistics and experimental design lessens when entire populations of users are producing observational behavior data. When this occurs, evaluators can apply machine learning to discover the naturally occurring experiments in big data sets, what Peter terms‘Data-driven Quasi-Experimental Design.’ This is exactly what Peter does when he builds causal models to predict and prescribe better programs for child welfare and juvenile justice to automate outcome evaluation, taking cues from precision medicine.
One example of a naturally occurring experiment was the 1854 Broad Street cholera outbreak in which physician John Snow used a dot map to identify a pattern that revealed the source of the outbreak, the Broad Street water pump. By finding patterns in the data John Snow was able to lay the groundwork for rejecting the false Miasma Theory and replace it with a proto-typical Germ Theory. And although he was already skeptical of miasma theory, by using the data to inform his theory-building he was also practicing a form of proto-typical Grounded Theory. Grounded theory is simply building theory inductively, after data collection and analysis, not before, resulting in theory that is grounded in data. Peter explained,“Machine learning is Grounded Theory on steroids. Once we’ve built the theory, found the pattern by machine learning, we can go back and let the machine learning test the theory.” In effect, machine learning is like having a million John Snows to pour over data to find the naturally occurring experiments or patterns in the maps of reality that are big data.
A key aspect of the value of applying machine learning in big data is that patterns more readily present themselves in datasets that are‘wide’ as opposed to‘tall.’ Peter continued,“If you are used to datasets you are thinking in rows. However, traditional statistical models break down with more features, or more columns.” So, Peter and evaluators like him that are applying data science to their evaluative practice are evolving from traditional Frequentist to Bayesian statistical approaches. While there is more to the distinction here, the latter uses prior knowledge, or degrees of belief, to determine the probability of success, where the former does not. This distinction is significant for evaluators who are wanting to move beyond predictive correlation to prescriptive evaluation. Peter expounded,“Prescriptive analytics is figuring out what will best work for each case or situation.” For example, with prediction, we can make statements that a foster child with certain attributes is 70% not likely to find a home. Using the same data points with prescriptive analytics we can find 30 children that are similar to that foster child and find out what they did to find a permanent home. In a way, only using predictive analytics can cause us to surrender while including prescriptive analytics can cause us to endeavor.
The last category of existing resources for applying big data for evaluation was mostly captured by the comments of independent evaluation consultant, Michael Bamberger. He spoke of the latent capacity that existed in evaluation professionals and teams, but that we’re not taking full advantage of big data: “Big data is being used by development agencies, but less by evaluators in these agencies. Evaluators don’t use big data, so there is a big gap.”
He outlined two scenarios for the future of evaluation in this new wave of data analytics: a state of divergence where evaluators are replaced by big data analysts and a state of convergence where evaluators develop a literacy with the principles of big data for their evaluative practice. One problematic consideration with this hypothetical is that many data scientists are not interested in causation, as Peter York noted. To move toward the future of convergence, he shared how big data can enhance the evaluation cycle from appraisal and planning through monitoring, reporting and evaluating sustainability. Michael went on to share a series of caveats emptor that include issues with extractive versus inclusive uses of big data, the fallacy of large numbers, data quality control, and different perspectives on theory, all of which could warrant their own blog posts for development evaluation.
While I deepened my basic understandings of data analytics including the tools and techniques, benefits and challenges, and guidelines for big data and evaluation, my biggest take away is reconsidering big data for social good by considering the ethical dilemma of not using existing data, tech, and capacity to improve development programs, possibly even prescribing specific interventions by identifying their probable efficacy through predictive models before they are deployed.
As we all know, big data and data science are becoming increasingly important in all aspects of our lives. There is a similar rapid growth in the applications of big data in the design and implementation of development programs. Examples range from the use of satellite images and remote sensors in emergency relief and the identification of poverty hotspots, through the use of mobile phones to track migration and to estimate changes in income (by tracking airtime purchases), social media analysis to track sentiments and predict increases in ethnic tension, and using smart phones on Internet of Things (IOT) to monitor health through biometric indicators.
Despite the rapidly increasing role of big data in development programs, there is speculation that evaluators have been slower to adopt big data than have colleagues working in other areas of development programs. Some of the evidence for the slow take-up of big data by evaluators is summarized in “The future of development evaluation in the age of big data”. However, there is currently very limited empirical evidence to test these concerns.
To try to fill this gap, my colleagues Rick Davies and Linda Raftree and I would like to invite those of you who are interested in big data and/or the future of evaluation to complete the attached survey. This survey, which takes about 10 minutes to complete asks evaluators to report on the data collection and data analysis techniques that you use in the evaluations you design, manage or analyze; while at the same time asking data scientists how familiar they are with evaluation tools and techniques.
The survey was originally designed to obtain feedback from participants in the MERL Tech conferences on “Exploring the Role of Technology in Monitoring, Evaluation, Research and Learning in Development” that are held annually in London and Washington, DC, but we would now like to broaden the focus to include a wider range of evaluators and data scientists.
One of the ways in which the findings will be used is to help build bridges between evaluators and data scientists by designing integrated training programs for both professions that introduce the tools and techniques of both conventional evaluation practice and data science, and show how they can be combined to strengthen both evaluations and data science research. “Building bridges between evaluators and big data analysts” summarizes some of the elements of a strategy to bring the two fields closer together.
The findings of the survey will be shared through this and other sites, and we hope this will stimulate a follow-up discussion. Thank you for your cooperation and we hope that the survey and the follow-up discussions will provide you with new ways of thinking about the present and potential role of big data and data science in program evaluation.
This year at MERL Tech DC, in addition to the regular conference on September 6th and 7th, we’re offering two full-day, in-depth workshops on September 5th. Join us for a deeper look into the possibilities and pitfalls of Blockchain for MERL and Big Data for Evaluation!
What can Blockchain offer MERL? with Shailee Adinolfi, Michael Cooper, and Val Gandhi, co-hosted by Chemonics International, 1717 H St. NW, Washington, DC 20016.
Tired of the blockchain hype, but still curious on how it will impact MERL? Join us for a full day workshop with development practitioners who have implemented blockchain solutions with social impact goals in various countries. Gain knowledge of the technical promises and drawbacks of blockchain technology as it stands today and brainstorm how it may be able to solve for some of the challenges in MERL in the future. Learn about ethical design principles for blockchain and how to engage with blockchain service providers to ensure that your ideas and programs are realistic and avoid harm. See the agenda here.
Big Data and Evaluation with Michael Bamberger, Kerry Bruce and Peter York, co-hosted by the Independent Evaluation Group at the World Bank – “I” Building, Room: I-1-200, 1850 I St NW, Washington, DC 20006
Join us for a one-day, in-depth workshop on big data and evaluation where you’ll get an introduction to Big Data for Evaluators. We’ll provide an overview of applications of big data in international development evaluation, discuss ways that evaluators are (or could be) using big data and big data analytics in their work. You’ll also learn about the various tools of data science and potential applications, as well as run through specific cases where evaluators have employed big data as one of their methods. We will also address the important question as to why many evaluators have been slower and more reluctant to incorporate big data into their work than have their colleagues in research, program planning, management and other areas such as emergency relief programs. Lastly, we’ll discuss the ethics of using big data in our work. See the agenda here!
At MERL Tech London, 2018, we invited Michael Bamberger and Rick Davies to debate the question of whether the enthusiasm for Big Data in Evaluation is warranted. At their session, through a formal debate (skillfully managed by Shawna Hoffman from The Rockefeller Foundation) they discussed whether Big Data and Evaluation would eventually converge, whether one would dominate the other, how can and should they relate to each other, and what risks and opportunities there are in this relationship.
Following the debate, Michael and Risk wanted to continue the discussion — this time exploring the issues in a more conversational mode on the MERL Tech Blog, because in practice both of them see more than one side to the issue.
So, what do Rick and Michael think — will big data integrate with evaluation — or is it all just hype?
Rick: In the MERL Tech debate I put a lot of emphasis on the possibility that evaluation, as a field, would be overwhelmed by big data / data science rhetoric. But since then I have been thinking about a countervailing development, which is that evaluative thinking is pushing back against unthinking enthusiasm for the use of data science algorithms. I emphasise “evaluative thinking” rather than “evaluators” as a category of people, because a lot of this pushback is coming from people who would not identify themselves as evaluators. There are different strands to this evaluative response.
One is a social justice perspective, reflected in recent books such as “Weapons of Math Destruction”, “Automated Inequality”, and “Algorithms of Oppression” which emphasise the human cost of poorly designed and or poorly supervised use of algorithms using large amounts of data to improve welfare and justice administration. Another strand is more like a form of exploratory philosophy, and has focused on how it might be possible to define “fairness” when designing and evaluating algorithms that have consequences for human welfare[ See 1, 2, 3, 4]. Another strand is perhaps more technical in focus, but still has a value concern. This is the literature on algorithmic transparency. Without transparency it is difficult to assess fairness [See 5, 6, ] Neural networks are often seen as a particular challenge. Associated with this are discussions about “the right to explanation” and what this means in practice[1,]
In parallel there is also some infiltration of data science thinking into mainstream evaluation practice. DFID is funding the World Bank’s Strategic Impact Evaluation Fund (SIEF) latest call for “nimble evaluations” . These are described as rapid and low cost and likely to take the form of an RCT but ones which are focused on improving implementation rather than assessing overall impact . This type of RCT is directly equivalent to A/B testing used by the internet giants to improve the way their platforms engage with their users. Hopefully these nimble approaches may bring a more immediate benefit to the people’s lives than RCTs which have tried to assess the impact of a whole project and then inform the design of subsequent projects.
Another recent development is the World Bank’s Data Science competition , where participants are being challenged to develop predictive models of household poverty status, based on World Bank Household Survey data. The intention is that they should provide a cheaper means of identifying poor households than simply relying on what can be very expensive and time consuming nationwide household surveys. At present the focus on the supporting website is very technical. As far as I can see there is no discussion of how the winning prediction model will be used and an how any risks of adverse effects might be monitored and managed. Yet as I suggested at MERLTech London, most algorithms used for prediction modelling will have errors. The propensity to generate False Positives and False Negatives is machine learning’s equivalent of original sin. It is to be expected, so it should be planned for. Plans should include systematic monitoring of errors and a public policy for correction, redress and compensation.
Michael: These are both important points, and it is interesting to think what conclusions we can draw for the question before us. Concerning the important issue of algorithmic transparency (AT), Rick points out that a number of widely discussed books and articles have pointed out the risk that the lack of AT poses for democracy and particularly for poor and vulnerable groups. Virginia Eubanks, one of the authors cited by Rick, talks about the “digital poorhouse” and how unregulated algorithms can help perpetuate an underclass. However, I think we should examine more carefully how evaluators are contributing to this discussion. My impression, based on very limited evidence is that evaluators are not at the center — or even perhaps the periphery — of this discussion. Much of the concern about these issues is being generated by journalists, public administration specialists or legal specialists. I argued in an earlier MERL Tech post that many evaluators are not very familiar with big data and data analytics and are often not very involved in these debates. This is a hypothesis that we hope readers can help us to test.
Rick’s second point, about the infiltration of data science into evaluation is obviously very central to our discussion. I would agree that the World Bank is one of the leaders in the promotion of data science, and the example of “nimble evaluation” may be a good example of convergence between data science and evaluation. However, there are other examples where the Bank is on the cutting edge of promoting new information technology, but where potential opportunities to integrate technology and evaluation do not seem to have been taken up. An example would be the Bank’s very interesting Big Data Innovation Challenge, which produced many exciting new applications of big data to development (e.g. climate smart agriculture, promoting financial inclusion, securing property rights through geospatial data, and mapping poverty through satellites). The use of data science to strengthen evaluation of the effectiveness of these interventions, however, was not mentioned as one of the objectives or outputs of this very exciting program.
It would also be interesting to explore to what extent the World Bank Data Science competition that Rick mentions resulted in the convergence of data science and evaluation, or whether it was simply testing new applications of data science.
Finally, I would like to mention two interesting chapters in Cybersociety, Big Data and Evaluation edited by Petersson and Breul (2017, Transaction Publications). One chapter (by Hojlund et al) reports on a survey which found that only 50% of professional evaluators claimed to be familiar with the basic concepts of big data, and only about 10% reported having used big data in an evaluation. In another chapter, Forss and Noren reviewed a sample of Terms of Reference (TOR) for evaluations conducted by different development agencies, where they found that none of the 25 TOR specifically required the evaluators to incorporate big data into their evaluation design.
It is difficult to find hard evidence on the extent to which evaluators are familiar with, sympathetic to, or using big data into their evaluations, but the examples mentioned above show that there are important questions about the progress made towards the convergence of evaluation and big data.
We invite readers to share their experiences both on how the two professions are starting to converge, or on the challenges that slow down, or even constrain the process of convergence.
by Shawna Hoffman, Specialist, Measurement, Evaluation and Organizational Performance at the Rockefeller Foundation.
Both the volume of data available at our fingertips and the speed with which it can be accessed and processed have increased exponentially over the past decade. The potential applications of this to support monitoring and evaluation (M&E) of complex development programs has generated great excitement. But is all the enthusiasm warranted? Will big data integrate with evaluation — or is this all just hype?
Before we began, we used Mentimeter to see where the audience stood on the topic:
Once the votes were in, we started.
Both Michael and Rick have fairly balanced and pragmatic viewpoints; however, for the sake of a good debate, and to help unearth the nuances and complexity surrounding the topic, they embraced the challenge of representing divergent and polarized perspectives – with Michael arguing in favor of integration, and Rick arguing against.
“Evaluation is in a state of crisis,” Michael argued, “but help is on the way.” Arguments in favor of the integration of big data and evaluation centered on a few key ideas:
There are strong use cases for integration. Data science tools and techniques can complement conventional evaluation methodology, providing cheap, quick, complexity-sensitive, longitudinal, and easily analyzable data.
Integration is possible. Incentives for cross-collaboration are strong, and barriers to working together are reducing. Traditionally these fields have been siloed, and their relationship has been characterized by a mutual lack of understanding of the other (or even questioning of the other’s motivations or professional rigor). However, data scientists are increasingly recognizing the benefits of mixed methods, and evaluators are seeing the potential to use big data to increase the number of types of evaluation that can be conducted within real-world budget, time and data constraints. There are some compelling examples (explored in this UN Global Pulse Report) of where integration has been successful.
Integration is the right thing to do. New approaches that leverage the strengths of data science and evaluation are potentially powerful instruments for giving voice to vulnerable groups and promoting participatory development and social justice. Without big data, evaluation could miss opportunities to reach the most rural and remote people. Without evaluation (which emphasizes transparency of arguments and evidence), big data algorithms can be opaque “black boxes.”
While this may paint a hopeful picture, Rick cautioned the audience to temper its enthusiasm. He warned of the risk of domination of evaluation by data science discourse, and surfaced some significant practical, technical, and ethical considerations that would make integration challenging.
First, big data are often non-representative, and the algorithms underpinning them are non-transparent. Second, “the mechanistic approaches offered by data science, are antithetical to the very notion of evaluation being about people’s values and necessarily involving their participation and consent,” he argued. It is – and will always be – critical to pay attention to the human element that evaluation brings to bear. Finally, big data are helpful for pattern recognition, but the ability to identify a pattern should not be confused with true explanation or understanding (correlation ≠ causation). Overall, there are many problems that integration would not solve for, and some that it could create or exacerbate.
The debate confirmed that this question is complex, nuanced, and multi-faceted. It helped to remind that there is cause for enthusiasm and optimism, at the same time as a healthy dose of skepticism. What was made very clear is that the future should leverage the respective strengths of these two fields in order to maximize good and minimize potential risks.
In the end, the side in favor of integration of big data and evaluation won the debate by a considerable margin.
The future of integration looks promising, but it’ll be interesting to see how this conversation unfolds as the number of examples of integration continues to grow.
Interested in learning more and exploring this further? Stay tuned for a follow-up post from Michael and Rick. You can also attend MERL Tech DC in September 2018 if you’d like to join in the discussions in person!
In Part 1 of this series we argued that, while applications of big data and data analytics are expanding rapidly in many areas of development programs, evaluators have been slow to adopt these applications. We predicted that one possible future scenario could be that evaluation may no longer be considered as a separate function, and that it may be treated as one of the outputs of the integrated information systems that will gradually be adopted by many development agencies. Furthermore, many evaluations will use data analytics approaches, rather than conventional evaluation designs. (Image: Big Data session notes from USAIDLearning’s Katherine Haugh [@katherine_haugh}. MERL Tech DC 2016).
Here, in Part 2 we identify some of the reasons why development evaluators have been slow to adopt big data analytics and we propose some promising approaches for building bridges between evaluators and data analysts.
Why have evaluators been slow to adopt big data analytics?
1. Weak institutional linkages. Over the past few years some development agencies have created data centers to explore ways to exploit new information technologies. These centers are mainly staffed by people with a background in data science or statistics and the institutional links to the agency’s evaluation office are often weak.
2. Many evaluators have limited familiarity with big data/analytics. Evaluation training programs tend to only present conventional experimental, quasi-experimental and mixed-methods/qualitative designs. They usually do not cover smart data analytics (see Part 1 of this blog). Similarly, many data scientists do not have a background in conventional evaluation methodology (though there are of course exceptions).
3. Methodological differences. Many big data approaches do not conform to the basic principles that underpin conventional program evaluation, for example:
Data quality: real-time big data provides one of the potentially most powerful sources of data for development programs. Among other things, real-time data can provide early warning signals of potential diseases (e.g. Google Flu), ethnic tension, drought and poverty (Meier 2015). However, when an evaluator asks if the data is biased or of poor quality, the data analyst may respond “Sure the data is biased (e.g. only captured from mobile phone users or twitter feeds) and it may be of poor quality. All data is biased and usually of poor quality, but it does not matter because tomorrow we will have new data.” This reflects the very different kinds of data that evaluators and data analysts typically work with, and the difference can be explained, but a statement such as the above can create the impression that data analysts do not take issues of bias and data quality very seriously.
Data mining: Many data analytics methods are based on the mining of large data sets to identify patterns of correlation, which are then built into predictive models, normally using Bayesian statistics. Many evaluators frown on data mining due to its potentially identifying spurious associations.
The role of theory: Most (but not all) evaluators believe that an evaluation design should be based on a theoretical framework (theory of change or program theory) that hypothesizes the processes through which the intended outcomes will be achieved. In contrast, there is plenty of debate among data analysts concerning the role of theory, and whether it is necessary at all. Some even go as far as to claim that data analytics means “the end of theory”(Anderson 2008). This, combined with data mining, creates the impression among some evaluators that data analytics uses whatever data is easily accessible with no theoretical framework to guide the selection of evaluation questions as to assess the adequacy of available data.
Experimental designs versus predictive analytics: Most quantitative evaluation designs are based on an experimental or quasi-experimental design using a pretest/posttest comparison group design. Given the high cost of data collection, statistical power calculations are frequently used to estimate the minimum size of sample required to ensure a certain level of statistical significance. Usually this means that analysis can only be conducted on the total sample, as sample size does not permit statistical significance testing for sub-samples. In contrast, predictive analytics usually employ Bayesian probability models. Due to the low cost of data collection and analysis, it is usually possible to conduct the analysis on the total population (rather than a sample), so that disaggregated analysis can be conducted to compare sub-populations, and often (particularly when also using machine learning) to compute outcome probabilities for individual subjects. There continues to be heated debates concerning the merits of each approach, and there has been much less discussion of how experimental and predictive analytics approaches could complement each other.
As Pete York at CommunityScience.com observes: “Herein lies the opportunity – we evaluators can’t battle the wave of big data and data science that will transform the way we do research. However, we can force it to have to succumb to the rules of objective rigor via the scientific method. Evaluators/researchers train people how to do it, they can train machines. We are already doing so.” (Personal communication 8/7/17)
4. Ethical and political concerns: Many evaluators also have concerns about who designs and markets big data apps and who benefits financially. Many commercial agencies collect data on low income populations (for example their consumption patterns) which may then be sold to consumer products companies with little or no benefit going to the populations from which the information is collected. Some of the algorithms may also include a bias against poor and vulnerable groups (O’Neil 2016) that are difficult to detect given the proprietary nature of the algorithms.
Another set of issues concern whether the ways in which big data are collected and used (for making decisions affecting poor and vulnerable groups) tends to be exclusive (governments and donors use big data to make decisions about programs affecting the poor without consulting them), or whether big data is used to promote inclusion (giving voice to vulnerable groups). These issues are discussed in a recent Rockefeller Foundation blog. There are also many issues around privacy and data security. There is of course no simple answer to these questions, but many of these concerns are often lurking in the background when evaluators are considering the possibility of incorporating big data into their evaluations.
Table 1. Reasons evaluators have been slow to adopt big data and opportunities for bridge building between evaluators and data analysts
Reason for slow adoption
Opportunities for bridge building
1. Weak institutional linkages
Strengthening formal and informal links between data centers and evaluators
2. Evaluators have limited knowledge about big data and data analytics
Capacity development programs covering both big data and conventional evaluation
Collaborative pilot evaluation projects
3. Methodological differences
Creating opportunities for dialogue to explore differences and to determine how they can be reconciled
Viewing data analytics and evaluation as being complementary rather than competing
4. Ethical and political concerns about big data
Greater focus on ethical codes of conduct, privacy and data security
Focusing on making approaches to big data and evaluation inclusive and avoiding exclusive/extractive approaches
Building bridges between evaluators and big data/analytics
There are a number of possible steps that could be taken to build bridges between evaluators and big data analysts, and thus to promote the integration of big data into development evaluation. Catherine Cheney (2016) presents interviews with a number of data scientists and development practitioners stressing that data driven development needs both social and computer scientists. No single approach is likely to be successful, and the best approach(es) will depend on each specific context, but we could consider:
Strengthening the formal and informal linkages between data centers and evaluation offices. It may be possible to achieve this within the existing organizational structure, but it will often require some formal organizational changes in terms of lines of communication. Linda Raftree provides a useful framework for understanding how different “buckets” of data (including among others, traditional data and big data) can be brought together, which suggests one pathway to collaboration between data centers and evaluation offices.
Identifying opportunities for collaborative pilot projects. A useful starting point may be to identify opportunities for collaboration on pilot projects in order to test/demonstrate the value-added of cooperation between the data analysts and evaluators. The pilots should be carefully selected to ensure that both groups are involved equally in the design of the initiative. Time should be budgeted to promote team-building so that each team can understand the other’s approach.
Promoting dialogue to explore ways to reconcile differences of approach and methodology between big data and evaluation. While many of these differences may at first appear to be based on fundamental differences of approach, at least some differences result at least in part from questions of terminology and in other cases it may be that different approaches can be applied at different stages of the evaluation process. For example:
Many evaluators are suspicious of real-time data from sources such as twitter, or analysis of phone records due to selection bias and issues of data quality. However, evaluators are familiar with exploratory data (collected, for example, during project visits, or feedback from staff), which is then checked more systematically in a follow-up study. When presented in this way, the two teams would be able to discuss in a non-confrontational way, how many kinds of real-time data could be built into evaluation designs.
When using Bayesian probability analysis it is necessary to begin with a prior distribution. The probabilities are then updated as more data becomes available. The results of a conventional experimental design can often be used as an input to the definition of the prior distribution. Consequently, it may be possible to consider experimental designs and Bayesian probability analysis as sequential stages of an evaluation rather than as competing approaches.
Integrated capacity development programs for data analysts and evaluators. These activities would both help develop a broader common methodological framework and serve as an opportunity for team building.
There are a number of factors that together explain the slow take-up of big data and data analytics by development evaluators. A number of promising approaches are proposed for building bridges to overcoming these barriers and to promote the integration of big data into development evaluation.
We are living in an increasingly quantified world.
There are multiple sources of data that can be generated and analyzed in real-time. They can be synthesized to capture complex interactions among data streams and to identify previously unsuspected linkages among seemingly unrelated factors [such as the purchase of diapers and increased sales of beer]. We can now quantify and monitor ourselves, our houses (even the contents of our refrigerator!), our communities, our cities, our purchases and preferences, our ecosystem, and multiple dimensions of the state of the world.
These rich sources of data are becoming increasingly accessible to individuals, researchers and businesses through huge numbers of mobile phone and tablet apps and user-friendly data analysis programs.
The influence of digital technology on international development is growing.
Many of these apps and other big data/data analytics tools are now being adopted by international development agencies. Due to their relatively low cost, ease of application, and accessibility in remote rural areas, the approaches are proving particularly attractive to non-profit organizations; and the majority of NGOs probably now use some kind of mobile phone apps.
Apps are widely-used for early warning systems, emergency relief, dissemination of information (to farmers, mothers, fishermen and other groups with limited access to markets), identifying and collecting feedback from marginal and vulnerable groups, and permitting rapid analysis of poverty. Data analytics are also used to create integrated data bases that synthesize all of the information on topics as diverse as national water resources, human trafficking, updates on conflict zones, climate change and many other development topics.
Table 1: Widely used big data/data analytics applications in international development
Big data/data analytics tools
Early warning systems for natural and man-made disasters
Analysis of Twitter, Facebook and other social media
Analysis of radio call-in programs
Satellite images and remote sensors
Electronic transaction records [ATM, on-line purchases]
GPS mapping and tracking
Dissemination of information to small farmers, mothers, fishermen and other traders
Feedback from marginal and vulnerable groups and on sensitive topics
Rapid analysis of poverty and identification of low-income groups
Analysis of phone records
Social media analysis
Satellite images [e.g. using thatched roofs as a proxy indicator of low-income households]
Electronic transaction records
Creation of an integrated data base synthesizing all the multiples sources of data on a development topic
National water resources
Agricultural conditions in a particular region
Evaluation is lagging behind.
Surprisingly, program evaluation is the area that is lagging behind in terms of the adoption of big data/analytics. The few available studies report that a high proportion of evaluators are not very familiar with big data/analytics and significantly fewer report having used big data in their professional evaluation work. Furthermore, while many international development agencies have created data development centers within the past few years, many of these are staffed by data scientists (many with limited familiarity with conventional evaluation methods) and there are weak institutional links to agency evaluation offices.
A recent study on the current status of the integration of big data into the monitoring and evaluation of development programs identified a number of reasons for the slow adoption of big data/analytics by evaluation offices:
Weak institutional links between data development centers and evaluation offices
Differences of methodology and the approach to data generation and analysis
Issues concerning data quality
Concerns by evaluators about the commercial, political and ethical nature of how big data is generated, controlled and used.
Key questions for the future of evaluation in international development…
The above gives rise to two sets of questions concerning the future role of evaluation in international development:
The future direction of development evaluation. Given the rapid expansion of big data in international development, it is likely there will be a move towards integrated program information systems. These will begin to generate, analyze and synthesize data for program selection, design, management, monitoring, evaluation and dissemination. A possible scenario is that program evaluation will no longer be considered a specialized function that is the responsibility of a separate evaluation office, rather it will become one of the outputs generated from the program data base. If this happens, evaluation may be designed and implemented not by evaluation specialists using conventional evaluation methods (experimental and quasi-experimental designs, theory-based evaluation) but by data analysts using methods such as predictive analytics and machine learning.
Key Question: Is this scenario credible? If so how widespread will it become and over what time horizon? Is it likely that evaluation will become one of the outputs of an integrated management information system? And if so is it likely that many of the evaluation functions will be taken over by big data analysts?
The changing role of development evaluators and the evaluation office. We argued that currently many or perhaps most development evaluators are not very familiar with big data/analytics, and even fewer apply these approaches. There are both professional reasons (how evaluators and data scientists are trained) and organizational reasons (the limited formal links between evaluation offices and data centers in many organizations) that explain the limited adoption of big data approaches by evaluators. So, assuming the above scenario proves to be at least partially true, what will be required for evaluators to become sufficiently conversant with these new approaches to be able to contribute to how big data/focused evaluation approaches are designed and implemented? According to Pete York at Communityscience.com, the big challenge and opportunity for evaluators is to ensure that the scientific method becomes an essential part of the data analytics toolkit. Recent studies by the Global Environmental Faciity (GEF) illustrate some of the ways that big data from sources such as satellite images and remote sensors can be used to strengthen conventional quasi-experimental evaluation designs. In a number of evaluations these data sources used propensity score matching to select matched samples for pretest-posttest comparison group designs to evaluate the effectiveness of programs to protect forest cover or reserves for mangrove swamps.
Key Question: Assuming there will be a significant change in how the evaluation function is organized and managed, what will be required to bridge the gap between evaluators and data analysts? How likely is it that the evaluators will be able to assume this new role and how likely is it that organizations will make the necessary adjustments to facilitate these transformations?
What do you think? How will these scenarios play out?
Note: Stay tuned for Michael’s next post focusing on how to build bridges between evaluators and big data analysts.
Below are some useful references if you’d like to read more on this topic:
Bamberger, M., Raftree, L and Olazabal, V (2016) The role of new information and communication technologies in equity–focused evaluation: opportunities and challenges. Evaluation. Vol 22(2) 228–244 . A discussion of the ethical issues and challenges with new information technology
Meier , P (2015) Digital Humanitarians: How big data is changing the face of humanitarian response. CRC Press. A review, with detailed case studies, of how digital technology is being used by NGOs and civil society.
O’Neill, C (2016) The weapons of math destruction: How big data increases inequality and threatens democracy. How widely-used digital algorithms negatively affect the poor and marginalized sectors of society. Crown books.
Petersson, G.K and Breul, J.D (editors) (2017) Cyber society, big data and evaluation. Comparative policy evaluation. Volume 24. Transaction Publications. The evolving role of evaluation in cyber society.
by Linda Raftree, Independent Consultant and MERL Tech Organizer
It can be overwhelming to get your head around all the different kinds of data and the various approaches to collecting or finding data for development and humanitarian monitoring, evaluation, research and learning (MERL).
Though there are many ways of categorizing data, lately I find myself conceptually organizing data streams into four general buckets when thinking about MERL in the aid and development space:
‘Traditional’ data. How we’ve been doing things for(pretty much)ever. Researchers, evaluators and/or enumerators are in relative control of the process. They design a specific questionnaire or a data gathering process and go out and collect qualitative or quantitative data; they send out a survey and request feedback; they do focus group discussions or interviews; or they collect data on paper and eventually digitize the data for analysis and decision-making. Increasingly, we’re using digital tools for all of these processes, but they are still quite traditional approaches (and there is nothing wrong with traditional!).
‘Found’ data. The Internet, digital data and open data have made it lots easier to find, share, and re-use datasets collected by others, whether this is internally in our own organizations, with partners or just in general.These tend to be datasets collected in traditional ways, such as government or agency data sets. In cases where the datasets are digitized and have proper descriptions, clear provenance, consent has been obtained for use/re-use, and care has been taken to de-identify them, they can eliminate the need to collect the same data over again. Data hubs are springing up that aim to collect and organize these data sets to make them easier to find and use.
‘Seamless’ data. Development and humanitarian agencies are increasingly using digital applications and platforms in their work — whether bespoke or commercially available ones. Data generated by users of these platforms can provide insights that help answer specific questions about their behaviors, and the data is not limited to quantitative data. This data is normally used to improve applications and platform experiences, interfaces, content, etc. but it can also provide clues into a host of other online and offline behaviors, including knowledge, attitudes, and practices. One cautionary note is that because this data is collected seamlessly, users of these tools and platforms may not realize that they are generating data or understand the degree to which their behaviors are being tracked and used for MERL purposes (even if they’ve checked “I agree” to the terms and conditions). This has big implications for privacy that organizations should think about, especially as new regulations are being developed such a the EU’s General Data Protection Regulations (GDPR). The commercial sector is great at this type of data analysis, but the development set are only just starting to get more sophisticated at it.
‘Big’ data. In addition to data generated ‘seamlessly’ by platforms and applications, there are also ‘big data’ and data that exists on the Internet that can be ‘harvested’ if one only knows how. The term ‘Big data’ describes the application of analytical techniques to search, aggregate, and cross-reference large data sets in order to develop intelligence and insights. (See this post for a good overview of big data and some of the associated challenges and concerns). Data harvesting is a term used for the process of finding and turning ‘unstructured’ content (message boards, a webpage, a PDF file, Tweets, videos, comments), into ‘semi-structured’ data so that it can then be analyzed. (Estimates are that 90 percent of the data on the Internet exists as unstructured content). Currently, big data seems to be more apt for predictive modeling than for looking backward at how well a program performed or what impact it had. Development and humanitarian organizations (self included) are only just starting to better understand concepts around big data how it might be used for MERL. (This is a useful primer).
Thinking about these four buckets of data can help MERL practitioners to identify data sources and how they might complement one another in a MERL plan. Categorizing them as such can also help to map out how the different kinds of data will be responsibly collected/found/harvested, stored, shared, used, and maintained/ retained/ destroyed. Each type of data also has certain implications in terms of privacy, consent and use/re-use and how it is stored and protected. Planning for the use of different data sources and types can also help organizations choose the data management systems needed and identify the resources, capacities and skill sets required (or needing to be acquired) for modern MERL.
Organizations and evaluators are increasingly comfortable using mobile and/or tablets to do traditional data gathering, but they often are not using ‘found’ datasets. This may be because these datasets are not very ‘find-able,’ because organizations are not creating them, re-using data is not a common practice for them, the data are of questionable quality/integrity, there are no descriptors, or a variety of other reasons.
The use of ‘seamless’ data is something that development and humanitarian agencies might want to get better at. Even though large swaths of the populations that we work with are not yet online, this is changing. And if we are using digital tools and applications in our work, we shouldn’t let that data go to waste if it can help us improve our services or better understand the impact and value of the programs we are implementing. (At the very least, we had better understand what seamless data the tools, applications and platforms we’re using are collecting so that we can manage data privacy and security of our users and ensure they are not being violated by third parties!)
Big data is also new to the development sector, and there may be good reason it is not yet widely used. Many of the populations we are working with are not producing much data — though this is also changing as digital financial services and mobile phone use has become almost universal and the use of smart phones is on the rise. Normally organizations require new knowledge, skills, partnerships and tools to access and use existing big data sets or to do any data harvesting. Some say that big data along with ‘seamless’ data will one day replace our current form of MERL. As artificial intelligence and machine learning advance, who knows… (and it’s not only MERL practitioners who will be out of a job –but that’s a conversation for another time!)
Not every organization needs to be using all four of these kinds of data, but we should at least be aware that they are out there and consider whether they are of use to our MERL efforts, depending on what our programs look like, who we are working with, and what kind of MERL we are tasked with.
I’m curious how other people conceptualize their buckets of data, and where I’ve missed something or defined these buckets erroneously…. Thoughts?