A new wave of technologies and approaches has the potential to influence how monitoring, evaluation, research and learning (MERL) practitioners do their work. The growth in use of smartphones and the internet, digitization of existing data sets, and collection of digital data make data increasingly available for MERL activities. This changes how MERL is conducted and, in some cases, who conducts it.
We hypothesized that emerging technology is revolutionizing the types of data that can be collected and accessed and the ways that it can be processed and used for better MERL. However, improved research on and documentation of how these technologies are being used is required so the sector can better understand where, when, why, how, and for which populations and which types of MERL these emerging technologies would be appropriate.
The team reviewed the state of the field and found there were three key new areas of data that MERL practitioners should consider:
New kinds of data sources, such as application data, sensor data, data from drones and biometrics. These types of data are providing more access to information and larger volumes of data than ever before.
New types of systems for data storage. The most prominent of these was the distributed ledger technologies (also known as blockchain) and an increasing use of cloud and edge computing. We discuss the implications of these technologies for MERL.
New ways of processing data, mainly from the field of machine learning, specifically supervised and unsupervised learning techniques that could help MERL practitioners manage large volumes of both quantitative and qualitative data.
These new technologies hold great promise for making MERL practices more precise, automated and timely. However, some challenges include:
A need to clearly define problems so the choice of data, tool, or technique is appropriate
Non-representative selection bias when sampling
Reduced MERL practitioner or evaluator control
Change management needs to adapt how organizations manage data
Rapid platform changes and difficulty with assessing the costs
A need for systems thinking which may involve stitching different technologies together
To address emerging challenges and make best use of the new data, tools, and approaches, we found a need for capacity strengthening for MERL practitioners, greater collaboration among social scientists and technologists, a need for increased documentation, and a need for the incorporation of more systems thinking among MERL practitioners.
Finally there remains a need for greater attention to justice, ethics and privacy in emerging technology.
Big data is a big topic in other sectors but its application within monitoring and evaluation (M&E) is limited, with most reports focusing more on its potential rather than actual use. Our paper, “Big Data to Data Science: Moving from ‘What’ to ‘How’ in the MERL Tech Space” probes trends in the use of big data between 2014 and 2019 by a community of early adopters working in monitoring, evaluation, research, and learning (MERL) in the development and humanitarian sectors. We focus on how MERL practitioners actually use big data and what encourages or deters adoption.
First, we collated administrative and publicly available MERL Tech conference data from the 281 sessions accepted for presentation between 2015 and 2019. Of these, we identified 54 sessions that mentioned big data and compared trends between sessions that did and did not mention this topic. In any given year from 2015 to 2019, 16 percent to 26 percent of sessions at MERL Tech conferences were related to the topic of big data. (Conferences were held in Washington DC, London, and Johannesburg).
Our quantitative analysis was complemented by 11 qualitative key informant interviews. We selected interviewees representing diverse viewpoints (implementers, donors, MERL specialists) and a range of subject matter expertise and backgrounds. During interviews, we explored why an interviewee chose to use big data, the benefits and challenges of using big data, reflections on the use of big data in the wider MERL tech community, and opportunities for the future.
Our findings indicate that MERL practitioners are in a fragmented, experimental phase, with use and application of big data varying widely, accompanied by shifting terminologies. One interviewee noted that “big data is sort of an outmoded buzzword” with practitioners now using terms such as ‘artificial intelligence’ and ‘machine learning.’ Our analysis attempted to expand the umbrella of terminologies under which big data and related technologies might fall. Key informant interviews and conference session analysis identified four main types of technologies used to collect big data: satellites, remote sensors, mobile technology, and M&E platforms, as well as a number of other tools and methods. Additionally, our analysis surfaced six main types of tools used to analyze big data: artificial intelligence and machine learning, geospatial analysis, data mining, data visualization, data analysis software packages, and social network analysis.
Barriers to adoption
We also took an in-depth look at barriers to and enablers of use of big data within MERL, as well as benefits and drawbacks. Our analysis found that perceived benefits of big data included enhanced analytical possibilities, increased efficiency, scale, data quality, accuracy, and cost-effectiveness. Big data is contributing to improved targeting and better value for money. It is also enabling remote monitoring in areas that are difficult to access for reasons such as distance, poor infrastructure, or conflict.
Concerns about bias, privacy, and the potential for big data to magnify existing inequalities arose frequently. MERL practitioners cited a number of drawbacks and limitations that make them cautious about using big data. These include lack of trust in the data (including mistrust from members of local communities); misalignment of objectives, capacity, and resources when partnering with big data firms and the corporate sector; and ethical concerns related to privacy, bias, and magnification of inequalities. Barriers to adoption include insufficient resources, absence of relevant use cases, lack of skills for big data, difficulty in determining return on investment, and challenges in pinpointing the tangible value of using big data in MERL.
Our paper includes a series of short case studies of big data applications in MERL. Our research surfaced a need for more systematic and broader sharing of big data use cases and case studies in the development sector.
The field of Big Data is rapidly evolving, thus we expect that shifts have happened already in the field since the beginning of our research in 2018. We recommend several steps for advancing with Big Data / Data Science in the MERL Space, including:
Consider. MERL Tech practitioners should examine relevant learning questions before deciding whether big data is the best tool for the MERL job at hand or whether another source or method could answer them just as well.
Pilot testing of various big data approaches is needed in order to assess their utility and the value they add. Pilot testing should be collaborative; for example, an organization with strong roots at the field level might work with an agency that has technical expertise in relevant areas.
Documenting. The current body of documentation is insufficient to highlight relevant use cases and identify frameworks for determining return on investment in big data for MERL work. The community should do more to document efforts, experiences, successes, and failures in academic and gray literature.
Sharing. There is a hum of activity around big data in the vibrant MERL Tech community. We encourage the MERL Tech community to engage in fora such as communities of practice, salons, events, and other convenings, and to seek less typical avenues for sharing information and learning and to avoid knowledge silos.
Learning. The MERL Tech space is not static; indeed, the terminology and applications of big data have shifted rapidly in the past 5 years and will continue to change over time. The MERL Tech community should participate in new training related to big data, continuing to apply critical thinking to new applications.
Guiding. Big data practitioners are crossing exciting frontiers as they apply new methods to research and learning questions. These new opportunities bring significant responsibility. MERL Tech programs serve people who are often vulnerable — but whose rights and dignity deserve respect. As we move forward with using big data, we must carefully consider, implement, and share guidance for responsible use of these new applications, always honoring the people at the heart of our interventions.
Sometimes we are not clear on why we are collecting data. ‘Just because we can’ is not a valid reason to collect or use data and technology. What purposes are driving our data collection and use of technology? What is the problem we are trying to solve? A lack of specificity can allow us stray into speculative data collection — if we’re collecting data on X, then it’s a good opportunity to collect data on Y “in case we need it in the future”. Do we ever really need it in the future? And if we do go back to it, we often find that because we didn’t collect the data on Y with a specific purpose, it’s not the “right” data for our needs. So, let’s always ask ourselves why are we collecting this data, do we really need it?
Projects are increasingly under pressure to be more efficient and cost-effective in their data collection, yet the need or desire to conduct more robust assessments can requires the collection of data on multiple dimensions within a community. These two dynamics are often in conflict with each other. Here are three questions that can help guide our decision making:
Are there existing data sets that are “good enough” to meet the M&E needs of a project? Often there are, and they are collected regularly enough to be useful. Lean on partners who understand the data space to help map out what exists and what really needs to be collected. Leverage partners who are innovating in the data space – can machine learning and AI-produced data meet 80% of your needs? If so, consider it.
What data are we critically in need of to assess a project? Build an efficient data collection methodology that considers respondent burden and potentially includes multiple channels for receiving responses to increase inclusivity.
What will the data be used for? Sensitive contexts and life or death decisions require a different level of specificity and periodicity than less sensitive projects. Think about data from this lens when deciding which information to collect, how often to collect it, and who to collect it from.
It is worth exploring questions of access in our data collection practices. Who has access to the data and the technology? Do the people about whom the data is, have access to it? Have we considered the harms that could come from the collection, storage, and use of data? For instance, while it can be useful to know where all the clients are who are accessing a pregnancy clinic to design better services, an unintended consequence may involve others having the ability to identify people who are pregnant, which pregnant people might not like these others to know. What can we do to protect the privacy of vulnerable populations? Also, going digital can be helpful, but if a person or community implicated in a data collection endeavour does not have access to technology or to a charging point – are we not just increasing or reinforcing inequality?
While we often advocate for transparency in many parts of our industry, we are not always transparent about our data practices. Are we willing to tell others, to tell community members, why we are collecting data, using technology, and how we are using information? If we are clear on our purpose, but not willing for it to be transparent, then it might be a good reason to reconsider. Yet, transparency does not equate accountability, so what are the mechanisms for ensuring greater accountability towards the people and communities we seek to serve?
Power and patience
One of the issues we’re facing is power imbalances. The demands that are made of us from donors about data, and the technology solutions that are presented to us, all make us feel like we’re not in control. But the rules haven’t been written yet — we get to write them.
One of the lessons from the responsible data workshop leading up to the conference was that organisations can get out in front of demands for data by developing their own data management and privacy policies. From this position it is easier to enter into dialogues and negotiations, with the organisational policy as your backstop. Therefore, it is worth asking, Who has power? For what? Where does it reside and how can we rebalance it?
Literacy underpins much of this – linguistic, digital, identity, ethical literacy. Often when it comes to ‘digital’ we immediately fall under the spell of the tyranny of the urgent. Therefore, in what ways can we adopt a more ‘patient’ or ‘reflective’ practice with respect to digital?
MERL Tech DC kicked off with a pre-conference workshop on September 5th that focused on what the Blockchain is and how it could influence MEL.
The workshop was broken into four parts: 1) blockchain 101, 2) how the blockchain is influencing and could influence MEL, 3) case studies to demonstrate early lessons learned, and 4) outstanding issues and emerging themes.
This blog focuses and builds on the fourth area. At the end, we provide additional resources that will be helpful to all interested in exploring how the blockchain could disrupt and impact international development at large.
Workshop Takeaways and Afterthoughts
For our purposes here, we have distilled some of the key takeaways from the workshop. This section includes a series of questions that we will respond to and link to various related reference materials.
Who are the main blockchain providers and what are they offering?
Any time a new “innovation” is introduced into the international development space, potential users lack knowledge about what the innovation is, the value it can add, and the costs of implementing it. This lack of knowledge opens the door for “snake oil salesmen” who engage in predatory attempts to sell their services to users who don’t have the knowledge to make informed decisions.
We’ve seen this phenomenon play out with blockchain. Take, for example, the numerous Initial Coin Offerings (ICO’s) that defrauded their investors, or the many instances of service providers offering low quality blockchain education trainings and/or project solutions.
Education is the best defense against being taken advantage of by snake oil salesmen. If you’re looking for general education about blockchain, we’ve included a collection of helpful tools in the table below. If your group is working to determine whether a blockchain solution is right for the problem at hand, the USAID Blockchain Primer offers easy to use decision trees that can help you. Beyond these, Mercy Corp has just published Block by Block, which outlines the attributes of various distributed ledgers along some very helpful lines that are useful when considering what distributed ledger technology to use.
Words of warning aside, there are agencies that provide genuine blockchain solutions. For a full list of providers please visit www.blockchainomics.tech, an information database run by The Development CAFE on all things blockchain.
Bottom Line: Beware the snake oil salesmen preaching the benefits of blockchain but silent on the feasibility of their solution. Unless the service provider is just as focused on your problem as you are, be wary that they are just trying to pitch a solution (viable or not) and not solve the problem. Before approaching the companies or service providers, always identify your problem and see if Blockchain is indeed a viable solutions.
How does governance of the blockchain influence its sustainability?
In the past, we’ve seen technology-led social impact solutions make initial gains that diminished over time until there is no sustained impact. Current evidence shows that many solutions of this sort fail because they are not designed to solve a specific problem in a relevant ecosystem. This insight has given rise to the Digital Development Principles and the Ethical Considerations that should be taken into account for blockchain solutions.
Bottom Line: Impact is achieved and sustained by the people who use a tool. Thus, blockchain, as a tool, does not sustain impacts on its own. People do so by applying knowledge about the principles and ethics needed for impact. Understanding this, our next step is to generate more customized principles and ethical considerations for blockchain solutions through case studies and other desperately needed research.
How do the blockchain, big data, and Artificial Intelligence influence each other?
The blockchain is a new type of distributed ledger system that could have massive social implications. Big Data refers to the exponential increase in data we experience through the Internet of Things (IoT) and other data sources (Smart Infrastructure, etc.). Artificial Intelligence (AI) assists in identifying and analyzing this new data at exponentially faster rates than is currently the case.
Blockchain is a distributed ledger, in essence, a database of transactions, just like any other database, it’s a repository, and it is contributing to the growth of Big Data. AI can be used to automate the process of data entry into the blockchain. This is how the three are connected.
The blockchain is considered a leading contender as the ledger of choice for big data because: 1) due to its distributed nature it can handle much larger amounts of data in a more secure fashion than is currently possible with cloud computing, and 2) it is possible to automate the way big data is uploaded to the blockchain. AI tools are easily integrated into blockchain functions to run searches and analyze data, and this opens up the capacity to collect, analyze and report findings on big data in a transparent and secure manner more efficiently than ever before.
Bit by Bit is a very readable and innovative overview of how to conduct social science research in the digital age of big data, artificial intelligence and the blockchain. It gives the reader a quality introduction into some of the dominant themes and issues to consider when attempting to evaluate either a technology lead solution or use technology to conduct social research.
Given its immutability, how can an adaptive management system work with the blockchain?
This is a critical point. The blockchain is an immutable record, it is almost impossible (meaning it has never been done and there are no simulations where current technology is able to take control of a properly designed blockchain) to hijack, hack, or alter. Thus the blockchain provides the security needed to mitigate corruption and facilitate audits.
This immutability does not mitigate any type of adaptive management approach, however. Adaptive Management requires small iterative course corrections informed by quality data around what is and is not working. This data record and the course corrections provide a rich data set that is extremely valuable to replication efforts because they subvert the main barrier to replication — lack of data on what does and does not work. Hence in this case the immutability of the blockchain is a value add to Adaptive Management. This is more of a question of good adaptive management practices rather than whether the blockchain is a viable tool for these purposes.
It is important to note that you can append information on blocks (not amend), so there will always be a record of previous mistakes (auditability), but the most recent layer of truth is what’s being viewed/queried/verified, etc. Hence, immutability is not a hurdle but a help.
What are the first steps an organization should take when deciding on whether to adopt a blockchain solution?
Each problem that an organization faces is unique, but the following simple steps can help one make a decision:
Identify your problem (using tools such as Developmental Evaluation or Principles of Digital Development)
Understand the blockchain technology, concepts, functionality, requirements and cost
See if your problem can be solved by blockchain rather than a centralized database
Consider the advantages and disadvantages
Identify the right provider and work with them in developing the blockchain
Consider ethical principles and privacy concerns as well as other social inequalities
Deploy in pilot phases and evaluate the results using an agile approach
What can be done to protect PII and other sensitive information on a blockchain?
Blockchain uses cryptography to store its data. That PII and other information cannot be viewed by anyone other than those who have access to the ‘keys’. While developing a blockchain, it’s important to ensure that what goes in is protected and that access to is regulated. Another critical step is promoting literacy on the use of blockchain and its features among stakeholders.
References Correlated to Take Aways
This table organizes current reference materials as related to the main questions we discussed in the workshop. (The question is in the left hand column and the reference material with a brief explanation and hyperlink is in the right hand column).
Resources and Considerations
Who are the main blockchain platforms? Who are the providers and what are they offering?
IBM, ConsenSys, Microsoft, AWS, Cognizant, R3, and others, are offering products and enterprise solutions.
Block by Block is a valuable comparison tool for assessing various platforms.
How does governance of the blockchain influence its sustainability?
See Beeck Center’s Blockchain Ethical Design Framework. Decentralization (how many nodes), equity amongst nodes, rules, transparency are all factors in long-term sustainability. Likewise the Principles for Digital Development have a lot of evidence behind them for their contributions to sustainability.
How do the blockchain, big data and Artificial Intelligence influence each other?
They can be combined in various ways to strengthen a particular service or product. There is no blanket approach, just as there is not blanket solution to any social impact problem. The key is to know the root cause of the problem at hand and how the function of each tool used separately and in conjunction can address these root causes.
Given its immutability, how can an adaptive management system work with the blockchain?
Ask how mistakes are corrected when creating a customized solution, or purchasing a product. Usually, there will be a way to do that, through an easy to use, user interface.
What are the first steps an organization should take when they are deciding on whether to adopt a blockchain solution?
Participate in demos, and test some of the solutions for your own purposes or use cases. Use the USAID Blockchain Primer and reach out to trusted experts to provide advice. Given that the blockchain is primarily open source code, once you have decided that a blockchain is a viable solution for your problem, GitHub is full of open source code that you can modify for your own purposes.
I had the opportunity to attend MERL Tech (September 7-8, 2017 Washington, DC). I was struck by the number of very thoughtful and content driven sessions. Coming from an IT/technology perspective, it was so refreshing to hear about the intersection of technology and humanitarian programs and how technology can provide the tools and data to positively impact decision making.
One of the sessions, “Big data, big problems, big solutions: Incorporating responsible data principles in institutional data management” was particularly poignant. The session was presented by Paul Perrin from University of Notre Dame, Alvaro Cobo & Jeff Lundberg from Catholic Relief Services and Gillian Kerr from LogicalOutcomes. The overall theme of the presentation was that in the field of evaluation and ICT4D, we must be thoughtful, diligent and take responsibility for protecting people’s personal and patient data; the potential risk for having a data breach is very high.
Paul started the session by highlighting the fact that data breaches which expose our personal data, credit card information and health information have become a common occurrence. He brought the conversation back to monitoring and evaluation and research and the gray area between the two, leading to confusion about data privacy. Paul’s argument is that evaluation data is used for research later in a project without proper approval of those receiving services. The risk for misuse and incorrect data handling increases significantly.
Alvaro and Jeff talked about a CRS data warehousing project and how they have made data security and data privacy a key focus. The team looked at the data lifecycle – repository design, data collection, storage, utilization, sharing and retention/destruction – and they are applying best data security practices throughout. And finally, Gillian described the very concerning situation that at NGOs, M&E practitioners may not be aware of data security and privacy best practices or don’t have the funds to correctly meet minimum security standards and leave this critical data aspect behind as “too complicated to deal with.”
The presentation team advocates for the following:
Deeper commitment to informed consent
Reasoned use of identifiers
Need to know vs. nice to know
Data security and privacy protocols
Data use agreements and protocols for outside parties
Revisit NGO primary and secondary data IRB requirements
This message resonated with me in a surprising way. Project Balance specializes in developing data collection applications, data warehousing and data visualization. When we embark on a project we are careful to make sure that sensitive data is handled securely and that client/patient data is de-identified appropriately. We make sure that client data can only be viewed by those that should have access; that tables or fields within tables that hold identifying information are encrypted. Encryption is used for internet data transmission and depending on the application the entire database may be encrypted. And in some cases the data capture form that holds a client’s personal and identifying information may require that the user of the system re-log in.
After hearing the presentation I realized Project Balance could do better. As part of our regular software requirements management process, we will now create a separate and specialized data security and privacy plan document, which will enhance our current process. By making this a defined requirements gathering step, the importance of data security and privacy will be highlighted and will help our customers address any gaps that are identified before the system is built.
Many thanks to the session presenters for bringing this topic to the fore and for inspiring me to improve our engagement process!