by Linda Raftree, Independent Consultant and MERL Tech Organizer
It can be overwhelming to get your head around all the different kinds of data and the various approaches to collecting or finding data for development and humanitarian monitoring, evaluation, research and learning (MERL).
Though there are many ways of categorizing data, lately I find myself conceptually organizing data streams into four general buckets when thinking about MERL in the aid and development space:
- ‘Traditional’ data. How we’ve been doing things for(pretty much)ever. Researchers, evaluators and/or enumerators are in relative control of the process. They design a specific questionnaire or a data gathering process and go out and collect qualitative or quantitative data; they send out a survey and request feedback; they do focus group discussions or interviews; or they collect data on paper and eventually digitize the data for analysis and decision-making. Increasingly, we’re using digital tools for all of these processes, but they are still quite traditional approaches (and there is nothing wrong with traditional!).
- ‘Found’ data. The Internet, digital data and open data have made it lots easier to find, share, and re-use datasets collected by others, whether this is internally in our own organizations, with partners or just in general.These tend to be datasets collected in traditional ways, such as government or agency data sets. In cases where the datasets are digitized and have proper descriptions, clear provenance, consent has been obtained for use/re-use, and care has been taken to de-identify them, they can eliminate the need to collect the same data over again. Data hubs are springing up that aim to collect and organize these data sets to make them easier to find and use.
- ‘Seamless’ data. Development and humanitarian agencies are increasingly using digital applications and platforms in their work — whether bespoke or commercially available ones. Data generated by users of these platforms can provide insights that help answer specific questions about their behaviors, and the data is not limited to quantitative data. This data is normally used to improve applications and platform experiences, interfaces, content, etc. but it can also provide clues into a host of other online and offline behaviors, including knowledge, attitudes, and practices. One cautionary note is that because this data is collected seamlessly, users of these tools and platforms may not realize that they are generating data or understand the degree to which their behaviors are being tracked and used for MERL purposes (even if they’ve checked “I agree” to the terms and conditions). This has big implications for privacy that organizations should think about, especially as new regulations are being developed such a the EU’s General Data Protection Regulations (GDPR). The commercial sector is great at this type of data analysis, but the development set are only just starting to get more sophisticated at it.
- ‘Big’ data. In addition to data generated ‘seamlessly’ by platforms and applications, there are also ‘big data’ and data that exists on the Internet that can be ‘harvested’ if one only knows how. The term ‘Big data’ describes the application of analytical techniques to search, aggregate, and cross-reference large data sets in order to develop intelligence and insights. (See this post for a good overview of big data and some of the associated challenges and concerns). Data harvesting is a term used for the process of finding and turning ‘unstructured’ content (message boards, a webpage, a PDF file, Tweets, videos, comments), into ‘semi-structured’ data so that it can then be analyzed. (Estimates are that 90 percent of the data on the Internet exists as unstructured content). Currently, big data seems to be more apt for predictive modeling than for looking backward at how well a program performed or what impact it had. Development and humanitarian organizations (self included) are only just starting to better understand concepts around big data how it might be used for MERL. (This is a useful primer).
Thinking about these four buckets of data can help MERL practitioners to identify data sources and how they might complement one another in a MERL plan. Categorizing them as such can also help to map out how the different kinds of data will be responsibly collected/found/harvested, stored, shared, used, and maintained/ retained/ destroyed. Each type of data also has certain implications in terms of privacy, consent and use/re-use and how it is stored and protected. Planning for the use of different data sources and types can also help organizations choose the data management systems needed and identify the resources, capacities and skill sets required (or needing to be acquired) for modern MERL.
Organizations and evaluators are increasingly comfortable using mobile and/or tablets to do traditional data gathering, but they often are not using ‘found’ datasets. This may be because these datasets are not very ‘find-able,’ because organizations are not creating them, re-using data is not a common practice for them, the data are of questionable quality/integrity, there are no descriptors, or a variety of other reasons.
The use of ‘seamless’ data is something that development and humanitarian agencies might want to get better at. Even though large swaths of the populations that we work with are not yet online, this is changing. And if we are using digital tools and applications in our work, we shouldn’t let that data go to waste if it can help us improve our services or better understand the impact and value of the programs we are implementing. (At the very least, we had better understand what seamless data the tools, applications and platforms we’re using are collecting so that we can manage data privacy and security of our users and ensure they are not being violated by third parties!)
Big data is also new to the development sector, and there may be good reason it is not yet widely used. Many of the populations we are working with are not producing much data — though this is also changing as digital financial services and mobile phone use has become almost universal and the use of smart phones is on the rise. Normally organizations require new knowledge, skills, partnerships and tools to access and use existing big data sets or to do any data harvesting. Some say that big data along with ‘seamless’ data will one day replace our current form of MERL. As artificial intelligence and machine learning advance, who knows… (and it’s not only MERL practitioners who will be out of a job –but that’s a conversation for another time!)
Not every organization needs to be using all four of these kinds of data, but we should at least be aware that they are out there and consider whether they are of use to our MERL efforts, depending on what our programs look like, who we are working with, and what kind of MERL we are tasked with.
I’m curious how other people conceptualize their buckets of data, and where I’ve missed something or defined these buckets erroneously…. Thoughts?