Building bridges between evaluators and big data analysts

By Michael Bamberger, Independent Evaluation Consultant. Michael has been involved in development evaluation for 50 years and recently wrote the report: “Integrating Big Data into the Monitoring and Evaluation of Development Programs” for UN Global Pulse.

MERLTech-2016_Panel_VisualNotes

In Part 1 of this series we argued that, while applications of big data and data analytics are expanding rapidly in many areas of development programs, evaluators have been slow to adopt these applications. We predicted that one possible future scenario could be that evaluation may no longer be considered as a separate function, and that it may be treated as one of the outputs of the integrated information systems that will gradually be adopted by many development agencies. Furthermore, many evaluations will use data analytics approaches, rather than conventional evaluation designs. (Image: Big Data session notes from USAIDLearning’s Katherine Haugh [@katherine_haugh}. MERL Tech DC 2016).

Here, in Part 2 we identify some of the reasons why development evaluators have been slow to adopt big data analytics and we propose some promising approaches for building bridges between evaluators and data analysts.

Why have evaluators been slow to adopt big data analytics?

Caroline Heider at the World Bank Independent Evaluation Group identifies four sets of data collection-related challenges affecting the adoption of new technologies by evaluators: ethics, governance, biases (potentially amplified through the use of ICT), and capacity.

We also see:

1. Weak institutional linkages. Over the past few years some development agencies have created data centers to explore ways to exploit new information technologies. These centers are mainly staffed by people with a background in data science or statistics and the institutional links to the agency’s evaluation office are often weak.

2. Many evaluators have limited familiarity with big data/analytics. Evaluation training programs tend to only present conventional experimental, quasi-experimental and mixed-methods/qualitative designs. They usually do not cover smart data analytics (see Part 1 of this blog). Similarly, many data scientists do not have a background in conventional evaluation methodology (though there are of course exceptions).

3. Methodological differences. Many big data approaches do not conform to the basic principles that underpin conventional program evaluation, for example:

  • Data quality: real-time big data provides one of the potentially most powerful sources of data for development programs. Among other things, real-time data can provide early warning signals of potential diseases (e.g. Google Flu), ethnic tension, drought and poverty (Meier 2015). However, when an evaluator asks if the data is biased or of poor quality, the data analyst may respond “Sure the data is biased (e.g. only captured from mobile phone users or twitter feeds) and it may be of poor quality. All data is biased and usually of poor quality, but it does not matter because tomorrow we will have new data.” This reflects the very different kinds of data that evaluators and data analysts typically work with, and the difference can be explained, but a statement such as the above can create the impression that data analysts do not take issues of bias and data quality very seriously.
  • Data mining: Many data analytics methods are based on the mining of large data sets to identify patterns of correlation, which are then built into predictive models, normally using Bayesian statistics. Many evaluators frown on data mining due to its potentially identifying spurious associations.
  • The role of theory: Most (but not all) evaluators believe that an evaluation design should be based on a theoretical framework (theory of change or program theory) that hypothesizes the processes through which the intended outcomes will be achieved. In contrast, there is plenty of debate among data analysts concerning the role of theory, and whether it is necessary at all. Some even go as far as to claim that data analytics means “the end of theory”(Anderson 2008). This, combined with data mining, creates the impression among some evaluators that data analytics uses whatever data is easily accessible with no theoretical framework to guide the selection of evaluation questions as to assess the adequacy of available data.
  • Experimental designs versus predictive analytics: Most quantitative evaluation designs are based on an experimental or quasi-experimental design using a pretest/posttest comparison group design. Given the high cost of data collection, statistical power calculations are frequently used to estimate the minimum size of sample required to ensure a certain level of statistical significance. Usually this means that analysis can only be conducted on the total sample, as sample size does not permit statistical significance testing for sub-samples. In contrast, predictive analytics usually employ Bayesian probability models. Due to the low cost of data collection and analysis, it is usually possible to conduct the analysis on the total population (rather than a sample), so that disaggregated analysis can be conducted to compare sub-populations, and often (particularly when also using machine learning) to compute outcome probabilities for individual subjects. There continues to be heated debates concerning the merits of each approach, and there has been much less discussion of how experimental and predictive analytics approaches could complement each other.
As Pete York at CommunityScience.com observes: “Herein lies the opportunity – we evaluators can’t battle the wave of big data and data science that will transform the way we do research. However, we can force it to have to succumb to the rules of objective rigor via the scientific method. Evaluators/researchers train people how to do it, they can train machines. We are already doing so.”  (Personal communication 8/7/17)

4. Ethical and political concerns: Many evaluators also have concerns about who designs and markets big data apps and who benefits financially. Many commercial agencies collect data on low income populations (for example their consumption patterns) which may then be sold to consumer products companies with little or no benefit going to the populations from which the information is collected. Some of the algorithms may also include a bias against poor and vulnerable groups (O’Neil 2016) that are difficult to detect given the proprietary nature of the algorithms.

Another set of issues concern whether the ways in which big data are collected and used (for making decisions affecting poor and vulnerable groups) tends to be exclusive (governments and donors use big data to make decisions about programs affecting the poor without consulting them), or whether big data is used to promote inclusion (giving voice to vulnerable groups). These issues are discussed in a recent Rockefeller Foundation blog. There are also many issues around privacy and data security. There is of course no simple answer to these questions, but many of these concerns are often lurking in the background when evaluators are considering the possibility of incorporating big data into their evaluations.

Table 1. Reasons evaluators have been slow to adopt big data and opportunities for bridge building between evaluators and data analysts

Reason for slow adoption

Opportunities for bridge building

1. Weak institutional linkages
  • Strengthening formal and informal links between data centers and evaluators
2. Evaluators have limited knowledge about big data and data analytics
  • Capacity development programs covering both big data and conventional evaluation
  • Collaborative pilot evaluation projects
3. Methodological differences
  • Creating opportunities for dialogue to explore differences and to determine how they can be reconciled
  • Viewing data analytics and evaluation as being complementary rather than competing
4. Ethical and political concerns about big data
  • Greater focus on ethical codes of conduct, privacy and data security
  • Focusing on making approaches to big data and evaluation inclusive and avoiding exclusive/extractive approaches

Building bridges between evaluators and big data/analytics 

There are a number of possible steps that could be taken to build bridges between evaluators and big data analysts, and thus to promote the integration of big data into development evaluation. Catherine Cheney (2016) presents interviews with a number of data scientists and development practitioners stressing that data driven development needs both social and computer scientists. No single approach is likely to be successful, and the best approach(es) will depend on each specific context, but we could consider:

  • Strengthening the formal and informal linkages between data centers and evaluation offices. It may be possible to achieve this within the existing organizational structure, but it will often require some formal organizational changes in terms of lines of communication. Linda Raftree provides a useful framework for understanding how different “buckets” of data (including among others, traditional data and big data) can be brought together, which suggests one pathway to collaboration between data centers and evaluation offices.
  • Identifying opportunities for collaborative pilot projects. A useful starting point may be to identify opportunities for collaboration on pilot projects in order to test/demonstrate the value-added of cooperation between the data analysts and evaluators. The pilots should be carefully selected to ensure that both groups are involved equally in the design of the initiative. Time should be budgeted to promote team-building so that each team can understand the other’s approach.
  • Promoting dialogue to explore ways to reconcile differences of approach and methodology between big data and evaluation. While many of these differences may at first appear to be based on fundamental differences of approach, at least some differences result at least in part from questions of terminology and in other cases it may be that different approaches can be applied at different stages of the evaluation process. For example:
    • Many evaluators are suspicious of real-time data from sources such as twitter, or analysis of phone records due to selection bias and issues of data quality. However, evaluators are familiar with exploratory data (collected, for example, during project visits, or feedback from staff), which is then checked more systematically in a follow-up study. When presented in this way, the two teams would be able to discuss in a non-confrontational way, how many kinds of real-time data could be built into evaluation designs.
    • When using Bayesian probability analysis it is necessary to begin with a prior distribution. The probabilities are then updated as more data becomes available. The results of a conventional experimental design can often be used as an input to the definition of the prior distribution. Consequently, it may be possible to consider experimental designs and Bayesian probability analysis as sequential stages of an evaluation rather than as competing approaches.
  • Integrated capacity development programs for data analysts and evaluators. These activities would both help develop a broader common methodological framework and serve as an opportunity for team building.

Conclusion

There are a number of factors that together explain the slow take-up of big data and data analytics by development evaluators. A number of promising approaches are proposed for building bridges to overcoming these barriers and to promote the integration of big data into development evaluation.

See Part 1 for a list of useful references!

2 thoughts on “Building bridges between evaluators and big data analysts

  1. My take on “Why have evaluators been slow to adopt big data analytics?”

    1. “Big data? I am having enough trouble finding any useful data! How to analyse big data is ‘a problem we would like to have’” This is what I suspect many evaluators are thinking.

    2. “Data mining is BAD” – because data mining is seen as by evaluators something that is ad hoc and non-transparent. Whereas the best data mining practices are systematic and transparent.

    3. “Correlation does not mean causation” – many evaluators have not updated this formulation to the more useful “Association is a necessary but insufficient basis for a strong causal claim”

    4. Evaluators focus on explanatory models and do not give much attention to the uses of predictive models, but both are useful in the real world, including the combination of both.

    5. Lack of appreciation of the limits of manual hypothesis formulation and testing (useful as it can be) as a means of accumulating knowledge. In a project with four outputs and four outcomes there can be 16 different individual causal links between outputs and outcomes, but 2 to the power of 16 possible combinations of these causal links. That’s a lot of theories to choose from (65,536). In this context, search algorithms can be very useful.

    6. Lack of knowledge and confidence in the use of machine learning software. There is still work to be done to make this software more user friendly. Rapid Miner, BigML and EvalC3 are heading in the right direction.

    7. Most evaluators probably don’t know that you can use the above software on small data sets. They don’t only work with large data sets. Yesterday I was using EvalC3 with a data set describing 25 cases only.

    8. The difficulty of understanding some machine learning findings. Decision tree models are eminently readable, but few can explain the internal logic of specific prediction models generated by artificial neural networks. Lack of expainability presents a major problem of public accountability

  2. Dear Rick,

    Thank you for your very interesting comments. These are all important issues so I have responded in some detail. Hopefully other readers might join in the discussion.

    Point 1. While I am sure that the concern of many evaluators is lack of data and this group would love to have access to big data, it has been my experience that many evaluators are not very familiar with big data and others have concerns about who generates and owns big data. The 2017 chapter by Hojlund et al “The current use of big data in evaluation” (in the publication that I cited by Petersonn and Breuel estimated that only about 50 per cent of evaluators were familiar with the basic principles of big data and only about 10 per cent claimed to have used big data in one of their evaluations.

    There are also situations where evaluators could potentially have access to more data but they are not sure how they could analyze it (for technical, time or budget reasons. So there are obviously different scenarios for understanding evaluators attitude to, or knowledge about big data. This latter group has several concerns. First, the fact that many, perhaps most, apps are developed for profit makes some evaluators worry about whether the use of these apps in development programs will lead to some form of exploitation of poor and vulnerable groups for profit. Another concern is whether big data will be used by funding agencies and governments to disempower the poor. The ability to collect data remotely means that it becomes possible to obtain data on and about the poor, without the need to consult with them, and in many cases without them even knowing the data is being used to make important decisions about their future. Finally, there is a concern that the nature of many of the algorithms used by the apps may have potential biases against the poor. Cathy O’Neill’s “The weapons of math destruction: How big data increases inequality and threatens democracy,” documents this concern.

    Points 2 and 3. Data mining, correlation and causation. Many evaluators have been taught that data mining, and the generation of spurious correlations are potentially bad. I agree with you that many evaluators assume that data mining is ad hoc, and assumed to be related to the perceived lack of a theoretical framework to guide the formulation of the analysis plan. The 2008 paper by Anderson “The end of theory: the data deluge makes the scientific method obsolete” and papers with similar titles contributed to this perception. There are also a number of publications on how predictive analytics are used for on-line marketing research that emphasize the use of data mining to identify factors correlated with consumer purchasing behavior (or in some cases increased click rates) that also suggest it is not necessary to understand why an association exists, as long as it helps the client increase sales. Siegel (2013) “Predictive analytics: The power to predict who will click, buy, lie or die” is an example of this approach. You are of course correct that data mining can be a rigorous approach based on a well-articulated analytical framework. But it is often not perceived this way by many evaluators – most of whom are not very familiar with predictive analytics.

    Point 4. I fully agree that there are great benefits to be achieved from combining explanatory and predictive models. The challenge is the need for bridge-building between evaluators with their explanatory models and data analysts with their predictive models. At present much of the discussion still assumes that the two approaches are competing or incompatible. What is needed are opportunities for the two groups to work together to explore possibilities for integrating the two approaches in the same study.

    Point 5. This is a very interesting point about the limitations of manual hypothesis testing. One approach is to start with theory-based approaches (manual hypotheses) to explore how far you can get and whether you can produce useful findings. This can then guide the use of quantitative search algorithms. One advantage of this two-step approach is that the exploration of these manual hypotheses, often complemented by in-depth qualitative research, can also help identify what kinds of information are required to test some of these hypotheses and to what extent this information can be generated from the available big data sets. Work in fields such as gender equality and women’s empowerment, equity or social exclusion often finds that critical information is not available in conventional data sets. In these cases special gender-responsive data may need to be generated from special studies. This kind of in-depth approach could usefully complement the quantitative approaches by identifying whether there are important kinds of information that are not immediately available from the big data sources that are being used for a particular study.

    This example reflects the interest of many researchers in adopting a mixed methods approach that combines big data with conventional evaluation methods.

    Point 6. I don’t think that machine learning (ML) has yet been taken-up by many evaluators. So it would be interesting if you have examples illustrating how ML can be applied in development evaluation. I agree that ML has tremendous potential, but what are the best entry points? This also goes back to your first point as ML usually requires the kinds of large data sets to which many evaluators do not have access. Various of my colleagues working in countries such as India suggest that central government agencies, and also some line ministries, have huge survey data sets most of which have not yet been fully exploited by evaluators. ML is one of the tools that could be very useful for working with these potentially very rich data sets. One of the big challenges is that there has been very little interest so far in finding ways to integrate different sectoral data sets, which could greatly enhance their value for evaluation purposes. This is an example of the “silo” effect, where many research professionals and development agencies only wish to work in their particular area.

    Point 7. The fact that ML and other data analytic tools can be used on relatively small data sets is important as evaluators frequently work with relatively small data sets. However, as you point out, many people assume that ML can only be used with large data sets, so it would be very helpful to provide examples showing the applicability with smaller data sets.

    Point 8. Your point about the difficulty of understanding the logic of many big data analytic tools is obviously still a barrier.

    Thank you again for your very stimulating comments.

    Regards

    Michael

Please Leave a Reply