Tag Archives: remote sensing

Oops! Satellite Imagery Cannot Predict Human Development Indicators

Guest post from Wayan Vota

In many developing country environments, it is difficult or impossible to obtain recent, reliable estimates of human development. Nationally representative household surveys, which are the standard instrument for determining development policy and priorities, are typically too expensive to collect with any regularity.

Recently, however, researchers have shown the potential for remote sensing technologies to provide a possible solution to this data constraint. In particular, recent work indicates that satellite imagery can be processed with deep neural networks to accurately estimate the sub-regional distribution of wealth in sub-Saharan Africa.

Testing Neural Networks to Process Satellite Imagery

In the paper, Can Human Development be Measured with Satellite Imagery?, Andrew Head, Mélanie Manguin, Nhat Tran, and Joshua Blumenstock explore the extent to which the same approach – of using convolutional neural networks to process satellite imagery – can be used to measure a broader set of human development indicators, in a broader range of geographic contexts.

Their analysis produces three main results:

  • They successfully replicate prior work showing that satellite images can accurately infer a wealth-based index of poverty in sub-Saharan Africa.
  • They show that this approach can generalize to predicting poverty in other countries and continents, but that the performance is sensitive to the hyperparameters used to tune the learning algorithm.
  • They find that this approach does not trivially generalize to predicting other measures of development such as educational attainment, access to drinking water, and a variety of health-related indicators.

This paper shows that while satellite imagery and machine learning may provide a powerful paradigm for estimating the wealth of small regions in sub-Saharan Africa, the same approach does not trivially generalize to other geographical contexts or to other measures of human development.

In this assessment, it is important to emphasize what they mean by “trivially,” because in truth the point they are making is somewhat circumspect. Specifically, what they have shown is that the exact framework—of retraining a deep neural network on night-lights data, and then using those features to predict the wealth of small regions in sub-Saharan Africa—cannot be directly applied to predicting arbitrary indicators in any country with uniformly good results.

This is an important point to make because absent empirical evidence to the contrary, it is likely that policymakers eager to gain quick access to micro-regional measurements of development might be tempted to do exactly what they have done in this paper, without paying careful attention to the thorny issues of generalizability that they have uncovered in this analysis.

It is not the researchers’ intent to impugn the potential for related approaches to provide important new methods for measuring development, but rather to say that such efforts should proceed with caution, and with careful validation.

Why Satellite Imagery Might Fail to Predict Development

The results showed that while some indicators like wealth and education can be predicted reasonably well in many countries, other development indicators are much more brittle, exhibiting high variance between and within countries, and others perform poorly everywhere.

Thus it is useful to distinguish between two possible reasons why the current approach may have failed to generalize to these measures of development.

  • It may be that this exercise is fundamentally not possible, and that no amount of additional work would yield qualitatively different results.
  • It is quite possible that their investigation to date has been not been sufficiently thorough, and that more concerted efforts could significantly improve the performance of these models

Insufficient “signal” in the satellite imagery.

The researchers’ overarching goal is to use information in satellite images to measure different aspects of human development. The premise of such an approach is that the original satellite imagery must contain useful information about the development indicator of interest. Absent of such a signal, no matter how sophisticated our computational model, the model is destined to fail.

The fact that wealth specifically can be measured from satellite imagery is quite intuitive. For instance, there are visual features one might expect correlate with wealth—large buildings, metals roofs, nicely paved roads, and so forth.

It may be the case that other measures of human development cannot be seen from above. For instance, it may be a fundamentally difficult task to infer the prevalence of malnutrition from satellite imagery, if the regions with high and low rates of malnutrition appear similar, even though they hypothesize that these indices should correlate with wealth index.

They were, however, surprised by the relative under-performance of models designed to predict access to drinking water, as they expected the satellite-based features to capture proximity to bodies of water, which in turn might affect access to drinking water.

(Over-) reliance on night-lights may not generalize.

Their reliance on night lights might help explain why some indicators were predicted less successfully in some countries than others. An example in their study includes Nepal, where the accuracy in predicting access to electricity was much lower (R2 = 0.24) than in the other countries (R2 = 0.69, 0.44, and 0.54 in Rwanda, Nigeria, and Haiti, respectively).

This may be partly due to the fact that Nepal has a very low population density (half as dense as Haiti and Rwanda) and very high levels of electrification (twice as high as Haiti, Rwanda, and Nigeria).

If the links between electrification, night-lights, and daytime imagery are broken in Nepal, they would expect their modeling approach to fail. More generally, they expect that when a development indicator does not clearly relate to the presence of nighttime lights, it may be unreasonable to expect good performance from the transfer learning process as a whole.

Deep learning vs. supervised feature engineering.

In this paper, the researchers focused explicitly on using the deep/transfer learning approach to extracting information from satellite images. While powerful, it is also possible that other approaches to feature engineering might be more successful than the brute force approach of the convolutional neural network.

For instance, Gros and Tiecke have recently shown how hand-labeled features from satellites, and specifically information about the types of buildings that are present in each image, can be quite effective in predicting population density. Labeling images in this manner is resource intensive, and they did not have the opportunity to test such approaches.

However, they believe that careful encoding of the relevant information from satellite imagery would likely bolster the performance of specific prediction tasks.

Neural Networks Can Still Process Satellite Imagery

Broadly, the researchers remain optimistic that future work using novel sources of data and new computational algorithms can engender significant advances in the measurement of human development.

However, it is imperative that such work proceeds carefully, with appropriate benchmarking and external calibration. Promising new tools for measurement have the potential to be implemented widely, possibly by individuals who do not have extensive expertise in the underlying algorithms.

Applied blindly, these algorithms have the potential to skew subsequent policy in unpredictable and undesirable ways. They view the results of this study as a cautionary example of how a promising algorithm should not be expected to work “off the shelf” in a context that is significantly different from the one in which it was originally developed.

The post Oops! Satellite Imagery Cannot Predict Human Development Indicators appeared first on ICTworks.

New Report: Global Innovations in Measurement and Evaluation

All 8 innovationsOn June 26th, New Philanthropy Capital (NPC) released its “Global Innovations in Measurement and Evaluation” report. In it, NPC outlines and elaborates on eight concepts that represent innovations in conducting effective measurement and evaluation of social impact programs. The list of concepts was distilled from conversations with leading evaluation experts about what is exciting in the field and what is most likely to make a long-lasting impact on the practice of evaluation. Below, we feature each of these eight concepts accompanied by brief descriptions of their meanings and implications.


The key to making an evaluation user-centric is to ensure that the service users are truly involved in every stage of the evaluation process. In this way, the power dynamic ceases to be unidirectional as more agency is given to the user. As a result, not only can findings become more compelling to decision makers because of more robust data collection, but also those responsible for the program now become accountable to the users in addition to the funders, a shift that is both ethically important and that is important for the trust it builds.

Shared Measurement & Evaluation

Shared measurement and evaluation requires multiple organizations with similar missions, programs or users to work together to measure their own and their combined impact. This involves using the same evaluation metrics and, at a more advanced stage, developing shared measurement tools and methodologies. Pooling data and comparing outcomes creates a bigger dataset that can support stronger conclusions and provide more insights.

Theory-Based Evaluation

The central idea behind theory-based evaluation is to not only measure the outcome of a program but to also get at the reason why it does or does not work. Typically, this approach begins with a theory of change that proposes an explanation for how activities lead to impact, and this theory is then tested and accepted, refuted or qualified. It is important to apply this concept because without an understanding of why programs work, there is a risk that mistakes will be repeated or that attempts to replicate a program will fail when attempted under different conditions.

Impact Management

Impact management is the integration of impact assessment into strategy and performance management by regularly collecting data and responding to it with course corrections designed to improve the outcomes of a program. This method contrasts with assessment strategies that only examine a program at the end of its life cycle. The objective here is to be flexible and adaptive in order to produce a more effective intervention rather than waiting to evaluate it until there is nothing that can be done to change it.

Data Linkage

Data linkage is the act of bringing together different but relevant data about a specified group of users from beyond a single organization or sub-sector dataset. One example could be a homelessness charity that supports its users in accessing social housing linking its data with the local council to see if its users ultimately remained in their homes. In essence, this method allows organizations to leverage the increasing quantities of data to create comparison groups to track the long term impacts of their programs.

Big Data

Big data is typically considered as the data generated as a by-product of digital transactions and interactions. It is a category that includes people’s social media activity, web searches and digital financial transaction trails. New technology has expanded the human ability to analyze large datasets, and consequently big data has become a powerful tool for helping identify trends and patterns, even if it does not provide explanations for them.

Remote Sensing

Remote sensing uses technology, such as mobile phones, to gather information from afar. This method is useful because it allows one to collect data that may not be typically accessible. Additionally, remote sensing data can be highly detailed, accurate, and in real time. Finally, one of its great strengths is that it is generated passively, which reduces the possibility of introducing researcher bias through human input.

Data Visualization

Data visualization is the practice of presenting data in a graphic form. New technology has made it possible to create a broad range of useful visualizations. The result is that data is now more accessible to non-specialists, and the insights produced through analysis can now be better understood and communicated.

For more details and more examples of real-world applications of these concepts, check out the full “Global Innovations in Measurement and Evaluation” report here.