January 15, 2019

Oops! Satellite Imagery Cannot Predict Human Development Indicators

Guest post from Wayan Vota

In many developing country environments, it is difficult or impossible to obtain recent, reliable estimates of human development. Nationally representative household surveys, which are the standard instrument for determining development policy and priorities, are typically too expensive to collect with any regularity.

Recently, however, researchers have shown the potential for remote sensing technologies to provide a possible solution to this data constraint. In particular, recent work indicates that satellite imagery can be processed with deep neural networks to accurately estimate the sub-regional distribution of wealth in sub-Saharan Africa.

Testing Neural Networks to Process Satellite Imagery

In the paper, Can Human Development be Measured with Satellite Imagery?, Andrew Head, Mélanie Manguin, Nhat Tran, and Joshua Blumenstock explore the extent to which the same approach – of using convolutional neural networks to process satellite imagery – can be used to measure a broader set of human development indicators, in a broader range of geographic contexts.

Their analysis produces three main results:

They successfully replicate prior work showing that satellite images can accurately infer a wealth-based index of poverty in sub-Saharan Africa.
They show that this approach can generalize to predicting poverty in other countries and continents, but that the performance is sensitive to the hyperparameters used to tune the learning algorithm.
They find that this approach does not trivially generalize to predicting other measures of development such as educational attainment, access to drinking water, and a variety of health-related indicators.

This paper shows that while satellite imagery and machine learning may provide a powerful paradigm for estimating the wealth of small regions in sub-Saharan Africa, the same approach does not trivially generalize to other geographical contexts or to other measures of human development.

In this assessment, it is important to emphasize what they mean by “trivially,” because in truth the point they are making is somewhat circumspect. Specifically, what they have shown is that the exact framework—of retraining a deep neural network on night-lights data, and then using those features to predict the wealth of small regions in sub-Saharan Africa—cannot be directly applied to predicting arbitrary indicators in any country with uniformly good results.

This is an important point to make because absent empirical evidence to the contrary, it is likely that policymakers eager to gain quick access to micro-regional measurements of development might be tempted to do exactly what they have done in this paper, without paying careful attention to the thorny issues of generalizability that they have uncovered in this analysis.

It is not the researchers’ intent to impugn the potential for related approaches to provide important new methods for measuring development, but rather to say that such efforts should proceed with caution, and with careful validation.

Why Satellite Imagery Might Fail to Predict Development

The results showed that while some indicators like wealth and education can be predicted reasonably well in many countries, other development indicators are much more brittle, exhibiting high variance between and within countries, and others perform poorly everywhere.

Thus it is useful to distinguish between two possible reasons why the current approach may have failed to generalize to these measures of development.

It may be that this exercise is fundamentally not possible, and that no amount of additional work would yield qualitatively different results.
It is quite possible that their investigation to date has been not been sufficiently thorough, and that more concerted efforts could significantly improve the performance of these models

Insufficient “signal” in the satellite imagery.

The researchers’ overarching goal is to use information in satellite images to measure different aspects of human development. The premise of such an approach is that the original satellite imagery must contain useful information about the development indicator of interest. Absent of such a signal, no matter how sophisticated our computational model, the model is destined to fail.

The fact that wealth specifically can be measured from satellite imagery is quite intuitive. For instance, there are visual features one might expect correlate with wealth—large buildings, metals roofs, nicely paved roads, and so forth.

It may be the case that other measures of human development cannot be seen from above. For instance, it may be a fundamentally difficult task to infer the prevalence of malnutrition from satellite imagery, if the regions with high and low rates of malnutrition appear similar, even though they hypothesize that these indices should correlate with wealth index.

They were, however, surprised by the relative under-performance of models designed to predict access to drinking water, as they expected the satellite-based features to capture proximity to bodies of water, which in turn might affect access to drinking water.

(Over-) reliance on night-lights may not generalize.

Their reliance on night lights might help explain why some indicators were predicted less successfully in some countries than others. An example in their study includes Nepal, where the accuracy in predicting access to electricity was much lower (R2 = 0.24) than in the other countries (R2 = 0.69, 0.44, and 0.54 in Rwanda, Nigeria, and Haiti, respectively).

This may be partly due to the fact that Nepal has a very low population density (half as dense as Haiti and Rwanda) and very high levels of electrification (twice as high as Haiti, Rwanda, and Nigeria).

If the links between electrification, night-lights, and daytime imagery are broken in Nepal, they would expect their modeling approach to fail. More generally, they expect that when a development indicator does not clearly relate to the presence of nighttime lights, it may be unreasonable to expect good performance from the transfer learning process as a whole.

Deep learning vs. supervised feature engineering.

In this paper, the researchers focused explicitly on using the deep/transfer learning approach to extracting information from satellite images. While powerful, it is also possible that other approaches to feature engineering might be more successful than the brute force approach of the convolutional neural network.

For instance, Gros and Tiecke have recently shown how hand-labeled features from satellites, and specifically information about the types of buildings that are present in each image, can be quite effective in predicting population density. Labeling images in this manner is resource intensive, and they did not have the opportunity to test such approaches.

However, they believe that careful encoding of the relevant information from satellite imagery would likely bolster the performance of specific prediction tasks.

Neural Networks Can Still Process Satellite Imagery

Broadly, the researchers remain optimistic that future work using novel sources of data and new computational algorithms can engender significant advances in the measurement of human development.

However, it is imperative that such work proceeds carefully, with appropriate benchmarking and external calibration. Promising new tools for measurement have the potential to be implemented widely, possibly by individuals who do not have extensive expertise in the underlying algorithms.

Applied blindly, these algorithms have the potential to skew subsequent policy in unpredictable and undesirable ways. They view the results of this study as a cautionary example of how a promising algorithm should not be expected to work “off the shelf” in a context that is significantly different from the one in which it was originally developed.

The post Oops! Satellite Imagery Cannot Predict Human Development Indicators appeared first on ICTworks.