New Tools for Research

New Tools for Economic and Policy Research

The realm of possibility in economic and policy research is rapidly expanding. Creation of data at unprecedented scales, (relatively) easy access to powerful computing, and increasing interdisciplinary work has given rise to creative uses of tools that were previously less known in this field. Some of these tools are making it easy for us to gather new types of data, while others are helping us add depth to our analyses. Machine Learning (ML) and Geographic Information System (GIS) and Remote Sensing are among such tools that are particularly popular and trendy.

Machine Learning is the study and construction of algorithms that can learn from and make predictions on data. The simplest example of Machine Learning is a simple linear regression –  one of the most commonly used econometric tools used for inference. Using estimates obtained from a simple regression to predict values for the dependent variables (often referred to as y-hat in econometric texts) out-of-sample is an instance of machine learning. The “Machine” (your statistical software) “Learns” from the first regression, then applies that learning in a new sample.

ML is not a new field, but it has gained fresh momentum with recent application of complex calculations to big data with many observations and many variables. Media content recommendation in Netflix, facial recognition and self-driving cars all employ ML algorithms of varying complexity.

Geographic Information Systems (GIS) is a system designed to capture, store, manipulate, analyze and manage spatial or geographic data. Remote Sensing is the science of identifying, observing, collecting and measuring objects without coming into direct contact with them. This is a very general definition which could even be used for sight, smell and hearing; but in the context of GIS and analysis of spatial data it refers to the technique of using and analyzing satellite imagery. It can allow us to extract information on topography, temperature, slopes and altitudes among other things. Again, this field is not new, but it’s recently gained traction due to improvements in computation and availability of a vast amount of satellite images at relatively low cost.

Using Remote Sensing software, one can take a satellite image and create GIS compatible files which differentiate water bodies (oceans, seas, lakes, rivers etc.) from land by applying different spectral filters on the image. Similar filters can be applied to pull out roads, forested areas, and built-up areas from satellite images. PITB has recently done work along these lines to map crops in Pakistan.[1]

Both tools (3 if you want to count GIS and Remote Sensing separately) have incredible potential for economic and policy research. One may argue that ML may not be useful in this application because it does not give causal estimates. While the latter is true, the prior may not be – as some policy problems can be framed as prediction problems. Kleinberg et al (2015)[2] suggest applying ML algorithms to patient characteristics to predict the probability of surviving after a year of hip replacement surgery to help with the decision on whether to do the surgery or not. Similarly, ML algorithms can be used in court settings to predict how likely defendants are to flee if released on bail to help judges decide whether to post bail prior to trial or not.[3]

Another important example is of work done by a group of computer scientists and social scientists in which they combine Remote Sensing (satellite imagery) with machine learning to predict poverty.[4] The group use survey and satellite data from five African countries to train a machine learning algorithm to identify image features that explain a high variation in local economic conditions. This algorithm can then be used to predict local-level consumption expenditure and asset wealth (commonly used measures of poverty) in areas without survey data. This is an extremely powerful tool that can be used to target community development schemes in areas where survey data is scarce. One can think of various situations where similar applications may be useful for Pakistan as well.

What does all this mean for policy makers in Pakistan? Many possibilities and opportunities to use rich data for programing and problem solving, provided that the policy maker is creative and open to experimenting and testing. On a wider level, it also means that we need to seriously think about educating our public and private organizations about the importance of data, and on systems to record and maintain data. Policy makers across the world are inching towards a data driven approach to programming and evidence generation, and it’s time we start doing the same.

What does this mean for researchers and students of economics and policy? Acquiring new skills that can help them use these tools to enrich their work. Specifically, students aspiring to be development practitioners or empirical researchers should learn programming languages as part of their coursework to make them more efficient and effective when they enter their respective professional fields.


Sameem Siddique

Doctoral Candidate of Economics at University of California, San Diego