Revolution #9: Turkeys and Data Resolution

One question we debate over in the office – quite a lot – is what resolution of data to include in GeoQuery.  On one hand, we have global boundary files – theoretically, a user might request ADM4 in Uganda, and have units of observation on the scale of kilometers.  On the other hand, the vast majority of our users as selecting coarser scales (ADM2 or so) of analysis, which would argue for including coarser underlying data as well.

This gets confusing, and words like “Ecological Fallacy” get thrown around quite a lot.  Because we like to think of ourselves as responsible curators, we try to ensure that (a) the majority of use cases for a dataset will be reasonable, even if the user has little spatial knowledge, and (b) we always provide the source resolution to users in metadata.

In honor of thanksgiving, here’s my example of a dataset we definitely would NOT include in GeoQuery – even though it is fundamentally useful for some types of decision makers:

The resolution of the data – U.S. states – would mean that if a user asked for county-level data, they would get averaged state-level data without obvious red flags.  This is “no good” in our book – we want to let users use data with confidence.

In some special cases (i.e., AidData) we provide additional metrics that help users scope with scalar mismatches, but we’re working to figure out a better way to do it.  The less complicated, the better.

Short and sweet for this week; the blog will likely skip next week, so we’ll see you in December.

About the author: Daniel Runfola

Dan's research focuses on the use of quantitative modeling techniques to explain the conditions under which aid interventions succeed or fail using spatial data. He specializes in computational geography, machine learning, quasi-observational experimental analyses, human-int data collection methods, and high performance computing approaches. His research has been supported by the World Bank, USAID, Global Environmental Facility, and a number of private foundations and donors. He has published 34 books, articles, and technical reports in support of his research, and is the Project Lead of the GeoQuery project. At William and Mary, Dr. Runfola serves as the director of the Data Science Program, and advises students in the Applied Science Computational Geography Ph.D. program.