One question we debate over in the office – quite a lot – is what resolution of data to include in GeoQuery. On one hand, we have global boundary files – theoretically, a user might request ADM4 in Uganda, and have units of observation on the scale of kilometers. On the other hand, the vast majority of our users as selecting coarser scales (ADM2 or so) of analysis, which would argue for including coarser underlying data as well.
This gets confusing, and words like “Ecological Fallacy” get thrown around quite a lot. Because we like to think of ourselves as responsible curators, we try to ensure that (a) the majority of use cases for a dataset will be reasonable, even if the user has little spatial knowledge, and (b) we always provide the source resolution to users in metadata.
In honor of thanksgiving, here’s my example of a dataset we definitely would NOT include in GeoQuery – even though it is fundamentally useful for some types of decision makers:
The resolution of the data – U.S. states – would mean that if a user asked for county-level data, they would get averaged state-level data without obvious red flags. This is “no good” in our book – we want to let users use data with confidence.
In some special cases (i.e., AidData) we provide additional metrics that help users scope with scalar mismatches, but we’re working to figure out a better way to do it. The less complicated, the better.
Short and sweet for this week; the blog will likely skip next week, so we’ll see you in December.