Revolution #13: “The Cloud”

We’ve had a lot of buzz around the office this week about the cloud, generally fostered by our discovery of paperspace (a cool startup that makes rolling your own computers in the cloud much easier).  We’ve dabbled with AWS, Google Compute, and a few other offerings before, but we’re constantly asked (at conferences and internally) the same question: why not use the “Cloud” for GeoQuery computational needs?

What we do use

For some background, it’s helpful to know what we do use for GeoQuery.  Currently, all processing routines are run on a computer cluster located about 10 steps from my office.  It’s no small thing – the server room takes up a huge chunk of real estate (around 3 large classrooms), and has thousands of dedicated cores hosted within it.  You can measure how much work is happening based on the decibels of noise when you stand outside.

This cluster is called “SciClone”, and has been put together over about 2 decades through NSF and other state and federal grants.  We have a full-time staff here that takes care of the cluster, and helps us troubleshoot a wide range of issues that emerge on GeoQuery.

SciClone is, of course, not free – we operate in a Queueing environment in which we share resources with other researchers here on campus.  Physicists love to burn cycles on our CPUs, and so sometimes we have to wait in line.  We have some dedicated resources (about 120 cores) that we hold priority on to mitigate this potential, but operating in a research environment comes with plenty of drawbacks.

Why we don’t use “The Cloud”

By all definitions, SciClone is “the cloud”, so this is a bit of an odd statement, but the question is really “why not use a scaleable solution like AWS, Azure, Google Computer, etc.”?  So, in bullet point fashion, here we go:

(1) It’s cheaper to use SciClone.  Because we share resources with Physics researchers (and many others), we get to piggy-back on their cores when they aren’t using them.  This means that we don’t have to pay for the hardware (excepting the cores we purchase for priority access).

(2) SciClone stays online even if we stop getting grants.  If – god forbid – we run out of money, because we get to piggy-back on physics and other research groups it wouldn’t mean the end of GeoQuery (processing would just take longer).  If we went the “cloud” route, we would be unable to pay for storage and processing, so we would risk going offline.

(3) We like to experiment.  Sometimes we get a crazy idea that could use thousands of CPU hours, which would cost us much more to execute in the cloud than it does locally.  Sneaking in on idle cores, using our own priority cluster, and other options gives us a lot of room to mess something up – and get it right next time – without having to worry about a huge bill.

We’re still eyeing the cloud for parts of our infrastructure – i.e., moving only processing redundancies to the cloud to expand our capacity – but we haven’t quite nailed down how the process would work yet.  More soon!

About the author: Daniel Runfola

Dan's research focuses on the use of quantitative modeling techniques to explain the conditions under which aid interventions succeed or fail using spatial data. He specializes in computational geography, machine learning, quasi-observational experimental analyses, human-int data collection methods, and high performance computing approaches. His research has been supported by the World Bank, USAID, Global Environmental Facility, and a number of private foundations and donors. He has published 34 books, articles, and technical reports in support of his research, and is the Project Lead of the GeoQuery project. At William and Mary, Dr. Runfola serves as the director of the Data Science Program, and advises students in the Applied Science Computational Geography Ph.D. program.