How to understand the spread of Zika

When Bardess was recently approached by a few of our partners to take a deeper look at some data related Zika virus we jumped at the opportunity.  This was for a number of reasons.  Firstly, at the time it was (and continues to be) a growing topic of interest, as a few travel cases had recently popped up in the US, and the Olympics were on their way.  This promised much more global travel to an area strongly affected by the virus.  Secondly, while some data was out there on current (and past cases) of the virus, little correlation had been done outside of academic circles to look at all the various datasets that impact the spread and impact of Zika.

We began with two premises, one being well known: that certain types of mosquitos (Aedes albopictus and Aedes aegypti) were mandatory for the virus to take hold and spread natively, and the second: that human travel from affected areas to areas with a prevalent population of suitable mosquitos would lead to spread of the disease beyond currently affected areas.

Over time, clearly, the CDC and others have pulled much of this data together (http://www.cdc.gov/zika/about/index.html) but, in a traditional fashion, with one key limitation.  The data, while being logically grouped, is not interactive, and requires some effort and mental manipulation by the reader to understand the relationships between key data points.  At Bardess we often see this same problem with traditional approaches to inherently data driven problems.  End users frequently have to wade from spreadsheet to spreadsheet, report to report, manually editing and combining data to gain the right insights to make decisions and gain understanding from the data.

Bardess thought that by taking the same approach we take with our customers’ data, we could gain rapid insight into the spread of the virus, and begin to see if patterns would emerge  in the data that could help us make sense of the available data.  Cloudera made the natural choice for storing the data, and there was a fair amount of it.  Even more than the volume of the data, was the fact that it came in many different forms.  News articles, tables from web pages, databases of flight routes, weather feeds.  All of these came into play.  Cloudera made it simple to first collect all this data, then to clean it up, structure it, and make it accessible.

Then we moved on to the problem of interaction.  Enabling direct interactions with data is somewhat of a holy grail across industries, and tools from the lowly yet reliable Excel, to the highly tailored and insightful Palantir abound. Some of the best and most widely used business tools center on surfacing insights from data, and enabling business users to bring disparate data points together and then act on the knowledge gained.  Qlik Sense turned out to be perfect for this.  In the space of a few days we had a working model not only for how we wanted to look at the data in the context of the virus itself, but also for how that data could apply to real world business use cases such as ensuring that adequate medical facilities and supplies for those facilities existed in areas of likely spread.

As we began to show our simple demo around it started to attract attention.  “Could we use this to help ensure that Zika kits are distributed to the areas that need it most?”  Of course.  How about understanding the historical outbreak in Brazil and other areas in South America?  Why Brazil and Columbia.  What about Puerto Rico.  Was Florida next? (it turns out that it was.)  These were all questions asked of the data, and the data spoke.  In some cases, the answer was to feed in more data.  The Qlik/Cloudera combination enabled these extensions of data with ease, and we continue to add data into the knowledge base to enable insights and answers for even more complex questions.

Most recently we started integrating insights from IBM’s Watson into the application, to see what data was being reported on before it hit the CDC counts, and to track public awareness and concern.  What topics are being discussed along side Zika?  What types of topics?  These are all questions you can ask yourself.  Just go check it out and see what a quick exploration into some publicly available data is like.  We have bookmarked a number of interesting conditions in the data, once  you log in just click the bookmark icon in the nav bar at the top right of the page.

http://demos.bardesscloud.com/sense/app/1d499651-1b17-4471-94c6-46580c1040a1/sheet/BGmSwnp

Mike Prorock
Director, Emerging Technologies, Bardess Group