Data Science Leading the way for Business Intelligence
Fast-paced and highly competitive industries evolve in a flash, making it imperative to bring top-notch leaders to the table at all positions. Daniel Parton, lead data scientist here at Bardess is one of those leaders making a difference in the business success of our customers.
Data scientists are the leaders of a new movement to take the vast oceans of data collected by modern businesses and turning it into actionable business intelligence. If you’ve ever felt like you were drowning in data, it’s a data scientist who can throw you a life preserver.
Daniel Parton scries the mysterious depths to reveal hidden insights that become actionable intelligence. Data science creatively reveals information that was once buried in databases, enabling companies to make better, more informed decisions.
Inspired by Parton’s work and in true scientific fashion, we decided to document and share his ingenuity, passion and advice. If you are as fascinated by this work as we are, read this Q&A with Parton, on his past, present and future.
How did you get into data science?
Daniel Parton: Like many data scientists who moved from academia to industry, I was probably doing data science before I’d ever heard the term. The story dates back to when I started my PhD in computational biochemistry at the University of Oxford, where I was using supercomputers to simulate proteins involved in cancer, and to find drug molecules for effective treatment. Although data science wasn’t a commonly used term back then, it turned out that the skills I was learning in statistical analysis and programming would be central to the explosion of data science in industry.
After my Ph.D., I moved to the US and continued my research at the University of Chicago and Sloan Kettering in New York. I then switched track and decided to go into industry, joining the data science team at Omnicom – one of the world’s leading marketing holdings companies. This was obviously quite a different field, but it was really exciting to see the huge impact our models could make for our clients, using many of the same techniques I had been using in academia. Now I’m at Bardess and I’m hugely enjoying working with our clients to determine how data science can provide a real business advantage, while also continuing to be heavily involved in the technical side.
How have you seen data science evolve through the years and where do you think it’s headed next?
As a practicing data scientist, the most important changes for me have been the development of software libraries like scikit-learn and pandas. These provide such huge productivity gains in machine learning and data wrangling respectively, that it’s almost unimaginable doing data science without them. And the enabling factor for these developments was not machine learning algorithms, so much as good software design.
Looking back further in time, it’s interesting that many of the machine learning algorithms we use today have been around for decades. For example, the core concepts of deep learning date back to the 1960s. However, they weren’t really useful on a wide scale until computers became powerful enough and software libraries for efficient numerical computation became available.
It’s interesting to think about what could be holding back data science today, and what that means for the next evolution of data science. I’m sure that data science will become much more streamlined than its current state, which is still commonly perceived as expensive and time-consuming, with unpredictable results, but huge business value.
My guess is that revolution will be driven again through software, but this time focused on automation, interfaces, and ops. I can imagine many aspects of machine learning becoming as streamlined as spinning up an EC2 instance, and there are already early efforts starting to emerge along these lines. Of course, there will always be use cases requiring custom data science implementations, so data scientists shouldn’t be fearing for their jobs just yet.
What are the biggest challenges you see for companies that need to integrate data science into their organization?
A data science team brings a set of skills, personalities, and tools into an organization that may not have been present before. This is central to the value that data science adds to a company, but it also presents new challenges. For example, it is important to ensure that information is communicated smoothly between the data science team and the wider organization. Having people who can bridge the data science and business worlds is essential.
Another big challenge is integrating a data science team into existing business workflows, e.g. data infrastructure, IT provisioning, security, etc. The image of data scientists as hackers – writing custom algorithms on laptops in Python or Jupyter Notebooks is a romantic one, but also hints at a certain level of disorganization. I think this is one reason why enterprise software platforms for analytics are going to become the norm, as they can address this disorganization, while still providing the flexibility necessary to let the hacker-types to do their magic.
What are the most valuable improvements you feel that data science has brought to our customers?
The great thing about what we do is that we transform data into insights and action, every day.
Some of my favorite experiences at Bardess have been presenting the first iteration of a data science project to a client. For some clients, data science is an entirely new venture, so discussions often revolve around how transformative the results are for how the organization functions and views itself. This is something I find really exciting to be involved in. For example, we implemented a customer lifetime value model for a client which provided some surprising suggestions; some customers appeared much higher or lower in the rankings than expected, but interpretation of the model outputs showed detailed and rational reasoning for these predictions, which wouldn’t have easily occurred to the client’s employees previously. For another client, we used clustering algorithms to segment their transaction data, and these product segments have become central to how the company understands their interactions with customers.
Other clients may already be heavily driven by data analytics, and for those companies it is great to see how our data science solutions can automate and save on resources (thus creating time for employees to tackle more important tasks), drive efficiency, and ultimately increase revenue (proactive vs reactive). For example, a major tech client was spending hundreds of person-hours each year developing a schedule for their huge annual sales conference (around 20,000 attendees). We built a custom constraint optimization algorithm to schedule sessions, avoiding both overfilling and underfilling of rooms, ensuring an equal distribution of session categories across timeslots, and avoiding putting speakers in back-to-back sessions. Now we are starting to use a similar algorithm to optimize resource management for a huge analytics team at a pharmaceutical company. These are the types of projects that instantly add value for a client, and we love to hear the happy responses of teams that no longer waste their time on repetitive, menial work when they realize they can let the computers do the heavy lifting.
Sometimes even a failed data science project can be beneficial. I remember having initial discussions about a marketing spend optimization model for a major national insurance firm, which would allocate spend across a range of marketing channels based on ROI. We spent a couple days building a simple proof-of-concept model, which very quickly revealed some key deficiencies in the client’s data reporting infrastructure. These deficiencies turned out to be affecting other functions within the company and they became a top-level priority for the company to fix. Months later, after rectifying the problem, they came back to us and we returned to build out the model for them.
How do you maintain your status as a leader in the industry?
Our best source of knowledge about the industry is undoubtedly our interaction with clients and partners. A nice thing about Bardess is that we have clients in just about every vertical, so this is incredibly helpful in developing a 360 view of industry expectations and the state-of-the-art. Of course, there are many other things we do to keep abreast of latest developments. I attend many data science talks and meetups in the NYC area, and talking with other attendees is often just as interesting as the main event. There are also some great blogs written by leaders in the field such as Andrew Gelman; it’s a fascinating time for data science right now and there are many great discussions happening across different fora.
Based on your experience, what should every business be taking advantage of in the practice of data science?
First, take advantage of software tools which help integrate data science as a core function within your business. Technical solutions for collaboration, governance, security, and integration with data infrastructure are vital.
Second, take cues from businesses which have integrated data science successfully. These companies have solved key organizational and workflow challenges. For example, the responsible parties for a data science project need to be able to translate between technical data science jargon and business understanding. And data science projects should have clearly defined business goals – not just “do data science on X dataset”. Otherwise data scientists can become siloed outside of both your technical infrastructure and your company culture.
What are some of your favorite tools to use and why?
DataRobot is a really exciting tool – a highly automated platform for machine learning which helps to democratize data science and create “citizen data scientists.” It’s one of the first such examples I have seen on the market. It was particularly exciting to me, since I’d tried to write a similar piece of software myself before. My attempt was just a Python library with no visual interface, and probably broken in many ways, although in my defense, I was trying to do this on my own in my spare time… So, when I first saw DataRobot, I was just happy that somebody else had already done this.
Cloudera Data Science Workbench tackles a different problem in data science, that of operations. It has tons of useful features that help integrate a data science team as a successful business function, including collaborative workspaces, notebooks, containerized compute environments, data connectors, and job schedulers. Cloudera also has some other really nice tools, e.g. Hue is a really nice SQL editor for working with Hive and Impala.
I still spend a lot of time doing data science by coding machine learning projects from scratch, and some of my favorite tools are iPython, ipdb (a debugger for iPython), PyCharm (for any software project larger than a script) and trusty old vi…
What resources do you find most useful today?
There’s plenty of advice already out there on what algorithms and programming languages are useful to know as a data scientist but, some of the resources I find myself recommending Andrew Ng’s Coursera course on Machine Learning (great primer on theory), and Wes McKinney’s book Python for Data Analysis (great resource for learning to use Python for data prep). As for more advanced topics, I really enjoyed Bayesian Methods for Hackers, which helps demystify the dark art of Bayesian inference – a powerful statistical modeling technique to have in the arsenal.
Thank you for your time Daniel, this has been an illuminating look at a mystifying topic. Your way of expressing concepts made the elements of data science come to life.
Bardess is a technology consulting and solutions company focused exclusively on leading-edge data analytics. With scientists like Daniel Parton and our full team of solution experts and data architects, you can trust Bardess to bring you the valuable insights you need from business intelligence.
Sara Gizinski, is part of the Hero Enablement team at Bardess Ltd. She is passionate and committed to enabling data heroes with our clients.