The saying “Work Smarter, Not Harder” does not just apply to physical fitness. Even in today’s rapidly progressing technology scape, there are still gaps in database automation that can only be closed with manual effort. If you’ve ever gone through the process of trying to select a database vendor that’s right for your company, the focus of the sell will be on how everything is automated. Automation is king, and yet, behind the scenes there is certain to be the human element. It may rear its ugly head during database audits, when strange anomalies in your data can only be explained by manual errors. True, when something is done to your data that requires manual / human intervention, this increases the chances of data corruption and decreases the quality of the data in your database. However, we do not give credit where credit is due. Human involvement in data quality is not only essential, it is critical. But there’s a time and a place for when manual effort is required, and the line is often blurred on where manual effort should be placed.
I recently attended a webinar on Big Data that discussed Dell’s success with using Statistica Analytics Platform to extract, structure, analyze and report data from Hadoop. While the tool itself sounds amazing for dealing with a small percentage of data that resides in Hadoop (because the size of the data in Statistica is limited to the size your computer can handle), the decisions behind how the data will be organized and interpreted in Statistica must be done manually. This includes interpretation on the pre-processed data to ensure it is “clean” enough and analysis of the data health based on certain checkpoints (such as redundancy, invariance, outliers, etc). The tool does seem user-friendly and provides a large amount of options on the handling of data, and offers a text mining section to structure data is typically quite difficult to organize such as social media text data. Yet the options are only as good as how they will best be used for the type and quality of data being used. Further, the tool will only use a percentage of the Big Data, so ensuring it is representative of the larger dataset based on human involvement is essential for the best truth to be told. Manual efforts are required to guard against misuse of data, especially when powerful tools are telling powerful stories that will make or break business decisions.
“Are there gaps in the automated solution that can only be resolved manually?” Anything done manually means more hours of work that an automated version may be able to do faster. So you want to make sure that if something has to be done manually that it be something that can be done significantly better manually than via automation and that it does pay off.
The first step is to understand how to identify gaps in the automated solution where manual effort is not only needed, but required. Setting up an automated solution requires human involvement:
- Defining the data and how it is captured
- How to integrate data into a centralized database
- Setting up business rules
- Understanding where data issues reside and why
- Preventing data issues moving forward
When incoming data from several sources enter your centralized database, you have already developed file transfer agreements, defined data dictionaries and understand business rules. Yet, this is never enough to ensure bad data does not enter your pristine database. Never trust that the incoming data is actually what it says it is. Rather, there must always be a process in place that fully examines what the data “is” before it enters the database, and this process must include human involvement.
- Monitoring the quality of incoming data over time: Erroneous data, unexpected values, values that are in the wrong field are all ways a database can get corrupt.
- Interpretation of what is in the database: Using elegant business reporting tools will not tell the full story without understanding the underlying data, its strengths and weaknesses. The best way to identify strengths and weaknesses in the data is to test it over time. Constant involvement in checking the data means getting rid of the weeds by assessing the data values and their respective fields, and understanding how the data will be used to address business needs, including actionable insights.
The best time to use manual intervention is when you are establishing standards in data quality, monitoring those standards across vendors and their associated incoming data, monitoring the quality of the centralized database and how the data is used, and constantly keeping checks on the quality and data function over time. Some areas will always be problematic to an automated solution without human intervention:
- Flagged ‘Alerts’: While there can be automated flags to alert you when data is questionable, the resulting data may need to be reviewed manually to determine its handling.
- Unstructured Data: Perhaps the data being captured is unstructured data, or data that allows for values that do not fall within a listed of expected values. Unstructured data may be meaningless if it cannot somehow be translated, and even if you have an automated solution that will translate unstructured data, the resulting outcome may still not yield actionable insights based on the context of the unstructured data.
- Beyond Business Rules: Business rules within an automated solution can only go so far, and translating complex data may not cover the breadth and scope of what the data is actually suggesting.
Instituting standards in data quality includes human involvement. Data quality is dependent on how the data is captured, defined and stored. The automation process has the ability to flag erroneous data yet the definition of what is considered erroneous comes from human involvement which includes knowledge on the weaknesses of incoming data. Yet, humans can only absorb so much information and we have too much information being collected and at an exceedingly fast rate. The good news is that we have automated solutions, powerful apps and software to translate data, and human involvement. The bad news is we do not always know how combine these pillars in a way that will yield the best results. In the world of database automation, working smarter, not harder means allowing the human element to shine where it is most needed.