• February 15, 2017

Brain Trust: Do You Have Faith in Your Data?

With data streaming in from all directions, security and reliability are paramount. 

By Marc Wilkinson, Chief Technologist and Mobility Global Practice Lead, Enterprise Services

Do you trust your data? It’s a critical question. As enterprises harness data streams from connected devices to drive decision-making and automate processes, data accuracy can mean the difference between value and disaster. Before enlisting data to make decisions and generate insights, ask a couple of questions: Do I trust the source? Can I write my decision tree in such a way that I can smooth out manipulation or glitches in data transport? 

Even simple data errors can have dramatic results. Imagine the outcome if the data returned from an aircraft altitude sensor reported measurements that were 40 feet too high (could be due to a hardware fault, a network error, or manipulation). Relying on such data to automate flight controls might lead to an extremely rough and bumpy landing—or worse.

A self-driving car has a vast wealth of information at its disposal. But how does it perform in situations where the data or scenarios contradict each other? How do we factor in the moral and ethical issues that arise from automated decision systems attempting to negotiate a lose-lose scenario? Too much trust in unverified data can generate its own set of problems. Blithely acting on or automating decisions based on such data could lead to foolish things happening fast.

Know Your Confidence Levels

Often the problem with data accuracy is there is no indication or measure of confidence. In mechanical engineering, you speak of tolerances of 1/32 of an inch for example. In car engines, spark plug gaps are set with feeler gauges to tolerances measured in thousandths of an inch. If the gap is off by a couple of thousandths, the engine will not run properly.

When writing applications or designing analytics processes, you need to take into account how accurate things need to be. Do I need temperature tolerances to a tenth of a degree, or is a single degree okay? Does my wearable fitness tracker need to be accurate to plus or minus one step, or is 100 steps okay?

For example, you may wear a fitness monitor—a simple step counter. And like many technology professionals, you may also have a smartphone and another device that also counts steps. Do they ever agree? Or do they vary within 5, 10, or even 20 percent? What if you were making a critical business decision based on this data? You now have three different results for the same metric. Which is right? How do you reconcile this difference?

It’s crucial that you determine acceptable variances from baseline or from expectation before writing your applications or designing your analytics processes. What is it you are trying to do? What trust levels are acceptable? Put boundaries around the data. If the data is wrong and you put too much trust in it, you will experience bad—even disastrous—results.

Adjust Your Values

Ultimately, data—not the database management system or the application—needs to be treated as an asset in its own right. Data needs accuracy standards and a set of security and audit controls around it. One option is to determine where data first becomes “active” and unencrypt it only if security protocols are valid. This should be assessed at the time of use, not the time of connection. The downside is that adding more layers of security has implications on management and performance, and frequently has negative impacts on usability.

We might see incidents where data is received and processed in the wrong order. This could result from a performance issue or an attempt at intrusion. This can be disastrous. Imagine the impact on world banking if stock trades were executed out of order. The system simply wouldn’t function.

Beware of Manipulation

Data manipulation is even sneakier, and potentially far more disruptive. You may not even be aware there is a problem with your data, depending on where the intrusion occurs. Yet with data manipulation, every enterprise (and personal) decision you make is influenced by someone else. This video succinctly explains this concept. 

The system as a whole—including the application, analytics, and visualization—needs to take into account the limits of data accuracy due to measurement process flaws and potential manipulation. When you factor in that sensor or device accuracy can degrade with temperature or time, data reconciliation becomes a massive challenge. The resulting data incongruence can cause huge errors and mistakes.

Bake Accuracy In

In a world that is more and more just-in-time and more automated, these errors can have a dramatic impact. That’s why we need to be very careful about trusting the data implicitly (a bit like fake news and trusting everything you read on the Internet) and fully automating decisions in circumstances where we cannot guarantee the accuracy of data sources.

If we are to ever consider IoT-based systems “secure” and the data they generate accurate, security has to be much more actively baked into the integration and communications framework. Data reconciliation and active data validation processes have to be applied across the system. When writing applications and designing systems, confidence ranges and accuracy requirements must be determined and factored into decision-making processes.  Bottom line: We need to secure the interpretation of the insight. 

Read more from Marc Wilkinson about the transformative power of the IoT frontier.