The “Big Data Tsunami” was the theme of my last post. Today I want to share with you another angle on how to look at Big Data. This angle reflects more the way physicists look at dark matter in the universe: It is there, we can calculate the mass but it eludes our possibilities of observation.
Dark Data is the untapped mass of under- (or un-) utilized data whose existence is widely unknown or unrecognized in business of all trades. But these dark data might contain valuable information if we were only able to tap it.
The problem lies in the fact that most of this Dark Data is present in unstructured (free text descriptions, free text observations, notes, etc.) or non-textual formats (pictures, videos, audio files and more).
Infosys has recently announced their BigDataEdge platform that will radically simplify the task of analyzing Big Data. They published an infographic that nicely explains the issue (see below). Many companies are developing similar systems that will allow businesses to gain valuable information from all these data, which are today hidden in the closet.
The challenge is in how to formalize unstructured data. The Infosys approach includes (quote from the announcement):
- A rich visual interface, with more than 50 customizable dashboards and 250 built-in algorithms. These algorithms, a set of reusable business rules both function and industry-specific, enable business teams to self-serve the process of building insights while minimizing the need for technical intervention
- Over 50 data source connectors, which allow easy access to structured and unstructured data residing across enterprise and external sources. This would enable acceleration of discovery of relevant information from existing, underutilized data
- A powerful collaboration wall and pre-built workflows that allow teams across functions to interact on insights and collectively implement decisions
- A Logical Data Warehouse providing a virtual data management architecture, eliminates the need for physical availability of data to build and test insights
- ‘Out-of-the-box’ applications for specific industry needs such as fraud detection and prevention, predictive analytics and monitoring, and customer micro-segmentation that deliver faster returns on investment
But how do we make all these data accessible? One approach is to have all the data in a virtual data center, a. k. .a Cloud. This way system to system interfaces are not coming into the way of data aggregation.
Just marvel with me what this could mean for research in mining existing research data, observations, notes in lab journals from all the experiments that were filed away since the results had not corroborated the original hypothesis. Would you agree that undetected gold nuggets are still buried in the mud of unstructured information? Imagine that all this information could be tested against new hypotheses, could be checked for weak correlations and connections undetected before.
We are only at the beginning of a new development here. More interesting inventions and innovations lie ahead of us.
Where do you see interesting new developments coming in this space?