The 5 V’s of Big Data
By Fred Senekal, Head of Research and Development at Learning Machines
Paying attention to the 5 Vs (Volume; Velocity; Variety; Veracity; and Value) of data is critical if a company is to derive strategic value from its big data. With these informing the decision-making process, an organisation can transform into a data-led environment.
Gathering, processing, and visualising data can aid in extracting value and financial benefits from it. But it is only when the 5 Vs are managed that true insights can be unlocked. Given the speed at which companies are embracing digitalisation, incorporation of big data and big data engineering principles into existing systems can be transformative.
Volume and Velocity
These are inextricably linked. Volume refers to the amount of data at the organisation’s disposal while velocity talks to the speed at which data is accumulated. Not only do organisations have to process batch data, but they also need to process data that is streaming into the business in real-time. Considering that 1 820TB of data is created, 11 million instant messages sent, and almost 700 000 Google searches are performed every 60 seconds, the amount of data being generated is exploding.
According to the Seagate Data Age 2025 Report, data will approach 175ZB by 2025, up from the more than 60ZB at the end of last year. Put it another way – in 2000, Google received 32.8 million searches a day. At the end of 2018, this grew to 5.6 billion. With more than 2.4 billion active monthly Facebook users, 1 billion Instagrammers, and 320 million Twitter users, keeping on top of this data has become virtually impossible.
To this end, data can be structured in one of three ways – structured; semi-structured; and unstructured.
Structured data is the traditional way data is organised and conforms to a formal structure. For instance, separate metadata is kept that describes the content of the data. This is typically stored in a relational database. An example of this is a bank statement that contains a date, time, amount, and descriptor.
For semi-structured data, the metadata is not as clearly separated from the data itself. It is often intertwined with it. An example of this is a Web page where the HTML markup is intertwined with the content.
The third instance centres on unstructured data. Approximately 80% of the world’s data is unstructured. To get value from this unstructured data, new methods of interpreting it are required. Examples include the likes of text files, emails, images, voicemails, social media posts, audio files. The list is virtually endless.
The fourth V deals with the assurance of quality, integrity, credibility, and accuracy of the data. So, does the information contradict other trusted resources or does it conform to the range of values it can take? Are there legal or regulatory constraints to the data or do individuals have the right level of access to the data?
These are all important questions that inform this pillar of data and is the critical cornerstone on which accurate decisions can be made. All people at an organisation must pay careful attention to aspects of data quality. Whether these are data capturers that need to make sure the data is correctly captured, programmers that need to put validation checks on the data in place, or management that makes available the tools and infrastructures to handle the data.
Of course, all the data in the world means nothing if no insights can be generated from it. Simply put, value refers to how useful the data is in the decision-making process. In many ways, value can only be unlocked through comprehensive data transformation.
In this regard, the process takes data to information, then knowledge and insight, and finally the decision-making itself. Business leaders must remember that data is only an asset insofar as it helps drive the decision-making capabilities of an organisation. This requires having advanced analytics and machine learning capabilities to be put in place.
The more consideration that is given to these 5Vs, the more successful organisations tend to be in their data management, data governance, and decision-making capabilities.