Data is being collected around us, and it is coming to us in incredible speed and in very different formats and sources, such as images and voice recordings. Every like, click, credit card swipe, or social media post is a new piece of data that can be use to better describe the present and predict the future.
The concept of Big Data can be describe in four different dimensions : Volume, Velocity, Variety and Veracity.
Volume refers to the size of the data. The sheer volume of the data requires distinct and different processing technologies than traditional storage and processing capabilities. In other words, this means that the data sets in Big Data are too large to process with a regular laptop or desktop processor. An example of a high-volume data set would be all credit card transactions on a day within Europe.
Velocity make reference to the speed at which this data is collected and available for processing. An example of a data that is generated with high velocity would be Twitter messages or Facebook posts.
The concept of Variety refers to the different sources and formats of data. Generally, data format is one out of three types: structured (raw data), semi structured and unstructured data (such as images and voice recordings).
Finally, Veracity refers to the quality and certainty of data. High veracity data has many records that are valuable to analyze and that contribute in a meaningful way to the overall results. Low veracity data, on the other hand, contains a high percentage of meaningless data.
We are collecting more data than ever before, and ways to link that data are also in great evolution. For instance think about an online purchase you made in the last month, and the email address you provided for receive the confirmation. This email address can be used to tie your social media and suddenly know your age, your interest, your likes, photos, etc. This additional data can be used to predict what other purchases you’re likely to make. This example illustrates why data is so valuable information for businesses, organizations and governments.
In the age of big data, artificial intelligence - AI, data science, and machine learning have become buzzwords that are often incorrectly used interchangeably. They are all related in some way but encompass a different range of methods and goals. That’s why I decided to start this series of post, where I will be explaining in general terms the difference between those concepts. See my next post to discover more about AI.