BIG DATA PROBLEM



What is Big Data?

Ever wondered how much data big tech giants like facebook and twitter generate?

According to a recent report Facebook generates about 4 petabytes(1000000 gigabytes) of data per day — that’s a million gigabytes. The total amount of data adds up to about 120 petabytes in a month and more than an Exabyte(1000 petabytes) in an year.


This huge chunk of data can be collectively referred to as Big Data.

“Big data is a term that describes the large volume of data — both structured and unstructured — that inundates a business on a day-to-day basis.”


Characteristics of Big Data

Some of the characteristics of these data sets include the following:



1. Volume

The quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not.


2. Variety

The type and nature of the data. The earlier technologies like RDBMSs were capable to handle structured data efficiently and effectively. However, the change in type and nature from structured to semi-structured or unstructured(variety) challenged the existing tools and technologies. The Big Data technologies evolved with the prime intention to capture, store, and process the semi-structured and unstructured data generated with high speed(velocity), and huge in size (volume).


3. Velocity

The speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in real-time. Compared to small data, big data are produced more continually. Two kinds of velocity related to big data are the frequency of generation and the frequency of handling, recording, and publishing.


4. Veracity

It is the extended definition for big data, which refers to the data quality and the data value. The data quality of captured data can vary greatly, affecting the accurate analysis.

The Problem

Volu