Of late, I have been working in the field of data science. I thought I would share with you all what I have been learning. I will start with the term Big Data.
What's Big Data?
During the last few years, you have been hearing everyone saying “Big Data”. It has become a buzz word in most of the discussions. You might be thinking what is this Big Data about.
- · Is it Huge data?
- · Is it Large Amounts of Data?
- · Is it meant for Big People?
- · What is it all about?
- · And, more importantly, how to identify the big data, and what is the use of it.
Today, I want to put the concept in clear and simple terms so that you can understand what it is and identify by yourself what the big data is in any context.
Let us define Big Data formally first –
Big Data is massive unstructured data that has variety, velocity, volume, and veracity, which can be used in decision making.
Massive Unstructured Data
Of late, lots and lots of Data is becoming available from various sources, in different formats.
Variety
When we say variety, we mean the different types of data sources that are becoming available – text, audio, video, click streams, log files and more.
Velocity
Data is frequently time-sensitive and must be used simultaneously with its stream into the enterprise to maximize its value.
Volume
Terabytes and even petabytes of information is becoming available. It calls for scalable storage and a distributed approach to querying.
Veracity
With information flowing in from so many different sources, at such a speed and in huge quantities that need to be used immediately, the important questions are - How much reliable the data is? How accurate the data is?
Data can be ambiguous, inconsistent, spam, that needs to be filtered out before taking major decisions.
Decision Making based on Data
The importance of data in decision-making is recognized worldwide. And, data is flowing in big volumes, with high velocity, from various sources of different types, bringing uncertainty alongside.
Data Scientists – The most sought after people in the corporate
The task is
- · to identify what data is relevant,
- · what are the sources available,
- · how to gather the data as it flows in,
- · how to clean it, and shape it,
- · how to model it to draw data insights,
- · how to report the findings
Each of these is a big topic on its own.
Now, I think you can appreciate the need for data scientists in the corporates and how interesting and promising the career opportunity is.