What’s Big Data?
What’s Big Data?

What’s Big Data?

Tags
Big Data
The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs.
  • Volume
    • The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as Twitter data feeds, clickstreams on a web page or a mobile app, or sensor-enabled equipment. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.
  • Velocity
    • Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.
  • Variety
    • Variety refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big data, data comes in new unstructured data types. Unstructured and semistructured data types, such as text, audio, and video, require additional preprocessing to derive meaning and support metadata.

🙌🏻My Opinion

Big Data refers to extremely large and complex datasets that cannot be easily managed, processed, or analyzed with traditional data processing tools.
Volume: Based on different business needs, industry, user quantities, the volume level of data could be TB, PB, EB even much higher.
Velocity: In some specific time or business, data may be generated fast. The data pipeline need to receive those data effectively, computing them properly and outputting results to data applications.
  • For example, in China when it is November 11th, a million of users would access the Alibaba Taobao website, to buy, to browse, to make item into cart or remove them… These actions will generate huge data logs, which could be taken advantage of for building ‘Home Page Recommendation System’ or any other applications, which makes a great difference to a company.
Variety: The data could be Structured Data, Semi-Structured Data, Unstructured Data.
  • Structured Data: Relational Data from some OLTP databases, like Oracle, SQL server, MySQL, like data tables.
  • Semi-Structured Data: Data Logs, XML, JSON…
  • Unstructured Data: Music, Video, Audio…
It involves the use of advanced technologies and techniques to handle massive volumes of data from various sources, often in real-time.