The Three V's of Big Data - How EdgeSet is transforming data processing

Chris Forno5 min. read | December 10, 2024

This visual guide compares EdgeSet to traditional data processing systems like spreadsheets, databases, data warehouses, and data lakes in the context of the three V's of Big Data:

Variety - the variety of data types, Velocity - the speed at which data is processed, and Volume - the amount of data stored and analyzed.

Variety: This encompasses diverse data types: structured databases, unstructured text and media, semi-structured formats like JSON from multiple sources and platforms.
Velocity: The speed at which data is generated, collected, and processed. In today's digital world, data is being created and updated in real-time, requiring systems that can capture, analyze, and respond to information quickly.
Volume: The massive amount of data generated every second from various sources. To give some context, the volume of global data exploded exponentially, growing from 2 exabytes in 2010 to 149 zettabytes in 2024 due to internet, mobile, social media, IoT, and cloud computing technologies.

Supports Keyboard Navigation

Spreadsheets

Traditional spreadsheets handle small volumes of structured data with basic data types and manual updates.

SINGLE SOURCE

VARIETY

Data is generally input directly by users

MANUAL

VELOCITY

Data is updated manually

MEGABYTES

VOLUME

Data must fit in memory (RAM)

MINUTES

SETUP TIME

Portable format, easily used and shared

Databases

Databases introduce better data management with increased velocity through real-time transactions.

SINGLE SOURCE

VARIETY

Needs strict input schema

STREAMING

ON DEMAND

MANUAL

BATCH

VELOCITY

Data transactions measured in milliseconds

GIGABYTES

VOLUME

Indexes should fit in memory

HOURS

SETUP TIME

Works directly with operational data

Data Warehouses

Data warehouse is a centralized repository for structured data. It uses an ETL process to clean and organize data, supporting business intelligence and reporting tasks.

MULTI SOURCE

VARIETY

Only supports structured sources

MANUAL

BATCH

VELOCITY

Updated in batches, usually nightly

TERABYTES

VOLUME

Must fit on disk(s)

MONTHS

SETUP TIME

Involves extensive ETL processes

Data Lakes

Data lake is a storage system that holds massive amounts of raw data in its native format, supporting flexible analytics and querying without pre-defined schemas.

MIXED SOURCE

VARIETY

Structured and semi-structured in homogeneous storage

ON DEMAND^*

MANUAL

BATCH

VELOCITY

Updated in micro-batches

^*partial support for on-demand data processing

EXABYTES

VOLUME

Spans across many machines

MONTHS

SETUP TIME

Slower integration and governance processes

EdgeSet

EdgeSet is a data integration platform that reduces ETL/ELT processes and enables real-time analytics across diverse, large-scale data sources without moving the data.

MIXED SOURCE

MIXED FORMAT

VARIETY

Supports sources in different native formats

ON DEMAND

MANUAL

BATCH

VELOCITY

Queries are always up-to-date

PETABYTES

VOLUME

Data is joined on a single machine

HOURS

SETUP TIME

Built on distributed query engine