Back to posts
The Three V's of Big Data - How EdgeSet is transforming data processing
Chris Forno5 min. read | December 10, 2024
This visual guide compares EdgeSet to traditional data processing systems like spreadsheets, databases, data warehouses, and data lakes in the context of the three V's of Big Data:
Variety - the variety of data types, Velocity - the speed at which data is processed, and Volume - the amount of data stored and analyzed.
Spreadsheets
Traditional spreadsheets handle small volumes of structured data with basic data types and manual updates.
SINGLE SOURCE
VARIETY
Data is generally input directly by users
MANUAL
VELOCITY
Data is updated manually
MEGABYTES
VOLUME
Data must fit in memory (RAM)
MINUTES
SETUP TIME
Portable format, easily used and shared
Databases
Databases introduce better data management with increased velocity through real-time transactions.
SINGLE SOURCE
VARIETY
Needs strict input schema
STREAMING
ON DEMAND
MANUAL
BATCH
VELOCITY
Data transactions measured in milliseconds
GIGABYTES
VOLUME
Indexes should fit in memory
HOURS
SETUP TIME
Works directly with operational data
Data Warehouses
Data warehouse is a centralized repository for structured data. It uses an ETL process to clean and organize data, supporting business intelligence and reporting tasks.
MULTI SOURCE
VARIETY
Only supports structured sources
MANUAL
BATCH
VELOCITY
Updated in batches, usually nightly
TERABYTES
VOLUME
Must fit on disk(s)
MONTHS
SETUP TIME
Involves extensive ETL processes
Data Lakes
Data lake is a storage system that holds massive amounts of raw data in its native format, supporting flexible analytics and querying without pre-defined schemas.
MIXED SOURCE
VARIETY
Structured and semi-structured in homogeneous storage
ON DEMAND*
MANUAL
BATCH
VELOCITY
Updated in micro-batches
*partial support for on-demand data processing
EXABYTES
VOLUME
Spans across many machines
MONTHS
SETUP TIME
Slower integration and governance processes
EdgeSet
EdgeSet is a data integration platform that reduces ETL/ELT processes and enables real-time analytics across diverse, large-scale data sources without moving the data.
MIXED SOURCE
MIXED FORMAT
VARIETY
Supports sources in different native formats
ON DEMAND
MANUAL
BATCH
VELOCITY
Queries are always up-to-date
PETABYTES
VOLUME
Data is joined on a single machine
HOURS
SETUP TIME
Built on distributed query engine