HeliosDB-Lite Storage Modes Tutorial¶
A hands-on guide to per-column storage optimization with real-world applications.
Prerequisites: HeliosDB-Lite REPL running (./heliosdb-lite repl -m for in-memory mode)
Table of Contents¶
- Understanding Storage Modes
- Dictionary Encoding: E-Commerce Orders
- Content-Addressed Storage: Document Deduplication
- Columnar Storage: Time-Series Analytics
- Mixed Modes: Event Tracking System
- Migration Strategies
See also: Advanced Features Tutorial for EXPLAIN, Vector Search, and Materialized Views.
1. Understanding Storage Modes¶
What Are Storage Modes?¶
HeliosDB-Lite supports four per-column storage strategies:
| Mode | Best For | Compression | Use Case |
|---|---|---|---|
DEFAULT |
General data | RocksDB LZ4 (3-7x) | OLTP, mixed workloads |
DICTIONARY |
Low-cardinality strings | 50-95% | Status codes, categories |
CONTENT_ADDRESSED |
Large duplicate values | ~100% dedup | Templates, JSON payloads |
COLUMNAR |
Numeric sequences | 20-50% + faster aggregates | Time-series, analytics |
Your First Storage Mode¶
CREATE TABLE demo (id INT PRIMARY KEY, category TEXT, value FLOAT8);
INSERT INTO demo VALUES (1, 'electronics', 99.99);
INSERT INTO demo VALUES (2, 'electronics', 149.99);
INSERT INTO demo VALUES (3, 'clothing', 29.99);
INSERT INTO demo VALUES (4, 'electronics', 199.99);
INSERT INTO demo VALUES (5, 'clothing', 49.99);
SELECT * FROM demo ORDER BY id;
Output:
┌────┬─────────────┬────────┐
│ id │ category │ value │
├────┼─────────────┼────────┤
│ 1 │ electronics │ 99.99 │
│ 2 │ electronics │ 149.99 │
│ 3 │ clothing │ 29.99 │
│ 4 │ electronics │ 199.99 │
│ 5 │ clothing │ 49.99 │
└────┴─────────────┴────────┘
Now migrate to dictionary encoding:
Data is preserved - queries work identically:
INSERT INTO demo VALUES (6, 'electronics', 79.99);
INSERT INTO demo VALUES (7, 'home', 299.99);
SELECT category, COUNT(*), AVG(value) FROM demo GROUP BY category;
Output:
┌─────────────┬───────┬────────┐
│ group_0 │ agg_0 │ agg_1 │
├─────────────┼───────┼────────┤
│ clothing │ 2 │ 39.99 │
│ electronics │ 4 │ 132.49 │
│ home │ 1 │ 299.99 │
└─────────────┴───────┴────────┘
2. Dictionary Encoding: E-Commerce Orders¶
Scenario¶
E-commerce platform with order status and shipping country having limited unique values but appearing millions of times.
Create the Schema¶
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
status TEXT STORAGE DICTIONARY,
shipping_country TEXT STORAGE DICTIONARY,
payment_method TEXT STORAGE DICTIONARY,
order_total FLOAT8,
created_at INT8
);
Populate Data¶
INSERT INTO orders VALUES (1001, 501, 'pending', 'USA', 'credit_card', 125.50, 1705320000);
INSERT INTO orders VALUES (1002, 502, 'pending', 'USA', 'paypal', 89.99, 1705320100);
INSERT INTO orders VALUES (1003, 503, 'shipped', 'Canada', 'credit_card', 245.00, 1705320200);
INSERT INTO orders VALUES (1004, 501, 'delivered', 'USA', 'credit_card', 67.25, 1705320300);
INSERT INTO orders VALUES (1005, 504, 'pending', 'UK', 'paypal', 189.99, 1705320400);
INSERT INTO orders VALUES (1006, 505, 'shipped', 'Germany', 'credit_card', 312.50, 1705320500);
INSERT INTO orders VALUES (1007, 502, 'delivered', 'USA', 'apple_pay', 45.00, 1705320600);
INSERT INTO orders VALUES (1008, 506, 'cancelled', 'France', 'credit_card', 78.99, 1705320700);
INSERT INTO orders VALUES (1009, 503, 'pending', 'Canada', 'paypal', 156.75, 1705320800);
INSERT INTO orders VALUES (1010, 507, 'shipped', 'USA', 'credit_card', 423.00, 1705320900);
Analyze Order Patterns¶
SELECT status, COUNT(*) as order_count, SUM(order_total) as total_revenue
FROM orders
GROUP BY status
ORDER BY order_count DESC;
Output:
┌───────────┬───────┬────────┐
│ group_0 │ agg_0 │ agg_1 │
├───────────┼───────┼────────┤
│ pending │ 4 │ 562.23 │
│ shipped │ 3 │ 980.5 │
│ delivered │ 2 │ 112.25 │
│ cancelled │ 1 │ 78.99 │
└───────────┴───────┴────────┘
Customer Analysis with Joins¶
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name TEXT,
email TEXT,
tier TEXT STORAGE DICTIONARY
);
INSERT INTO customers VALUES (501, 'Alice Johnson', 'alice@email.com', 'gold');
INSERT INTO customers VALUES (502, 'Bob Smith', 'bob@email.com', 'silver');
INSERT INTO customers VALUES (503, 'Carol White', 'carol@email.com', 'gold');
INSERT INTO customers VALUES (504, 'David Brown', 'david@email.com', 'bronze');
INSERT INTO customers VALUES (505, 'Eva Martinez', 'eva@email.com', 'silver');
SELECT c.name, c.tier, COUNT(*) as total_orders, SUM(o.order_total) as lifetime_value
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.tier
ORDER BY lifetime_value DESC;
Output:
┌───────────────┬────────┬───────┬────────┐
│ group_1 │ group_2│ agg_0 │ agg_1 │
├───────────────┼────────┼───────┼────────┤
│ Carol White │ gold │ 2 │ 401.75 │
│ Eva Martinez │ silver │ 1 │ 312.5 │
│ Alice Johnson │ gold │ 2 │ 192.75 │
│ David Brown │ bronze │ 1 │ 189.99 │
│ Bob Smith │ silver │ 2 │ 134.99 │
└───────────────┴────────┴───────┴────────┘
Filter by Dictionary-Encoded Columns¶
SELECT o.order_id, c.name, o.status, o.order_total
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'pending' AND o.shipping_country = 'USA'
ORDER BY o.order_total DESC;
3. Content-Addressed Storage: Document Deduplication¶
Scenario¶
Notification system where many users receive identical email templates. Store once, reference everywhere.
Create Tables¶
CREATE TABLE email_templates (
template_id INT PRIMARY KEY,
template_name TEXT,
html_content TEXT STORAGE CONTENT_ADDRESSED,
category TEXT STORAGE DICTIONARY
);
CREATE TABLE notifications (
notification_id INT PRIMARY KEY,
user_id INT,
template_id INT,
sent_at INT8,
status TEXT STORAGE DICTIONARY
);
Insert Templates¶
INSERT INTO email_templates VALUES (
1,
'Welcome Email',
'<html><head><style>body{font-family:Arial,sans-serif;margin:0;padding:20px;background:#f5f5f5}.container{max-width:600px;margin:0 auto;background:white;border-radius:8px;padding:40px}.header{text-align:center;border-bottom:2px solid #007bff;padding-bottom:20px}.title{color:#333;font-size:28px;margin:20px 0}.content{color:#666;line-height:1.6;padding:20px 0}.button{display:inline-block;background:#007bff;color:white;padding:12px 30px;text-decoration:none;border-radius:5px;margin:20px 0}.footer{text-align:center;color:#999;font-size:12px;padding-top:20px;border-top:1px solid #eee}</style></head><body><div class="container"><div class="header"><h1 class="title">Welcome!</h1></div></div></body></html>',
'onboarding'
);
INSERT INTO email_templates VALUES (
2,
'Order Confirmation',
'<html><body>Your order has been confirmed!</body></html>',
'transactional'
);
INSERT INTO email_templates VALUES (
3,
'Password Reset',
'<html><body>Click here to reset your password.</body></html>',
'security'
);
SELECT template_id, template_name, category FROM email_templates ORDER BY template_id;
Log Notifications¶
INSERT INTO notifications VALUES (1, 1001, 1, 1705400000, 'delivered');
INSERT INTO notifications VALUES (2, 1002, 1, 1705400100, 'delivered');
INSERT INTO notifications VALUES (3, 1003, 1, 1705400200, 'delivered');
INSERT INTO notifications VALUES (4, 1004, 2, 1705400300, 'delivered');
INSERT INTO notifications VALUES (5, 1005, 2, 1705400400, 'delivered');
INSERT INTO notifications VALUES (6, 1001, 2, 1705400500, 'delivered');
INSERT INTO notifications VALUES (7, 1006, 3, 1705400600, 'delivered');
INSERT INTO notifications VALUES (8, 1007, 1, 1705400700, 'bounced');
Analyze Usage¶
SELECT t.template_name, t.category, COUNT(*) as times_sent
FROM email_templates t
INNER JOIN notifications n ON t.template_id = n.template_id
GROUP BY t.template_id, t.template_name, t.category
ORDER BY times_sent DESC;
Output:
┌────────────────────┬───────────────┬───────┐
│ group_1 │ group_2 │ agg_0 │
├────────────────────┼───────────────┼───────┤
│ Welcome Email │ onboarding │ 4 │
│ Order Confirmation │ transactional │ 3 │
│ Password Reset │ security │ 1 │
└────────────────────┴───────────────┴───────┘
Storage benefit: With CAS, 100K notifications using 3 templates stores template content only 3 times instead of 100K times.
4. Columnar Storage: Time-Series Analytics¶
Scenario¶
IoT monitoring system with sensors reporting metrics every second. Aggregations over time ranges are critical.
Create Schema¶
CREATE TABLE sensor_readings (
reading_id INT PRIMARY KEY,
sensor_id INT,
sensor_type TEXT STORAGE DICTIONARY,
location TEXT STORAGE DICTIONARY,
timestamp INT8 STORAGE COLUMNAR,
temperature FLOAT8 STORAGE COLUMNAR,
humidity FLOAT8 STORAGE COLUMNAR,
pressure FLOAT8 STORAGE COLUMNAR
);
Insert Sensor Data¶
INSERT INTO sensor_readings VALUES (1, 101, 'environmental', 'building_a_floor1', 1705500000, 22.5, 45.2, 1013.25);
INSERT INTO sensor_readings VALUES (2, 101, 'environmental', 'building_a_floor1', 1705500060, 22.6, 45.0, 1013.20);
INSERT INTO sensor_readings VALUES (3, 101, 'environmental', 'building_a_floor1', 1705500120, 22.7, 44.8, 1013.15);
INSERT INTO sensor_readings VALUES (4, 102, 'environmental', 'building_a_floor2', 1705500000, 21.0, 48.5, 1013.30);
INSERT INTO sensor_readings VALUES (5, 102, 'environmental', 'building_a_floor2', 1705500060, 21.2, 48.2, 1013.25);
INSERT INTO sensor_readings VALUES (6, 201, 'industrial', 'building_b_warehouse', 1705500000, 18.0, 35.0, 1012.00);
INSERT INTO sensor_readings VALUES (7, 201, 'industrial', 'building_b_warehouse', 1705500060, 18.2, 35.5, 1012.05);
INSERT INTO sensor_readings VALUES (8, 301, 'outdoor', 'campus_entrance', 1705500000, 5.2, 78.0, 1015.50);
INSERT INTO sensor_readings VALUES (9, 301, 'outdoor', 'campus_entrance', 1705500060, 5.0, 79.0, 1015.55);
Aggregations (Columnar Advantage)¶
SELECT sensor_type, COUNT(*) as readings, AVG(temperature) as avg_temp, AVG(humidity) as avg_humidity
FROM sensor_readings
GROUP BY sensor_type
ORDER BY avg_temp DESC;
Output:
┌───────────────┬───────┬───────┬────────┐
│ group_0 │ agg_0 │ agg_1 │ agg_2 │
├───────────────┼───────┼───────┼────────┤
│ environmental │ 5 │ 22.0 │ 46.34 │
│ industrial │ 2 │ 18.1 │ 35.25 │
│ outdoor │ 2 │ 5.1 │ 78.5 │
└───────────────┴───────┴───────┴────────┘
Min/Max Analysis¶
SELECT location, MIN(temperature) as min_temp, MAX(temperature) as max_temp
FROM sensor_readings
GROUP BY location
ORDER BY location;
Output:
┌──────────────────────┬───────┬───────┐
│ group_0 │ agg_0 │ agg_1 │
├──────────────────────┼───────┼───────┤
│ building_a_floor1 │ 22.5 │ 22.7 │
│ building_a_floor2 │ 21.0 │ 21.2 │
│ building_b_warehouse │ 18.0 │ 18.2 │
│ campus_entrance │ 5.0 │ 5.2 │
└──────────────────────┴───────┴───────┘
5. Mixed Modes: Event Tracking System¶
Scenario¶
SaaS application event tracking combining all storage modes for optimal performance.
Create Schema¶
CREATE TABLE events (
event_id INT PRIMARY KEY,
event_type TEXT STORAGE DICTIONARY,
source_app TEXT STORAGE DICTIONARY,
user_country TEXT STORAGE DICTIONARY,
device_type TEXT STORAGE DICTIONARY,
timestamp INT8 STORAGE COLUMNAR,
response_time_ms INT STORAGE COLUMNAR,
payload_size INT STORAGE COLUMNAR,
error_stack TEXT STORAGE CONTENT_ADDRESSED,
user_id INT,
session_id TEXT,
request_path TEXT
);
Insert Events¶
INSERT INTO events VALUES (1, 'page_view', 'web_app', 'USA', 'desktop', 1705600000, 45, 1024, NULL, 1001, 'sess_abc', '/dashboard');
INSERT INTO events VALUES (2, 'page_view', 'web_app', 'USA', 'mobile', 1705600010, 120, 512, NULL, 1002, 'sess_def', '/products');
INSERT INTO events VALUES (3, 'page_view', 'web_app', 'UK', 'desktop', 1705600020, 52, 1024, NULL, 1003, 'sess_ghi', '/dashboard');
INSERT INTO events VALUES (4, 'api_call', 'web_app', 'USA', 'desktop', 1705600050, 23, 256, NULL, 1001, 'sess_abc', '/api/users');
INSERT INTO events VALUES (5, 'api_call', 'mobile_app', 'USA', 'mobile', 1705600060, 156, 512, NULL, 1002, 'sess_def', '/api/products');
INSERT INTO events VALUES (6, 'error', 'web_app', 'USA', 'desktop', 1705600080, 5000, 2048, 'Error: Connection timeout at db.js:142', 1007, 'sess_stu', '/api/users/123');
INSERT INTO events VALUES (7, 'error', 'web_app', 'UK', 'mobile', 1705600090, 5200, 2048, 'Error: Connection timeout at db.js:142', 1008, 'sess_vwx', '/api/users/456');
INSERT INTO events VALUES (8, 'transaction', 'web_app', 'USA', 'desktop', 1705600150, 234, 512, NULL, 1001, 'sess_abc', '/checkout');
INSERT INTO events VALUES (9, 'login', 'mobile_app', 'France', 'mobile', 1705600190, 203, 192, NULL, 1014, 'sess_nop', '/auth/login');
Analytics Queries¶
SELECT event_type, COUNT(*) as count, AVG(response_time_ms) as avg_response_ms
FROM events
GROUP BY event_type
ORDER BY count DESC;
Output:
┌─────────────┬───────┬───────┐
│ group_0 │ agg_0 │ agg_1 │
├─────────────┼───────┼───────┤
│ page_view │ 3 │ 72.33 │
│ api_call │ 2 │ 89.5 │
│ error │ 2 │ 5100 │
│ transaction │ 1 │ 234 │
│ login │ 1 │ 203 │
└─────────────┴───────┴───────┘
SELECT device_type, COUNT(*) as events, SUM(payload_size) as total_bytes
FROM events
GROUP BY device_type
ORDER BY events DESC;
Output:
┌─────────┬───────┬───────┐
│ group_0 │ agg_0 │ agg_1 │
├─────────┼───────┼───────┤
│ desktop │ 5 │ 4864 │
│ mobile │ 4 │ 3264 │
└─────────┴───────┴───────┘
Error Analysis (CAS Benefit)¶
SELECT error_stack, COUNT(*) as occurrences
FROM events
WHERE event_type = 'error' AND error_stack IS NOT NULL
GROUP BY error_stack
ORDER BY occurrences DESC;
Output:
┌────────────────────────────────────────┬───────┐
│ group_0 │ agg_0 │
├────────────────────────────────────────┼───────┤
│ Error: Connection timeout at db.js:142 │ 2 │
└────────────────────────────────────────┴───────┘
Same error stack stored once, referenced twice.
6. Migration Strategies¶
Online Schema Evolution¶
Migrate existing columns to new storage modes without downtime.
CREATE TABLE products (
id INT PRIMARY KEY,
name TEXT,
category TEXT,
brand TEXT,
price FLOAT8
);
INSERT INTO products VALUES (1, 'Laptop Pro', 'electronics', 'TechCorp', 1299.99);
INSERT INTO products VALUES (2, 'Wireless Mouse', 'electronics', 'TechCorp', 49.99);
INSERT INTO products VALUES (3, 'USB Cable', 'electronics', 'CableCo', 9.99);
INSERT INTO products VALUES (4, 'Office Chair', 'furniture', 'ComfortPlus', 299.99);
Migrate to Dictionary¶
SELECT category, brand, COUNT(*), AVG(price)
FROM products
GROUP BY category, brand
ORDER BY COUNT(*) DESC;
ALTER TABLE products ALTER COLUMN category SET STORAGE DICTIONARY;
ALTER TABLE products ALTER COLUMN brand SET STORAGE DICTIONARY;
SELECT category, brand, COUNT(*), AVG(price)
FROM products
GROUP BY category, brand
ORDER BY COUNT(*) DESC;
Results are identical - migration is transparent.
Migrate to Columnar¶
ALTER TABLE products ALTER COLUMN price SET STORAGE COLUMNAR;
SELECT category, COUNT(*), MIN(price), MAX(price), SUM(price)
FROM products
GROUP BY category;
Output:
┌─────────────┬───────┬────────┬─────────┬─────────┐
│ group_0 │ agg_0 │ agg_1 │ agg_2 │ agg_3 │
├─────────────┼───────┼────────┼─────────┼─────────┤
│ electronics │ 3 │ 9.99 │ 1299.99 │ 1359.97 │
│ furniture │ 1 │ 299.99 │ 299.99 │ 299.99 │
└─────────────┴───────┴────────┴─────────┴─────────┘
Revert to Default¶
ALTER TABLE products ALTER COLUMN category SET STORAGE DEFAULT;
ALTER TABLE products ALTER COLUMN brand SET STORAGE DEFAULT;
ALTER TABLE products ALTER COLUMN price SET STORAGE DEFAULT;
SELECT * FROM products ORDER BY id;
All data preserved:
┌────┬────────────────┬─────────────┬─────────────┬─────────┐
│ id │ name │ category │ brand │ price │
├────┼────────────────┼─────────────┼─────────────┼─────────┤
│ 1 │ Laptop Pro │ electronics │ TechCorp │ 1299.99 │
│ 2 │ Wireless Mouse │ electronics │ TechCorp │ 49.99 │
│ 3 │ USB Cable │ electronics │ CableCo │ 9.99 │
│ 4 │ Office Chair │ furniture │ ComfortPlus │ 299.99 │
└────┴────────────────┴─────────────┴─────────────┴─────────┘
Quick Reference¶
Choosing the Right Mode¶
| Column Characteristics | Recommended Mode | Example |
|---|---|---|
| < 64K unique strings | DICTIONARY |
status, country, category |
| Large values (>1KB) with duplicates | CONTENT_ADDRESSED |
templates, JSON payloads |
| Numeric sequences for aggregations | COLUMNAR |
timestamps, metrics, prices |
| Unique text, mixed queries | DEFAULT |
names, descriptions, IDs |
SQL Syntax¶
-- At table creation
CREATE TABLE t (
col1 TEXT STORAGE DICTIONARY,
col2 TEXT STORAGE CONTENT_ADDRESSED,
col3 FLOAT8 STORAGE COLUMNAR
);
-- Online migration
ALTER TABLE t ALTER COLUMN col SET STORAGE DICTIONARY;
ALTER TABLE t ALTER COLUMN col SET STORAGE CONTENT_ADDRESSED;
ALTER TABLE t ALTER COLUMN col SET STORAGE COLUMNAR;
ALTER TABLE t ALTER COLUMN col SET STORAGE DEFAULT;
Performance Guidelines¶
- Dictionary: Best when same values repeat thousands of times
- Content-Addressed: Best for values >1KB that duplicate across rows
- Columnar: Best for numeric columns with SUM/AVG/COUNT queries
- Mixed Modes: Real-world tables often combine multiple modes
See Advanced Features Tutorial for EXPLAIN, Vector Search, Materialized Views, and Bulk Loading.