Skip to content

HeliosDB-Lite Storage Modes Tutorial

A hands-on guide to per-column storage optimization with real-world applications.

Prerequisites: HeliosDB-Lite REPL running (./heliosdb-lite repl -m for in-memory mode)


Table of Contents

  1. Understanding Storage Modes
  2. Dictionary Encoding: E-Commerce Orders
  3. Content-Addressed Storage: Document Deduplication
  4. Columnar Storage: Time-Series Analytics
  5. Mixed Modes: Event Tracking System
  6. Migration Strategies

See also: Advanced Features Tutorial for EXPLAIN, Vector Search, and Materialized Views.


1. Understanding Storage Modes

What Are Storage Modes?

HeliosDB-Lite supports four per-column storage strategies:

Mode Best For Compression Use Case
DEFAULT General data RocksDB LZ4 (3-7x) OLTP, mixed workloads
DICTIONARY Low-cardinality strings 50-95% Status codes, categories
CONTENT_ADDRESSED Large duplicate values ~100% dedup Templates, JSON payloads
COLUMNAR Numeric sequences 20-50% + faster aggregates Time-series, analytics

Your First Storage Mode

CREATE TABLE demo (id INT PRIMARY KEY, category TEXT, value FLOAT8);

INSERT INTO demo VALUES (1, 'electronics', 99.99);
INSERT INTO demo VALUES (2, 'electronics', 149.99);
INSERT INTO demo VALUES (3, 'clothing', 29.99);
INSERT INTO demo VALUES (4, 'electronics', 199.99);
INSERT INTO demo VALUES (5, 'clothing', 49.99);

SELECT * FROM demo ORDER BY id;

Output:

┌────┬─────────────┬────────┐
│ id │ category    │ value  │
├────┼─────────────┼────────┤
│ 1  │ electronics │ 99.99  │
│ 2  │ electronics │ 149.99 │
│ 3  │ clothing    │ 29.99  │
│ 4  │ electronics │ 199.99 │
│ 5  │ clothing    │ 49.99  │
└────┴─────────────┴────────┘

Now migrate to dictionary encoding:

ALTER TABLE demo ALTER COLUMN category SET STORAGE DICTIONARY;

SELECT * FROM demo ORDER BY id;

Data is preserved - queries work identically:

INSERT INTO demo VALUES (6, 'electronics', 79.99);
INSERT INTO demo VALUES (7, 'home', 299.99);

SELECT category, COUNT(*), AVG(value) FROM demo GROUP BY category;

Output:

┌─────────────┬───────┬────────┐
│ group_0     │ agg_0 │ agg_1  │
├─────────────┼───────┼────────┤
│ clothing    │ 2     │ 39.99  │
│ electronics │ 4     │ 132.49 │
│ home        │ 1     │ 299.99 │
└─────────────┴───────┴────────┘

DROP TABLE demo;

2. Dictionary Encoding: E-Commerce Orders

Scenario

E-commerce platform with order status and shipping country having limited unique values but appearing millions of times.

Create the Schema

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    status TEXT STORAGE DICTIONARY,
    shipping_country TEXT STORAGE DICTIONARY,
    payment_method TEXT STORAGE DICTIONARY,
    order_total FLOAT8,
    created_at INT8
);

Populate Data

INSERT INTO orders VALUES (1001, 501, 'pending', 'USA', 'credit_card', 125.50, 1705320000);
INSERT INTO orders VALUES (1002, 502, 'pending', 'USA', 'paypal', 89.99, 1705320100);
INSERT INTO orders VALUES (1003, 503, 'shipped', 'Canada', 'credit_card', 245.00, 1705320200);
INSERT INTO orders VALUES (1004, 501, 'delivered', 'USA', 'credit_card', 67.25, 1705320300);
INSERT INTO orders VALUES (1005, 504, 'pending', 'UK', 'paypal', 189.99, 1705320400);
INSERT INTO orders VALUES (1006, 505, 'shipped', 'Germany', 'credit_card', 312.50, 1705320500);
INSERT INTO orders VALUES (1007, 502, 'delivered', 'USA', 'apple_pay', 45.00, 1705320600);
INSERT INTO orders VALUES (1008, 506, 'cancelled', 'France', 'credit_card', 78.99, 1705320700);
INSERT INTO orders VALUES (1009, 503, 'pending', 'Canada', 'paypal', 156.75, 1705320800);
INSERT INTO orders VALUES (1010, 507, 'shipped', 'USA', 'credit_card', 423.00, 1705320900);

Analyze Order Patterns

SELECT status, COUNT(*) as order_count, SUM(order_total) as total_revenue
FROM orders
GROUP BY status
ORDER BY order_count DESC;

Output:

┌───────────┬───────┬────────┐
│ group_0   │ agg_0 │ agg_1  │
├───────────┼───────┼────────┤
│ pending   │ 4     │ 562.23 │
│ shipped   │ 3     │ 980.5  │
│ delivered │ 2     │ 112.25 │
│ cancelled │ 1     │ 78.99  │
└───────────┴───────┴────────┘

Customer Analysis with Joins

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name TEXT,
    email TEXT,
    tier TEXT STORAGE DICTIONARY
);

INSERT INTO customers VALUES (501, 'Alice Johnson', 'alice@email.com', 'gold');
INSERT INTO customers VALUES (502, 'Bob Smith', 'bob@email.com', 'silver');
INSERT INTO customers VALUES (503, 'Carol White', 'carol@email.com', 'gold');
INSERT INTO customers VALUES (504, 'David Brown', 'david@email.com', 'bronze');
INSERT INTO customers VALUES (505, 'Eva Martinez', 'eva@email.com', 'silver');

SELECT c.name, c.tier, COUNT(*) as total_orders, SUM(o.order_total) as lifetime_value
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.tier
ORDER BY lifetime_value DESC;

Output:

┌───────────────┬────────┬───────┬────────┐
│ group_1       │ group_2│ agg_0 │ agg_1  │
├───────────────┼────────┼───────┼────────┤
│ Carol White   │ gold   │ 2     │ 401.75 │
│ Eva Martinez  │ silver │ 1     │ 312.5  │
│ Alice Johnson │ gold   │ 2     │ 192.75 │
│ David Brown   │ bronze │ 1     │ 189.99 │
│ Bob Smith     │ silver │ 2     │ 134.99 │
└───────────────┴────────┴───────┴────────┘

Filter by Dictionary-Encoded Columns

SELECT o.order_id, c.name, o.status, o.order_total
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'pending' AND o.shipping_country = 'USA'
ORDER BY o.order_total DESC;
DROP TABLE orders;
DROP TABLE customers;

3. Content-Addressed Storage: Document Deduplication

Scenario

Notification system where many users receive identical email templates. Store once, reference everywhere.

Create Tables

CREATE TABLE email_templates (
    template_id INT PRIMARY KEY,
    template_name TEXT,
    html_content TEXT STORAGE CONTENT_ADDRESSED,
    category TEXT STORAGE DICTIONARY
);

CREATE TABLE notifications (
    notification_id INT PRIMARY KEY,
    user_id INT,
    template_id INT,
    sent_at INT8,
    status TEXT STORAGE DICTIONARY
);

Insert Templates

INSERT INTO email_templates VALUES (
    1,
    'Welcome Email',
    '<html><head><style>body{font-family:Arial,sans-serif;margin:0;padding:20px;background:#f5f5f5}.container{max-width:600px;margin:0 auto;background:white;border-radius:8px;padding:40px}.header{text-align:center;border-bottom:2px solid #007bff;padding-bottom:20px}.title{color:#333;font-size:28px;margin:20px 0}.content{color:#666;line-height:1.6;padding:20px 0}.button{display:inline-block;background:#007bff;color:white;padding:12px 30px;text-decoration:none;border-radius:5px;margin:20px 0}.footer{text-align:center;color:#999;font-size:12px;padding-top:20px;border-top:1px solid #eee}</style></head><body><div class="container"><div class="header"><h1 class="title">Welcome!</h1></div></div></body></html>',
    'onboarding'
);

INSERT INTO email_templates VALUES (
    2,
    'Order Confirmation',
    '<html><body>Your order has been confirmed!</body></html>',
    'transactional'
);

INSERT INTO email_templates VALUES (
    3,
    'Password Reset',
    '<html><body>Click here to reset your password.</body></html>',
    'security'
);

SELECT template_id, template_name, category FROM email_templates ORDER BY template_id;

Log Notifications

INSERT INTO notifications VALUES (1, 1001, 1, 1705400000, 'delivered');
INSERT INTO notifications VALUES (2, 1002, 1, 1705400100, 'delivered');
INSERT INTO notifications VALUES (3, 1003, 1, 1705400200, 'delivered');
INSERT INTO notifications VALUES (4, 1004, 2, 1705400300, 'delivered');
INSERT INTO notifications VALUES (5, 1005, 2, 1705400400, 'delivered');
INSERT INTO notifications VALUES (6, 1001, 2, 1705400500, 'delivered');
INSERT INTO notifications VALUES (7, 1006, 3, 1705400600, 'delivered');
INSERT INTO notifications VALUES (8, 1007, 1, 1705400700, 'bounced');

Analyze Usage

SELECT t.template_name, t.category, COUNT(*) as times_sent
FROM email_templates t
INNER JOIN notifications n ON t.template_id = n.template_id
GROUP BY t.template_id, t.template_name, t.category
ORDER BY times_sent DESC;

Output:

┌────────────────────┬───────────────┬───────┐
│ group_1            │ group_2       │ agg_0 │
├────────────────────┼───────────────┼───────┤
│ Welcome Email      │ onboarding    │ 4     │
│ Order Confirmation │ transactional │ 3     │
│ Password Reset     │ security      │ 1     │
└────────────────────┴───────────────┴───────┘

Storage benefit: With CAS, 100K notifications using 3 templates stores template content only 3 times instead of 100K times.

DROP TABLE notifications;
DROP TABLE email_templates;

4. Columnar Storage: Time-Series Analytics

Scenario

IoT monitoring system with sensors reporting metrics every second. Aggregations over time ranges are critical.

Create Schema

CREATE TABLE sensor_readings (
    reading_id INT PRIMARY KEY,
    sensor_id INT,
    sensor_type TEXT STORAGE DICTIONARY,
    location TEXT STORAGE DICTIONARY,
    timestamp INT8 STORAGE COLUMNAR,
    temperature FLOAT8 STORAGE COLUMNAR,
    humidity FLOAT8 STORAGE COLUMNAR,
    pressure FLOAT8 STORAGE COLUMNAR
);

Insert Sensor Data

INSERT INTO sensor_readings VALUES (1, 101, 'environmental', 'building_a_floor1', 1705500000, 22.5, 45.2, 1013.25);
INSERT INTO sensor_readings VALUES (2, 101, 'environmental', 'building_a_floor1', 1705500060, 22.6, 45.0, 1013.20);
INSERT INTO sensor_readings VALUES (3, 101, 'environmental', 'building_a_floor1', 1705500120, 22.7, 44.8, 1013.15);
INSERT INTO sensor_readings VALUES (4, 102, 'environmental', 'building_a_floor2', 1705500000, 21.0, 48.5, 1013.30);
INSERT INTO sensor_readings VALUES (5, 102, 'environmental', 'building_a_floor2', 1705500060, 21.2, 48.2, 1013.25);
INSERT INTO sensor_readings VALUES (6, 201, 'industrial', 'building_b_warehouse', 1705500000, 18.0, 35.0, 1012.00);
INSERT INTO sensor_readings VALUES (7, 201, 'industrial', 'building_b_warehouse', 1705500060, 18.2, 35.5, 1012.05);
INSERT INTO sensor_readings VALUES (8, 301, 'outdoor', 'campus_entrance', 1705500000, 5.2, 78.0, 1015.50);
INSERT INTO sensor_readings VALUES (9, 301, 'outdoor', 'campus_entrance', 1705500060, 5.0, 79.0, 1015.55);

Aggregations (Columnar Advantage)

SELECT sensor_type, COUNT(*) as readings, AVG(temperature) as avg_temp, AVG(humidity) as avg_humidity
FROM sensor_readings
GROUP BY sensor_type
ORDER BY avg_temp DESC;

Output:

┌───────────────┬───────┬───────┬────────┐
│ group_0       │ agg_0 │ agg_1 │ agg_2  │
├───────────────┼───────┼───────┼────────┤
│ environmental │ 5     │ 22.0  │ 46.34  │
│ industrial    │ 2     │ 18.1  │ 35.25  │
│ outdoor       │ 2     │ 5.1   │ 78.5   │
└───────────────┴───────┴───────┴────────┘

Min/Max Analysis

SELECT location, MIN(temperature) as min_temp, MAX(temperature) as max_temp
FROM sensor_readings
GROUP BY location
ORDER BY location;

Output:

┌──────────────────────┬───────┬───────┐
│ group_0              │ agg_0 │ agg_1 │
├──────────────────────┼───────┼───────┤
│ building_a_floor1    │ 22.5  │ 22.7  │
│ building_a_floor2    │ 21.0  │ 21.2  │
│ building_b_warehouse │ 18.0  │ 18.2  │
│ campus_entrance      │ 5.0   │ 5.2   │
└──────────────────────┴───────┴───────┘

DROP TABLE sensor_readings;

5. Mixed Modes: Event Tracking System

Scenario

SaaS application event tracking combining all storage modes for optimal performance.

Create Schema

CREATE TABLE events (
    event_id INT PRIMARY KEY,
    event_type TEXT STORAGE DICTIONARY,
    source_app TEXT STORAGE DICTIONARY,
    user_country TEXT STORAGE DICTIONARY,
    device_type TEXT STORAGE DICTIONARY,
    timestamp INT8 STORAGE COLUMNAR,
    response_time_ms INT STORAGE COLUMNAR,
    payload_size INT STORAGE COLUMNAR,
    error_stack TEXT STORAGE CONTENT_ADDRESSED,
    user_id INT,
    session_id TEXT,
    request_path TEXT
);

Insert Events

INSERT INTO events VALUES (1, 'page_view', 'web_app', 'USA', 'desktop', 1705600000, 45, 1024, NULL, 1001, 'sess_abc', '/dashboard');
INSERT INTO events VALUES (2, 'page_view', 'web_app', 'USA', 'mobile', 1705600010, 120, 512, NULL, 1002, 'sess_def', '/products');
INSERT INTO events VALUES (3, 'page_view', 'web_app', 'UK', 'desktop', 1705600020, 52, 1024, NULL, 1003, 'sess_ghi', '/dashboard');
INSERT INTO events VALUES (4, 'api_call', 'web_app', 'USA', 'desktop', 1705600050, 23, 256, NULL, 1001, 'sess_abc', '/api/users');
INSERT INTO events VALUES (5, 'api_call', 'mobile_app', 'USA', 'mobile', 1705600060, 156, 512, NULL, 1002, 'sess_def', '/api/products');
INSERT INTO events VALUES (6, 'error', 'web_app', 'USA', 'desktop', 1705600080, 5000, 2048, 'Error: Connection timeout at db.js:142', 1007, 'sess_stu', '/api/users/123');
INSERT INTO events VALUES (7, 'error', 'web_app', 'UK', 'mobile', 1705600090, 5200, 2048, 'Error: Connection timeout at db.js:142', 1008, 'sess_vwx', '/api/users/456');
INSERT INTO events VALUES (8, 'transaction', 'web_app', 'USA', 'desktop', 1705600150, 234, 512, NULL, 1001, 'sess_abc', '/checkout');
INSERT INTO events VALUES (9, 'login', 'mobile_app', 'France', 'mobile', 1705600190, 203, 192, NULL, 1014, 'sess_nop', '/auth/login');

Analytics Queries

SELECT event_type, COUNT(*) as count, AVG(response_time_ms) as avg_response_ms
FROM events
GROUP BY event_type
ORDER BY count DESC;

Output:

┌─────────────┬───────┬───────┐
│ group_0     │ agg_0 │ agg_1 │
├─────────────┼───────┼───────┤
│ page_view   │ 3     │ 72.33 │
│ api_call    │ 2     │ 89.5  │
│ error       │ 2     │ 5100  │
│ transaction │ 1     │ 234   │
│ login       │ 1     │ 203   │
└─────────────┴───────┴───────┘

SELECT device_type, COUNT(*) as events, SUM(payload_size) as total_bytes
FROM events
GROUP BY device_type
ORDER BY events DESC;

Output:

┌─────────┬───────┬───────┐
│ group_0 │ agg_0 │ agg_1 │
├─────────┼───────┼───────┤
│ desktop │ 5     │ 4864  │
│ mobile  │ 4     │ 3264  │
└─────────┴───────┴───────┘

Error Analysis (CAS Benefit)

SELECT error_stack, COUNT(*) as occurrences
FROM events
WHERE event_type = 'error' AND error_stack IS NOT NULL
GROUP BY error_stack
ORDER BY occurrences DESC;

Output:

┌────────────────────────────────────────┬───────┐
│ group_0                                │ agg_0 │
├────────────────────────────────────────┼───────┤
│ Error: Connection timeout at db.js:142 │ 2     │
└────────────────────────────────────────┴───────┘

Same error stack stored once, referenced twice.

DROP TABLE events;

6. Migration Strategies

Online Schema Evolution

Migrate existing columns to new storage modes without downtime.

CREATE TABLE products (
    id INT PRIMARY KEY,
    name TEXT,
    category TEXT,
    brand TEXT,
    price FLOAT8
);

INSERT INTO products VALUES (1, 'Laptop Pro', 'electronics', 'TechCorp', 1299.99);
INSERT INTO products VALUES (2, 'Wireless Mouse', 'electronics', 'TechCorp', 49.99);
INSERT INTO products VALUES (3, 'USB Cable', 'electronics', 'CableCo', 9.99);
INSERT INTO products VALUES (4, 'Office Chair', 'furniture', 'ComfortPlus', 299.99);

Migrate to Dictionary

SELECT category, brand, COUNT(*), AVG(price)
FROM products
GROUP BY category, brand
ORDER BY COUNT(*) DESC;

ALTER TABLE products ALTER COLUMN category SET STORAGE DICTIONARY;
ALTER TABLE products ALTER COLUMN brand SET STORAGE DICTIONARY;

SELECT category, brand, COUNT(*), AVG(price)
FROM products
GROUP BY category, brand
ORDER BY COUNT(*) DESC;

Results are identical - migration is transparent.

Migrate to Columnar

ALTER TABLE products ALTER COLUMN price SET STORAGE COLUMNAR;

SELECT category, COUNT(*), MIN(price), MAX(price), SUM(price)
FROM products
GROUP BY category;

Output:

┌─────────────┬───────┬────────┬─────────┬─────────┐
│ group_0     │ agg_0 │ agg_1  │ agg_2   │ agg_3   │
├─────────────┼───────┼────────┼─────────┼─────────┤
│ electronics │ 3     │ 9.99   │ 1299.99 │ 1359.97 │
│ furniture   │ 1     │ 299.99 │ 299.99  │ 299.99  │
└─────────────┴───────┴────────┴─────────┴─────────┘

Revert to Default

ALTER TABLE products ALTER COLUMN category SET STORAGE DEFAULT;
ALTER TABLE products ALTER COLUMN brand SET STORAGE DEFAULT;
ALTER TABLE products ALTER COLUMN price SET STORAGE DEFAULT;

SELECT * FROM products ORDER BY id;

All data preserved:

┌────┬────────────────┬─────────────┬─────────────┬─────────┐
│ id │ name           │ category    │ brand       │ price   │
├────┼────────────────┼─────────────┼─────────────┼─────────┤
│ 1  │ Laptop Pro     │ electronics │ TechCorp    │ 1299.99 │
│ 2  │ Wireless Mouse │ electronics │ TechCorp    │ 49.99   │
│ 3  │ USB Cable      │ electronics │ CableCo     │ 9.99    │
│ 4  │ Office Chair   │ furniture   │ ComfortPlus │ 299.99  │
└────┴────────────────┴─────────────┴─────────────┴─────────┘

DROP TABLE products;

Quick Reference

Choosing the Right Mode

Column Characteristics Recommended Mode Example
< 64K unique strings DICTIONARY status, country, category
Large values (>1KB) with duplicates CONTENT_ADDRESSED templates, JSON payloads
Numeric sequences for aggregations COLUMNAR timestamps, metrics, prices
Unique text, mixed queries DEFAULT names, descriptions, IDs

SQL Syntax

-- At table creation
CREATE TABLE t (
    col1 TEXT STORAGE DICTIONARY,
    col2 TEXT STORAGE CONTENT_ADDRESSED,
    col3 FLOAT8 STORAGE COLUMNAR
);

-- Online migration
ALTER TABLE t ALTER COLUMN col SET STORAGE DICTIONARY;
ALTER TABLE t ALTER COLUMN col SET STORAGE CONTENT_ADDRESSED;
ALTER TABLE t ALTER COLUMN col SET STORAGE COLUMNAR;
ALTER TABLE t ALTER COLUMN col SET STORAGE DEFAULT;

Performance Guidelines

  1. Dictionary: Best when same values repeat thousands of times
  2. Content-Addressed: Best for values >1KB that duplicate across rows
  3. Columnar: Best for numeric columns with SUM/AVG/COUNT queries
  4. Mixed Modes: Real-world tables often combine multiple modes

See Advanced Features Tutorial for EXPLAIN, Vector Search, Materialized Views, and Bulk Loading.