HeliosDB-Lite Advanced Features Tutorial¶
Hands-on guide to EXPLAIN, Vector Search, Materialized Views, and Bulk Loading.
Prerequisites: HeliosDB-Lite REPL running (./heliosdb-lite repl -m for in-memory mode)
See also: Storage Modes Tutorial for per-column storage optimization.
Table of Contents¶
- Query Analysis with EXPLAIN
- Vector Search with HNSW Indexes
- Materialized Views
- Bulk Loading for Performance
1. Query Analysis with EXPLAIN¶
Basic EXPLAIN¶
CREATE TABLE events (
event_id INT PRIMARY KEY,
event_type TEXT STORAGE DICTIONARY,
response_time_ms INT STORAGE COLUMNAR
);
INSERT INTO events VALUES (1, 'page_view', 45);
INSERT INTO events VALUES (2, 'api_call', 23);
INSERT INTO events VALUES (3, 'page_view', 67);
EXPLAIN SELECT event_type, COUNT(*) FROM events GROUP BY event_type;
Shows the query execution plan.
EXPLAIN Modes¶
Verbose mode shows cost estimates and cardinality.
Analyze mode executes the query and shows actual timing.
Output Formats¶
EXPLAIN (FORMAT JSON) SELECT * FROM events;
EXPLAIN (FORMAT YAML) SELECT event_type, SUM(response_time_ms) FROM events GROUP BY event_type;
Machine-readable formats for programmatic analysis.
2. Vector Search with HNSW Indexes¶
Create a Knowledge Base¶
CREATE TABLE knowledge_articles (
article_id INT PRIMARY KEY,
title TEXT,
content TEXT STORAGE CONTENT_ADDRESSED,
category TEXT STORAGE DICTIONARY,
embedding VECTOR(4)
);
Insert Articles with Embeddings¶
INSERT INTO knowledge_articles VALUES (
1,
'Getting Started Guide',
'Welcome to our platform. This guide helps you get started.',
'onboarding',
'[0.1, 0.8, 0.2, 0.5]'
);
INSERT INTO knowledge_articles VALUES (
2,
'Password Reset Help',
'To reset your password, click Forgot Password on the login page.',
'security',
'[0.3, 0.2, 0.9, 0.1]'
);
INSERT INTO knowledge_articles VALUES (
3,
'Billing FAQ',
'Common questions about billing, invoices, and payment methods.',
'billing',
'[0.7, 0.1, 0.3, 0.8]'
);
INSERT INTO knowledge_articles VALUES (
4,
'API Documentation',
'Complete reference for REST API endpoints and authentication.',
'developer',
'[0.2, 0.6, 0.4, 0.7]'
);
INSERT INTO knowledge_articles VALUES (
5,
'Account Settings',
'Manage your profile, notifications, and privacy settings.',
'onboarding',
'[0.15, 0.75, 0.25, 0.55]'
);
Create HNSW Index¶
Semantic Search¶
Find articles similar to a query embedding (e.g., "how do I change my password"):
SELECT article_id, title, category, embedding <-> '[0.25, 0.3, 0.85, 0.15]' as distance
FROM knowledge_articles
ORDER BY embedding <-> '[0.25, 0.3, 0.85, 0.15]'
LIMIT 3;
Output:
┌────────────┬───────────────────────┬──────────┬──────────┐
│ article_id │ title │ category │ distance │
├────────────┼───────────────────────┼──────────┼──────────┤
│ 2 │ Password Reset Help │ security │ 0.122 │
│ 4 │ API Documentation │ developer│ 0.548 │
│ 5 │ Account Settings │ onboarding│ 0.612 │
└────────────┴───────────────────────┴──────────┴──────────┘
Distance Operators¶
<->L2 distance (Euclidean)<=>Cosine distance<#>Inner product
SELECT title, category, embedding <=> '[0.1, 0.7, 0.3, 0.5]' as cosine_distance
FROM knowledge_articles
WHERE category = 'onboarding'
ORDER BY cosine_distance
LIMIT 2;
3. Materialized Views¶
Create Base Table¶
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
category TEXT STORAGE DICTIONARY,
amount FLOAT8 STORAGE COLUMNAR,
sale_date INT8 STORAGE COLUMNAR
);
INSERT INTO sales VALUES (1, 101, 'electronics', 299.99, 1705600000);
INSERT INTO sales VALUES (2, 102, 'electronics', 149.99, 1705600100);
INSERT INTO sales VALUES (3, 201, 'furniture', 599.99, 1705600200);
INSERT INTO sales VALUES (4, 101, 'electronics', 299.99, 1705600300);
INSERT INTO sales VALUES (5, 301, 'clothing', 49.99, 1705600400);
INSERT INTO sales VALUES (6, 102, 'electronics', 149.99, 1705600500);
INSERT INTO sales VALUES (7, 201, 'furniture', 599.99, 1705600600);
INSERT INTO sales VALUES (8, 302, 'clothing', 79.99, 1705600700);
Create Materialized View¶
CREATE MATERIALIZED VIEW category_summary AS
SELECT
category,
COUNT(*) as sale_count,
SUM(amount) as total_revenue,
AVG(amount) as avg_sale
FROM sales
GROUP BY category;
Query the View¶
Output:
┌─────────────┬────────────┬───────────────┬──────────┐
│ category │ sale_count │ total_revenue │ avg_sale │
├─────────────┼────────────┼───────────────┼──────────┤
│ furniture │ 2 │ 1199.98 │ 599.99 │
│ electronics │ 4 │ 899.96 │ 224.99 │
│ clothing │ 2 │ 129.98 │ 64.99 │
└─────────────┴────────────┴───────────────┴──────────┘
Refresh After Changes¶
INSERT INTO sales VALUES (9, 103, 'electronics', 999.99, 1705600800);
REFRESH MATERIALIZED VIEW category_summary;
SELECT * FROM category_summary ORDER BY total_revenue DESC;
Product-Level View¶
CREATE MATERIALIZED VIEW product_summary AS
SELECT
product_id,
category,
COUNT(*) as times_sold,
SUM(amount) as total_revenue
FROM sales
GROUP BY product_id, category;
SELECT * FROM product_summary ORDER BY total_revenue DESC;
4. Bulk Loading for Performance¶
Create Table with Optimized Storage¶
CREATE TABLE bulk_events (
id INT PRIMARY KEY,
event_type TEXT STORAGE DICTIONARY,
region TEXT STORAGE DICTIONARY,
timestamp INT8 STORAGE COLUMNAR,
value FLOAT8 STORAGE COLUMNAR
);
Insert Data¶
INSERT INTO bulk_events VALUES (1, 'click', 'us-east', 1705700000, 1.5);
INSERT INTO bulk_events VALUES (2, 'click', 'us-west', 1705700001, 2.3);
INSERT INTO bulk_events VALUES (3, 'view', 'eu-west', 1705700002, 0.8);
INSERT INTO bulk_events VALUES (4, 'click', 'us-east', 1705700003, 1.9);
INSERT INTO bulk_events VALUES (5, 'purchase', 'us-east', 1705700004, 99.99);
INSERT INTO bulk_events VALUES (6, 'view', 'ap-south', 1705700005, 0.5);
INSERT INTO bulk_events VALUES (7, 'click', 'eu-west', 1705700006, 2.1);
INSERT INTO bulk_events VALUES (8, 'view', 'us-west', 1705700007, 0.7);
INSERT INTO bulk_events VALUES (9, 'purchase', 'eu-west', 1705700008, 149.99);
INSERT INTO bulk_events VALUES (10, 'click', 'ap-south', 1705700009, 1.2);
Note: For maximum performance (500K-700K rows/sec), use the Rust API:
// Enable bulk mode via API
db.set_bulk_load_mode(true);
// ... inserts ...
db.set_bulk_load_mode(false);
// Or use direct_bulk_load for 700K-900K rows/sec
db.direct_bulk_load("bulk_events", &rows, batch_size)?;
Analyze Results¶
SELECT event_type, region, COUNT(*), SUM(value) as total_value
FROM bulk_events
GROUP BY event_type, region
ORDER BY total_value DESC;
Output:
┌───────────┬──────────┬───────┬─────────────┐
│ group_0 │ group_1 │ agg_0 │ agg_1 │
├───────────┼──────────┼───────┼─────────────┤
│ purchase │ eu-west │ 1 │ 149.99 │
│ purchase │ us-east │ 1 │ 99.99 │
│ click │ us-east │ 2 │ 3.4 │
│ click │ us-west │ 1 │ 2.3 │
│ click │ eu-west │ 1 │ 2.1 │
│ click │ ap-south │ 1 │ 1.2 │
│ view │ eu-west │ 1 │ 0.8 │
│ view │ us-west │ 1 │ 0.7 │
│ view │ ap-south │ 1 │ 0.5 │
└───────────┴──────────┴───────┴─────────────┘
Performance Comparison¶
| Method | Throughput |
|---|---|
| Standard INSERT | 30K-50K rows/sec |
| Bulk Load Mode | 500K-700K rows/sec |
| direct_bulk_load API | 700K-900K rows/sec |
Combining Features¶
Real-World Example: Analytics Pipeline¶
CREATE TABLE raw_events (
id INT PRIMARY KEY,
event_type TEXT STORAGE DICTIONARY,
user_country TEXT STORAGE DICTIONARY,
timestamp INT8 STORAGE COLUMNAR,
value FLOAT8 STORAGE COLUMNAR,
embedding VECTOR(4)
);
INSERT INTO raw_events VALUES (1, 'search', 'USA', 1705800000, 1.0, '[0.1, 0.2, 0.3, 0.4]');
INSERT INTO raw_events VALUES (2, 'click', 'USA', 1705800001, 2.5, '[0.2, 0.3, 0.4, 0.5]');
INSERT INTO raw_events VALUES (3, 'search', 'UK', 1705800002, 1.0, '[0.1, 0.25, 0.35, 0.45]');
INSERT INTO raw_events VALUES (4, 'purchase', 'Germany', 1705800003, 99.99, '[0.8, 0.1, 0.2, 0.9]');
INSERT INTO raw_events VALUES (5, 'click', 'France', 1705800004, 3.0, '[0.3, 0.4, 0.5, 0.6]');
CREATE INDEX idx_events_embedding ON raw_events USING hnsw (embedding);
CREATE MATERIALIZED VIEW event_summary AS
SELECT event_type, user_country, COUNT(*), SUM(value), AVG(value)
FROM raw_events
GROUP BY event_type, user_country;
SELECT * FROM event_summary ORDER BY SUM(value) DESC;
SELECT id, event_type, embedding <-> '[0.15, 0.25, 0.35, 0.45]' as distance
FROM raw_events
ORDER BY distance
LIMIT 3;
EXPLAIN (VERBOSE) SELECT event_type, AVG(value) FROM raw_events GROUP BY event_type;
DROP MATERIALIZED VIEW event_summary;
DROP TABLE raw_events;
Quick Reference¶
EXPLAIN Options¶
| Option | Description |
|---|---|
VERBOSE |
Show costs and cardinality |
ANALYZE |
Execute and show timing |
FORMAT JSON |
JSON output |
FORMAT YAML |
YAML output |
Vector Distance Operators¶
| Operator | Distance Type |
|---|---|
<-> |
L2 (Euclidean) |
<=> |
Cosine |
<#> |
Inner Product |
Materialized View Commands¶
CREATE MATERIALIZED VIEW name AS SELECT ...;
REFRESH MATERIALIZED VIEW name;
DROP MATERIALIZED VIEW name;
Bulk Loading (Rust API)¶
db.set_bulk_load_mode(true);
// INSERT statements
db.set_bulk_load_mode(false);
// Or for maximum speed:
db.direct_bulk_load("table", &rows, batch_size)?;
See Storage Modes Tutorial for Dictionary, Content-Addressed, and Columnar storage optimization.