Skip to content

HeliosDB-Lite Advanced Features Tutorial

Hands-on guide to EXPLAIN, Vector Search, Materialized Views, and Bulk Loading.

Prerequisites: HeliosDB-Lite REPL running (./heliosdb-lite repl -m for in-memory mode)

See also: Storage Modes Tutorial for per-column storage optimization.


Table of Contents

  1. Query Analysis with EXPLAIN
  2. Vector Search with HNSW Indexes
  3. Materialized Views
  4. Bulk Loading for Performance

1. Query Analysis with EXPLAIN

Basic EXPLAIN

CREATE TABLE events (
    event_id INT PRIMARY KEY,
    event_type TEXT STORAGE DICTIONARY,
    response_time_ms INT STORAGE COLUMNAR
);

INSERT INTO events VALUES (1, 'page_view', 45);
INSERT INTO events VALUES (2, 'api_call', 23);
INSERT INTO events VALUES (3, 'page_view', 67);

EXPLAIN SELECT event_type, COUNT(*) FROM events GROUP BY event_type;

Shows the query execution plan.

EXPLAIN Modes

EXPLAIN (VERBOSE) SELECT event_type, AVG(response_time_ms) FROM events GROUP BY event_type;

Verbose mode shows cost estimates and cardinality.

EXPLAIN (ANALYZE) SELECT * FROM events WHERE event_type = 'page_view';

Analyze mode executes the query and shows actual timing.

Output Formats

EXPLAIN (FORMAT JSON) SELECT * FROM events;

EXPLAIN (FORMAT YAML) SELECT event_type, SUM(response_time_ms) FROM events GROUP BY event_type;

Machine-readable formats for programmatic analysis.

DROP TABLE events;

2. Vector Search with HNSW Indexes

Create a Knowledge Base

CREATE TABLE knowledge_articles (
    article_id INT PRIMARY KEY,
    title TEXT,
    content TEXT STORAGE CONTENT_ADDRESSED,
    category TEXT STORAGE DICTIONARY,
    embedding VECTOR(4)
);

Insert Articles with Embeddings

INSERT INTO knowledge_articles VALUES (
    1,
    'Getting Started Guide',
    'Welcome to our platform. This guide helps you get started.',
    'onboarding',
    '[0.1, 0.8, 0.2, 0.5]'
);

INSERT INTO knowledge_articles VALUES (
    2,
    'Password Reset Help',
    'To reset your password, click Forgot Password on the login page.',
    'security',
    '[0.3, 0.2, 0.9, 0.1]'
);

INSERT INTO knowledge_articles VALUES (
    3,
    'Billing FAQ',
    'Common questions about billing, invoices, and payment methods.',
    'billing',
    '[0.7, 0.1, 0.3, 0.8]'
);

INSERT INTO knowledge_articles VALUES (
    4,
    'API Documentation',
    'Complete reference for REST API endpoints and authentication.',
    'developer',
    '[0.2, 0.6, 0.4, 0.7]'
);

INSERT INTO knowledge_articles VALUES (
    5,
    'Account Settings',
    'Manage your profile, notifications, and privacy settings.',
    'onboarding',
    '[0.15, 0.75, 0.25, 0.55]'
);

Create HNSW Index

CREATE INDEX idx_kb_embedding ON knowledge_articles USING hnsw (embedding);

Find articles similar to a query embedding (e.g., "how do I change my password"):

SELECT article_id, title, category, embedding <-> '[0.25, 0.3, 0.85, 0.15]' as distance
FROM knowledge_articles
ORDER BY embedding <-> '[0.25, 0.3, 0.85, 0.15]'
LIMIT 3;

Output:

┌────────────┬───────────────────────┬──────────┬──────────┐
│ article_id │ title                 │ category │ distance │
├────────────┼───────────────────────┼──────────┼──────────┤
│ 2          │ Password Reset Help   │ security │ 0.122    │
│ 4          │ API Documentation     │ developer│ 0.548    │
│ 5          │ Account Settings      │ onboarding│ 0.612   │
└────────────┴───────────────────────┴──────────┴──────────┘

Distance Operators

  • <-> L2 distance (Euclidean)
  • <=> Cosine distance
  • <#> Inner product
SELECT title, category, embedding <=> '[0.1, 0.7, 0.3, 0.5]' as cosine_distance
FROM knowledge_articles
WHERE category = 'onboarding'
ORDER BY cosine_distance
LIMIT 2;
DROP TABLE knowledge_articles;

3. Materialized Views

Create Base Table

CREATE TABLE sales (
    sale_id INT PRIMARY KEY,
    product_id INT,
    category TEXT STORAGE DICTIONARY,
    amount FLOAT8 STORAGE COLUMNAR,
    sale_date INT8 STORAGE COLUMNAR
);

INSERT INTO sales VALUES (1, 101, 'electronics', 299.99, 1705600000);
INSERT INTO sales VALUES (2, 102, 'electronics', 149.99, 1705600100);
INSERT INTO sales VALUES (3, 201, 'furniture', 599.99, 1705600200);
INSERT INTO sales VALUES (4, 101, 'electronics', 299.99, 1705600300);
INSERT INTO sales VALUES (5, 301, 'clothing', 49.99, 1705600400);
INSERT INTO sales VALUES (6, 102, 'electronics', 149.99, 1705600500);
INSERT INTO sales VALUES (7, 201, 'furniture', 599.99, 1705600600);
INSERT INTO sales VALUES (8, 302, 'clothing', 79.99, 1705600700);

Create Materialized View

CREATE MATERIALIZED VIEW category_summary AS
SELECT
    category,
    COUNT(*) as sale_count,
    SUM(amount) as total_revenue,
    AVG(amount) as avg_sale
FROM sales
GROUP BY category;

Query the View

SELECT * FROM category_summary ORDER BY total_revenue DESC;

Output:

┌─────────────┬────────────┬───────────────┬──────────┐
│ category    │ sale_count │ total_revenue │ avg_sale │
├─────────────┼────────────┼───────────────┼──────────┤
│ furniture   │ 2          │ 1199.98       │ 599.99   │
│ electronics │ 4          │ 899.96        │ 224.99   │
│ clothing    │ 2          │ 129.98        │ 64.99    │
└─────────────┴────────────┴───────────────┴──────────┘

Refresh After Changes

INSERT INTO sales VALUES (9, 103, 'electronics', 999.99, 1705600800);

REFRESH MATERIALIZED VIEW category_summary;

SELECT * FROM category_summary ORDER BY total_revenue DESC;

Product-Level View

CREATE MATERIALIZED VIEW product_summary AS
SELECT
    product_id,
    category,
    COUNT(*) as times_sold,
    SUM(amount) as total_revenue
FROM sales
GROUP BY product_id, category;

SELECT * FROM product_summary ORDER BY total_revenue DESC;
DROP MATERIALIZED VIEW product_summary;
DROP MATERIALIZED VIEW category_summary;
DROP TABLE sales;

4. Bulk Loading for Performance

Create Table with Optimized Storage

CREATE TABLE bulk_events (
    id INT PRIMARY KEY,
    event_type TEXT STORAGE DICTIONARY,
    region TEXT STORAGE DICTIONARY,
    timestamp INT8 STORAGE COLUMNAR,
    value FLOAT8 STORAGE COLUMNAR
);

Insert Data

INSERT INTO bulk_events VALUES (1, 'click', 'us-east', 1705700000, 1.5);
INSERT INTO bulk_events VALUES (2, 'click', 'us-west', 1705700001, 2.3);
INSERT INTO bulk_events VALUES (3, 'view', 'eu-west', 1705700002, 0.8);
INSERT INTO bulk_events VALUES (4, 'click', 'us-east', 1705700003, 1.9);
INSERT INTO bulk_events VALUES (5, 'purchase', 'us-east', 1705700004, 99.99);
INSERT INTO bulk_events VALUES (6, 'view', 'ap-south', 1705700005, 0.5);
INSERT INTO bulk_events VALUES (7, 'click', 'eu-west', 1705700006, 2.1);
INSERT INTO bulk_events VALUES (8, 'view', 'us-west', 1705700007, 0.7);
INSERT INTO bulk_events VALUES (9, 'purchase', 'eu-west', 1705700008, 149.99);
INSERT INTO bulk_events VALUES (10, 'click', 'ap-south', 1705700009, 1.2);

Note: For maximum performance (500K-700K rows/sec), use the Rust API:

// Enable bulk mode via API
db.set_bulk_load_mode(true);
// ... inserts ...
db.set_bulk_load_mode(false);

// Or use direct_bulk_load for 700K-900K rows/sec
db.direct_bulk_load("bulk_events", &rows, batch_size)?;

Analyze Results

SELECT event_type, region, COUNT(*), SUM(value) as total_value
FROM bulk_events
GROUP BY event_type, region
ORDER BY total_value DESC;

Output:

┌───────────┬──────────┬───────┬─────────────┐
│ group_0   │ group_1  │ agg_0 │ agg_1       │
├───────────┼──────────┼───────┼─────────────┤
│ purchase  │ eu-west  │ 1     │ 149.99      │
│ purchase  │ us-east  │ 1     │ 99.99       │
│ click     │ us-east  │ 2     │ 3.4         │
│ click     │ us-west  │ 1     │ 2.3         │
│ click     │ eu-west  │ 1     │ 2.1         │
│ click     │ ap-south │ 1     │ 1.2         │
│ view      │ eu-west  │ 1     │ 0.8         │
│ view      │ us-west  │ 1     │ 0.7         │
│ view      │ ap-south │ 1     │ 0.5         │
└───────────┴──────────┴───────┴─────────────┘

Performance Comparison

Method Throughput
Standard INSERT 30K-50K rows/sec
Bulk Load Mode 500K-700K rows/sec
direct_bulk_load API 700K-900K rows/sec
DROP TABLE bulk_events;

Combining Features

Real-World Example: Analytics Pipeline

CREATE TABLE raw_events (
    id INT PRIMARY KEY,
    event_type TEXT STORAGE DICTIONARY,
    user_country TEXT STORAGE DICTIONARY,
    timestamp INT8 STORAGE COLUMNAR,
    value FLOAT8 STORAGE COLUMNAR,
    embedding VECTOR(4)
);

INSERT INTO raw_events VALUES (1, 'search', 'USA', 1705800000, 1.0, '[0.1, 0.2, 0.3, 0.4]');
INSERT INTO raw_events VALUES (2, 'click', 'USA', 1705800001, 2.5, '[0.2, 0.3, 0.4, 0.5]');
INSERT INTO raw_events VALUES (3, 'search', 'UK', 1705800002, 1.0, '[0.1, 0.25, 0.35, 0.45]');
INSERT INTO raw_events VALUES (4, 'purchase', 'Germany', 1705800003, 99.99, '[0.8, 0.1, 0.2, 0.9]');
INSERT INTO raw_events VALUES (5, 'click', 'France', 1705800004, 3.0, '[0.3, 0.4, 0.5, 0.6]');

CREATE INDEX idx_events_embedding ON raw_events USING hnsw (embedding);

CREATE MATERIALIZED VIEW event_summary AS
SELECT event_type, user_country, COUNT(*), SUM(value), AVG(value)
FROM raw_events
GROUP BY event_type, user_country;

SELECT * FROM event_summary ORDER BY SUM(value) DESC;

SELECT id, event_type, embedding <-> '[0.15, 0.25, 0.35, 0.45]' as distance
FROM raw_events
ORDER BY distance
LIMIT 3;

EXPLAIN (VERBOSE) SELECT event_type, AVG(value) FROM raw_events GROUP BY event_type;

DROP MATERIALIZED VIEW event_summary;
DROP TABLE raw_events;

Quick Reference

EXPLAIN Options

Option Description
VERBOSE Show costs and cardinality
ANALYZE Execute and show timing
FORMAT JSON JSON output
FORMAT YAML YAML output

Vector Distance Operators

Operator Distance Type
<-> L2 (Euclidean)
<=> Cosine
<#> Inner Product

Materialized View Commands

CREATE MATERIALIZED VIEW name AS SELECT ...;
REFRESH MATERIALIZED VIEW name;
DROP MATERIALIZED VIEW name;

Bulk Loading (Rust API)

db.set_bulk_load_mode(true);
// INSERT statements
db.set_bulk_load_mode(false);

// Or for maximum speed:
db.direct_bulk_load("table", &rows, batch_size)?;

See Storage Modes Tutorial for Dictionary, Content-Addressed, and Columnar storage optimization.