HeliosDB-Lite SQLite Layer Architecture - User Guide¶
A user-friendly explanation of how HeliosDB-Lite works under the hood, designed for developers who want to understand the system without diving into the source code.
Overview: What Makes HeliosDB-Lite Different?¶
HeliosDB-Lite is built from the ground up in Rust for performance, safety, and advanced features - but it's designed to look and feel exactly like SQLite from a Python perspective.
The magic: HeliosDB-Lite implements the Python DB-API 2.0 interface (same as SQLite) while using a completely different engine underneath.
┌─────────────────────────────────────────────────────────────┐
│ Your Python Application │
│ import heliosdb_sqlite as sqlite3 │
└──────────────────────────────┬──────────────────────────────┘
│
│ Python DB-API 2.0
│ (Same interface as SQLite)
│
┌──────────────────────────────▼──────────────────────────────┐
│ HeliosDB-Lite SQLite Compatibility Layer │
│ (Python bindings) │
└──────────────────────────────┬──────────────────────────────┘
│
│ PyO3 (Python ↔ Rust bridge)
│
┌──────────────────────────────▼──────────────────────────────┐
│ HeliosDB-Lite Core Engine │
│ (Written in Rust) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ SQL Parser & │ │ Query Exec │ │ Storage │ │
│ │ Planner │ │ Engine │ │ Engine │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Concurrency │ │ Vector Search│ │ Time-Travel │ │
│ │ Control │ │ (HNSW) │ │ (MVCC) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└──────────────────────────────────────────────────────────┘
How HeliosDB-Lite SQLite Layer Works¶
When You Import¶
What happens: 1. Python loads the HeliosDB-Lite module (compiled Rust library with Python bindings) 2. The module registers itself as a SQLite-compatible database driver 3. All standard DB-API 2.0 methods (connect, execute, etc.) become available 4. No configuration required - it just works
Behind the scenes: - The Python module is a thin wrapper around the Rust core - PyO3 (Python-Rust bridge) handles type conversions automatically - Function calls are fast (near-native performance)
When You Connect to a Database¶
What happens:
- File detection:
- HeliosDB-Lite checks if the file exists
- If it's an existing SQLite database, it can read it directly
-
If it's a new file, it creates a HeliosDB-Lite database
-
Engine initialization:
- Opens or creates the database file
- Initializes the storage engine
- Sets up transaction management
-
Prepares connection handle
-
Connection object returned:
- You get a Python connection object
- It implements all standard SQLite connection methods
- Plus optional HeliosDB-Lite advanced features
File format: - HeliosDB-Lite uses a compatible format for basic data - SQLite can read HeliosDB-Lite databases (for basic tables) - HeliosDB-Lite can read SQLite databases - Advanced features (vectors, branches) stored in extended sections
When You Execute a Query¶
cursor = conn.cursor()
cursor.execute('SELECT * FROM users WHERE age > ?', (18,))
results = cursor.fetchall()
What happens:
1. Python → Rust Bridge
┌─────────────────────────────────────────────────────┐
│ cursor.execute(sql, params) │
│ ↓ │
│ PyO3 converts Python types to Rust types │
│ ↓ │
│ Rust receives: &str (SQL) + Vec<Value> (params) │
└─────────────────────────────────────────────────────┘
2. SQL Parsing
┌─────────────────────────────────────────────────────┐
│ Parse SQL into Abstract Syntax Tree (AST) │
│ Validate syntax │
│ Extract table references, columns, conditions │
└─────────────────────────────────────────────────────┘
3. Query Planning
┌─────────────────────────────────────────────────────┐
│ Analyze query and data statistics │
│ Choose optimal execution strategy │
│ Determine index usage │
│ Optimize joins and filters │
└─────────────────────────────────────────────────────┘
4. Execution
┌─────────────────────────────────────────────────────┐
│ Access storage engine for table data │
│ Apply filters (WHERE age > 18) │
│ Use indexes if available │
│ Collect matching rows │
└─────────────────────────────────────────────────────┘
5. Result Conversion
┌─────────────────────────────────────────────────────┐
│ Convert Rust data structures to Python types │
│ List of tuples/rows returned to Python │
│ Cursor stores results for fetchall()/fetchone() │
└─────────────────────────────────────────────────────┘
Performance optimization: - Prepared statement caching - Query plan reuse - Zero-copy operations where possible - Parallel execution for complex queries
The Three Modes Explained¶
HeliosDB-Lite can run in three different modes, giving you flexibility in how you deploy and use it.
Mode 1: Embedded Mode (Default)¶
How it works:
┌──────────────────────────────────────────────┐
│ Your Python Application │
│ │
│ ┌────────────────────────────────────┐ │
│ │ HeliosDB-Lite Engine │ │
│ │ (runs in same process) │ │
│ └────────────────────────────────────┘ │
│ ↓ │
│ Database File │
│ (app.db) │
└──────────────────────────────────────────────┘
Characteristics: - Database engine runs inside your application process - Direct memory access (fastest) - No network overhead - Similar to how SQLite works - Perfect for single-application deployments
Use when: - Single application accessing database - Maximum performance needed - Embedded/desktop applications - Simple deployments
Mode 2: Server Mode (PostgreSQL-Compatible)¶
How it works:
┌─────────────────┐ ┌─────────────────┐
│ Application 1 │ │ Application 2 │
│ (Python) │ │ (Node.js) │
└────────┬────────┘ └────────┬────────┘
│ │
│ PostgreSQL Protocol │
│ │
└────────────┬──────────────┘
│
┌────────────▼─────────────┐
│ HeliosDB-Lite Server │
│ (Network daemon) │
│ │
│ ┌──────────────┐ │
│ │ Query Router │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼────────┐ │
│ │ Core Engine │ │
│ └──────┬────────┘ │
└───────────┼──────────────┘
│
┌──────▼──────┐
│ Database │
│ (app.db) │
└─────────────┘
Characteristics: - Database runs as separate server process - Multiple clients can connect - PostgreSQL wire protocol (use psycopg2, pg, etc.) - Network overhead, but enables multi-client access - Horizontal scaling possible
Use when: - Multiple applications need database access - Microservices architecture - Remote database access needed - Team development (shared database)
Mode 3: Hybrid Mode (Best of Both Worlds)¶
How it works:
┌─────────────────────────────────────────────┐
│ Your Python Application │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ Direct Embedded Access (Fast) │ │
│ │ ↓ │ │
│ │ HeliosDB-Lite Core Engine │ │
│ └──────────────┬───────────────────────┘ │
│ │ │
└─────────────────┼───────────────────────────┘
│
┌────────▼─────────┐
│ │
│ Database File │
│ │
└────────▲─────────┘
│
┌─────────────────┼───────────────────────────┐
│ Server Thread │ │
│ ↓ │ │
│ PostgreSQL Protocol Server │
│ (Accepts remote connections) │
└────────────────────────────────────────────┘
↑
│ PostgreSQL Protocol
│
┌────────┴─────────┐
│ Remote Client │
│ (Dashboard, │
│ Monitoring, │
│ Admin tools) │
└──────────────────┘
Characteristics: - Your application gets direct embedded access (fast!) - Server thread runs in background for remote access - Same database, different access methods - No performance penalty for local access - Enables remote monitoring/management
Use when: - Main application needs fast local access - Also need remote admin/monitoring - Want flexibility without choosing - Development/production parity
How Advanced Features Work¶
Vector Search¶
Architecture:
cursor.execute('''
SELECT id, title,
vector_distance(embedding, ?, 'cosine') as similarity
FROM articles
ORDER BY similarity
LIMIT 10
''', (query_embedding,))
What happens: 1. Vector column stored as special data type (VECTOR(n)) 2. HNSW (Hierarchical Navigable Small World) index created automatically 3. Query planner recognizes vector_distance function 4. Uses specialized vector search algorithm (not full table scan) 5. Results ranked by similarity
Storage:
Regular Table Storage:
┌──────┬────────┬──────────────────────────────────┐
│ id │ title │ content │
├──────┼────────┼──────────────────────────────────┤
│ 1 │ "AI" │ "Machine learning introduction" │
└──────┴────────┴──────────────────────────────────┘
Vector Extension Storage:
┌──────┬──────────────────────────────────────────┐
│ id │ embedding (384-dim vector) │
├──────┼──────────────────────────────────────────┤
│ 1 │ [0.23, 0.45, 0.12, ..., 0.67] │
└──────┴──────────────────────────────────────────┘
↓
HNSW Index (graph structure for fast search)
Performance: - Approximate nearest neighbor search: O(log n) - Much faster than brute force: O(n) - Trade-off: slight accuracy for massive speed
Time-Travel Queries¶
Architecture:
How it works:
MVCC (Multi-Version Concurrency Control)
Current Data View:
┌──────┬─────────┬──────────────────────┐
│ id │ name │ version/timestamp │
├──────┼─────────┼──────────────────────┤
│ 1 │ "Alice" │ 2024-06-01 15:00 │
└──────┴─────────┴──────────────────────┘
Historical Versions (stored separately):
┌──────┬─────────┬──────────────────────┐
│ id │ name │ version/timestamp │
├──────┼─────────┼──────────────────────┤
│ 1 │ "A" │ 2024-01-01 09:00 │
│ 1 │ "Al" │ 2024-01-01 10:30 │
│ 1 │ "Alice" │ 2024-06-01 15:00 │ ← Current
└──────┴─────────┴──────────────────────┘
Query with AS OF TIMESTAMP:
- Looks up version <= specified timestamp
- Returns historical state
- No impact on current data
Storage overhead: - Only changed values stored (delta compression) - Configurable retention period - Automatic cleanup of old versions
Database Branching¶
Architecture:
branch = conn.create_branch('experiment')
branch_conn = sqlite3.connect('app.db', branch='experiment')
How it works:
Main Database:
┌────────────────────────────────────┐
│ Main branch (active) │
│ │
│ ┌──────────────────────────────┐ │
│ │ Table: users │ │
│ │ 1000 rows │ │
│ └──────────────────────────────┘ │
└────────────────────────────────────┘
│
│ create_branch('experiment')
│
▼
┌────────────────────────────────────┐
│ Experiment branch (copy-on-write) │
│ │
│ ┌──────────────────────────────┐ │
│ │ Only stores CHANGES │ │
│ │ from main branch │ │
│ │ │ │
│ │ Modified rows: 50 │ │
│ │ Deleted rows: 10 │ │
│ │ New rows: 5 │ │
│ └──────────────────────────────┘ │
└────────────────────────────────────┘
Reading from branch:
1. Check if row modified in branch → use branch version
2. If not modified → use main branch version
3. Apply deletes from branch
4. Add new rows from branch
Result: Full database view without duplicating unchanged data
Benefits: - Instant branch creation (no copying) - Minimal storage overhead - Safe experimentation - Easy merge or discard
Data Storage Explanation¶
File Structure¶
myapp.db (HeliosDB-Lite database file)
│
├─ Header (metadata)
│ ├─ Format version
│ ├─ Page size
│ ├─ Schema version
│ └─ Feature flags
│
├─ Schema section
│ ├─ Table definitions
│ ├─ Index definitions
│ ├─ View definitions
│ └─ Constraints
│
├─ Data pages (table data)
│ ├─ B-tree structure (like SQLite)
│ ├─ Row storage
│ └─ Index structures
│
├─ Advanced features section (optional)
│ ├─ Vector indexes (HNSW graphs)
│ ├─ Time-travel versions (MVCC)
│ ├─ Branch metadata
│ └─ Encryption metadata
│
└─ Transaction log (WAL)
├─ Uncommitted changes
└─ Recovery information
Compatibility: - Basic sections compatible with SQLite - Advanced sections ignored by SQLite (safe) - HeliosDB-Lite reads both formats seamlessly
Performance Characteristics¶
Memory Usage¶
Per Connection:
- Base overhead: ~100 KB
- Query cache: ~10 MB (configurable)
- Connection buffers: ~1 MB
Per Table:
- Schema metadata: ~1-10 KB
- Index overhead: Varies (5-20% of table size)
Vector Search:
- HNSW index: ~20-40% of vector data size
- In-memory: Optional, speeds up queries 10-100x
Time-Travel:
- Historical versions: Configurable retention
- Delta compression reduces overhead to ~5-15%
Query Performance¶
Operation | SQLite | HeliosDB-Lite
----------------------|-------------|---------------
Single row read | 10 μs | 12 μs (comparable)
Indexed search | 100 μs | 105 μs (comparable)
Full table scan | 10 ms | 11 ms (comparable)
Single write | 50 μs | 75 μs (slight overhead)
Concurrent reads | Excellent | Excellent (same)
Concurrent writes | Blocked | Parallel! (major win)
Vector search (10k) | N/A | 2 ms (exclusive)
Time-travel query | N/A | 15 ms (exclusive)
Key takeaway: - Similar performance for standard operations - Much better with concurrent writes - Advanced features add new capabilities (not slower alternatives)
Concurrency Model¶
How HeliosDB-Lite Handles Concurrent Access¶
SQLite approach (locks):
Thread 1: BEGIN → Write → [Lock database] → COMMIT → [Unlock]
Thread 2: [WAIT...] → BEGIN → Write
Thread 3: [WAIT...] → BEGIN
HeliosDB-Lite approach (MVCC):
Thread 1: BEGIN → Write to version 1 → COMMIT
Thread 2: BEGIN → Write to version 2 → COMMIT (simultaneous!)
Thread 3: BEGIN → Read version 1 → done (doesn't block writes)
Conflict resolution:
- Non-conflicting writes: Both succeed
- Conflicting writes: Last commit wins (configurable)
- Readers never block writers
- Writers never block readers
Implementation:
Transaction Log:
Time ─────────────────────────────▶
T1: [BEGIN] ────── [Write row A] ────── [COMMIT v1]
T2: [BEGIN] ────── [Write row B] ────── [COMMIT v2]
T3: [BEGIN] [Read] [COMMIT]
All transactions see consistent snapshot
No locks needed!
What Happens When You Import¶
Detailed Import Process¶
Step-by-step:
-
Python loads .so/.dll (compiled Rust library)
-
Module initialization (happens once)
-
Python receives module with all methods
-
No configuration needed - ready to use!
Threading and Async Support¶
Thread Safety¶
HeliosDB-Lite is thread-safe by default (unlike SQLite):
# This works safely in HeliosDB-Lite
conn = sqlite3.connect('app.db', check_same_thread=False)
def worker(thread_id):
cursor = conn.cursor() # Safe from multiple threads
cursor.execute('INSERT INTO logs VALUES (?, ?)', (thread_id, 'message'))
conn.commit()
threads = [Thread(target=worker, args=(i,)) for i in range(100)]
# All threads can use same connection safely!
How it's safe: - Rust's ownership model prevents data races - Internal synchronization (lock-free where possible) - MVCC eliminates most contention
Async/Await Support¶
import asyncio
import heliosdb_sqlite as sqlite3
async def async_query():
loop = asyncio.get_event_loop()
# Run blocking operation in thread pool
def blocking_query():
conn = sqlite3.connect('app.db')
cursor = conn.cursor()
cursor.execute('SELECT * FROM users')
return cursor.fetchall()
results = await loop.run_in_executor(None, blocking_query)
return results
# Or use async connection pool
from heliosdb_sqlite.asyncio import AsyncConnectionPool
pool = AsyncConnectionPool('app.db')
async with pool.connection() as conn:
cursor = conn.cursor()
results = await cursor.execute('SELECT * FROM users')
Summary: Why This Architecture Matters¶
For Drop-In Compatibility: - Implements exact same Python interface as SQLite - You change one import line, everything works - Existing code, queries, databases all compatible
For Performance: - Rust core = fast and safe - MVCC = no write locks - Parallel execution where possible - Optimized for modern workloads
For Advanced Features: - Vector search: Built-in HNSW indexes - Time-travel: MVCC enables historical queries - Branching: Copy-on-write for safe experimentation - Encryption: Transparent at storage layer
For Flexibility: - Three modes: embedded, server, hybrid - Start simple, scale when needed - No architectural changes required
What's Next?¶
Now that you understand how HeliosDB-Lite works:
- Try it out:
HELIOSDB_SQLITE_DROP_IN_GUIDE.md - Migrate your app:
HELIOSDB_SQLITE_MIGRATION_PATTERNS.md - Explore advanced features:
HELIOSDB_SQLITE_ADVANCED_FEATURES.md - Troubleshoot issues:
HELIOSDB_SQLITE_TROUBLESHOOTING.md - Read FAQ:
HELIOSDB_SQLITE_FAQ.md
The architecture is designed to be invisible when you don't need it, and powerful when you do.
Happy building with HeliosDB-Lite!