HeliosDB-Lite SQLite Layer Architecture - User Guide¶

A user-friendly explanation of how HeliosDB-Lite works under the hood, designed for developers who want to understand the system without diving into the source code.

Overview: What Makes HeliosDB-Lite Different?¶

HeliosDB-Lite is built from the ground up in Rust for performance, safety, and advanced features - but it's designed to look and feel exactly like SQLite from a Python perspective.

The magic: HeliosDB-Lite implements the Python DB-API 2.0 interface (same as SQLite) while using a completely different engine underneath.

┌─────────────────────────────────────────────────────────────┐
│                    Your Python Application                  │
│                  import heliosdb_sqlite as sqlite3          │
└──────────────────────────────┬──────────────────────────────┘
                               │
                               │ Python DB-API 2.0
                               │ (Same interface as SQLite)
                               │
┌──────────────────────────────▼──────────────────────────────┐
│              HeliosDB-Lite SQLite Compatibility Layer       │
│                         (Python bindings)                    │
└──────────────────────────────┬──────────────────────────────┘
                               │
                               │ PyO3 (Python ↔ Rust bridge)
                               │
┌──────────────────────────────▼──────────────────────────────┐
│                   HeliosDB-Lite Core Engine                 │
│                      (Written in Rust)                      │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ SQL Parser & │  │ Query Exec   │  │  Storage     │     │
│  │   Planner    │  │   Engine     │  │  Engine      │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ Concurrency  │  │ Vector Search│  │ Time-Travel  │     │
│  │   Control    │  │   (HNSW)     │  │   (MVCC)     │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└──────────────────────────────────────────────────────────┘

How HeliosDB-Lite SQLite Layer Works¶

When You Import¶

import heliosdb_sqlite as sqlite3

What happens: 1. Python loads the HeliosDB-Lite module (compiled Rust library with Python bindings) 2. The module registers itself as a SQLite-compatible database driver 3. All standard DB-API 2.0 methods (connect, execute, etc.) become available 4. No configuration required - it just works

Behind the scenes: - The Python module is a thin wrapper around the Rust core - PyO3 (Python-Rust bridge) handles type conversions automatically - Function calls are fast (near-native performance)

When You Connect to a Database¶

conn = sqlite3.connect('myapp.db')

What happens:

File detection:
HeliosDB-Lite checks if the file exists
If it's an existing SQLite database, it can read it directly
If it's a new file, it creates a HeliosDB-Lite database
Engine initialization:
Opens or creates the database file
Initializes the storage engine
Sets up transaction management
Prepares connection handle
Connection object returned:
You get a Python connection object
It implements all standard SQLite connection methods
Plus optional HeliosDB-Lite advanced features

File format: - HeliosDB-Lite uses a compatible format for basic data - SQLite can read HeliosDB-Lite databases (for basic tables) - HeliosDB-Lite can read SQLite databases - Advanced features (vectors, branches) stored in extended sections

When You Execute a Query¶

cursor = conn.cursor()
cursor.execute('SELECT * FROM users WHERE age > ?', (18,))
results = cursor.fetchall()

What happens:

1. Python → Rust Bridge
   ┌─────────────────────────────────────────────────────┐
   │ cursor.execute(sql, params)                         │
   │   ↓                                                 │
   │ PyO3 converts Python types to Rust types            │
   │   ↓                                                 │
   │ Rust receives: &str (SQL) + Vec<Value> (params)    │
   └─────────────────────────────────────────────────────┘

2. SQL Parsing
   ┌─────────────────────────────────────────────────────┐
   │ Parse SQL into Abstract Syntax Tree (AST)           │
   │ Validate syntax                                     │
   │ Extract table references, columns, conditions       │
   └─────────────────────────────────────────────────────┘

3. Query Planning
   ┌─────────────────────────────────────────────────────┐
   │ Analyze query and data statistics                   │
   │ Choose optimal execution strategy                   │
   │ Determine index usage                               │
   │ Optimize joins and filters                          │
   └─────────────────────────────────────────────────────┘

4. Execution
   ┌─────────────────────────────────────────────────────┐
   │ Access storage engine for table data                │
   │ Apply filters (WHERE age > 18)                      │
   │ Use indexes if available                            │
   │ Collect matching rows                               │
   └─────────────────────────────────────────────────────┘

5. Result Conversion
   ┌─────────────────────────────────────────────────────┐
   │ Convert Rust data structures to Python types        │
   │ List of tuples/rows returned to Python              │
   │ Cursor stores results for fetchall()/fetchone()     │
   └─────────────────────────────────────────────────────┘

Performance optimization: - Prepared statement caching - Query plan reuse - Zero-copy operations where possible - Parallel execution for complex queries

The Three Modes Explained¶

HeliosDB-Lite can run in three different modes, giving you flexibility in how you deploy and use it.

Mode 1: Embedded Mode (Default)¶

How it works:

conn = sqlite3.connect('app.db')  # Default mode

┌──────────────────────────────────────────────┐
│          Your Python Application             │
│                                              │
│   ┌────────────────────────────────────┐    │
│   │   HeliosDB-Lite Engine             │    │
│   │   (runs in same process)           │    │
│   └────────────────────────────────────┘    │
│                    ↓                         │
│              Database File                   │
│              (app.db)                        │
└──────────────────────────────────────────────┘

Characteristics: - Database engine runs inside your application process - Direct memory access (fastest) - No network overhead - Similar to how SQLite works - Perfect for single-application deployments

Use when: - Single application accessing database - Maximum performance needed - Embedded/desktop applications - Simple deployments

Mode 2: Server Mode (PostgreSQL-Compatible)¶

How it works:

server = sqlite3.start_server('app.db', port=5432)

┌─────────────────┐         ┌─────────────────┐
│   Application 1 │         │  Application 2  │
│   (Python)      │         │  (Node.js)      │
└────────┬────────┘         └────────┬────────┘
         │                           │
         │ PostgreSQL Protocol       │
         │                           │
         └────────────┬──────────────┘
                      │
         ┌────────────▼─────────────┐
         │  HeliosDB-Lite Server    │
         │  (Network daemon)        │
         │                          │
         │    ┌──────────────┐      │
         │    │ Query Router │      │
         │    └──────┬───────┘      │
         │           │              │
         │    ┌──────▼────────┐     │
         │    │ Core Engine   │     │
         │    └──────┬────────┘     │
         └───────────┼──────────────┘
                     │
              ┌──────▼──────┐
              │ Database    │
              │ (app.db)    │
              └─────────────┘

Characteristics: - Database runs as separate server process - Multiple clients can connect - PostgreSQL wire protocol (use psycopg2, pg, etc.) - Network overhead, but enables multi-client access - Horizontal scaling possible

Use when: - Multiple applications need database access - Microservices architecture - Remote database access needed - Team development (shared database)

Mode 3: Hybrid Mode (Best of Both Worlds)¶

How it works:

conn = sqlite3.connect('app.db', heliosdb_mode='hybrid')

┌─────────────────────────────────────────────┐
│         Your Python Application             │
│                                             │
│  ┌──────────────────────────────────────┐   │
│  │  Direct Embedded Access (Fast)       │   │
│  │  ↓                                   │   │
│  │  HeliosDB-Lite Core Engine           │   │
│  └──────────────┬───────────────────────┘   │
│                 │                           │
└─────────────────┼───────────────────────────┘
                  │
         ┌────────▼─────────┐
         │                  │
         │  Database File   │
         │                  │
         └────────▲─────────┘
                  │
┌─────────────────┼───────────────────────────┐
│  Server Thread  │                           │
│  ↓              │                           │
│  PostgreSQL Protocol Server                │
│  (Accepts remote connections)              │
└────────────────────────────────────────────┘
         ↑
         │ PostgreSQL Protocol
         │
┌────────┴─────────┐
│  Remote Client   │
│  (Dashboard,     │
│   Monitoring,    │
│   Admin tools)   │
└──────────────────┘

Characteristics: - Your application gets direct embedded access (fast!) - Server thread runs in background for remote access - Same database, different access methods - No performance penalty for local access - Enables remote monitoring/management

Use when: - Main application needs fast local access - Also need remote admin/monitoring - Want flexibility without choosing - Development/production parity

How Advanced Features Work¶

Vector Search¶

Architecture:

cursor.execute('''
    SELECT id, title,
           vector_distance(embedding, ?, 'cosine') as similarity
    FROM articles
    ORDER BY similarity
    LIMIT 10
''', (query_embedding,))

What happens: 1. Vector column stored as special data type (VECTOR(n)) 2. HNSW (Hierarchical Navigable Small World) index created automatically 3. Query planner recognizes vector_distance function 4. Uses specialized vector search algorithm (not full table scan) 5. Results ranked by similarity

Storage:

Regular Table Storage:
┌──────┬────────┬──────────────────────────────────┐
│  id  │ title  │  content                         │
├──────┼────────┼──────────────────────────────────┤
│  1   │ "AI"   │ "Machine learning introduction"  │
└──────┴────────┴──────────────────────────────────┘

Vector Extension Storage:
┌──────┬──────────────────────────────────────────┐
│  id  │  embedding (384-dim vector)              │
├──────┼──────────────────────────────────────────┤
│  1   │  [0.23, 0.45, 0.12, ..., 0.67]          │
└──────┴──────────────────────────────────────────┘
          ↓
    HNSW Index (graph structure for fast search)

Performance: - Approximate nearest neighbor search: O(log n) - Much faster than brute force: O(n) - Trade-off: slight accuracy for massive speed

Time-Travel Queries¶

Architecture:

cursor.execute('''
    SELECT * FROM users
    AS OF TIMESTAMP '2024-01-01 10:00:00'
''')

How it works:

MVCC (Multi-Version Concurrency Control)

Current Data View:
┌──────┬─────────┬──────────────────────┐
│  id  │  name   │  version/timestamp   │
├──────┼─────────┼──────────────────────┤
│  1   │ "Alice" │  2024-06-01 15:00    │
└──────┴─────────┴──────────────────────┘

Historical Versions (stored separately):
┌──────┬─────────┬──────────────────────┐
│  id  │  name   │  version/timestamp   │
├──────┼─────────┼──────────────────────┤
│  1   │ "A"     │  2024-01-01 09:00    │
│  1   │ "Al"    │  2024-01-01 10:30    │
│  1   │ "Alice" │  2024-06-01 15:00    │ ← Current
└──────┴─────────┴──────────────────────┘

Query with AS OF TIMESTAMP:
- Looks up version <= specified timestamp
- Returns historical state
- No impact on current data

Storage overhead: - Only changed values stored (delta compression) - Configurable retention period - Automatic cleanup of old versions

Database Branching¶

Architecture:

branch = conn.create_branch('experiment')
branch_conn = sqlite3.connect('app.db', branch='experiment')

How it works:

Main Database:
┌────────────────────────────────────┐
│  Main branch (active)              │
│                                    │
│  ┌──────────────────────────────┐  │
│  │ Table: users                 │  │
│  │ 1000 rows                    │  │
│  └──────────────────────────────┘  │
└────────────────────────────────────┘
         │
         │ create_branch('experiment')
         │
         ▼
┌────────────────────────────────────┐
│  Experiment branch (copy-on-write) │
│                                    │
│  ┌──────────────────────────────┐  │
│  │ Only stores CHANGES           │  │
│  │ from main branch             │  │
│  │                              │  │
│  │ Modified rows: 50            │  │
│  │ Deleted rows: 10             │  │
│  │ New rows: 5                  │  │
│  └──────────────────────────────┘  │
└────────────────────────────────────┘

Reading from branch:
1. Check if row modified in branch → use branch version
2. If not modified → use main branch version
3. Apply deletes from branch
4. Add new rows from branch

Result: Full database view without duplicating unchanged data

Benefits: - Instant branch creation (no copying) - Minimal storage overhead - Safe experimentation - Easy merge or discard

Data Storage Explanation¶

File Structure¶

myapp.db (HeliosDB-Lite database file)
│
├─ Header (metadata)
│  ├─ Format version
│  ├─ Page size
│  ├─ Schema version
│  └─ Feature flags
│
├─ Schema section
│  ├─ Table definitions
│  ├─ Index definitions
│  ├─ View definitions
│  └─ Constraints
│
├─ Data pages (table data)
│  ├─ B-tree structure (like SQLite)
│  ├─ Row storage
│  └─ Index structures
│
├─ Advanced features section (optional)
│  ├─ Vector indexes (HNSW graphs)
│  ├─ Time-travel versions (MVCC)
│  ├─ Branch metadata
│  └─ Encryption metadata
│
└─ Transaction log (WAL)
   ├─ Uncommitted changes
   └─ Recovery information

Compatibility: - Basic sections compatible with SQLite - Advanced sections ignored by SQLite (safe) - HeliosDB-Lite reads both formats seamlessly

Performance Characteristics¶

Memory Usage¶

Per Connection:
- Base overhead: ~100 KB
- Query cache: ~10 MB (configurable)
- Connection buffers: ~1 MB

Per Table:
- Schema metadata: ~1-10 KB
- Index overhead: Varies (5-20% of table size)

Vector Search:
- HNSW index: ~20-40% of vector data size
- In-memory: Optional, speeds up queries 10-100x

Time-Travel:
- Historical versions: Configurable retention
- Delta compression reduces overhead to ~5-15%

Query Performance¶

Operation             | SQLite      | HeliosDB-Lite
----------------------|-------------|---------------
Single row read       | 10 μs       | 12 μs         (comparable)
Indexed search        | 100 μs      | 105 μs        (comparable)
Full table scan       | 10 ms       | 11 ms         (comparable)
Single write          | 50 μs       | 75 μs         (slight overhead)
Concurrent reads      | Excellent   | Excellent     (same)
Concurrent writes     | Blocked     | Parallel!     (major win)
Vector search (10k)   | N/A         | 2 ms          (exclusive)
Time-travel query     | N/A         | 15 ms         (exclusive)

Key takeaway: - Similar performance for standard operations - Much better with concurrent writes - Advanced features add new capabilities (not slower alternatives)

Concurrency Model¶

How HeliosDB-Lite Handles Concurrent Access¶

SQLite approach (locks):

Thread 1: BEGIN → Write → [Lock database] → COMMIT → [Unlock]
Thread 2:                    [WAIT...]                  → BEGIN → Write
Thread 3:                    [WAIT...]                           → BEGIN

HeliosDB-Lite approach (MVCC):

Thread 1: BEGIN → Write to version 1 → COMMIT
Thread 2: BEGIN → Write to version 2 → COMMIT  (simultaneous!)
Thread 3: BEGIN → Read version 1    → done     (doesn't block writes)

Conflict resolution:
- Non-conflicting writes: Both succeed
- Conflicting writes: Last commit wins (configurable)
- Readers never block writers
- Writers never block readers

Implementation:

Transaction Log:

Time ─────────────────────────────▶

T1: [BEGIN] ────── [Write row A] ────── [COMMIT v1]
T2:         [BEGIN] ────── [Write row B] ────── [COMMIT v2]
T3:                 [BEGIN] [Read] [COMMIT]

All transactions see consistent snapshot
No locks needed!

What Happens When You Import¶

Detailed Import Process¶

import heliosdb_sqlite as sqlite3

Step-by-step:

Python loads .so/.dll (compiled Rust library)

HeliosDB-Lite module ────▶ [OS loads shared library]

Module initialization (happens once)

// Rust code (simplified)
#[pymodule]
fn heliosdb_sqlite(py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<Connection>()?;
    m.add_class::<Cursor>()?;
    m.add_function(wrap_pyfunction!(connect, m)?)?;
    // ... register all DB-API 2.0 functions
    Ok(())
}

Python receives module with all methods

# Now available:
sqlite3.connect(...)
sqlite3.Connection
sqlite3.Cursor
sqlite3.Row
# etc. - full DB-API 2.0 interface

No configuration needed - ready to use!

Threading and Async Support¶

Thread Safety¶

HeliosDB-Lite is thread-safe by default (unlike SQLite):

# This works safely in HeliosDB-Lite
conn = sqlite3.connect('app.db', check_same_thread=False)

def worker(thread_id):
    cursor = conn.cursor()  # Safe from multiple threads
    cursor.execute('INSERT INTO logs VALUES (?, ?)', (thread_id, 'message'))
    conn.commit()

threads = [Thread(target=worker, args=(i,)) for i in range(100)]
# All threads can use same connection safely!

How it's safe: - Rust's ownership model prevents data races - Internal synchronization (lock-free where possible) - MVCC eliminates most contention

Async/Await Support¶

import asyncio
import heliosdb_sqlite as sqlite3

async def async_query():
    loop = asyncio.get_event_loop()

    # Run blocking operation in thread pool
    def blocking_query():
        conn = sqlite3.connect('app.db')
        cursor = conn.cursor()
        cursor.execute('SELECT * FROM users')
        return cursor.fetchall()

    results = await loop.run_in_executor(None, blocking_query)
    return results

# Or use async connection pool
from heliosdb_sqlite.asyncio import AsyncConnectionPool

pool = AsyncConnectionPool('app.db')
async with pool.connection() as conn:
    cursor = conn.cursor()
    results = await cursor.execute('SELECT * FROM users')

Summary: Why This Architecture Matters¶

For Drop-In Compatibility: - Implements exact same Python interface as SQLite - You change one import line, everything works - Existing code, queries, databases all compatible

For Performance: - Rust core = fast and safe - MVCC = no write locks - Parallel execution where possible - Optimized for modern workloads

For Advanced Features: - Vector search: Built-in HNSW indexes - Time-travel: MVCC enables historical queries - Branching: Copy-on-write for safe experimentation - Encryption: Transparent at storage layer

For Flexibility: - Three modes: embedded, server, hybrid - Start simple, scale when needed - No architectural changes required

What's Next?¶

Now that you understand how HeliosDB-Lite works:

Try it out: HELIOSDB_SQLITE_DROP_IN_GUIDE.md
Migrate your app: HELIOSDB_SQLITE_MIGRATION_PATTERNS.md
Explore advanced features: HELIOSDB_SQLITE_ADVANCED_FEATURES.md
Troubleshoot issues: HELIOSDB_SQLITE_TROUBLESHOOTING.md
Read FAQ: HELIOSDB_SQLITE_FAQ.md

The architecture is designed to be invisible when you don't need it, and powerful when you do.

Happy building with HeliosDB-Lite!