Skip to content

HeliosDB Lite - Branch Storage User Guide

Overview

HeliosDB Lite's branch storage provides Git-like database branching with copy-on-write semantics. Create instant branches for development, testing, or staging environments without duplicating data.

Key Features

  • Instant Branch Creation: Create branches in <10ms regardless of database size
  • Copy-on-Write: Modified data is copied only when written, not at branch creation time
  • Storage Efficiency: <2% overhead per branch for metadata, shared storage for unchanged data
  • MVCC Integration: Branch-aware transactions with snapshot isolation guarantees
  • Hierarchical Branches: Support for multi-level branch hierarchies
  • Minimal Read Overhead: <5% read performance overhead for current branch

Architecture

Branch Hierarchy

Branches form a directed acyclic graph (DAG):

main (root)
 ├── dev
 │   └── feature-x
 └── staging
     └── hotfix-1

Copy-on-Write Mechanism

  1. Branch Creation: Only metadata is created, no data is copied
  2. First Read: Data is read from the branch or parent chain
  3. First Write: Data is copied on first modification (copy-on-write)
  4. Subsequent Operations: Branch operates on its own data

Key Format

Physical keys encode branch information:

data:<branch_id>:<user_key>:<timestamp>

Example:
data:0000000002:users:123:0000000100
     │         │         │
     │         │         └─ Timestamp
     │         └─────────── User key (table:row_id)
     └─────────────────────── Branch ID (2 = dev)

Usage Examples

Basic Branch Operations

use heliosdb_lite::{Config, storage::{StorageEngine, BranchOptions}};

// Open database
let config = Config::in_memory();
let engine = StorageEngine::open_in_memory(&config)?;

// Create a development branch
let branch_id = engine.create_branch(
    "dev",                    // Branch name
    Some("main"),             // Parent branch (None = current)
    BranchOptions::default(), // Options
)?;

// List all branches
let branches = engine.list_branches()?;
for branch in branches {
    println!("{}: {:?}", branch.name, branch.state);
}

// Drop a branch
engine.drop_branch("dev", false)?;

Branch Transactions

// Begin transaction on a specific branch
let mut tx = engine.begin_branch_transaction("dev")?;

// Read (checks current branch, then parent chain)
let value = tx.get(&b"users:123".to_vec())?;

// Write (copy-on-write)
tx.put(b"users:123".to_vec(), b"new_data".to_vec())?;

// Commit
tx.commit()?;

Branch Isolation Example

// Insert in main branch
engine.put(b"key1", b"main_value")?;

// Create dev branch
engine.create_branch("dev", Some("main"), BranchOptions::default())?;

// Read from dev (sees main's value)
let tx = engine.begin_branch_transaction("dev")?;
assert_eq!(tx.get(&b"key1".to_vec())?, Some(b"main_value".to_vec()));

// Modify in dev
let mut tx = engine.begin_branch_transaction("dev")?;
tx.put(b"key1".to_vec(), b"dev_value".to_vec())?;
tx.commit()?;

// Main branch is unchanged
assert_eq!(engine.get(b"key1")?, Some(b"main_value".to_vec()));

// Dev branch has new value
let tx = engine.begin_branch_transaction("dev")?;
assert_eq!(tx.get(&b"key1".to_vec())?, Some(b"dev_value".to_vec()));

Hierarchical Branches

// Create branch hierarchy: main -> dev -> feature
engine.create_branch("dev", Some("main"), BranchOptions::default())?;
engine.create_branch("feature", Some("dev"), BranchOptions::default())?;

// Write to main
engine.put(b"config", b"production")?;

// Read from feature branch (traverses: feature -> dev -> main)
let tx = engine.begin_branch_transaction("feature")?;
assert_eq!(tx.get(&b"config".to_vec())?, Some(b"production".to_vec()));

// Write to dev
let mut tx = engine.begin_branch_transaction("dev")?;
tx.put(b"config".to_vec(), b"development".to_vec())?;
tx.commit()?;

// Feature now sees dev's value
let tx = engine.begin_branch_transaction("feature")?;
assert_eq!(tx.get(&b"config".to_vec())?, Some(b"development".to_vec()));

Branch Options

use std::collections::HashMap;
use heliosdb_lite::storage::BranchOptions;

let mut metadata = HashMap::new();
metadata.insert("owner".to_string(), "alice".to_string());
metadata.insert("purpose".to_string(), "feature-dev".to_string());

let options = BranchOptions {
    replication_factor: Some(3),  // For distributed mode
    region: Some("us-west".to_string()),
    metadata,
};

engine.create_branch("feature", Some("dev"), options)?;

Branch Metadata

Each branch has comprehensive metadata:

let branch = engine.get_branch("dev")?;

println!("Name: {}", branch.name);
println!("Branch ID: {}", branch.branch_id);
println!("Parent ID: {:?}", branch.parent_id);
println!("Created at: {}", branch.created_at);
println!("State: {:?}", branch.state);
println!("Modified keys: {}", branch.stats.modified_keys);
println!("Storage bytes: {}", branch.stats.storage_bytes);

Performance Characteristics

Branch Creation

  • Latency: <10ms regardless of database size
  • Throughput: 1000+ branches/second
  • Storage: ~500 bytes of metadata per branch

Read Operations

  • Current Branch Hit: <0.1ms overhead vs. non-branched read
  • Parent Chain Lookup: <0.5ms overhead (proportional to chain depth)
  • Throughput: ~95% of non-branched read throughput

Write Operations

  • Copy-on-Write: <0.2ms overhead for first write to a key
  • Subsequent Writes: Same as non-branched write
  • Throughput: ~95% of non-branched write throughput

Storage Overhead

Example: 1GB database, 5 branches, 10% data modified per branch

Original:   1GB
Branches:   5 × (500 bytes metadata + 100MB data) ≈ 500MB
Total:      1.5GB (50% overhead, 10% per modified data as expected)

Best Practices

1. Branch Naming

Use descriptive, hierarchical names:

main
├── dev
├── staging
└── production-fixes
    └── hotfix-2024-11-18

2. Branch Lifecycle

// Create for specific purpose
let branch_id = engine.create_branch("feature-auth", Some("dev"), options)?;

// Do work...
let mut tx = engine.begin_branch_transaction("feature-auth")?;
// ... perform operations
tx.commit()?;

// Clean up when done
engine.drop_branch("feature-auth", false)?;

3. Avoid Deep Hierarchies

Keep branch hierarchies shallow (≤5 levels) for optimal read performance:

✓ Good:  main -> dev -> feature
✗ Avoid: main -> dev -> team -> user -> feature -> sub-feature

4. Regular Cleanup

Drop merged or abandoned branches:

// Get all branches
let branches = engine.list_branches()?;

// Drop inactive branches
for branch in branches {
    if should_cleanup(&branch) {
        engine.drop_branch(&branch.name, false)?;
    }
}

Limitations

Current Limitations

  1. No Merge Support: Merge functionality is not yet implemented
  2. No Garbage Collection: Dropped branch data is marked but not yet cleaned up
  3. No Branch Permissions: All branches have the same access level
  4. No Distributed Branching: Branches are local to a single node

Cannot Drop Rules

  • Main Branch: Cannot drop the main (root) branch
  • Parent with Children: Cannot drop a branch that has child branches
// This will fail - main cannot be dropped
engine.drop_branch("main", false)?; // Error

// This will fail - dev has child 'feature'
engine.create_branch("dev", Some("main"), options)?;
engine.create_branch("feature", Some("dev"), options)?;
engine.drop_branch("dev", false)?; // Error: has children

Troubleshooting

Branch Not Found

match engine.get_branch("unknown") {
    Ok(branch) => println!("Found: {}", branch.name),
    Err(e) => println!("Error: {}", e), // "Branch 'unknown' not found"
}

Cannot Drop Branch

// Check if branch has children
let branch = engine.get_branch("dev")?;
// Manual check via metadata would be needed

// Or use if_exists flag
engine.drop_branch("dev", true)?; // No error if not exists

Read Performance Issues

If reads are slow on a branch:

  1. Check hierarchy depth: Deep hierarchies cause multiple lookups
  2. Verify parent chain: Each parent adds ~0.1ms overhead
  3. Consider flattening: Recreate branch from main if too deep

Implementation Details

Key Components

  1. BranchManager: Manages branch metadata and lifecycle
  2. BranchTransaction: Branch-aware MVCC transactions
  3. BranchRegistry: Global branch ID registry
  4. Parent Chain Cache: Cached parent relationships for fast lookups

Thread Safety

All branch operations are thread-safe:

use std::sync::Arc;
use std::thread;

let engine = Arc::new(StorageEngine::open_in_memory(&config)?);

// Safe concurrent access
let handles: Vec<_> = (0..10).map(|i| {
    let engine = Arc::clone(&engine);
    thread::spawn(move || {
        let mut tx = engine.begin_branch_transaction("dev").unwrap();
        tx.put(format!("key{}", i).into_bytes(), b"value".to_vec()).unwrap();
        tx.commit().unwrap();
    })
}).collect();

for handle in handles {
    handle.join().unwrap();
}

Future Enhancements

Planned Features

  1. Branch Merging: Three-way merge with conflict detection
  2. Garbage Collection: Automatic cleanup of dropped branch data
  3. Branch Snapshots: Create lightweight snapshots within branches
  4. Branch Permissions: Fine-grained access control per branch
  5. Distributed Branching: Cross-region branch replication
  6. Branch Triggers: Execute code on branch events (create, merge, drop)

SQL Integration (Future)

-- Create branch
CREATE DATABASE BRANCH dev FROM main AS OF NOW;

-- Switch to branch
SET branch = dev;

-- Merge branch
MERGE DATABASE BRANCH dev INTO main
WITH (
    conflict_resolution = 'source_wins',
    delete_branch_after = true
);

-- Drop branch
DROP DATABASE BRANCH dev;

-- List branches
SELECT * FROM pg_database_branches();

See Also

Conclusion

Branch storage in HeliosDB Lite provides a powerful, efficient way to manage database variants with minimal overhead. The copy-on-write architecture ensures instant branch creation while maintaining strong isolation guarantees and excellent performance.

For questions or issues, please refer to the architecture document or open an issue on GitHub.