HeliosDB Lite - Branch Storage User Guide¶
Overview¶
HeliosDB Lite's branch storage provides Git-like database branching with copy-on-write semantics. Create instant branches for development, testing, or staging environments without duplicating data.
Key Features¶
- Instant Branch Creation: Create branches in <10ms regardless of database size
- Copy-on-Write: Modified data is copied only when written, not at branch creation time
- Storage Efficiency: <2% overhead per branch for metadata, shared storage for unchanged data
- MVCC Integration: Branch-aware transactions with snapshot isolation guarantees
- Hierarchical Branches: Support for multi-level branch hierarchies
- Minimal Read Overhead: <5% read performance overhead for current branch
Architecture¶
Branch Hierarchy¶
Branches form a directed acyclic graph (DAG):
Copy-on-Write Mechanism¶
- Branch Creation: Only metadata is created, no data is copied
- First Read: Data is read from the branch or parent chain
- First Write: Data is copied on first modification (copy-on-write)
- Subsequent Operations: Branch operates on its own data
Key Format¶
Physical keys encode branch information:
data:<branch_id>:<user_key>:<timestamp>
Example:
data:0000000002:users:123:0000000100
│ │ │
│ │ └─ Timestamp
│ └─────────── User key (table:row_id)
└─────────────────────── Branch ID (2 = dev)
Usage Examples¶
Basic Branch Operations¶
use heliosdb_lite::{Config, storage::{StorageEngine, BranchOptions}};
// Open database
let config = Config::in_memory();
let engine = StorageEngine::open_in_memory(&config)?;
// Create a development branch
let branch_id = engine.create_branch(
"dev", // Branch name
Some("main"), // Parent branch (None = current)
BranchOptions::default(), // Options
)?;
// List all branches
let branches = engine.list_branches()?;
for branch in branches {
println!("{}: {:?}", branch.name, branch.state);
}
// Drop a branch
engine.drop_branch("dev", false)?;
Branch Transactions¶
// Begin transaction on a specific branch
let mut tx = engine.begin_branch_transaction("dev")?;
// Read (checks current branch, then parent chain)
let value = tx.get(&b"users:123".to_vec())?;
// Write (copy-on-write)
tx.put(b"users:123".to_vec(), b"new_data".to_vec())?;
// Commit
tx.commit()?;
Branch Isolation Example¶
// Insert in main branch
engine.put(b"key1", b"main_value")?;
// Create dev branch
engine.create_branch("dev", Some("main"), BranchOptions::default())?;
// Read from dev (sees main's value)
let tx = engine.begin_branch_transaction("dev")?;
assert_eq!(tx.get(&b"key1".to_vec())?, Some(b"main_value".to_vec()));
// Modify in dev
let mut tx = engine.begin_branch_transaction("dev")?;
tx.put(b"key1".to_vec(), b"dev_value".to_vec())?;
tx.commit()?;
// Main branch is unchanged
assert_eq!(engine.get(b"key1")?, Some(b"main_value".to_vec()));
// Dev branch has new value
let tx = engine.begin_branch_transaction("dev")?;
assert_eq!(tx.get(&b"key1".to_vec())?, Some(b"dev_value".to_vec()));
Hierarchical Branches¶
// Create branch hierarchy: main -> dev -> feature
engine.create_branch("dev", Some("main"), BranchOptions::default())?;
engine.create_branch("feature", Some("dev"), BranchOptions::default())?;
// Write to main
engine.put(b"config", b"production")?;
// Read from feature branch (traverses: feature -> dev -> main)
let tx = engine.begin_branch_transaction("feature")?;
assert_eq!(tx.get(&b"config".to_vec())?, Some(b"production".to_vec()));
// Write to dev
let mut tx = engine.begin_branch_transaction("dev")?;
tx.put(b"config".to_vec(), b"development".to_vec())?;
tx.commit()?;
// Feature now sees dev's value
let tx = engine.begin_branch_transaction("feature")?;
assert_eq!(tx.get(&b"config".to_vec())?, Some(b"development".to_vec()));
Branch Options¶
use std::collections::HashMap;
use heliosdb_lite::storage::BranchOptions;
let mut metadata = HashMap::new();
metadata.insert("owner".to_string(), "alice".to_string());
metadata.insert("purpose".to_string(), "feature-dev".to_string());
let options = BranchOptions {
replication_factor: Some(3), // For distributed mode
region: Some("us-west".to_string()),
metadata,
};
engine.create_branch("feature", Some("dev"), options)?;
Branch Metadata¶
Each branch has comprehensive metadata:
let branch = engine.get_branch("dev")?;
println!("Name: {}", branch.name);
println!("Branch ID: {}", branch.branch_id);
println!("Parent ID: {:?}", branch.parent_id);
println!("Created at: {}", branch.created_at);
println!("State: {:?}", branch.state);
println!("Modified keys: {}", branch.stats.modified_keys);
println!("Storage bytes: {}", branch.stats.storage_bytes);
Performance Characteristics¶
Branch Creation¶
- Latency: <10ms regardless of database size
- Throughput: 1000+ branches/second
- Storage: ~500 bytes of metadata per branch
Read Operations¶
- Current Branch Hit: <0.1ms overhead vs. non-branched read
- Parent Chain Lookup: <0.5ms overhead (proportional to chain depth)
- Throughput: ~95% of non-branched read throughput
Write Operations¶
- Copy-on-Write: <0.2ms overhead for first write to a key
- Subsequent Writes: Same as non-branched write
- Throughput: ~95% of non-branched write throughput
Storage Overhead¶
Example: 1GB database, 5 branches, 10% data modified per branch
Original: 1GB
Branches: 5 × (500 bytes metadata + 100MB data) ≈ 500MB
Total: 1.5GB (50% overhead, 10% per modified data as expected)
Best Practices¶
1. Branch Naming¶
Use descriptive, hierarchical names:
2. Branch Lifecycle¶
// Create for specific purpose
let branch_id = engine.create_branch("feature-auth", Some("dev"), options)?;
// Do work...
let mut tx = engine.begin_branch_transaction("feature-auth")?;
// ... perform operations
tx.commit()?;
// Clean up when done
engine.drop_branch("feature-auth", false)?;
3. Avoid Deep Hierarchies¶
Keep branch hierarchies shallow (≤5 levels) for optimal read performance:
4. Regular Cleanup¶
Drop merged or abandoned branches:
// Get all branches
let branches = engine.list_branches()?;
// Drop inactive branches
for branch in branches {
if should_cleanup(&branch) {
engine.drop_branch(&branch.name, false)?;
}
}
Limitations¶
Current Limitations¶
- No Merge Support: Merge functionality is not yet implemented
- No Garbage Collection: Dropped branch data is marked but not yet cleaned up
- No Branch Permissions: All branches have the same access level
- No Distributed Branching: Branches are local to a single node
Cannot Drop Rules¶
- Main Branch: Cannot drop the main (root) branch
- Parent with Children: Cannot drop a branch that has child branches
// This will fail - main cannot be dropped
engine.drop_branch("main", false)?; // Error
// This will fail - dev has child 'feature'
engine.create_branch("dev", Some("main"), options)?;
engine.create_branch("feature", Some("dev"), options)?;
engine.drop_branch("dev", false)?; // Error: has children
Troubleshooting¶
Branch Not Found¶
match engine.get_branch("unknown") {
Ok(branch) => println!("Found: {}", branch.name),
Err(e) => println!("Error: {}", e), // "Branch 'unknown' not found"
}
Cannot Drop Branch¶
// Check if branch has children
let branch = engine.get_branch("dev")?;
// Manual check via metadata would be needed
// Or use if_exists flag
engine.drop_branch("dev", true)?; // No error if not exists
Read Performance Issues¶
If reads are slow on a branch:
- Check hierarchy depth: Deep hierarchies cause multiple lookups
- Verify parent chain: Each parent adds ~0.1ms overhead
- Consider flattening: Recreate branch from main if too deep
Implementation Details¶
Key Components¶
- BranchManager: Manages branch metadata and lifecycle
- BranchTransaction: Branch-aware MVCC transactions
- BranchRegistry: Global branch ID registry
- Parent Chain Cache: Cached parent relationships for fast lookups
Thread Safety¶
All branch operations are thread-safe:
use std::sync::Arc;
use std::thread;
let engine = Arc::new(StorageEngine::open_in_memory(&config)?);
// Safe concurrent access
let handles: Vec<_> = (0..10).map(|i| {
let engine = Arc::clone(&engine);
thread::spawn(move || {
let mut tx = engine.begin_branch_transaction("dev").unwrap();
tx.put(format!("key{}", i).into_bytes(), b"value".to_vec()).unwrap();
tx.commit().unwrap();
})
}).collect();
for handle in handles {
handle.join().unwrap();
}
Future Enhancements¶
Planned Features¶
- Branch Merging: Three-way merge with conflict detection
- Garbage Collection: Automatic cleanup of dropped branch data
- Branch Snapshots: Create lightweight snapshots within branches
- Branch Permissions: Fine-grained access control per branch
- Distributed Branching: Cross-region branch replication
- Branch Triggers: Execute code on branch events (create, merge, drop)
SQL Integration (Future)¶
-- Create branch
CREATE DATABASE BRANCH dev FROM main AS OF NOW;
-- Switch to branch
SET branch = dev;
-- Merge branch
MERGE DATABASE BRANCH dev INTO main
WITH (
conflict_resolution = 'source_wins',
delete_branch_after = true
);
-- Drop branch
DROP DATABASE BRANCH dev;
-- List branches
SELECT * FROM pg_database_branches();
See Also¶
Conclusion¶
Branch storage in HeliosDB Lite provides a powerful, efficient way to manage database variants with minimal overhead. The copy-on-write architecture ensures instant branch creation while maintaining strong isolation guarantees and excellent performance.
For questions or issues, please refer to the architecture document or open an issue on GitHub.