Branch Storage Architecture - Part 1 of 3¶
Architecture Design¶
Navigation: Index | Part 1 | Part 2 →
HeliosDB Lite - Branch Storage Architecture¶
Copy-on-Write Branch Storage Backend Design¶
Version: 2.2 Created: November 18, 2025 Status: Architecture Design Document Target: HeliosDB Lite v0.2.0 Scope: Database branching storage backend with efficient copy-on-write
Executive Summary¶
This document defines the storage architecture for HeliosDB Lite's database branching feature, implementing efficient copy-on-write (CoW) semantics with minimal storage overhead. The design leverages RocksDB's MVCC capabilities and introduces a hierarchical branch metadata system that enables instant branch creation, efficient storage sharing, and conflict-aware merging.
Key Design Principles: 1. Zero-Copy Branch Creation - Branches are created instantly via metadata only 2. Shared Storage - Unchanged data is shared between branches (CoW) 3. MVCC Integration - Branches integrate seamlessly with existing snapshot isolation 4. Efficient Garbage Collection - Automatic cleanup of unreferenced versions 5. Conflict Detection - Three-way merge with configurable resolution strategies
Performance Targets: - Branch Creation: <10ms regardless of database size - Branch Read Overhead: <5% vs. non-branched reads - Storage Overhead: <2% per branch for metadata - Merge Performance: Linear in changed data (not total data)
Table of Contents¶
- Architecture Overview
- Branch Metadata Schema
- Copy-on-Write Mechanism
- Key Space Design
- Data Structures
- API Design
- Branch Isolation
- Merge Strategies
- Garbage Collection
- MVCC Integration
- Performance Considerations
- Testing Strategy
1. Architecture Overview¶
1.1 High-Level Design¶
┌─────────────────────────────────────────────────────────────────┐
│ Branch Storage Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Branch Manager │ │
│ │ • Branch metadata CRUD │ │
│ │ • Branch lineage tracking │ │
│ │ • Merge coordination │ │
│ └──────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼───────────────────────────────────┐ │
│ │ Copy-on-Write Layer │ │
│ │ • Versioned key resolution │ │
│ │ • Branch-aware reads/writes │ │
│ │ • Version visibility rules │ │
│ └──────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼───────────────────────────────────┐ │
│ │ MVCC Storage Layer │ │
│ │ • Snapshot isolation (existing) │ │
│ │ • Multi-version tuples │ │
│ │ • Timestamp-based visibility │ │
│ └──────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼───────────────────────────────────┐ │
│ │ RocksDB Backend │ │
│ │ • Key-value storage │ │
│ │ • SST files & memtables │ │
│ │ • Compaction & compression │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
1.2 Branch Hierarchy Model¶
Branches form a directed acyclic graph (DAG) where:
- Root Branch (main): The default branch, created automatically
- Child Branches: Created from a parent branch at a specific point in time
- Orphan Branches: Branches whose parent has been dropped (still valid)
Example Branch DAG:
main (root)
│
├─────┐
│ │
dev staging@T100
│ │
│ └─── feature-x@T150
│
└─── hotfix@T200
1.3 Copy-on-Write Principles¶
- Branch Creation: Copy metadata only, no data duplication
- First Write: Data is copied on first modification in a branch
- Read Priority: Reads check branch-specific versions first, then parent
- Storage Sharing: Unchanged data remains shared across branches
- Isolation: Each branch maintains independent version history
2. Branch Metadata Schema¶
2.1 Branch Metadata Structure¶
Each branch has comprehensive metadata stored in RocksDB:
Key: branch:meta:<branch_name>
Value: BranchMetadata (bincode serialized)
BranchMetadata {
name: String,
branch_id: BranchId, // Unique identifier
parent_id: Option<BranchId>, // None for root branch
created_at: Timestamp, // Creation timestamp
created_from_snapshot: SnapshotId, // Parent snapshot at branch point
state: BranchState, // Active, Dropped, Merged
merge_base: Option<SnapshotId>, // Last merge point (for merge tracking)
metadata: BranchMetadataExtras, // Options, stats, etc.
}
2.2 Branch State Transitions¶
┌─────────┐
│ Create │
└────┬────┘
│
▼
┌──────────┐ Merge ┌────────┐
│ Active ├─────────────────▶│ Merged │
└─────┬────┘ └────────┘
│
│ Drop
▼
┌─────────┐
│ Dropped │
└─────────┘
States:
- Active: Branch is active, can be read/written
- Merged: Branch has been merged into target (read-only)
- Dropped: Branch has been dropped (GC eligible)
2.3 Branch Registry¶
Global branch registry for efficient lookup:
Key: branch:registry
Value: BranchRegistry (bincode serialized)
BranchRegistry {
branches: HashMap<BranchId, BranchName>,
main_branch: BranchId,
next_branch_id: u64,
}
2.4 Branch Lineage Index¶
Parent-child relationship index for GC and merge:
Key: branch:children:<parent_branch_id>
Value: Vec<BranchId> (bincode serialized)
Enables:
- Finding all children of a branch
- Preventing parent drop while children exist
- Efficient lineage traversal
3. Copy-on-Write Mechanism¶
3.1 Key Versioning Strategy¶
Each data key is versioned with branch and timestamp information:
Physical Key Format:
┌──────────┬─────────┬────────────┬───────────┬───────────┐
│ Prefix │ BranchID│ UserKey │ Timestamp │ Version │
│ (data:) │ (u64) │ (variable) │ (u64) │ (u32) │
└──────────┴─────────┴────────────┴───────────┴───────────┘
Example:
data:0000000001:users:123:0000000100:00000001
│ │ │ │
│ │ │ └─ Version counter
│ │ └─────────── Timestamp
│ └───────────────────── User key (table:row_id)
└─────────────────────────────── Branch ID (1 = main)
Benefits:
- Efficient range scans per branch
- Natural clustering of branch data
- Simple visibility checks
3.2 Write Path - Copy-on-Write¶
fn write_to_branch(
branch_id: BranchId,
user_key: &str,
value: &[u8],
timestamp: Timestamp,
) -> Result<()> {
// 1. Check if key exists in current branch
let existing = get_latest_version_in_branch(branch_id, user_key)?;
if existing.is_some() {
// Key already modified in this branch - overwrite
let physical_key = encode_key(branch_id, user_key, timestamp);
db.put(physical_key, value)?;
} else {
// 2. Check parent branches for existing version
let parent_version = get_from_parent_chain(branch_id, user_key)?;
// 3. Copy-on-write: Create new version in current branch
let physical_key = encode_key(branch_id, user_key, timestamp);
db.put(physical_key, value)?;
// Parent version remains unchanged (shared storage)
}
Ok(())
}
3.3 Read Path - Branch Chain Resolution¶
fn read_from_branch(
branch_id: BranchId,
user_key: &str,
snapshot: SnapshotId,
) -> Result<Option<Value>> {
// 1. Try current branch first
if let Some(value) = get_visible_version_in_branch(
branch_id,
user_key,
snapshot
)? {
return Ok(Some(value));
}
// 2. Walk up parent chain
let mut current_branch = branch_id;
while let Some(parent) = get_parent_branch(current_branch)? {
// Get parent snapshot at branch point
let branch_meta = get_branch_metadata(current_branch)?;
let parent_snapshot = branch_meta.created_from_snapshot;
// Read from parent at its snapshot
if let Some(value) = get_visible_version_in_branch(
parent.branch_id,
user_key,
parent_snapshot
)? {
return Ok(Some(value));
}
current_branch = parent.branch_id;
}
// 3. Not found in any branch
Ok(None)
}
3.4 Storage Deduplication¶
Unchanged data is automatically shared:
Before Branch:
main: key1 -> v1, key2 -> v2, key3 -> v3
After CREATE BRANCH dev FROM main:
main: key1 -> v1, key2 -> v2, key3 -> v3
dev: (empty - all reads delegate to main)
After UPDATE key2 in dev:
main: key1 -> v1, key2 -> v2, key3 -> v3
dev: key2 -> v2' (only modified data stored)
key1, key3 still read from main
Storage Overhead: 1 value + metadata (~100 bytes)
4. Key Space Design¶
4.1 Key Prefixes¶
All keys use prefixed namespaces for organization:
// Branch metadata
const BRANCH_META_PREFIX: &[u8] = b"branch:meta:";
const BRANCH_REGISTRY: &[u8] = b"branch:registry";
const BRANCH_CHILDREN_PREFIX: &[u8] = b"branch:children:";
// Branch data (versioned)
const BRANCH_DATA_PREFIX: &[u8] = b"data:";
// Branch catalog (table schemas per branch)
const BRANCH_CATALOG_PREFIX: &[u8] = b"branch:catalog:";
// Branch counters (row IDs per branch)
const BRANCH_COUNTER_PREFIX: &[u8] = b"branch:counter:";
// GC tracking
const GC_REFS_PREFIX: &[u8] = b"gc:refs:";
const GC_PENDING_PREFIX: &[u8] = b"gc:pending:";
4.2 Key Encoding Functions¶
/// Encode a branch data key
fn encode_branch_data_key(
branch_id: BranchId,
user_key: &str,
timestamp: Timestamp,
) -> Vec<u8> {
let mut key = Vec::new();
key.extend_from_slice(BRANCH_DATA_PREFIX);
key.extend_from_slice(&branch_id.to_be_bytes());
key.push(b':');
key.extend_from_slice(user_key.as_bytes());
key.push(b':');
key.extend_from_slice(×tamp.to_be_bytes());
key
}
/// Decode a branch data key
fn decode_branch_data_key(key: &[u8]) -> (BranchId, String, Timestamp) {
// Parse components from key
// Returns (branch_id, user_key, timestamp)
}
/// Encode a branch metadata key
fn encode_branch_meta_key(branch_name: &str) -> Vec<u8> {
let mut key = Vec::new();
key.extend_from_slice(BRANCH_META_PREFIX);
key.extend_from_slice(branch_name.as_bytes());
key
}
4.3 Key Ordering Properties¶
RocksDB's lexicographic key ordering provides natural clustering:
Sorted Key Space:
┌─────────────────────────────────────────────────────────┐
│ branch:children:1 -> [2, 3] │
│ branch:children:2 -> [4] │
│ branch:meta:dev -> BranchMetadata { ... } │
│ branch:meta:main -> BranchMetadata { ... } │
│ branch:registry -> BranchRegistry { ... } │
│ data:0000000001:table1:row1:0000000100 -> value │ ← main branch
│ data:0000000001:table1:row2:0000000105 -> value │
│ data:0000000002:table1:row1:0000000120 -> value' │ ← dev branch
│ data:0000000002:table1:row3:0000000125 -> value │
└─────────────────────────────────────────────────────────┘
Benefits:
- Sequential scans within a branch
- Efficient prefix searches
- Natural compaction grouping
5. Data Structures¶
5.1 Core Rust Structures¶
/// Branch identifier (globally unique)
pub type BranchId = u64;
/// Branch metadata
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BranchMetadata {
/// Branch name (user-visible)
pub name: String,
/// Unique branch ID
pub branch_id: BranchId,
/// Parent branch ID (None for root)
pub parent_id: Option<BranchId>,
/// Creation timestamp
pub created_at: Timestamp,
/// Snapshot at branch point
pub created_from_snapshot: SnapshotId,
/// Current state
pub state: BranchState,
/// Last merge base (for three-way merge)
pub merge_base: Option<SnapshotId>,
/// Additional metadata
pub options: BranchOptions,
/// Statistics
pub stats: BranchStats,
}
/// Branch state
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum BranchState {
/// Branch is active
Active,
/// Branch has been merged
Merged {
/// Target branch
into_branch: BranchId,
/// Merge timestamp
at_timestamp: Timestamp,
},
/// Branch has been dropped
Dropped {
/// Drop timestamp
at_timestamp: Timestamp,
},
}
/// Branch creation/merge options
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct BranchOptions {
/// Replication factor (for distributed mode)
pub replication_factor: Option<usize>,
/// Region hint (for distributed mode)
pub region: Option<String>,
/// Custom metadata
pub metadata: HashMap<String, String>,
}
/// Branch statistics
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct BranchStats {
/// Number of modified keys in this branch
pub modified_keys: u64,
/// Approximate storage size (bytes)
pub storage_bytes: u64,
/// Number of commits in this branch
pub commit_count: u64,
/// Last activity timestamp
pub last_modified: Timestamp,
}
/// Branch registry
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BranchRegistry {
/// Map of branch ID -> branch name
pub branches: HashMap<BranchId, String>,
/// Main branch ID
pub main_branch: BranchId,
/// Next available branch ID
pub next_branch_id: u64,
}
impl BranchRegistry {
/// Get next branch ID (auto-increment)
pub fn next_id(&mut self) -> BranchId {
let id = self.next_branch_id;
self.next_branch_id += 1;
id
}
}
5.2 Branch Manager Structure¶
/// Branch manager - coordinates all branch operations
pub struct BranchManager {
/// Storage engine reference
storage: Arc<StorageEngine>,
/// Branch registry (cached)
registry: Arc<RwLock<BranchRegistry>>,
/// Branch metadata cache
metadata_cache: Arc<RwLock<HashMap<BranchId, BranchMetadata>>>,
}
impl BranchManager {
/// Create new branch manager
pub fn new(storage: Arc<StorageEngine>) -> Result<Self> {
let registry = Self::load_or_create_registry(&storage)?;
Ok(Self {
storage,
registry: Arc::new(RwLock::new(registry)),
metadata_cache: Arc::new(RwLock::new(HashMap::new())),
})
}
/// Create a new branch
pub fn create_branch(
&self,
name: &str,
parent: Option<&str>,
as_of: AsOfClause,
options: BranchOptions,
) -> Result<BranchId>;
/// Drop a branch
pub fn drop_branch(&self, name: &str, if_exists: bool) -> Result<()>;
/// Merge branches
pub fn merge_branch(
&self,
source: &str,
target: &str,
options: MergeOptions,
) -> Result<MergeResult>;
/// Get branch metadata
pub fn get_branch(&self, name: &str) -> Result<BranchMetadata>;
/// List all branches
pub fn list_branches(&self) -> Result<Vec<BranchMetadata>>;
/// Get current branch (from context)
pub fn current_branch(&self) -> Result<BranchId>;
}
5.3 Branch Transaction Structure¶
/// Branch-aware transaction
pub struct BranchTransaction {
/// Underlying MVCC transaction
tx: Transaction,
/// Branch ID
branch_id: BranchId,
/// Branch metadata (cached)
branch_meta: BranchMetadata,
/// Parent chain (cached for reads)
parent_chain: Vec<(BranchId, SnapshotId)>,
}
impl BranchTransaction {
/// Read from branch with parent fallback
pub fn get(&self, key: &Key) -> Result<Option<Vec<u8>>>;
/// Write to branch (copy-on-write)
pub fn put(&mut self, key: Key, value: Vec<u8>) -> Result<()>;
/// Delete from branch
pub fn delete(&mut self, key: Key) -> Result<()>;
/// Commit transaction
pub fn commit(self) -> Result<()>;
/// Rollback transaction
pub fn rollback(self) -> Result<()>;
}
6. API Design¶
6.1 Branch Management API¶
```rust
/// Branch creation
pub fn create_branch(
storage: &StorageEngine,
branch_name: &str,
parent: Option<&str>,
as_of: AsOfClause,
options: BranchOptions,
) -> Result
// 1. Validate branch name
if branch_mgr.get_branch(branch_name).is_ok() {
return Err(Error::branch_exists(branch_name));
}
// 2. Resolve parent branch
let parent_id = if let Some(parent_name) = parent {
Some(branch_mgr.get_branch(parent_name)?.branch_id)
} else {
// None = CURRENT branch from context
Some(branch_mgr.current_branch()?)
};
// 3. Resolve AS OF snapshot
let snapshot_id = resolve_as_of_clause(&as_of, storage)?;
// 4. Allocate branch ID
let mut registry = branch_mgr.registry.write();
let branch_id = registry.next_id();
// 5. Create metadata
let metadata = BranchMetadata {
name: branch_name.to_string(),
branch_id,
parent_id,
created_at: storage.current_timestamp(),
created_from_snapshot: snapshot_id,
state: BranchState::Active,
merge_base: None,
options,
stats: BranchStats::default(),
};
// 6. Persist metadata
let key = encode_branch_meta_key(branch_name);
let value = bincode::serialize(&metadata)?;
storage.put(&key, &value)?;
// 7. Update registry
registry.branches.insert(branch_id, branch_name.to_string());
let registry_value = bincode::serialize(&*registry)?;
storage.put(BRANCH_REGISTRY, ®istry_value)?;
// 8. Update parent's children list
if let Some(parent_id) = parent_id {
add_child_branch(storage, parent_id, branch_id)?;
}
Ok(branch_id)
}
/// Branch deletion pub fn drop_branch( storage: &StorageEngine, branch_name: &str, if_exists: bool, ) -> Result<()> { let branch_mgr = BranchManager::new(Arc::clone(&storage.inner))?;
// 1. Get branch metadata
let metadata = match branch_mgr.get_branch(branch_name) {
Ok(meta) => meta,
Err(_) if if_exists => return Ok(()),
Err(e) => return Err(e),
};
// 2. Prevent dropping main branch
if metadata.branch_id == branch_mgr.registry.read().main_branch {
return Err(Error::cannot_drop_main_branch());
}
// 3. Check for child branches
let children = get_child_branches(storage, metadata.branch_id)?;
if !children.is_empty() {
return Err(Error::branch_has_children(
branch_name,
children.len(),
));
}
// 4. Mark as dropped (soft delete)
let mut updated_meta = metadata.clone();
updated_meta.state = BranchState::Dropped {
at_timestamp: storage.current_timestamp(),
};
let key = encode_branch_meta_key(branch_name);
let value = bincode::serialize(&updated_meta)?;
storage.put(&key, &value)?;
// 5. Remove from registry
let mut registry = branch_mgr.registry.write();
registry.branches.remove(&metadata.branch_id);
let registry_value = bincode::serialize(&*registry)?;
storage.put(BRANCH_REGISTRY, ®istry_value)?;
// 6. Schedule GC for branch data
schedule_branch_gc(storage, metadata.branch_id)?;
Ok(())
}
/// Branch merging
pub fn merge_branch(
storage: &StorageEngine,
source_name: &str,
target_name: &str,
options: MergeOptions,
) -> Result
// 1. Get source and target metadata
let source = branch_mgr.get_branch(source_name)?;
let target = branch_mgr.get_branch(target_name)?;